LinuxQuestions.org
Visit Jeremy's Blog.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 11-29-2015, 01:33 PM   #1
rbees
Member
 
Registered: Mar 2004
Location: northern michigan usa
Distribution: Debian Squeeze, Whezzy, Jessie
Posts: 921

Rep: Reputation: 46
Less steps?


Ladies & Gents

It only took me a half a day to get this working < proof that I am learning

But there has to be a better way to do this in bash. I find it kind of pointless to have wget an htm file, save it, use w3m to parse it to text, save it again, read it into an array and grep it twice just to get a single address : port out of it, and then delete the two saved files. I wouldn't go to the trouble except that periodically the site has streaming issues and the address : port gets changes which brakes my script. Then I have to go parse the file again by hand and change the hard code in my script to the new data before it will work again. The worst part is I don't know it needs changing until after it has broken. By then it is to late and all the automation I have worked for is borked.

I did not have any luck trying to just grep the htm for the data and according to what I have read regx does not work well on htm files. Especially if what you are trying to do is very complicated.

What I find most irritating is that w3m will not translate htm-to-text with out the file being saved locally first, or at least I have not figured out how to do it yet. man w3m has not helped there. If I try to redirect the http address instead of a local file it doesn't work.

Code:
#!/bin/bash

wget http://www.shaareyzedek.mb.ca/service/serviceslive/liveservice.htm
w3m -cols 40 < liveservice.htm > liveservice.txt
oldifs=($IFS)
IFS= 
read -a livestream <<<"$(grep FlashVars liveservice.txt | grep -Eo '(http|https)://[^/"]+')"
echo $livestream
IFS=($oldifs)
rm liveservice*
It just seams that there should be a way to get rid of 5 or 6 steps here.
 
Old 11-29-2015, 02:03 PM   #2
norobro
Member
 
Registered: Feb 2006
Distribution: Debian Sid
Posts: 792

Rep: Reputation: 331Reputation: 331Reputation: 331Reputation: 331
Use curl and pipe the output to your grep's:
Code:
curl -s http://www.shaareyzedek.mb.ca/service/serviceslive/liveservice.htm | grep FlashVars  | grep -Eo '(http|https)://[^/"]+'
 
Old 11-29-2015, 02:05 PM   #3
rbees
Member
 
Registered: Mar 2004
Location: northern michigan usa
Distribution: Debian Squeeze, Whezzy, Jessie
Posts: 921

Original Poster
Rep: Reputation: 46
So after looking at man w3m some more I did manage to get some of the steps removed but it still seams kludgy to me.

Code:
oldifs=$IFS
IFS= 
read -a livestream <<< $(w3m -dump_source http://www.shaareyzedek.mb.ca/service/serviceslive/liveservice.htm | grep FlashVars | grep -Eo '(http|https)://[^/"]+')
echo $livestream
IFS=$oldifs
 
Old 11-29-2015, 04:59 PM   #4
ntubski
Senior Member
 
Registered: Nov 2005
Distribution: Debian, Arch
Posts: 3,780

Rep: Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081
I'm confused why are you reading it as an array (read -a) and then echoing it as a scalar?

You can use the VAR=xx command to just set VAR during command. Also, I think redirecting from a process substitution is better than using $() to turn a stream into a string, and then <<< to turn the string back into a stream again.
Code:
IFS= read -a livestream < <(w3m -dump_source http://www.shaareyzedek.mb.ca/service/serviceslive/liveservice.htm | grep FlashVars | grep -Eo '(http|https)://[^/"]+')
echo $livestream
# but shouldn't it be
echo "${livestream[@]}"
 
Old 11-30-2015, 05:00 AM   #5
rbees
Member
 
Registered: Mar 2004
Location: northern michigan usa
Distribution: Debian Squeeze, Whezzy, Jessie
Posts: 921

Original Poster
Rep: Reputation: 46
Thanks ntubski

Code:
w3m -dump_source http://www.shaareyzedek.mb.ca/service/serviceslive/liveservice.htm | grep FlashVars | grep -Eo '(http|https)://[^/"]+')
actually dumps 5 addresses, four of which are identical and the actual address of the stream, two on each side and one in middle to the player. I only need the stream address once to plug into my script and I don't need the player address at all.

You are right instead of
Code:
echo $livestream
# it should be 
echo ${livestream[1]}
# 1 being one of the actual references to the stream address
Code:
var=$(w3m -dump_source http://www.shaareyzedek.mb.ca/service/serviceslive/liveservice.htm | grep FlashVars | grep -Eo '(http|https)://[^/"]+'))
or any similar has always returned an error of some kind for me or just not returned the data I need. In this case
Quote:
bash: syntax error near unexpected token `)'
Trying again
Code:
$(w3m -dump_source http://www.shaareyzedek.mb.ca/service/serviceslive/liveservice.htm |grep FlashVars | grep -Eo '(http|https)://[^/"]+')
returns
Code:
echo $var
http://s6.voscast.com:9464 http://s6.voscast.com:9464 http://www.macromedia.com http://s6.voscast.com:9464 http://s6.voscast.com:9464
So reading it into an array seamed like the easiest way, one that I understand, to get the data I need.
 
Old 11-30-2015, 10:34 AM   #6
ntubski
Senior Member
 
Registered: Nov 2005
Distribution: Debian, Arch
Posts: 3,780

Rep: Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081
Quote:
Originally Posted by rbees View Post
Code:
var=$(w3m -dump_source http://www.shaareyzedek.mb.ca/service/serviceslive/liveservice.htm | grep FlashVars | grep -Eo '(http|https)://[^/"]+'))
Extra paren?

Quote:
Code:
$(w3m -dump_source http://www.shaareyzedek.mb.ca/service/serviceslive/liveservice.htm |grep FlashVars | grep -Eo '(http|https)://[^/"]+')
returns
Code:
echo $var
http://s6.voscast.com:9464 http://s6.voscast.com:9464 http://www.macromedia.com http://s6.voscast.com:9464 http://s6.voscast.com:9464
So reading it into an array seamed like the easiest way, one that I understand, to get the data I need.
Try adding | head -1:
Code:
w3m -dump_source http://www.shaareyzedek.mb.ca/service/serviceslive/liveservice.htm |grep FlashVars | grep -Eo '(http|https)://[^/"]+' | head -1
Probably the 2 greps and head can be combined into a single awk statement.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
How to add features to a kernel, what steps and why these steps are taken. Siljrath Linux - General 1 03-30-2013 08:53 AM
First steps with LQ RONSIN FR LinuxQuestions.org Member Intro 1 06-14-2012 07:38 PM
first steps Snorkel1 Linux - Newbie 8 03-19-2012 12:43 PM
LXer: Shuttleworth steps down as Ubuntu 10.04 alpha steps up LXer Syndicated Linux News 0 12-18-2009 09:50 AM
what are the steps for 2.6.8.1 mrlucio79 Red Hat 2 09-01-2004 08:46 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 06:32 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration