[SOLVED] Less steps?

rbees · 11-29-2015, 01:33 PM

Ladies & Gents

It only took me a half a day to get this working < proof that I am learning

But there has to be a better way to do this in bash. I find it kind of pointless to have wget an htm file, save it, use w3m to parse it to text, save it again, read it into an array and grep it twice just to get a single address : port out of it, and then delete the two saved files. I wouldn't go to the trouble except that periodically the site has streaming issues and the address : port gets changes which brakes my script. Then I have to go parse the file again by hand and change the hard code in my script to the new data before it will work again. The worst part is I don't know it needs changing until after it has broken. By then it is to late and all the automation I have worked for is borked.

I did not have any luck trying to just grep the htm for the data and according to what I have read regx does not work well on htm files. Especially if what you are trying to do is very complicated.

What I find most irritating is that w3m will not translate htm-to-text with out the file being saved locally first, or at least I have not figured out how to do it yet. man w3m has not helped there. If I try to redirect the http address instead of a local file it doesn't work.

Code:

#!/bin/bash

wget http://www.shaareyzedek.mb.ca/service/serviceslive/liveservice.htm
w3m -cols 40 < liveservice.htm > liveservice.txt
oldifs=($IFS)
IFS= 
read -a livestream <<<"$(grep FlashVars liveservice.txt | grep -Eo '(http|https)://[^/"]+')"
echo $livestream
IFS=($oldifs)
rm liveservice*

It just seams that there should be a way to get rid of 5 or 6 steps here.

norobro · 11-29-2015, 02:03 PM

Use curl and pipe the output to your grep's:

Code:

curl -s http://www.shaareyzedek.mb.ca/service/serviceslive/liveservice.htm | grep FlashVars  | grep -Eo '(http|https)://[^/"]+'

rbees · 11-29-2015, 02:05 PM

So after looking at man w3m some more I did manage to get some of the steps removed but it still seams kludgy to me.

Code:

oldifs=$IFS
IFS= 
read -a livestream <<< $(w3m -dump_source http://www.shaareyzedek.mb.ca/service/serviceslive/liveservice.htm | grep FlashVars | grep -Eo '(http|https)://[^/"]+')
echo $livestream
IFS=$oldifs

ntubski · 11-29-2015, 04:59 PM

I'm confused why are you reading it as an array (read -a) and then echoing it as a scalar?

You can use the VAR=xx command to just set VAR during command. Also, I think redirecting from a process substitution is better than using $() to turn a stream into a string, and then <<< to turn the string back into a stream again.

Code:

IFS= read -a livestream < <(w3m -dump_source http://www.shaareyzedek.mb.ca/service/serviceslive/liveservice.htm | grep FlashVars | grep -Eo '(http|https)://[^/"]+')
echo $livestream
# but shouldn't it be
echo "${livestream[@]}"

rbees · 11-30-2015, 05:00 AM

Thanks ntubski

Code:

w3m -dump_source http://www.shaareyzedek.mb.ca/service/serviceslive/liveservice.htm | grep FlashVars | grep -Eo '(http|https)://[^/"]+')

actually dumps 5 addresses, four of which are identical and the actual address of the stream, two on each side and one in middle to the player. I only need the stream address once to plug into my script and I don't need the player address at all.

You are right instead of

Code:

echo $livestream
# it should be 
echo ${livestream[1]}
# 1 being one of the actual references to the stream address

Code:

var=$(w3m -dump_source http://www.shaareyzedek.mb.ca/service/serviceslive/liveservice.htm | grep FlashVars | grep -Eo '(http|https)://[^/"]+'))

or any similar has always returned an error of some kind for me or just not returned the data I need. In this case

Quote:

bash: syntax error near unexpected token `)'

Trying again

Code:

$(w3m -dump_source http://www.shaareyzedek.mb.ca/service/serviceslive/liveservice.htm |grep FlashVars | grep -Eo '(http|https)://[^/"]+')

returns

Code:

echo $var
http://s6.voscast.com:9464 http://s6.voscast.com:9464 http://www.macromedia.com http://s6.voscast.com:9464 http://s6.voscast.com:9464

So reading it into an array seamed like the easiest way, one that I understand, to get the data I need.

ntubski · 11-30-2015, 10:34 AM

Quote:

Originally Posted by rbees

Code:

var=$(w3m -dump_source http://www.shaareyzedek.mb.ca/service/serviceslive/liveservice.htm | grep FlashVars | grep -Eo '(http|https)://[^/"]+'))

Extra paren?

Quote:

Code:

$(w3m -dump_source http://www.shaareyzedek.mb.ca/service/serviceslive/liveservice.htm |grep FlashVars | grep -Eo '(http|https)://[^/"]+')

returns

Code:

echo $var
http://s6.voscast.com:9464 http://s6.voscast.com:9464 http://www.macromedia.com http://s6.voscast.com:9464 http://s6.voscast.com:9464

So reading it into an array seamed like the easiest way, one that I understand, to get the data I need.

Try adding | head -1:

Code:

w3m -dump_source http://www.shaareyzedek.mb.ca/service/serviceslive/liveservice.htm |grep FlashVars | grep -Eo '(http|https)://[^/"]+' | head -1

Probably the 2 greps and head can be combined into a single awk statement.