LinuxQuestions.org
Visit Jeremy's Blog.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 12-05-2011, 02:32 AM   #1
ted_chou12
Member
 
Registered: Aug 2010
Location: Zhongli, Taoyuan
Distribution: slackware, windows, debian (armv4l GNU/Linux)
Posts: 431
Blog Entries: 32

Rep: Reputation: 3
How can I use sed to match this?


Hi, how can I use sed to match this text string:
Code:
<img src="static/image/filetype/torrent.gif" border="0" class="vm" alt="" />
<span style="white-space: nowrap" id="attach_%number%" onmouseover="showMenu(%text%)">

<a href="%variable%" target="_blank">%string%.torrent</a>

<em class="xg1">
Hi, the variable parts are represented by %number%, %text%. They can be any length. %number% only has a string of numbers like 21414. Where as %text% and %string% can match any length of string of text or number. The part that I want to extract is %variable% and %string% separately.

Code:
array=(echo "$string" | sed '<img src="static/image/filetype/torrent.gif" border="0" class="vm" alt="" />
<span style="white-space: nowrap" id="attach_[0-9]" onmouseover="showMenu\((*.)\)">

<a href="$1" target="_blank">$2.torrent</a>

<em class="xg1">')

for $vars in $array ..
this part is ignored.
Would it be something like this?
Thanks,
Ted

Last edited by ted_chou12; 12-05-2011 at 02:35 AM.
 
Old 12-05-2011, 02:47 AM   #2
jhwilliams
Senior Member
 
Registered: Apr 2007
Location: Portland, OR
Distribution: Debian, Android, LFS
Posts: 1,168

Rep: Reputation: 211Reputation: 211Reputation: 211
Just match the <a> tag, then.

Code:
cat foo.html | sed -r 's@<a href="(.*)" .*>(.*).torrent</a>@variable=\1, string=\2@'
Explanation: the sed command has search and replace parts, broken up by @ chars.

Search looks for
<a href="(.*)" .*>(.*).torrent</a>

Which saves the href target, and the contents of the a tag itself (by using the parentheses.)

Next, the replace statement references those matches in order as \1 and \2.

Last edited by jhwilliams; 12-05-2011 at 02:52 AM.
 
1 members found this post helpful.
Old 12-05-2011, 03:05 AM   #3
ted_chou12
Member
 
Registered: Aug 2010
Location: Zhongli, Taoyuan
Distribution: slackware, windows, debian (armv4l GNU/Linux)
Posts: 431

Original Poster
Blog Entries: 32

Rep: Reputation: 3
Quote:
Originally Posted by jhwilliams View Post
Just match the <a> tag, then.

Code:
cat foo.html | sed -r 's@<a href="(.*)" .*>(.*).torrent</a>@variable=\1, string=\2@'
Explanation: the sed command has search and replace parts, broken up by @ chars.

Search looks for
<a href="(.*)" .*>(.*).torrent</a>

Which saves the href target, and the contents of the a tag itself (by using the parentheses.)

Next, the replace statement references those matches in order as \1 and \2.
Hi, thanks, can you explain how I could extract $1 and $2?
I tried
Code:
echo $(cat foo.html | sed -r 's@<a href="(.*)" .*>(.*).torrent</a>@variable=\1, string=\2@')
It seems to output the whole page a lot of times.
Thanks,
Ted
 
Old 12-05-2011, 03:15 AM   #4
jhwilliams
Senior Member
 
Registered: Apr 2007
Location: Portland, OR
Distribution: Debian, Android, LFS
Posts: 1,168

Rep: Reputation: 211Reputation: 211Reputation: 211
Oh right, right. Try grepping for the <a href line first.
 
1 members found this post helpful.
Old 12-05-2011, 03:40 AM   #5
ted_chou12
Member
 
Registered: Aug 2010
Location: Zhongli, Taoyuan
Distribution: slackware, windows, debian (armv4l GNU/Linux)
Posts: 431

Original Poster
Blog Entries: 32

Rep: Reputation: 3
Hi, would that be this:
Code:
echo $(cat "aa.html" | grep "^<a href=" |sed -r 's@<a href="(.*)" .*>(.*).torrent</a>@variable=\1, string=\2@')
Thanks,
Ted
 
Old 12-05-2011, 03:45 AM   #6
jhwilliams
Senior Member
 
Registered: Apr 2007
Location: Portland, OR
Distribution: Debian, Android, LFS
Posts: 1,168

Rep: Reputation: 211Reputation: 211Reputation: 211
Quote:
Originally Posted by ted_chou12 View Post
Hi, would that be this:
Code:
echo $(cat "aa.html" | grep "^<a href=" |sed -r 's@<a href="(.*)" .*>(.*).torrent</a>@variable=\1, string=\2@')
Thanks,
Ted
That's close, but you're grep match isn't what you want. As is, you're looking for lines that start with <a href=". There will probably be other stuff before you hit the <a> tag. So, maybe just remove the ^. Or, account for whatever you expect to find before the tag.
 
1 members found this post helpful.
Old 12-05-2011, 03:56 AM   #7
ted_chou12
Member
 
Registered: Aug 2010
Location: Zhongli, Taoyuan
Distribution: slackware, windows, debian (armv4l GNU/Linux)
Posts: 431

Original Poster
Blog Entries: 32

Rep: Reputation: 3
Hi, the output is
Quote:
XXXTOP Part of HTML
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
variable
variable
variable
variable,text
XXXXXXXXXXXXXXBottom part of HTML
So it isn't matching all of it, or at least, not partially extracting. I am guessing because I have multiple ones to extract within one page. How would I go about doing this?
Thanks,
Ted
 
Old 12-05-2011, 04:48 AM   #8
ted_chou12
Member
 
Registered: Aug 2010
Location: Zhongli, Taoyuan
Distribution: slackware, windows, debian (armv4l GNU/Linux)
Posts: 431

Original Poster
Blog Entries: 32

Rep: Reputation: 3
The page I wish to extract is:
Code:
<div id="wp" class="wp"><script type="text/javascript">var fid = parseInt('108'), tid = parseInt('147178');</script>

<script src="static/js/forum_viewthread.js?6vP" type="text/javascript"></script>
<script type="text/javascript">zoomstatus = parseInt(1);var imagemaxwidth = '750';var aimgcount = new Array();</script>

<style id="diy_style" type="text/css"></style>
<!--[diy=diynavtop]--><div id="diynavtop" class="area"></div><!--[/diy]-->
<div id="pt" class="bm cl">
<div class="z">
<a href="./" class="nvhm" title=""></a> <em>&rsaquo;</em> <a href="forum.php"></a> <em>&rsaquo;</em> <a href="forum.php?gid=107"></a> <em>&rsaquo;</em> <a href="forum.php?mod=forumdisplay&fid=108&page=1"></a> <em>&rsaquo;</em> <a href="forum.php?mod=viewthread&amp;tid=147178">[5/12] [...</a>
</div>
</div>

<style id="diy_style" type="text/css"></style>
<div class="wp">
<!--[diy=diy1]--><div id="diy1" class="area"></div><!--[/diy]-->
</div>

<div id="ct" class="wp cl">
<div id="pgt" class="pgs mbm cl ">
<div class="pgt"><div class="pg"><strong>1</strong><a href="forum.php?mod=viewthread&tid=147178&amp;extra=page%3D1&amp;page=2">2</a><a href="forum.php?mod=viewthread&tid=147178&amp;extra=page%3D1&amp;page=3">3</a><a href="forum.php?mod=viewthread&tid=147178&amp;extra=page%3D1&amp;page=4">4</a><a href="forum.php?mod=viewthread&tid=147178&amp;extra=page%3D1&amp;page=5">5</a><a href="forum.php?mod=viewthread&tid=147178&amp;extra=page%3D1&amp;page=6">6</a><a href="forum.php?mod=viewthread&tid=147178&amp;extra=page%3D1&amp;page=7">7</a><a href="forum.php?mod=viewthread&tid=147178&amp;extra=page%3D1&amp;page=8">8</a><a href="forum.php?mod=viewthread&tid=147178&amp;extra=page%3D1&amp;page=9">9</a><a href="forum.php?mod=viewthread&tid=147178&amp;extra=page%3D1&amp;page=10">10</a><a href="forum.php?mod=viewthread&tid=147178&amp;extra=page%3D1&amp;page=2" class="nxt">下一頁</a></div></div>
<span class="y pgb" id="visitedforums" onmouseover="$('visitedforums').id = 'visitedforumstmp';this.id = 'visitedforums';showMenu({'ctrlid':this.id,'pos':'34'})"><a href="forum.php?mod=forumdisplay&fid=108&page=1"></a></span>
<a id="newspecial" onmouseover="$('newspecial').id = 'newspecialtmp';this.id = 'newspecial';showMenu({'ctrlid':this.id})" onclick="showWindow('newthread', 'forum.php?mod=post&action=newthread&fid=108')" href="javascript:;" title="發新帖"><img src="static/image/common/pn_post.png" alt="" /></a><a id="post_reply" onclick="showWindow('reply', 'forum.php?mod=post&action=reply&fid=108&tid=147178')" href="javascript:;" title=""><img src="static/image/common/pn_reply.png" alt="" /></a>
</div>



<div id="postlist" class="pl bm">
<table cellspacing="0" cellpadding="0">
<tr>
<td class="pls ptm pbm">
<div class="hm">
<span class="xg1"></span> <span class="xi1">11183</span><span class="pipe">|</span><span class="xg1">:</span> <span class="xi1">188</span>
</div>
<tr>
<td class="pls" rowspan="2">
 <div class="pi">
<div class="authi"><a href="home.php?mod=space&amp;uid=1" target="_blank" class="xw1">pieayu</a>

</div>
</div>
<div class="p_pop blk bui" id="userinfo2721324" style="display: none; margin-top: -11px;">
<div class="m z">
<div id="userinfo2721324_ma"></div>
</div>
<div class="i y">
<div>
<strong><a href="home.php?mod=space&amp;uid=1" target="_blank" class="xi2">pieayu</a></strong>
</p>
<ul class="xl xl2 o cl">
<li class="buddy"><a href="home.php?mod=spacecp&amp;ac=friend&amp;op=add&amp;uid=1&amp;handlekey=addfriendhk_1" id="a_friend_li_2721324" onclick="showWindow(this.id, this.href, 'get', 1, {'ctrlid':this.id,'pos':'00'});" title=></a></li>
<li class="poke2"><a href="home.php?mod=spacecp&amp;ac=poke&amp;op=send&amp;uid=1" id="a_poke_li_2721324" onclick="showWindow(this.id, this.href, 'get', 0);" title="" class="xi2"></a></li>
<li class="pm2"><a href="home.php?mod=spacecp&amp;ac=pm&amp;op=showmsg&amp;handlekey=showmsg_1&amp;touid=1&amp;pmid=0&amp;daterange=2&amp;pid=2721324&amp;tid=147178" onclick="showWindow('sendpm', this.href);" title="" class="xi2"></a></li>
</ul>
</td>
<td class="plc">
<div class="pi">
<div id="fj" class="y">
<label class="z"></label>
<input type="text" class="px p_fre z" size="2" onkeyup="$('fj_btn').href='forum.php?mod=redirect&ptid=147178&authorid=0&postno='+this.value" onkeydown="if(event.keyCode==13) {window.location=$('fj_btn').href;return false;}" title="" id="postnum2721324" onclick="setCopy(this.href, '');return false;"><em>1</em><sup>#</sup></a>
</strong>
<div class="pti">
<div class="pdbt">
</div>
<div class="authi">
<img class="authicn vm" id="authicon2721324" src="static/image/common/ayu_icon.gif" />
<em id="authorposton2721324"> 2011-10-9 17:03:38</em>
<span class="pipe">|</span><a href="forum.php?mod=viewthread&amp;tid=147178&amp;page=1&amp;authorid=1" rel="nofollow"></a>
<span class="pipe">|</span><a href="forum.php?mod=viewthread&amp;tid=147178&amp;extra=page%3D1&amp;ordertype=1"></a>
</div>
</div>
</div><div class="pct"><style type="text/css">.pcb{margin-right:0}</style><div class="pcb">
<div class="t_fsz">
<table cellspacing="0" cellpadding="0"><tr><td class="t_f" id="postmessage_2721324">
<img src="http://img165.poco.cn/mypoco/myphoto/20111010/04/5536770720111010042551037.jpg" onload="thumbImg(this)" alt="" /><br />
<br />

<ignore_js_op>

<img src="static/image/filetype/torrent.gif" border="0" class="vm" alt="" />
<span style="white-space: nowrap" id="attach_65391" onmouseover="showMenu({'ctrlid':this.id,'pos':'12'})">

<a href="forum.php?mod=attachment&amp;aid=NjUzOTF8OGM1YTM3MGF8MTMyMzA4MTY4NHw5MDM3MnwxNDcxNzg%3D" target="_blank">[DMG][Mirai nikki][OAD][848x480][BIG5].rmvb.torrent</a>

<em class="xg1">(13.74 KB, 下載次數: 839)
</em>
</span>
<div class="tip tip_4" id="attach_65391_menu" style="position: absolute; display: none">
<div class="tip_c xs0">
<div class="y">2011-10-9 17:05 上傳</div>
下載次數: 839

</div>
<div class="tip_horn"></div>
</div>
</ignore_js_op>
<br />

<ignore_js_op>

<img src="static/image/filetype/torrent.gif" border="0" class="vm" alt="" />
<span style="white-space: nowrap" id="attach_65390" onmouseover="showMenu({'ctrlid':this.id,'pos':'12'})">

<a href="forum.php?mod=attachment&amp;aid=NjUzOTB8Njk3ZTZhYjd8MTMyMzA4MTY4NHw5MDM3MnwxNDcxNzg%3D" target="_blank">[DMG][Mirai nikki][01][848x480][BIG5].rmvb.torrent</a>

<em class="xg1">(17.22 KB, 下載次數: 1339)
</em>
</span>
<div class="tip tip_4" id="attach_65390_menu" style="position: absolute; display: none">
<div class="tip_c xs0">
<div class="y">2011-10-9 17:03 上傳</div>
下載次數: 1339

</div>
<div class="tip_horn"></div>
</div>
</ignore_js_op>
<br />

<ignore_js_op>

<img src="static/image/filetype/torrent.gif" border="0" class="vm" alt="" />
<span style="white-space: nowrap" id="attach_65689" onmouseover="showMenu({'ctrlid':this.id,'pos':'12'})">

<a href="forum.php?mod=attachment&amp;aid=NjU2ODl8NGUwNzEzZTN8MTMyMzA4MTY4NHw5MDM3MnwxNDcxNzg%3D" target="_blank">[DMG][Mirai nikki][02][848x480][BIG5].rmvb.torrent</a>

<em class="xg1">(19.97 KB, 下載次數: 1198)
</em>
</span>
<div class="tip tip_4" id="attach_65689_menu" style="position: absolute; display: none">
<div class="tip_c xs0">
<div class="y">2011-10-16 17:03 上傳</div>
下載次數: 1198

</div>
<div class="tip_horn"></div>
</div>
</ignore_js_op>
<br />

<ignore_js_op>

<img src="static/image/filetype/torrent.gif" border="0" class="vm" alt="" />
<span style="white-space: nowrap" id="attach_65972" onmouseover="showMenu({'ctrlid':this.id,'pos':'12'})">

<a href="forum.php?mod=attachment&amp;aid=NjU5NzJ8ZmNlNDY3NzV8MTMyMzA4MTY4NHw5MDM3MnwxNDcxNzg%3D" target="_blank">[DMG][Mirai nikki][03][848x480][BIG5].rmvb.torrent</a>

<em class="xg1">(17.76 KB, 下載次數: 1086)
</em>
</span>
<div class="tip tip_4" id="attach_65972_menu" style="position: absolute; display: none">
<div class="tip_c xs0">
<div class="y">2011-10-23 17:04 上傳</div>
下載次數: 1086

</div>
<div class="tip_horn"></div>
</div>
</ignore_js_op>
<br />

<ignore_js_op>

<img src="static/image/filetype/torrent.gif" border="0" class="vm" alt="" />
<span style="white-space: nowrap" id="attach_66180" onmouseover="showMenu({'ctrlid':this.id,'pos':'12'})">

<a href="forum.php?mod=attachment&amp;aid=NjYxODB8YTIwNzBiYjN8MTMyMzA4MTY4NHw5MDM3MnwxNDcxNzg%3D" target="_blank">[DMG][Mirai nikki][04][848x480][BIG5].rmvb.torrent</a>

<em class="xg1">(17.55 KB, 下載次數: 1103)
</em>
</span>
<div class="tip tip_4" id="attach_66180_menu" style="position: absolute; display: none">
<div class="tip_c xs0">
<div class="y">2011-10-30 20:55 上傳</div>
下載次數: 1103

</div>
<div class="tip_horn"></div>
</div>
</ignore_js_op>
<br />

<ignore_js_op>

<img src="static/image/filetype/torrent.gif" border="0" class="vm" alt="" />
<span style="white-space: nowrap" id="attach_66441" onmouseover="showMenu({'ctrlid':this.id,'pos':'12'})">

<a href="forum.php?mod=attachment&amp;aid=NjY0NDF8ZGI0OTIwYjV8MTMyMzA4MTY4NHw5MDM3MnwxNDcxNzg%3D" target="_blank">[DMG][Mirai nikki][05][848x480][BIG5].rmvb.torrent</a>

<em class="xg1">(20.18 KB, 下載次數: 1029)
</em>
</span>
<div class="tip tip_4" id="attach_66441_menu" style="position: absolute; display: none">
<div class="tip_c xs0">
<div class="y">2011-11-6 17:05 上傳</div>
下載次數: 1029

</div>
<div class="tip_horn"></div>
</div>
</ignore_js_op>
<br />

<ignore_js_op>

<img src="static/image/filetype/torrent.gif" border="0" class="vm" alt="" />
<span style="white-space: nowrap" id="attach_66721" onmouseover="showMenu({'ctrlid':this.id,'pos':'12'})">

<a href="forum.php?mod=attachment&amp;aid=NjY3MjF8YmQ1OTYzZTJ8MTMyMzA4MTY4NHw5MDM3MnwxNDcxNzg%3D" target="_blank">[DMG][Mirai nikki][06][848x480][BIG5].rmvb.torrent</a>

<em class="xg1">(17.56 KB, 下載次數: 996)
</em>
</span>
<div class="tip tip_4" id="attach_66721_menu" style="position: absolute; display: none">
<div class="tip_c xs0">
<div class="y">2011-11-13 17:08 上傳</div>
下載次數: 996

</div>
<div class="tip_horn"></div>
</div>
</ignore_js_op>
<br />

<ignore_js_op>

<img src="static/image/filetype/torrent.gif" border="0" class="vm" alt="" />
<span style="white-space: nowrap" id="attach_66986" onmouseover="showMenu({'ctrlid':this.id,'pos':'12'})">

<a href="forum.php?mod=attachment&amp;aid=NjY5ODZ8M2NjYzQ0OTZ8MTMyMzA4MTY4NHw5MDM3MnwxNDcxNzg%3D" target="_blank">[DMG][Mirai nikki][07][848x480][BIG5].rmvb.torrent</a>

<em class="xg1">(17.66 KB, 下載次數: 995)
</em>
</span>
<div class="tip tip_4" id="attach_66986_menu" style="position: absolute; display: none">
<div class="tip_c xs0">
<div class="y">2011-11-20 17:07 上傳</div>
下載次數: 995

</div>
<div class="tip_horn"></div>
</div>
</ignore_js_op>
<br />

<ignore_js_op>

<img src="static/image/filetype/torrent.gif" border="0" class="vm" alt="" />
<span style="white-space: nowrap" id="attach_67283" onmouseover="showMenu({'ctrlid':this.id,'pos':'12'})">

<a href="forum.php?mod=attachment&amp;aid=NjcyODN8MDQyNmI0NGR8MTMyMzA4MTY4NHw5MDM3MnwxNDcxNzg%3D" target="_blank">[DMG][Mirai nikki][08][848x480][BIG5].rmvb.torrent</a>

<em class="xg1">(19.18 KB, 下載次數: 862)
</em>
</span>
<div class="tip tip_4" id="attach_67283_menu" style="position: absolute; display: none">
<div class="tip_c xs0">
<div class="y">2011-11-27 17:05 上傳</div>
下載次數: 862

</div>
<div class="tip_horn"></div>
</div>
</ignore_js_op>
<br />

<ignore_js_op>

<img src="static/image/filetype/torrent.gif" border="0" class="vm" alt="" />
<span style="white-space: nowrap" id="attach_67658" onmouseover="showMenu({'ctrlid':this.id,'pos':'12'})">

<a href="forum.php?mod=attachment&amp;aid=Njc2NTh8YTkyY2EwMjd8MTMyMzA4MTY4NHw5MDM3MnwxNDcxNzg%3D" target="_blank">[DMG][Mirai nikki][09][848x480][BIG5].rmvb.torrent</a>

<em class="xg1">(17.46 KB, 下載次數: 306)
</em>
</span>
<div class="tip tip_4" id="attach_67658_menu" style="position: absolute; display: none">
<div class="tip_c xs0">
<div class="y">2011-12-4 17:09 上傳</div>
下載次數: 306

</div>
<div class="tip_horn"></div>
</div>
</ignore_js_op>
<br />
<br />
<br />
<font size="4"><a href="forum.php?mod=viewthread&amp;tid=144252" target="_blank">http://pieayu.com/forum.php?mod=viewthread&amp;tid=144252</a></font></td></tr></table>
</div>
<div id="comment_2721324" class="cm">
</div>
<div id="post_rate_div_2721324"></div>
</div></div>

</td></tr>
<tr><td class="plc plm">
<div class="modact"><a href="forum.php?mod=misc&amp;action=viewthreadmod&amp;tid=147178" title="帖子模式" onclick="showWindow('viewthreadmod', .........................................a lot more
Sorry for the mandarine within.
Thanks,Ted
 
Old 12-05-2011, 06:26 AM   #9
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983
I would keep it simple and use three different sed commands to retrieve the three different items. You might store the result into arrays, then loop over their content, e.g.
Code:
#!/bin/bash
OLD_IFS=${IFS}
IFS=$'\n'
num=( $(sed -rn '/id=.*onmouseover/s/.*attach_([0-9]+).*/\1/p' file) )
text=( $(sed -rn '/onmouseover=/s/.*onmouseover="showMenu\((.*)\).*/\1/p' file) )
string=( $(sed -rn '/target=/s/.*>(.*).torrent.*/\1/p' file) )
for i in $(seq 0 $((${#num[@]}-1)))
do
  echo ${num[$i]}
  echo ${text[$i]}
  echo ${string[$i]}
done
IFS=${OLD_IFS}
The replacement and the subsequent restore of the IFS variable is due to blank spaces in the results of the sed command (in particular the torrent file names contain spaces). Hope this helps.
 
1 members found this post helpful.
Old 12-05-2011, 07:19 AM   #10
jschiwal
LQ Guru
 
Registered: Aug 2001
Location: Fargo, ND
Distribution: SuSE AMD64
Posts: 15,733

Rep: Reputation: 682Reputation: 682Reputation: 682Reputation: 682Reputation: 682Reputation: 682
You can use -e or ";" to separate sed commands. You don't want to run sed 3 times.

This seems to work:
sed -n '/href=.*target="_blank"/s|.*<a href="\(.*\)" target="_blank">\(.*\).torrent<\/a>|variable=\1 string=\2|p' aa.html

The first part "/.../" matches patterns for the rest of sed to work with.
The -n option causes sed to not output lines unless you use the "p" command. This allows us to only output lines that match.

For much more complicated sed programs, create a file with the sed commands and use "sed -f sedprogram file"

Last edited by jschiwal; 12-05-2011 at 07:21 AM.
 
2 members found this post helpful.
Old 12-05-2011, 09:20 AM   #11
ted_chou12
Member
 
Registered: Aug 2010
Location: Zhongli, Taoyuan
Distribution: slackware, windows, debian (armv4l GNU/Linux)
Posts: 431

Original Poster
Blog Entries: 32

Rep: Reputation: 3
Thanks,
@jschiwal that gave perfect outcome.
@colucix, thanks, it did gave me the perfect %string%, %text% and %number%, but I was looking for %variable% and %string%. I tried to modify the code slightly to work, but I am quite a rookieXD, here is what I tried:
Code:
OLD_IFS=${IFS}
IFS=$'\n'
num=( $(sed -rn '/id=.*onmouseover/s/.*attach_([0-9]+).*/\1/p' aa.html) )
var=( $(sed -rn '/a\shref="(.*)"/\1/p' aa.html) )
string=( $(sed -rn '/target=/s/.*>(.*).torrent.*/\1/p' aa.html) )
for i in $(seq 0 $((${#num[@]}-1)))
do
  echo ${num[$i]}
  echo ${var[$i]}
  echo ${string[$i]}
done
IFS=${OLD_IFS}
would you guide me in the correct direction for this to work too? (I wish to learn how to use sed better.) BTW, I learnt a new use of IFS from your code.
Thanks,
Ted
 
Old 12-05-2011, 10:20 AM   #12
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983
Slightly modified:
Code:
#!/bin/bash
OLD_IFS=${IFS}
IFS=$'\n'
variable=( $(sed -rn '/target=/s/.*href="([^"]+)".*>.*.torrent.*/\1/p' file) )
string=( $(sed -rn '/target=/s/.*>(.*).torrent.*/\1/p' file) )
for i in $(seq 0 $((${#variable[@]}-1)))
do
  echo ${variable[$i]}
  echo ${string[$i]}
done
IFS=${OLD_IFS}
@jschiwal, I agree about limiting the number of sed commands to speed up the script. However I cannot think a method to assign results to separate variables, as requested by the OP. Unless we use a while read loop like this (without using shell arrays):
Code:
while read variable string
do
  echo $variable
  echo $string
done < <(sed -rn '/target=/s/.*href="([^"]+)".*>(.*).torrent.*/\1 \2/p' file)
 
1 members found this post helpful.
Old 12-05-2011, 11:07 AM   #13
ted_chou12
Member
 
Registered: Aug 2010
Location: Zhongli, Taoyuan
Distribution: slackware, windows, debian (armv4l GNU/Linux)
Posts: 431

Original Poster
Blog Entries: 32

Rep: Reputation: 3
Thanks, that was perfect!
 
Old 12-09-2011, 03:10 AM   #14
jschiwal
LQ Guru
 
Registered: Aug 2001
Location: Fargo, ND
Distribution: SuSE AMD64
Posts: 15,733

Rep: Reputation: 682Reputation: 682Reputation: 682Reputation: 682Reputation: 682Reputation: 682
My main point is that each sed command starts at the beginning of the file. You would make three reports instead of one, and if one tag is missing or a file is modified between sed commands, the lists could become misaligned. I agree that arrays are needed to hold all the values. The information extracted is incomplete because there is no meaningful field, or index (hash) associated with the lines.
You could gather statistic type info from it, but I think one multi field report would be more flexible than three lists.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
sed match ted_chou12 Programming 7 04-20-2011 03:49 AM
[SOLVED] How would I use awk or sed to match this? ted_chou12 Programming 7 04-08-2011 04:27 AM
[SOLVED] Sed, how do I match even characters only? trist007 Linux - Newbie 3 09-03-2010 07:11 PM
sed - last occurence of a match mr_scary Linux - Desktop 7 02-13-2009 12:44 PM
grep/sed/awk - find match, then match on next line gctaylor1 Programming 3 07-11-2007 08:55 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 11:50 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration