LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 12-06-2011, 01:54 PM   #1
ted_chou12
Member
 
Registered: Aug 2010
Location: Zhongli, Taoyuan
Distribution: slackware, windows, debian (armv4l GNU/Linux)
Posts: 425
Blog Entries: 28

Rep: Reputation: 2
match html with sed


Hi, sorry, I have another question regarding matching for html using sed:
Code:
<input type="hidden" name="formhash" value="4e17ac62">
This "4e17ac62" is what I want. But this line is in a large html text which could be at the start of the line, middle or end.
I have came up with:
Code:
echo $(cat "/tmp/torrents.html.tmp" | sed -r 's@<input type="hidden" name="formhash" value="(.*)">@\1@')
but it doesnt match

EDIT;
I have a more success trial:
Code:
sh-3.1# echo $(cat "/tmp/signin.html.tmp" | sed -n 's@<input type="hidden" name="formhash" value="\(.*\)">@\1@p')
<form id="qiandao" method="post" action="plugin.php?id=dsu_paulsign:sign&amp;operation=qiandao&amp;infloat=1" onkeydown="if(event.keyCode==13){showWindow('qwindow', 'qiandao', 'post', '0');return false}"> 0bdc06af
but I only want "0bdc06af"


Thanks,
Ted

Last edited by ted_chou12; 12-06-2011 at 02:00 PM.
 
Old 12-06-2011, 02:00 PM   #2
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978
Hi Ted. Apparently it's correct and it matches in my tests. Maybe there is some other issue in your original file. Please, could you post an excerpt of the html file containing the line above?
 
Old 12-06-2011, 02:04 PM   #3
ted_chou12
Member
 
Registered: Aug 2010
Location: Zhongli, Taoyuan
Distribution: slackware, windows, debian (armv4l GNU/Linux)
Posts: 425
Blog Entries: 28

Original Poster
Rep: Reputation: 2
Thanks, I think I've got it:
Code:
echo $(cat "/tmp/signin.html.tmp" | sed -n 's@.*<input type="hidden" name="formhash" value="\(.*\)">@\1@p')


---------- Post added 12-06-11 at 03:05 PM ----------

HTML SAMPLE
Code:
          <form id="qiandao" method="post" action="plugin.php?id=dsu_paulsign:sign&amp;operation=qiandao&amp;infloat=1" onkeydown="if(event.keyCode==13){showWindow('qwindow', 'qiandao', 'post', '0');return false}">  <input type="hidden" name="formhash" value="4e17ac62">

          <table width="100%" cellpadding="0" cellspacing="0" align="center">

            <tr>

              <td class="tr3 tac">

                <ul class="qdsmile">

                  <input id="kx_s" type="radio" name="qdxq" value="kx" style="display:none"><li id="kx" onclick="Icon_selected(this.id)" onmouseover="showMenu({'ctrlid':this.id, 'pos':'21'});"><center><img src="source/plugin/dsu_paulsign/img/kx.gif" /><br /></center></li>

                  <input id="ng_s" type="radio" name="qdxq" value="ng" style="display:none"><li id="ng" onclick="Icon_selected(this.id)" onmouseover="showMenu({'ctrlid':this.id, 'pos':'21'});"><center><img src="source/plugin/dsu_paulsign/img/ng.gif" /><br /></center></li>

                  <input id="ym_s" type="radio" name="qdxq" value="ym" style="display:none"><li id="ym" onclick="Icon_selected(this.id)" onmouseover="showMenu({'ctrlid':this.id, 'pos':'21'});"><center><img src="source/plugin/dsu_paulsign/img/ym.gif" /><br /></center></li>

                  <input id="wl_s" type="radio" name="qdxq" value="wl" style="display:none"><li id="wl" onclick="Icon_selected(this.id)" onmouseover="showMenu({'ctrlid':this.id, 'pos':'21'});"><center><img src="source/plugin/dsu_paulsign/img/wl.gif" /><br /></center></li>

                  <input id="nu_s" type="radio" name="qdxq" value="nu" style="display:none"><li id="nu" onclick="Icon_selected(this.id)" onmouseover="showMenu({'ctrlid':this.id, 'pos':'21'});"><center><img src="source/plugin/dsu_paulsign/img/nu.gif" /><br /></center></li>

                  <input id="ch_s" type="radio" name="qdxq" value="ch" style="display:none"><li id="ch" onclick="Icon_selected(this.id)" onmouseover="showMenu({'ctrlid':this.id, 'pos':'21'});"><center><img src="source/plugin/dsu_paulsign/img/ch.gif" /><br /></center></li>

                  <input id="fd_s" type="radio" name="qdxq" value="fd" style="display:none"><li id="fd" onclick="Icon_selected(this.id)" onmouseover="showMenu({'ctrlid':this.id, 'pos':'21'});"><center><img src="source/plugin/dsu_paulsign/img/fd.gif" /><br /></center></li>

                  <input id="yl_s" type="radio" name="qdxq" value="yl" style="display:none"><li id="yl" onclick="Icon_selected(this.id)" onmouseover="showMenu({'ctrlid':this.id, 'pos':'21'});"><center><img src="source/plugin/dsu_paulsign/img/yl.gif" /><br /></center></li>

                  <input id="shuai_s" type="radio" name="qdxq" value="shuai" style="display:none"><li id="shuai" onclick="Icon_selected(this.id)" onmouseover="showMenu({'ctrlid':this.id, 'pos':'21'});"><center
 
Old 12-06-2011, 02:07 PM   #4
ted_chou12
Member
 
Registered: Aug 2010
Location: Zhongli, Taoyuan
Distribution: slackware, windows, debian (armv4l GNU/Linux)
Posts: 425
Blog Entries: 28

Original Poster
Rep: Reputation: 2
nope, its still not working...
 
Old 12-06-2011, 02:27 PM   #5
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Debian sid + kde 3.5 & 4.4
Posts: 6,826

Rep: Reputation: 1973Reputation: 1973Reputation: 1973Reputation: 1973Reputation: 1973Reputation: 1973Reputation: 1973Reputation: 1973Reputation: 1973Reputation: 1973Reputation: 1973
Code:
echo $(cat "/tmp/signin.html.tmp" | sed -n 's@.*<input type="hidden" name="formhash" value="\(.*\)">@\1@p')
We have a twofer! Useless use of cat, and useless use of echo! sed can take a file name as an argument, and prints to stdout, so both of the other commands are redundant.


I think you may be over-complicating this. But also, could you please explain what it is exactly that distinguishes the line you want from all the rest? Is name="formhash" enough? Or does it really need type="hidden" as well? It would be much easier if we could target a single word or short string, especially since html uses a free-form syntax and there's no guarantee that all the parts will be on the same line.

It might even be worthwhile to look into piping the file through htmltidy or similar first, to ensure that the formatting is regular.


But assuming that everything is on one line, perhaps this is all you need?
Code:
sed -rn '\|formhash| s|.*value="([^"]+)".*|\1|p' file.html
if you really need to target the line more specifically, then you can nest the command like this:
Code:
sed -rn '\|type="hidden"| { \|name="formhash"| s|.*value="([^"]+)".*|\1|p}' file.html

Last edited by David the H.; 12-06-2011 at 02:31 PM. Reason: minor formatting
 
1 members found this post helpful.
Old 12-06-2011, 02:41 PM   #6
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978
Two notes:

1. when you use the s command in sed, take in mind that the substitution is performed for every string matching the regular expression, but the rest is left intact (and printed out). If you want to extract only a part of the matching line (and discard all the other lines) you have to suppress the normal output with option -n and print explicitly using the p modifier, e.g.
Code:
sed -rn 's/some(th)ing/\1/p' file
2. when you want to extract a string using the substitution, you have to match the whole line with your regular expression, otherwise only the matched part is substituted and the rest of the line is left intact. Usually you do this by means of the .* pattern at the beginning and at the end of the regular expression, e.g.
Code:
sed -rn 's/.*some(th)ing.*/\1/p' file
At this point you can try:
Code:
sed -rn 's@.*<input type="hidden" name="formhash" value="(.*)">.*@\1@p' /tmp/torrents.html.tmp
Edit: too late! Sorry for redundancy, I left this thread open for some minutes and I didn't see the David's answer before posting.

Last edited by colucix; 12-06-2011 at 02:50 PM.
 
2 members found this post helpful.
Old 12-06-2011, 02:49 PM   #7
ted_chou12
Member
 
Registered: Aug 2010
Location: Zhongli, Taoyuan
Distribution: slackware, windows, debian (armv4l GNU/Linux)
Posts: 425
Blog Entries: 28

Original Poster
Rep: Reputation: 2
Thanks,
Ted
 
Old 12-06-2011, 03:22 PM   #8
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Debian sid + kde 3.5 & 4.4
Posts: 6,826

Rep: Reputation: 1973Reputation: 1973Reputation: 1973Reputation: 1973Reputation: 1973Reputation: 1973Reputation: 1973Reputation: 1973Reputation: 1973Reputation: 1973Reputation: 1973
Notice also how I moved the part of the match that differentiates the desired line into the address field of the sed expression. This helps make it more accurate and efficient.

to explain, the basic sed expression syntax looks like this:

<optional address 1>,<optional address 2> commands

The addresses are generally either line numbers or regular expressions surrounded by "/../" (other delimiters can be used if first preceded by "\". I used "\|..|" above) If a line matches an address or address range, then the commands following it will be applied. The default is to apply them to all lines.

So in my first example, only if a line contains the string "formhash" does it then use the substitution command and extract the value from the string value="", and prints it (re colucix' post).

Here are a few useful sed references.
http://www.grymoire.com/Unix/Sed.html
http://sed.sourceforge.net/grabbag/
http://sed.sourceforge.net/sedfaq.html
http://sed.sourceforge.net/sed1line.txt
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] How can I use sed to match this? ted_chou12 Linux - Newbie 13 12-09-2011 03:10 AM
sed match ted_chou12 Programming 7 04-20-2011 03:49 AM
[SOLVED] How would I use awk or sed to match this? ted_chou12 Programming 7 04-08-2011 04:27 AM
[SOLVED] Sed, how do I match even characters only? trist007 Linux - Newbie 3 09-03-2010 07:11 PM
grep/sed/awk - find match, then match on next line gctaylor1 Programming 3 07-11-2007 08:55 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 12:13 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration