Linux - NewbieThis Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Hi, the variable parts are represented by %number%, %text%. They can be any length. %number% only has a string of numbers like 21414. Where as %text% and %string% can match any length of string of text or number. The part that I want to extract is %variable% and %string% separately.
Code:
array=(echo "$string" | sed '<img src="static/image/filetype/torrent.gif" border="0" class="vm" alt="" />
<span style="white-space: nowrap" id="attach_[0-9]" onmouseover="showMenu\((*.)\)">
<a href="$1" target="_blank">$2.torrent</a>
<em class="xg1">')
for $vars in $array ..
this part is ignored.
Would it be something like this?
Thanks,
Ted
Last edited by ted_chou12; 12-05-2011 at 02:35 AM.
That's close, but you're grep match isn't what you want. As is, you're looking for lines that start with <a href=". There will probably be other stuff before you hit the <a> tag. So, maybe just remove the ^. Or, account for whatever you expect to find before the tag.
XXXTOP Part of HTML
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
variable
variable
variable
variable,text
XXXXXXXXXXXXXXBottom part of HTML
So it isn't matching all of it, or at least, not partially extracting. I am guessing because I have multiple ones to extract within one page. How would I go about doing this?
Thanks,
Ted
I would keep it simple and use three different sed commands to retrieve the three different items. You might store the result into arrays, then loop over their content, e.g.
Code:
#!/bin/bash
OLD_IFS=${IFS}
IFS=$'\n'
num=( $(sed -rn '/id=.*onmouseover/s/.*attach_([0-9]+).*/\1/p' file) )
text=( $(sed -rn '/onmouseover=/s/.*onmouseover="showMenu\((.*)\).*/\1/p' file) )
string=( $(sed -rn '/target=/s/.*>(.*).torrent.*/\1/p' file) )
for i in $(seq 0 $((${#num[@]}-1)))
do
echo ${num[$i]}
echo ${text[$i]}
echo ${string[$i]}
done
IFS=${OLD_IFS}
The replacement and the subsequent restore of the IFS variable is due to blank spaces in the results of the sed command (in particular the torrent file names contain spaces). Hope this helps.
You can use -e or ";" to separate sed commands. You don't want to run sed 3 times.
This seems to work:
sed -n '/href=.*target="_blank"/s|.*<a href="\(.*\)" target="_blank">\(.*\).torrent<\/a>|variable=\1 string=\2|p' aa.html
The first part "/.../" matches patterns for the rest of sed to work with.
The -n option causes sed to not output lines unless you use the "p" command. This allows us to only output lines that match.
For much more complicated sed programs, create a file with the sed commands and use "sed -f sedprogram file"
Thanks,
@jschiwal that gave perfect outcome.
@colucix, thanks, it did gave me the perfect %string%, %text% and %number%, but I was looking for %variable% and %string%. I tried to modify the code slightly to work, but I am quite a rookieXD, here is what I tried:
Code:
OLD_IFS=${IFS}
IFS=$'\n'
num=( $(sed -rn '/id=.*onmouseover/s/.*attach_([0-9]+).*/\1/p' aa.html) )
var=( $(sed -rn '/a\shref="(.*)"/\1/p' aa.html) )
string=( $(sed -rn '/target=/s/.*>(.*).torrent.*/\1/p' aa.html) )
for i in $(seq 0 $((${#num[@]}-1)))
do
echo ${num[$i]}
echo ${var[$i]}
echo ${string[$i]}
done
IFS=${OLD_IFS}
would you guide me in the correct direction for this to work too? (I wish to learn how to use sed better.) BTW, I learnt a new use of IFS from your code.
Thanks,
Ted
#!/bin/bash
OLD_IFS=${IFS}
IFS=$'\n'
variable=( $(sed -rn '/target=/s/.*href="([^"]+)".*>.*.torrent.*/\1/p' file) )
string=( $(sed -rn '/target=/s/.*>(.*).torrent.*/\1/p' file) )
for i in $(seq 0 $((${#variable[@]}-1)))
do
echo ${variable[$i]}
echo ${string[$i]}
done
IFS=${OLD_IFS}
@jschiwal, I agree about limiting the number of sed commands to speed up the script. However I cannot think a method to assign results to separate variables, as requested by the OP. Unless we use a while read loop like this (without using shell arrays):
Code:
while read variable string
do
echo $variable
echo $string
done < <(sed -rn '/target=/s/.*href="([^"]+)".*>(.*).torrent.*/\1 \2/p' file)
My main point is that each sed command starts at the beginning of the file. You would make three reports instead of one, and if one tag is missing or a file is modified between sed commands, the lists could become misaligned. I agree that arrays are needed to hold all the values. The information extracted is incomplete because there is no meaningful field, or index (hash) associated with the lines.
You could gather statistic type info from it, but I think one multi field report would be more flexible than three lists.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.