LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   Extract string in one line (https://www.linuxquestions.org/questions/linux-newbie-8/extract-string-in-one-line-939722/)

tquang 04-13-2012 09:40 PM

Extract string in one line
 
Hi have string
Code:

<?xml version="1.0" encoding="utf-8"?><playlist version="1" xmlns:jwplayer="http://developer.longtailvideo.com/"><trackList>    <track>        <title><![CDATA[V́ Một Người Ra Đi]]></title>        <creator><![CDATA[Ưng Hoàng Phúc]]></creator>        <location><![CDATA[http://stream31.gonct.info/d1662f31a7693fbf14f8f697ce5795ba/4f88d04d/NhacCuaTui123/Vi Mot Nguoi Ra di - Ung Hoang Phuc [NCT 21634274224197812500].mp3]]></location>        <info><![CDATA[http://www.nhaccuatui.com/nghe?M=PJNGXDuPk0]]></info>    <image><![CDATA[http://static.nhaccuatui.com/generals/logo-player.jpg]]></image>    <jwplayer:adv.enable>true</jwplayer:adv.enable>    <jwplayer:adv.link><![CDATA[http://www.nhaccuatui.com/clickqc/wtfpwnghn/wefdkmgois]]></jwplayer:adv.link>    <jwplayer:adv.file><![CDATA[http://stc.nct.nixcdn.com/imgqc/2012/04/vfresh_plinner_0404-634691279827076250.swf?bid=wtfpwnghn&skey=wefdkmgois&view=yes]]></jwplayer:adv.file>    </track></trackList><trackList>    <track>        <title><![CDATA[Gọi Tên Em Trong Đêm]]></title>        <creator><![CDATA[The Men]]></creator>        <location><![CDATA[http://stream42.gonct.info/3902a50be3234d3ed4077a44f2f3bc16/4f88d04d/NhacCuaTui184/Goi Ten Em Trong dem - The Men [NCT 586345336925975812508].mp3]]></location>        <info><![CDATA[http://www.nhaccuatui.com/nghe?M=RrJkLmWoVs]]></info>    <image><![CDATA[http://static.nhaccuatui.com/generals/logo-player.jpg]]></image>    <jwplayer:adv.enable>true</jwplayer:adv.enable>    <jwplayer:adv.link><![CDATA[http://www.nhaccuatui.com/clickqc/wtfptojaw/wefdkmgois]]></jwplayer:adv.link>    <jwplayer:adv.file><![CDATA[http://stc.nct.nixcdn.com/imgqc/2012/04/h&s_plinner_1302_final-634693320016533750.swf?bid=wtfptojaw&skey=wefdkmgois&view=yes]]></jwplayer:adv.file>    </track></trackList><trackList>    <track>        <title><![CDATA[Nắng Có C̣n Xuân]]></title>        <creator><![CDATA[ Nhóm Mặt Trời]]></creator>        <location><![CDATA[http://stream62.gonct.info/557114bb0d7c1d2f88fca23f70dad7c2/4f88d04d/NhacCuaTui033/Nang co con xuan - Nhom Mat Troi [NCT 4564256566].mp3]]></location>        <info><![CDATA[http://www.nhaccuatui.com/nghe?M=dFpjf69OL0]]></info>    <image><![CDATA[http://static.nhaccuatui.com/generals/logo-player.jpg]]></image>    <jwplayer:adv.enable>true</jwplayer:adv.enable>    <jwplayer:adv.link><![CDATA[http://www.nhaccuatui.com/clickqc/wtfpezpyb/wefdkmgois]]></jwplayer:adv.link>    <jwplayer:adv.file><![CDATA[http://stc.nct.nixcdn.com/imgqc/2012/04/vaioe_300x250_nhacucatui_phase2-634695633595440000.swf?bid=wtfpezpyb&skey=wefdkmgois&view=yes]]></jwplayer:adv.file>    </track></trackList><trackList>    <track>        <title><![CDATA[Trái Tim Không Ngủ Yên]]></title>        <creator><![CDATA[Mỹ Linh, Bằng Kiều]]></creator>
And i need all strings like:
Code:

http://stream62.gonct.info/557114bb0d7c1d2f88fca23f70dad7c2/4f88d04d/NhacCuaTui033/Nang co con xuan - Nhom Mat Troi [NCT 4564256566].mp3
How can I do it? Thank you very much.

towheedm 04-13-2012 10:43 PM

Is you xml string (the first one) all on one line or multiple lines? XML tags are normally on separate line. This is important on how the strings you want are extracted. Could you post the schema as it is in the file.

David the H. 04-14-2012 12:41 PM

The best way to handle xml documents is to use an actual xml parser. I suggest you check out xmlstarlet, or a language like perl that has an xml module available.

Trying to do such work with something like awk or sed is likely to be much trickier to get working right. They just aren't designed for handling such data schemas.

xuta 04-14-2012 01:03 PM

Hi Vietnamese guy, I am Vietnamese, too

I suppose you string store in nct.txt, so for extract the link you want, use command below

Code:

cat nct.txt | awk -F"<location>" '{print $4}' | grep -o "http.*mp3"

tquang 04-15-2012 09:30 PM

Thank all my friends. I was rewrote new script for downloading.
However, i have new question after downloaded file: move/rename multi file

Original file name:
Quote:

Cho Nguoi Tinh Nho - Tuan Vu [NCT 48634443728911412500_128].mp3
Nang Chieu - The Son [NCT 46633943472496991250].mp3
Chot thay dem buon - Chu Bin [NCT 77634000197958750000].mp3
To new name:
Quote:

Cho Nguoi Tinh Nho - Tuan Vu.mp3
Nang Chieu - The Son.mp3
Chot thay dem buon - Chu Bin.mp3

Tinkster 04-15-2012 09:48 PM

Does that help?

Code:

echo Cho Nguoi Tinh Nho - Tuan Vu [NCT 48634443728911412500_128].mp3 | sed -r 's/ \[[^]]+\]//'
Or a "naive" approach w/o regex using bash ...
Code:

testing="Cho Nguoi Tinh Nho - Tuan Vu [NCT 48634443728911412500_128].mp3"
echo ${testing%% [*}.mp3


tquang 04-21-2012 05:14 AM

Quote:

Originally Posted by Tinkster (Post 4653643)
Does that help?

Code:

echo Cho Nguoi Tinh Nho - Tuan Vu [NCT 48634443728911412500_128].mp3 | sed -r 's/ \[[^]]+\]//'
Or a "naive" approach w/o regex using bash ...
Code:

testing="Cho Nguoi Tinh Nho - Tuan Vu [NCT 48634443728911412500_128].mp3"
echo ${testing%% [*}.mp3


I have list link in file list.txt
Contents in list
Code:

http://site/com/Cho Nguoi Tinh Nho - Tuan Vu [NCT 48634443728911412500_128].mp3
http://site/com/Nang Chieu - The Son [NCT 46633943472496991250].mp3
http://site/com/Chot thay dem buon - Chu Bin [NCT 77634000197958750000].mp3

I want after downloaded, file will be storage in /path/file/saved/ and renamed to
Code:

Cho Nguoi Tinh Nho - Tuan Vu.mp3
Nang Chieu - The Son.mp3
Chot thay dem buon - Chu Bin.mp3

And my code
Code:

cat list.txt | while read url; do wget -P /path/file/saved/ --continue "${url}"; done
How do I continue for that?
Thank you.

Tinkster 04-21-2012 05:42 PM

I'd do something like this ...

Code:

while read url
do
  outfile=$(echo ${url}| sed -r -e 's/ \[[^]]+\]//' -e 's@^.*/@@')
  wget -O /path/file/saved/${outfile} -c "${url}"
done < list.txt



All times are GMT -5. The time now is 10:02 PM.