Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place! |
Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
|
 |
06-29-2009, 09:40 PM
|
#1
|
LQ Newbie
Registered: Jun 2009
Posts: 3
Rep:
|
Get a list of delimited filenames from a text file (sed?)
Hi, I'm really new to Bash, so this could sound silly to most of you. I'm trying to get a list of some filenames from a text file. Tried to do this with sed and awk, but couldn't get it to work with my limited knowledge.
This is a sample file content:
<?xml version="1.0" encoding="utf-8"?>
<!-- Generator: Adobe Illustrator 13.0.1, SVG Export Plug-In . SVG Version: 6.00 Build 14948) -->
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<svg version="1.1" id="Layer_1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" x="0px" y="0px"
width="471.677px" height="126.604px" viewBox="0 0 471.677 126.604" enable-background="new 0 0 471.677 126.604"
xml:space="preserve">
<rect x="0.01" y="1.27" fill="none" width="471.667" height="125.333"/>
<text transform="matrix(1 0 0 1 0.0098 8.3701)"><tspan x="0" y="0" font-family="'MyriadPro-Regular'" font-size="10">/Volumes/Secondary500/Temp/Untitled-2_Layer 1 copy 2.pdf</tspan><tspan x="0" y="12" font-family="'MyriadPro-Regular'" font-size="10">/Volumes/Secondary500/Temp/Untitled-2_Layer 1 copy.pdf</tspan><tspan x="0" y="24" font-family="'MyriadPro-Regular'" font-size="10">/Volumes/Secondary500/Temp/Untitled-2_Layer 1.pdf</tspan></text>
</svg>
What I would like to get from this sample is a new text file with this exact content:
/Volumes/Secondary500/Temp/Untitled-2_Layer 1 copy 2.pdf
/Volumes/Secondary500/Temp/Untitled-2_Layer 1 copy.pdf
/Volumes/Secondary500/Temp/Untitled-2_Layer 1.pdf
I thought telling sed to print all the matching entries between 'font-size"10">' and '</tspan>' but... the best I got was a file with the whole line contaning my field delimiters.
If you could explain each step done, would be great.
The filenames could be more or less. This 3 are just an example.
|
|
|
06-29-2009, 09:54 PM
|
#2
|
LQ Veteran
Registered: Nov 2005
Location: Annapolis, MD
Distribution: Mint
Posts: 17,809
|
Using your example, the syntax would be simple---ie find all patterns beginning in "/Volumes" and ending in ".pdf"
The Regex would be: "/Volumes.*\.pdf"
So--verify what the criteria should be, and post some sample code. Also, what references (books, tutorials, etc.) are you using?
|
|
|
06-29-2009, 10:04 PM
|
#3
|
LQ Guru
Registered: Mar 2006
Location: Sydney, Australia
Distribution: Fedora, CentOS, OpenSuse, Slack, Gentoo, Debian, Arch, PCBSD
Posts: 6,678
Rep: 
|
Or
Code:
sed -e 's/.*font-size="10">\(.*\)<\/tspan>/\1/' your_input_file
where \(.*\) effectively picks up the pattern between "10"> and </tspan>, and replaces the line with it (\1).
Last edited by billymayday; 06-29-2009 at 10:06 PM.
|
|
|
06-29-2009, 11:52 PM
|
#4
|
LQ Newbie
Registered: Jun 2009
Posts: 3
Original Poster
Rep:
|
Thanks for such a quick reply!
I've tried aready both methods, from billymayday and pixellany. I think I'm getting them both wrong though :b
1) Here is my code for pixellany solution:
#!/bin/bash
DEBUGGINGDIR=/Volumes/Secondary500/Temp
FILE=$DEBUGGINGDIR/*.svg
PRINTFILE=$DEBUGGINGDIR/10pt.txt
cat $FILE | awk -F '/Volumes.*\.pdf' '{print $2;}' > $PRINTFILE
And this is the output I get from it (the input .svg file content is the initially given example):
</tspan></text>
What am I doing wrong?
(I'm learning from a lot of web pages like http://linux.org.mt/article/terminal, http://www.cs.hmc.edu/tech_docs/qref/sed.html, http://ftp.gnu.org/old-gnu/Manuals/s...ter/sed_3.html + google, man pages from apple, since I'm using OS X 10.5, Would you recommend me a good one? thanks.)
2) And this is the code I used for billymayday solution:
#!/bin/bash
DEBUGGINGDIR=/Volumes/Secondary500/Temp
FILE=$DEBUGGINGDIR/*.svg
PRINTFILE=$DEBUGGINGDIR/10pt.txt
sed -e 's/.*font-size="10">\(.*\)<\/tspan>/\1/g' $FILE > $PRINTFILE
And gave me this output (got only the first filename, tried adding "g" afterwards, but didn't work):
<?xml version="1.0" encoding="utf-8"?>
<!-- Generator: Adobe Illustrator 13.0.1, SVG Export Plug-In . SVG Version: 6.00 Build 14948) -->
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<svg version="1.1" id="Layer_1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" x="0px" y="0px"
width="362.51px" height="97.437px" viewBox="0 0 362.51 97.437" enable-background="new 0 0 362.51 97.437" xml:space="preserve">
<rect x="0.01" y="1.27" fill="none" width="362.5" height="96.167"/>
/Volumes/Secondary500/Temp/Untitled-2_Layer 1.pdf</text>
</svg>
What should I fix? I've been trying different approaches, but still can't make it :s
|
|
|
06-30-2009, 12:07 AM
|
#5
|
LQ Guru
Registered: Mar 2006
Location: Sydney, Australia
Distribution: Fedora, CentOS, OpenSuse, Slack, Gentoo, Debian, Arch, PCBSD
Posts: 6,678
Rep: 
|
Yes, well I guess a bit more testing would have helped, huh?
Code:
sed -n -e 's/.*font-size="10">\(.*\)<\/tspan>.*/\1/p' test1
looks better.
|
|
|
06-30-2009, 05:22 AM
|
#6
|
LQ Veteran
Registered: Nov 2005
Location: Annapolis, MD
Distribution: Mint
Posts: 17,809
|
When I suggested the structure of the Regex to be used, I did not mean that you would use it as the field separator in AWK......
Here is just one way to do this in SED:
Code:
sed -n 's/.*\(word\).*/\1/p' filename
Translation:
suppress printing unless stated.
for any line containing "word", replace the entire line with "word", then print.
Will only pick up one instance of "word" per line.....
How about "grep -o"?
|
|
|
06-30-2009, 05:47 AM
|
#7
|
Senior Member
Registered: Aug 2006
Posts: 2,697
|
minimal regular expression.
Code:
awk 'BEGIN{RS="</tspan>";FS=">"}{ print $NF}' file
output
Code:
# more file
<?xml version="1.0" encoding="utf-8"?>
<!-- Generator: Adobe Illustrator 13.0.1, SVG Export Plug-In . SVG Version: 6.00 Build 14948) -->
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<svg version="1.1" id="Layer_1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" x="0px" y="0px"
width="471.677px" height="126.604px" viewBox="0 0 471.677 126.604" enable-background="new 0 0 471.677 126.604"
xml:space="preserve">
<rect x="0.01" y="1.27" fill="none" width="471.667" height="125.333"/>
<text transform="matrix(1 0 0 1 0.0098 8.3701)"><tspan x="0" y="0" font-family="'MyriadPro-Regular'" font-size="10">/Volumes/Secondary500/T
emp/Untitled-2_Layer 1 copy 2.pdf</tspan><tspan x="0" y="12" font-family="'MyriadPro-Regular'" font-size="10">/Volumes/Secondary500/Temp/Un
titled-2_Layer 1 copy.pdf</tspan><tspan x="0" y="24" font-family="'MyriadPro-Regular'" font-size="10">/Volumes/Secondary500/Temp/Untitled-2
_Layer 1.pdf</tspan></text>
</svg>
# ./testnew.sh
/Volumes/Secondary500/Temp/Untitled-2_Layer 1 copy 2.pdf
/Volumes/Secondary500/Temp/Untitled-2_Layer 1 copy.pdf
/Volumes/Secondary500/Temp/Untitled-2_Layer 1.pdf
Last edited by ghostdog74; 06-30-2009 at 05:49 AM.
|
|
|
06-30-2009, 05:48 AM
|
#8
|
Senior Member
Registered: Aug 2006
Posts: 2,697
|
Quote:
Originally Posted by billymayday
Yes, well I guess a bit more testing would have helped, huh?
Code:
sed -n -e 's/.*font-size="10">\(.*\)<\/tspan>.*/\1/p' test1
looks better.
|
unless all of them are on single line (which i doubt), the above only can get 1 result.
|
|
|
06-30-2009, 05:57 AM
|
#9
|
LQ Guru
Registered: Mar 2006
Location: Sydney, Australia
Distribution: Fedora, CentOS, OpenSuse, Slack, Gentoo, Debian, Arch, PCBSD
Posts: 6,678
Rep: 
|
Quote:
Originally Posted by ghostdog74
unless all of them are on single line (which i doubt), the above only can get 1 result.
|
Don't you mean unless they're all on different lines? If they're on the same line, you'll only get one result.
I didn't spend that long on the data to be honest.
|
|
|
06-30-2009, 06:26 AM
|
#10
|
LQ Veteran
Registered: Nov 2005
Location: Annapolis, MD
Distribution: Mint
Posts: 17,809
|
If I can get the filenames to not have line breaks in them, then this works:
grep -o '/Volumes.*pdf' file
|
|
|
06-30-2009, 06:34 AM
|
#11
|
Senior Member
Registered: Aug 2006
Posts: 2,697
|
Quote:
Originally Posted by billymayday
Don't you mean unless they're all on different lines? If they're on the same line, you'll only get one result.
I didn't spend that long on the data to be honest.
|
yes, pardon my english. if they are all on the same line then the sed without non-greedy parameter, it will have 1 result..
|
|
|
06-30-2009, 06:45 AM
|
#12
|
LQ Veteran
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,365
|
Quote:
Originally Posted by pixellany
If I can get the filenames to not have line breaks in them, then this works:
grep -o '/Volumes.*pdf' file
|
Not if there are 2 or more one the one line - note ghostdog74s comment on greediosity  . Try
Code:
grep -Eo "/Volumes[^.]*.pdf" file
|
|
|
06-30-2009, 06:51 AM
|
#13
|
LQ Veteran
Registered: Nov 2005
Location: Annapolis, MD
Distribution: Mint
Posts: 17,809
|
Touche ( I mean: TOO-SHAY....How do I type accented letters here?)
greediosity???? Hmmmm
Now---make it work if there are line breaks in the desired matched patterns........ 
|
|
|
06-30-2009, 11:57 AM
|
#14
|
LQ Newbie
Registered: Jun 2009
Posts: 3
Original Poster
Rep:
|
Thanks a lot, this line from Pixellany and syg00 output exactly what I was looking for. So I'm gonna learn grep better!
grep -Eo "/Volumes[^.]*.pdf" file
Ghostdog74, the awk line worked as well, but reported a lot lot of empty lines before and between filenames, do you know why? how can that be avoid? (For educational purpose, jeje. I'm gonna need to use awk and sed very soon for a couple of scripts)
|
|
|
06-30-2009, 04:51 PM
|
#15
|
LQ Guru
Registered: Mar 2006
Location: Sydney, Australia
Distribution: Fedora, CentOS, OpenSuse, Slack, Gentoo, Debian, Arch, PCBSD
Posts: 6,678
Rep: 
|
|
|
|
All times are GMT -5. The time now is 05:01 AM.
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|