Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place! |
Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
|
|
04-21-2005, 01:43 AM
|
#1
|
Senior Member
Registered: May 2004
Location: Albuquerque, NM USA
Distribution: Debian-Lenny/Sid 32/64 Desktop: Generic AMD64-EVGA 680i Laptop: Generic Intel SIS-AC97
Posts: 4,250
Rep:
|
Help with a script to edit text file (awk? sed?)
I have a .txt file with a few thousand records that look roughly like this:
/mnt/windata2/MusicArchives/Archive22/Bob Dylan - Like a Rolling Stone.mp3 - 7195 KB - 44 KHz - 160 KBps - 6:08
I want to use a script to prepare a delimited file ready for import into a database.
The rules for what I want to accomplish are:
(1) Delete everything from the first 'word' up to (but not including) position 36
... or perhaps more simply, but not as educational,
remove the pattern '/mnt/windata2/MusicArchives/Archive'
(2) With what's left, replace the '/' at position 3 with ' - '
(3) Remove the pattern '.mp3'
(4) Replace all occurences of ' - ' with '|'
I suppose I could study awk (or something) until I figure it out, but if someone would provide me with a solution, I promise to study it with manual in hand.
(Now that I've written it all out, I think I might be able to figure it out myself just using sed pattern matching. But for other purposes, I am interested in file edits based on the position of characters in a record.)
|
|
|
04-21-2005, 05:43 AM
|
#2
|
Senior Member
Registered: Dec 2003
Location: Brasil
Distribution: Arch
Posts: 1,037
Rep:
|
hi there,
you should try to do this things by yourself. at least post what you've tried so far.
anyway i will give you a tip.
try something like that ( not tested):
Code:
sed "s,/,-,g" dummy | sed "s,.mp3,,g" | awk -F "Archive" '{print $3}'
where dummy is the name of your file with this line structure.
regards
slackie1000
Last edited by slackie1000; 04-21-2005 at 06:07 AM.
|
|
|
04-21-2005, 06:23 AM
|
#3
|
Member
Registered: May 2004
Location: UK
Distribution: Gentoo
Posts: 293
Rep:
|
I am assuming you want it to look like this:
22-Bob Dylan | Like a Rolling Stone | 7195 KB | 44 KHz | 160 KBps | 6:08
In which case this:
Code:
sed -e 's/.*\/Archive//' -e 's/\.mp3//' -e 's/-/|/g' -e 's/\//-/'
will do it.
p.s. You could have replaced "-" with "|" in your find script, leaving one less task now. In fact you could have got to this in one go. If you really want to learn, you've got to try stuff on your own
Last edited by ahh; 04-21-2005 at 06:34 AM.
|
|
|
04-21-2005, 06:25 AM
|
#4
|
Senior Member
Registered: Dec 2003
Location: Brasil
Distribution: Arch
Posts: 1,037
Rep:
|
Re: Help with a script to edit text file (awk? sed?)
hi there,
Quote:
Originally posted by rickh
(4) Replace all occurences of ' - ' with '|'
|
sorry, missed that part.
regards
slackie1000
|
|
|
04-21-2005, 08:32 AM
|
#5
|
Senior Member
Registered: Oct 2003
Posts: 3,057
Rep:
|
Quote:
I want to use a script to prepare a delimited file ready for import into a database.
|
If you want to put this in a database, the comma or tab seperator is a better choice than the | but you can change the OFS if you want.
This may get you started....
Code:
#!/bin/bash
cat file.txt |\
awk -F" - " '{OFS=","}{print$1,$2,$3,$4,$5,$6}' |\
sed -e 's/\(.*\)22\//22-/' |\
sed -e 's/\(.*\).mp3/\1/' >file1.txt
|
|
|
04-21-2005, 10:38 AM
|
#6
|
Senior Member
Registered: May 2004
Location: Albuquerque, NM USA
Distribution: Debian-Lenny/Sid 32/64 Desktop: Generic AMD64-EVGA 680i Laptop: Generic Intel SIS-AC97
Posts: 4,250
Original Poster
Rep:
|
Thanks to all of you ...
For the '...do it yourself...' comments ... you are right, of course, but I am an absolute newbie at scripting. I am studying, but I learn best by working around a semi-finished example accomplishing something I'm really interested in rather than a texbook 'Hello World' type of thing. And thanks, homey, for the start on a real script. With the help here on the commands, I think I can put it together.
p.s. You could have replaced "-" with "|" in your find script, leaving one less task now.
I figured that out, and my starting file now looks like this:
/mnt/windata2/MusicArchives/Archive22/Bob Dylan - Like a Rolling Stone.mp3|7195 KB|44 KHz|160 KBps|6:08
with a desired conclusion of this:
22|Bob Dylan|Like a Rolling Stone|7195 KB|44 KHz|160 KBps|6:08
A couple slight modifications to ahh's script, thus:
sed -e 's/.*\/Archive//' -e 's/\.mp3//' -e 's/\// - /' -e 's/ - /|/g'
... accomplishes that.
The only thing I'm having trouble understanding is the 'backslash' operator. Especially the first segment.
sed -e 's/.*\/Archive//' changes this: /mnt/windata2/MusicArchives/Archive22/
to this: 22/
Sure would like to understand exactly how that works.
Last edited by rickh; 04-21-2005 at 10:57 AM.
|
|
|
04-21-2005, 03:56 PM
|
#7
|
Member
Registered: May 2004
Location: UK
Distribution: Gentoo
Posts: 293
Rep:
|
Quote:
Originally posted by rickh
The only thing I'm having trouble understanding is the 'backslash' operator. Especially the first segment.
sed -e 's/.*\/Archive//' changes this: /mnt/windata2/MusicArchives/Archive22/
to this: 22/
Sure would like to understand exactly how that works.
|
The '-e' means 'execute the following'. It is not strictly necessary if there is only one instruction following, but you can string instructions together by using it.
the 's' means replace what matches the regular expression 's/here//' with what is 's//here/'.
The '.' matches any character.
The '*' means match any number of the previous character.
The '\\' is used to escape the following '/', as '/' is used as a delimiter between expressions.
So what we have here is 'Find a string that includes any number of characters followed by /Archive and replace it with nothing'.
Hope that helps.
|
|
|
04-21-2005, 07:47 PM
|
#8
|
Senior Member
Registered: May 2004
Location: Albuquerque, NM USA
Distribution: Debian-Lenny/Sid 32/64 Desktop: Generic AMD64-EVGA 680i Laptop: Generic Intel SIS-AC97
Posts: 4,250
Original Poster
Rep:
|
Hope that helps.
Ahh, yes ... except The \ is used to escape the following /, as / is used as a delimiter between expressions. is still a little vague.
My statement: sed -e 's/.*\/Archive//' -e 's/\.mp3//' -e 's/\// - /' -e 's/ - /|/g'
The first \ represents the same thing as a /, used to alert the interpreter to the fact that it is immediately followed by a literal character that could be confused with an operator.
Similar situation in the 2nd segment ... the \ alerts the interpreter that the immediately following . is a literal, not a alpha wild card.
In the 3rd segment The \ actually represents a literal / ... Simply turned back to avoid the confusion of three consecutive /'s
Dang! I think I need a drink. ... and what if one of the directory names in the the path being deleted had contained a numeric character? lol! You can be sure I'll never again include a numeric character in a directory name.
Last edited by rickh; 04-21-2005 at 07:50 PM.
|
|
|
04-21-2005, 08:24 PM
|
#9
|
Member
Registered: May 2004
Location: UK
Distribution: Gentoo
Posts: 293
Rep:
|
Quote:
Originally posted by rickh
My statement: sed -e 's/.*\/Archive//' -e 's/\.mp3//' -e 's/\// - /' -e 's/ - /|/g'
The first \ represents the same thing as a /, used to alert the interpreter to the fact that it is immediately followed by a literal character that could be confused with an operator.
Similar situation in the 2nd segment ... the \ alerts the interpreter that the immediately following . is a literal, not a alpha wild card.
In the 3rd segment The \ actually represents a literal / ... Simply turned back to avoid the confusion of three consecutive /'s
Dang! I think I need a drink. ... and what if one of the directory names in the the path being deleted had contained a numeric character? lol! You can be sure I'll never again include a numeric character in a directory name.
|
You pretty much have it there, but maybe not quite.
The sed expressions used have this form:
Though of course this will do nothing. To use it we must add in:
Code:
sed -e 's/regular expression to be matched/string to replace regular expression/'
If we want to match a character that has a special meaning in regular expressions, such as a . or a * we must precede it with a \ to tell sed that the next character should be taken literally, and not to use its special meaning.
Similarly, if we want to match a character that has a special meaning to sed, in this case the / which sed is using to seperate the instruction 's', the regular expression and the string, we have to precede it with a \ to tell sed to take it literally, otherwise it will think it is being used as a seperator.
And you needn't worry about numbers in the path, the '.' matches numbers, letters or spaces.
|
|
|
All times are GMT -5. The time now is 04:12 PM.
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|