LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices

Reply
 
Search this Thread
Old 04-21-2005, 01:43 AM   #1
rickh
Senior Member
 
Registered: May 2004
Location: Albuquerque, NM USA
Distribution: Debian-Lenny/Sid 32/64 Desktop: Generic AMD64-EVGA 680i Laptop: Generic Intel SIS-AC97
Posts: 4,250

Rep: Reputation: 59
Help with a script to edit text file (awk? sed?)


I have a .txt file with a few thousand records that look roughly like this:

/mnt/windata2/MusicArchives/Archive22/Bob Dylan - Like a Rolling Stone.mp3 - 7195 KB - 44 KHz - 160 KBps - 6:08

I want to use a script to prepare a delimited file ready for import into a database.

The rules for what I want to accomplish are:
(1) Delete everything from the first 'word' up to (but not including) position 36
... or perhaps more simply, but not as educational,
remove the pattern '/mnt/windata2/MusicArchives/Archive'
(2) With what's left, replace the '/' at position 3 with ' - '
(3) Remove the pattern '.mp3'
(4) Replace all occurences of ' - ' with '|'

I suppose I could study awk (or something) until I figure it out, but if someone would provide me with a solution, I promise to study it with manual in hand.

(Now that I've written it all out, I think I might be able to figure it out myself just using sed pattern matching. But for other purposes, I am interested in file edits based on the position of characters in a record.)
 
Old 04-21-2005, 05:43 AM   #2
slackie1000
Senior Member
 
Registered: Dec 2003
Location: Brasil
Distribution: Arch
Posts: 1,037

Rep: Reputation: 45
hi there,

you should try to do this things by yourself. at least post what you've tried so far.
anyway i will give you a tip.
try something like that ( not tested):
Code:
sed "s,/,-,g" dummy | sed "s,.mp3,,g" | awk -F "Archive" '{print $3}'
where dummy is the name of your file with this line structure.

regards

slackie1000

Last edited by slackie1000; 04-21-2005 at 06:07 AM.
 
Old 04-21-2005, 06:23 AM   #3
ahh
Member
 
Registered: May 2004
Location: UK
Distribution: Gentoo
Posts: 293

Rep: Reputation: 31
I am assuming you want it to look like this:

22-Bob Dylan | Like a Rolling Stone | 7195 KB | 44 KHz | 160 KBps | 6:08

In which case this:
Code:
sed -e 's/.*\/Archive//' -e 's/\.mp3//' -e 's/-/|/g' -e 's/\//-/'
will do it.

p.s. You could have replaced "-" with "|" in your find script, leaving one less task now. In fact you could have got to this in one go. If you really want to learn, you've got to try stuff on your own

Last edited by ahh; 04-21-2005 at 06:34 AM.
 
Old 04-21-2005, 06:25 AM   #4
slackie1000
Senior Member
 
Registered: Dec 2003
Location: Brasil
Distribution: Arch
Posts: 1,037

Rep: Reputation: 45
Re: Help with a script to edit text file (awk? sed?)

hi there,

Quote:
Originally posted by rickh
(4) Replace all occurences of ' - ' with '|'
sorry, missed that part.

regards

slackie1000
 
Old 04-21-2005, 08:32 AM   #5
homey
Senior Member
 
Registered: Oct 2003
Posts: 3,057

Rep: Reputation: 56
Quote:
I want to use a script to prepare a delimited file ready for import into a database.
If you want to put this in a database, the comma or tab seperator is a better choice than the | but you can change the OFS if you want.

This may get you started....
Code:
#!/bin/bash
cat file.txt |\
awk -F" - " '{OFS=","}{print$1,$2,$3,$4,$5,$6}' |\
sed -e 's/\(.*\)22\//22-/' |\
sed -e 's/\(.*\).mp3/\1/' >file1.txt
 
Old 04-21-2005, 10:38 AM   #6
rickh
Senior Member
 
Registered: May 2004
Location: Albuquerque, NM USA
Distribution: Debian-Lenny/Sid 32/64 Desktop: Generic AMD64-EVGA 680i Laptop: Generic Intel SIS-AC97
Posts: 4,250

Original Poster
Rep: Reputation: 59
Thanks to all of you ...

For the '...do it yourself...' comments ... you are right, of course, but I am an absolute newbie at scripting. I am studying, but I learn best by working around a semi-finished example accomplishing something I'm really interested in rather than a texbook 'Hello World' type of thing. And thanks, homey, for the start on a real script. With the help here on the commands, I think I can put it together.

p.s. You could have replaced "-" with "|" in your find script, leaving one less task now.

I figured that out, and my starting file now looks like this:
/mnt/windata2/MusicArchives/Archive22/Bob Dylan - Like a Rolling Stone.mp3|7195 KB|44 KHz|160 KBps|6:08

with a desired conclusion of this:
22|Bob Dylan|Like a Rolling Stone|7195 KB|44 KHz|160 KBps|6:08

A couple slight modifications to ahh's script, thus:
sed -e 's/.*\/Archive//' -e 's/\.mp3//' -e 's/\// - /' -e 's/ - /|/g'
... accomplishes that.

The only thing I'm having trouble understanding is the 'backslash' operator. Especially the first segment.
sed -e 's/.*\/Archive//' changes this: /mnt/windata2/MusicArchives/Archive22/
to this: 22/

Sure would like to understand exactly how that works.

Last edited by rickh; 04-21-2005 at 10:57 AM.
 
Old 04-21-2005, 03:56 PM   #7
ahh
Member
 
Registered: May 2004
Location: UK
Distribution: Gentoo
Posts: 293

Rep: Reputation: 31
Quote:
Originally posted by rickh
The only thing I'm having trouble understanding is the 'backslash' operator. Especially the first segment.
sed -e 's/.*\/Archive//' changes this: /mnt/windata2/MusicArchives/Archive22/
to this: 22/

Sure would like to understand exactly how that works.
The '-e' means 'execute the following'. It is not strictly necessary if there is only one instruction following, but you can string instructions together by using it.

the 's' means replace what matches the regular expression 's/here//' with what is 's//here/'.

The '.' matches any character.

The '*' means match any number of the previous character.

The '\\' is used to escape the following '/', as '/' is used as a delimiter between expressions.

So what we have here is 'Find a string that includes any number of characters followed by /Archive and replace it with nothing'.

Hope that helps.
 
Old 04-21-2005, 07:47 PM   #8
rickh
Senior Member
 
Registered: May 2004
Location: Albuquerque, NM USA
Distribution: Debian-Lenny/Sid 32/64 Desktop: Generic AMD64-EVGA 680i Laptop: Generic Intel SIS-AC97
Posts: 4,250

Original Poster
Rep: Reputation: 59
Hope that helps.

Ahh, yes ... except The \ is used to escape the following /, as / is used as a delimiter between expressions. is still a little vague.

My statement: sed -e 's/.*\/Archive//' -e 's/\.mp3//' -e 's/\// - /' -e 's/ - /|/g'

The first \ represents the same thing as a /, used to alert the interpreter to the fact that it is immediately followed by a literal character that could be confused with an operator.

Similar situation in the 2nd segment ... the \ alerts the interpreter that the immediately following . is a literal, not a alpha wild card.

In the 3rd segment The \ actually represents a literal / ... Simply turned back to avoid the confusion of three consecutive /'s

Dang! I think I need a drink. ... and what if one of the directory names in the the path being deleted had contained a numeric character? lol! You can be sure I'll never again include a numeric character in a directory name.

Last edited by rickh; 04-21-2005 at 07:50 PM.
 
Old 04-21-2005, 08:24 PM   #9
ahh
Member
 
Registered: May 2004
Location: UK
Distribution: Gentoo
Posts: 293

Rep: Reputation: 31
Quote:
Originally posted by rickh
My statement: sed -e 's/.*\/Archive//' -e 's/\.mp3//' -e 's/\// - /' -e 's/ - /|/g'

The first \ represents the same thing as a /, used to alert the interpreter to the fact that it is immediately followed by a literal character that could be confused with an operator.

Similar situation in the 2nd segment ... the \ alerts the interpreter that the immediately following . is a literal, not a alpha wild card.

In the 3rd segment The \ actually represents a literal / ... Simply turned back to avoid the confusion of three consecutive /'s

Dang! I think I need a drink. ... and what if one of the directory names in the the path being deleted had contained a numeric character? lol! You can be sure I'll never again include a numeric character in a directory name.
You pretty much have it there, but maybe not quite.

The sed expressions used have this form:
Code:
sed -e 's///'
Though of course this will do nothing. To use it we must add in:
Code:
sed -e 's/regular expression to be matched/string to replace regular expression/'
If we want to match a character that has a special meaning in regular expressions, such as a . or a * we must precede it with a \ to tell sed that the next character should be taken literally, and not to use its special meaning.

Similarly, if we want to match a character that has a special meaning to sed, in this case the / which sed is using to seperate the instruction 's', the regular expression and the string, we have to precede it with a \ to tell sed to take it literally, otherwise it will think it is being used as a seperator.

And you needn't worry about numbers in the path, the '.' matches numbers, letters or spaces.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
how to delete duplicates entries in xml file using sed/awk/sort ? catzilla Linux - Software 1 10-28-2005 02:57 PM
How to find and change a specific text in a text file by using shell script Bassam Programming 1 07-18-2005 07:15 PM
/etc/rc.sysinit: /bin/awk: Text file busy teeno Linux - Software 5 02-23-2005 02:19 AM
script edit file.. johnyy Linux - Software 4 01-22-2004 05:50 PM
clear recent list, edit reopened text file obby Linux - General 0 09-17-2003 08:30 AM


All times are GMT -5. The time now is 07:01 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration