LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Software (http://www.linuxquestions.org/questions/linux-software-2/)
-   -   Another Sed challenge (http://www.linuxquestions.org/questions/linux-software-2/another-sed-challenge-694877/)

ifeatu 01-04-2009 09:39 AM

Another Sed challenge
 
I'm trying to create a script that takes a file full of names of files (mp3's actually) with the following syntax:

Mon_Da_YEAR_Sub_ject_Sub_ject_Location_side_1_or_2.mp3

...and makes folders from those file names in the following syntax:


Location-YEAR-MN-DY- Subject -BLH

The "BLH" part needs to be indiscriminately attached to every folder name.

Basically it'll take some Regex to parse the data from the file name and reorganize it into the folder name...and I think there will have to be a loop statement like do-while...can anyone help me?

unSpawn 01-04-2009 04:01 PM

Quote:

Originally Posted by ifeatu (Post 3396760)
I'm trying to create a script

Unless reinventing the wheel is something you just got to do, did you check LQ, Freshmeat or Sourceforge for apps that already can do that? Besides filenames like "Mon_Da_YEAR_Sub_ject_Sub_ject_Location_side_1_or_2.mp3" aren't reliable and easy to work with (wrt IFS). If MP3s are tagged "right" then getting the mp3info and working on that would be *way* "safer" IMHO. The "BLH" part isn't something you would even need sed for. Maybe it's best to post whatever (pseudo) script you've got right now?..


Quote:

Originally Posted by ifeatu (Post 3396760)
Basically it'll take some Regex to parse the data from the file name and reorganize it into the folder name...

No challenge for you as it seems you recently bought ISBN 9780596528126 :-]

ifeatu 01-05-2009 09:00 PM

okay...so I took your advise and took a stab at it...I ended up having to use a batch file in DOS for the folder creation part...then I installed GNU Sed for DOS and took a stab at the RegEx...here is where I got stuck:

OKay...first, here is some of my raw data:

April_14_1991_Bread_of_life_Vallejo_side_1.mp3
April_14_1991_Bread_of_life_Vallejo_side_2.mp3
April_21_1991_Ministry_zadok_priesthood_Vallejo_side_1.mp3
April_21_1991_Ministry_zadok_priesthood_Vallejo_side_2.mp3
Apr_05_1992_Matthew_13_Sower_Berkeley_side_1.mp3
Apr_05_1992_Matthew_13_Sower_Berkeley_side_2.mp3
Aug_04_1991B_Moving_out_alive_soul_Vallejo_side_1.mp3
Aug_04_1991B_Moving_out_alive_soul_Vallejo_side_2.mp3
Aug_04_1992_Daniel_7_Berkeley_side_1.mp3
Aug_04_1992_Daniel_7_Berkeley_side_2.mp3

So you have the month (.*) followed by the day ([0-9]{1,2}) followed by the year ([0-9]{4}) followed by a subject (.*) followed by a location (Vallejo,vallejo, Berkeley,berkeley, UC, Union City) followed by the word "side" followed by a 1 or 2

sed -r 's/(.*)([0-9]{2})\s*_\s*([0-9]{4})\s*_\s*(.*)\s*([vallejo][.*])\s*(side)\s*_\s*([1,2])(.*)/\3-\1-\2-\4/' test2.txt
> test3.txt


I think its pretty obvious where I'm running into my problem, I can't seem to get my syntax right for the location...can anyone help?

The line above returns absolutely nothing, it returns the data precisely as it is read.

chrism01 01-06-2009 03:17 AM

That's a bit advanced for me, but I notice a couple of things:
1. in your data you have Vallejo, your sed uses vallejo (case is different)
2. Aug_04_1991B_Moving_out_alive_soul_Vallejo_side_1.mp3 & Aug_04_1991B_Moving_out_alive_soul_Vallejo_side_2.mp3 do not have an underscore after the year component.

ifeatu 01-06-2009 05:00 AM

Quote:

Originally Posted by chrism01 (Post 3398582)
That's a bit advanced for me, but I notice a couple of things:
1. in your data you have Vallejo, your sed uses vallejo (case is different)
2. Aug_04_1991B_Moving_out_alive_soul_Vallejo_side_1.mp3 & Aug_04_1991B_Moving_out_alive_soul_Vallejo_side_2.mp3 do not have an underscore after the year component.

therein lies a portion of the challenge. The regex needs to make provisions for the B or A that some times delimits the year...I understand that the data isn't at all pretty...but in this circumstance, reinventing the wheel is an absolute neccesity.

syg00 01-06-2009 05:13 AM

sed is best (only ?) for well-formed data.
Use something like perl - that way you can use some real programming techniques.


All times are GMT -5. The time now is 06:17 PM.