LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - General (https://www.linuxquestions.org/questions/linux-general-1/)
-   -   bash reg expression to split many file names on _3charnumbers_ (https://www.linuxquestions.org/questions/linux-general-1/bash-reg-expression-to-split-many-file-names-on-_3charnumbers_-4175635540/)

zimbot 08-02-2018 01:35 PM

bash reg expression to split many file names on _3charnumbers_
 
friends,

I think what I need to do is split strings with a regular expression

where : _ and any 3 numbers underscore is my deliminator

I need to split a string ( a dir full of file names , thousands of mov files )

they all have the various forms of this
jug_WNYW_0000013_001_CommercialBlock7.mov
jug_WNYW_0000013_002_CommercialBlock7.mov
all the way to
jug_WNYW_0000013_099_CommercialBlock7.mov
and here are other examples
jug_WNYW_0000013_041_WeatherTwo.mov
jug_WLWT_0000099_022_reds.mov
jug_WLWT_0001234_001_reds.mov
jug_WCPO_0008994_007_shewalkedintothedoor.mov
b99456666_007_shetakesawall.mov
c9902345_003_chocolatepieisgood.mov

what I really wish i could do get get what ever is LEFT of the underscore 3 wide number underscore

goodStuff_NNN_
i wish for the good stuff

I tried a reg expression with
tapeName="$(cut -d'_[0-9]{3}_' -f1 <<<"$jug_1978-03-31_wlwt_AIR_005_SubheadsTeasers")"

i Thought the _[0-9]{3} would be
underscore and any 3 numbers and underscore

I just want jug_1978-03-31_wlwt_AIR


I have had limited success with

$ echo "jug_1978-03-31_WDTN_AIR_005_SubheadsTeasers" | awk -F'_' '{print $1"_"$2"_"$3}'
returns
jug_1978-03-31_WDTN

and you see that will never work with a string like
c9902345_003_chocolatepieisgood.mov

so I am back to needing the LEFT of _NNN_
that is an increment _001_ to _999_

I have done some sed & awk .... but never reg expressions

also this is on a mac in bash....

it *seems the mac (os 10.11) also does not take sed from standard in .... isn't that nice


any assistance is appreciated - thank you

best regards

and THANK YOU

zimbot

Turbocapitalist 08-02-2018 01:46 PM

Do you need to count the files or just trim what is to the right of the triple digit number?

If you are limited to bash on OS X then parameter expansion might be relevant.

http://wiki.bash-hackers.org/syntax/pe

Otherwise it is easy with the perl-based version of the rename utility.

BW-userx 08-02-2018 05:45 PM

if you're going from left to right
Code:

userx@slacurr.ent.org:~
$ fun=jug_WNYW_0000013_001_CommercialBlock7.mov

#furthest to the right
$ sofun=${fun##*[0-9]_}

$ echo $sofun
CommercialBlock7.mov

or
Code:

userx@slacurr.ent.org:~
$ Name=jug_WLWT_0000099_022_reds.mov ;
 
$ leadingFname=${Name%%_[0-9]*} ;
 
$ endingFname=${Name##*[0-9]_} ;
 
$ newFileName="$leadingFname"_"$endingFname" ;
 
$ echo $newFileName
jug_WLWT_reds.mov

look under Substring Removal

syg00 08-02-2018 07:00 PM

Quote:

Originally Posted by zimbot (Post 5886976)
it *seems the mac (os 10.11) also does not take sed from standard in .... isn't that nice

I find that very hard to believe. But I wouldn't be caught dead using a mac, so I can't check.

Could be done with parameter substitution, but I always mess up complex globs. Does anyone know if/how you can include a (specific) repetition count in a glob in a substring removal like that ?.
I also agree the perl rename is best option as you can use regex to get the job done. But you'd better get it correct (for all cases).

chrism01 08-03-2018 12:54 AM

Perl
Code:

$var1="jug_1978-03-31_wlwt_AIR_005_SubheadsTeasers";
$var2=(split(/_[0-9]{3}_/,$var1))[0];
print "var2 = $var2\n";

# o/p
var2 = jug_1978-03-31_wlwt_AIR


String mangling is one of Perl's strengths... :)

zimbot 08-06-2018 10:31 AM

thanks everybody

I will be giving some of this a try

I very much appreciate the perl suggestion .
due to convience I hope for a full bash solution

thanks again


All times are GMT -5. The time now is 09:18 PM.