[SOLVED] Make x number of changes matching pattern with sed

grail · 08-07-2010, 04:57 AM

So i tried searching on google but found it difficult to say exactly what I was looking for.

Task - Capitalise x number of letters at the start of words.

eg. Original line - one.two.three.four
Revised line - One.Two.three.four (here only requiring 2 changes)

Test data:

Code:

wire.in.the.blood.s04e01.ws.pdtv.xvid-river.avi
wire.in.the.blood.s04e02.ws.pdtv.xvid-river.avi
wire.in.the.blood.s04e03.ws.pdtv.xvid-river.avi
wire.in.the.blood.s04e04.ws.pdtv.xvid-river.avi

Current code:

Code:

sed -r 's@(\b[a-z])@\u\1@' file

So this will change the first letter to be capitalised.

Adding a 'g' at the end will cause the first letter of every word on a boundary to be capitalised.

Adding a '4g' will process all words from fourth match to the end.

My task is to process all up until the fifth word on each line??

Code:

Wire.In.The.Blood.S04e01.ws.pdtv.xvid-river.avi

i know it will be simple but I am buggered if I can get it

druuna · 08-07-2010, 05:23 AM

Hi,

Possible solution:

sed 's/\([a-z][a-z]*\)\.\([a-z][a-z]*\)\.\([a-z][a-z]*\)\.\([a-z][a-z]*\)\./\u\1.\u\2.\u\3.\u\4./' infile

As you can see, this is not dynamic, you need to add/subtract \([a-z][a-z]*\)\. and \u\X to make it hit 3 or 5 words.

Anyway, hope this helps.

BTW: I only use a-z, if numbers are present as well, you need to add these.

grail · 08-07-2010, 06:12 AM

Hi druuna

Thanks for the reply. I guess what I am trying to get at is, is there something similar in sed's arsenal to '4g' changing from fourth onwards
to be able to change upto fourth?

druuna · 08-07-2010, 09:26 AM

Hi,

Too my knowledge there is no range option in sed to do this.

If you need some dynamic solution maybe this will help:

Code:

#!/usr/bin/perl

print "File to use : " ;
$file = <> ;

print "Amount of words to initial uppercase : " ;
$amount = <> ;

open( FNAME, $file ) or die "Cannot open: $file : $!\n" ;
while ( $line = <FNAME> ) {
   $_     = $line ;
   @words = split( /\.+/ ) ;
   for ( $x = 0 ; $x <= $amount - 1 ; $x++ ) {
      $words[$x] = ucfirst( $words[$x] ) ;
   }
   $y = join( '.', @words ) ;
   print $y;
}
close( FNAME ) ;

Example run:

Code:

$ cat testfile 
wire.in.the.blood.s04e01.ws.pdtv.xvid-river.avi
wire.in.the.blood.s04e02.ws.pdtv.xvid-river.avi
wire.in.the.blood.s04e03.ws.pdtv.xvid-river.avi
wire.in.the.blood.s04e04.ws.pdtv.xvid-river.avi

$ ./dynamic.to.upper.pl 
File to use : testfile
Amount of words to initial uppercase : 4
Wire.In.The.Blood.s04e01.ws.pdtv.xvid-river.avi
Wire.In.The.Blood.s04e02.ws.pdtv.xvid-river.avi
Wire.In.The.Blood.s04e03.ws.pdtv.xvid-river.avi
Wire.In.The.Blood.s04e04.ws.pdtv.xvid-river.avi

$ ./dynamic.to.upper.pl 
File to use : testfile
Amount of words to initial uppercase : 2
Wire.In.the.blood.s04e01.ws.pdtv.xvid-river.avi
Wire.In.the.blood.s04e02.ws.pdtv.xvid-river.avi
Wire.In.the.blood.s04e03.ws.pdtv.xvid-river.avi
Wire.In.the.blood.s04e04.ws.pdtv.xvid-river.avi

Hope this helps.

grail · 08-07-2010, 10:21 AM

Thanks again druuna ... always a pleasure to an alternative. I ended up sticking a couple of seds together:

Code:

sed -r 's@(\b[a-z])@\u\1@g;s@(\b[a-zA-Z])@\l\1@5g' file

I generally despise repetition but it works ... I will leave a little longer in case a sed guru has another change

druuna · 08-07-2010, 10:27 AM

Hi,

To be honest: I like your solution (post #5) better then the perl one _and_ the sed one I posted.

But it is always nice to play with perl and have an alternative

ghostdog74 · 08-08-2010, 06:50 AM

as always, if you have structured data with field and field delimiters, use awk.

Code:

# awk -F"." '{for(i=1;i<=5;i++) $i=toupper( substr($i,1,1)  ) substr($i,2) }1' OFS="."  file
Wire.In.The.Blood.S04e01.ws.pdtv.xvid-river.avi
Wire.In.The.Blood.S04e02.ws.pdtv.xvid-river.avi
Wire.In.The.Blood.S04e03.ws.pdtv.xvid-river.avi
Wire.In.The.Blood.S04e04.ws.pdtv.xvid-river.avi

No need for messy regex using sed.

grail · 08-08-2010, 06:58 AM

Hi ghost

I agree and did originally use awk, but my original regex is not complicated, in fact it is kinda neat

My main query was regarding whether there is a sedism I am missing that would allow you to make up to a number of changes as opposed to from a number onwards,
which is allowed using '4g' at the end.

Thanks for your valued input as always.