sed and regex help

zski128 · 12-12-2011, 07:59 PM

Hello,
I am in need of some regex help. I need to pull out a string of numbers for a single line.

The input can look like this, these are 3 separate examples:
o.text text 336 09-Dec-11 13:33:10
o.text text 3350126 09-Dec-11 13:33:10
o.texttext text 30473 09-Dec-11 13:33:10

I need to pull out the middle number, 336, 3350126, 30473

I am close, I think, here is the command I am running:

Code:

sed -r 's/([0-9]+).*/\1/'

Any help would be greatly appreciated!!

corp769 · 12-12-2011, 08:21 PM

Honestly, if the fields do not change, you could use awk to extract the data, like so:

Code:

cat filename | awk '{ print $3 }'

Where filename is the name of the file that holds the data.

Cheers,

Josh

danielbmartin · 12-12-2011, 08:40 PM

Quote:

Originally Posted by zski128

The input can look like this, these are 3 separate examples:
o.text text 336 09-Dec-11 13:33:10
o.text text 3350126 09-Dec-11 13:33:10
o.texttext text 30473 09-Dec-11 13:33:10

I need to pull out the middle number, 336, 3350126, 30473

It appears you always want the third field, and your fields are delimited by a single blank. Consider using "cut".

Code:

cut -d' ' -f3 < InFile

Daniel B. Martin

Telengard · 12-12-2011, 11:08 PM

zski128, are your fields separated by single space characters?

Code:

test$ cat input-file
o.text text 336 09-Dec-11 13:33:10
o.text text 3350126 09-Dec-11 13:33:10
o.texttext text 30473 09-Dec-11 13:33:10

Quote:

Originally Posted by corp769

Code:

cat filename | awk '{ print $3 }'

It should work without cat.

Code:

test$ awk '{print $3}' input-file
336
3350126
30473
test$

Quote:

Originally Posted by danielbmartin

Code:

cut -d' ' -f3 < InFile

I tried without redirection, and it seemed to work.

Code:

test$ cut -d' ' -f3 input-file
336
3350126
30473
test$

A Bash loop worked too.

Code:

test$ while read -a array; do echo ${array[2]}; done < input-file
336
3350126
30473
test$

As for sed, when I tried the given program I got this.

Code:

test$ sed -r 's/([0-9]+).*/\1/' input-file
o.text text 336
o.text text 3350126
o.texttext text 30473

What seems to be happening is that the regex is only matching text from the first digit character on (the backreference). So I decided to match all characters preceeding the first digit outside the backreference.

Code:

test$ sed -r 's/.+ ([0-9]+) .+/\1/' input-file
336
3350126
30473
test$

zski128, is that what you wanted?

zski128 · 12-13-2011, 06:43 AM

Quote:

What seems to be happening is that the regex is only matching text from the first digit character on (the backreference). So I decided to match all characters preceeding the first digit outside the backreference.

Code:

test$ sed -r 's/.+ ([0-9]+) .+/\1/' input-file
336
3350126
30473
test$

Thanks! The first reply with awk is much simpler, however thanks for the regex, I see where I was going wrong. The cut command would not work in my case, there is a variable amount of white space between the strings that where stripped out when I posted this thread, sorry about that.

Telengard · 12-13-2011, 10:30 AM

Quote:

Originally Posted by zski128

The cut command would not work in my case, there is a variable amount of white space between the strings that where stripped out when I posted this thread, sorry about that.

That is one reason you should enclose both code and data blocks in code tags. It will preserve the whitespace.