LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (http://www.linuxquestions.org/questions/programming-9/)
-   -   sed and regex help (http://www.linuxquestions.org/questions/programming-9/sed-and-regex-help-918418/)

zski128 12-12-2011 07:59 PM

sed and regex help
 
Hello,
I am in need of some regex help. I need to pull out a string of numbers for a single line.

The input can look like this, these are 3 separate examples:
o.text text 336 09-Dec-11 13:33:10
o.text text 3350126 09-Dec-11 13:33:10
o.texttext text 30473 09-Dec-11 13:33:10

I need to pull out the middle number, 336, 3350126, 30473

I am close, I think, here is the command I am running:

Code:

sed -r 's/([0-9]+).*/\1/'
Any help would be greatly appreciated!!

corp769 12-12-2011 08:21 PM

Honestly, if the fields do not change, you could use awk to extract the data, like so:
Code:

cat filename | awk '{ print $3 }'
Where filename is the name of the file that holds the data.

Cheers,

Josh

danielbmartin 12-12-2011 08:40 PM

Quote:

Originally Posted by zski128 (Post 4548351)
The input can look like this, these are 3 separate examples:
o.text text 336 09-Dec-11 13:33:10
o.text text 3350126 09-Dec-11 13:33:10
o.texttext text 30473 09-Dec-11 13:33:10

I need to pull out the middle number, 336, 3350126, 30473

It appears you always want the third field, and your fields are delimited by a single blank. Consider using "cut".

Code:

cut -d' ' -f3 < InFile
Daniel B. Martin

Telengard 12-12-2011 11:08 PM

zski128, are your fields separated by single space characters?

Code:

test$ cat input-file
o.text text 336 09-Dec-11 13:33:10
o.text text 3350126 09-Dec-11 13:33:10
o.texttext text 30473 09-Dec-11 13:33:10

Quote:

Originally Posted by corp769 (Post 4548364)
Code:

cat filename | awk '{ print $3 }'

It should work without cat.

Code:

test$ awk '{print $3}' input-file
336
3350126
30473
test$

Quote:

Originally Posted by danielbmartin (Post 4548374)
Code:

cut -d' ' -f3 < InFile

I tried without redirection, and it seemed to work.

Code:

test$ cut -d' ' -f3 input-file
336
3350126
30473
test$

A Bash loop worked too.

Code:

test$ while read -a array; do echo ${array[2]}; done < input-file
336
3350126
30473
test$

As for sed, when I tried the given program I got this.

Code:

test$ sed -r 's/([0-9]+).*/\1/' input-file
o.text text 336
o.text text 3350126
o.texttext text 30473

What seems to be happening is that the regex is only matching text from the first digit character on (the backreference). So I decided to match all characters preceeding the first digit outside the backreference.

Code:

test$ sed -r 's/.+ ([0-9]+) .+/\1/' input-file
336
3350126
30473
test$

zski128, is that what you wanted?

zski128 12-13-2011 06:43 AM

Quote:

What seems to be happening is that the regex is only matching text from the first digit character on (the backreference). So I decided to match all characters preceeding the first digit outside the backreference.

Code:

test$ sed -r 's/.+ ([0-9]+) .+/\1/' input-file
336
3350126
30473
test$

Thanks! The first reply with awk is much simpler, however thanks for the regex, I see where I was going wrong. The cut command would not work in my case, there is a variable amount of white space between the strings that where stripped out when I posted this thread, sorry about that.

Telengard 12-13-2011 10:30 AM

Quote:

Originally Posted by zski128 (Post 4548679)
The cut command would not work in my case, there is a variable amount of white space between the strings that where stripped out when I posted this thread, sorry about that.

That is one reason you should enclose both code and data blocks in code tags. It will preserve the whitespace.


All times are GMT -5. The time now is 08:59 PM.