LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Software (https://www.linuxquestions.org/questions/linux-software-2/)
-   -   sed help plz (https://www.linuxquestions.org/questions/linux-software-2/sed-help-plz-725459/)

rjkfsm 05-12-2009 10:57 AM

sed help plz
 
I am looking to use sed in a script. I have searched through the docs and cannot find what I am looking for.

If I have an email address like frustrated_sed_user@smtp1.mail.linux.com. How do I get the topmost domain out? (ie linux.com)

RK

MensaWater 05-12-2009 11:01 AM

awk is a better tool for extracting fields:

Code:

echo frustrated_sed_user@smtp1.mail.linux.com |awk -F. '{print $3"."$4}'
The "-F." is telling awk to use dot as the field separator instead of white space. The dot in quotes in the print section is adding the dot back to the output as awk stripped it out when it broke the fields up. Fields 3 and 4 are being printed before and after the dot respectively.

rjkfsm 05-12-2009 11:36 AM

Well, I can tell I'm warmer...

How do I do a reverse search?

If I do:
Code:

echo frustrated_sed_user@smtp1.mail.linux.com |awk -F. '{print $3"."$4}'
I get linux.com just like I want, but if I have:
Code:

echo frustrated_sed_user@smtp1.linux.com |awk -F. '{print $3"."$4}'
I get com.

Further clarification: I'm looking for a more general email address filter, not something that only works with that one string.

Oh and thank you very much for your reply. I think awk may be what I need, but geez that's a lot of documentation.

RK

MensaWater 05-12-2009 11:42 AM

Code:

echo frustrated_sed_user@smtp1.mail.linux.com |awk -F. '{print $(NF-1)"."$NF}'
NF = number of fields so $NF would be last field and $(NF-1) would be field before last field. So long as you have at least 2 fields it should work for any address that ends in the domain you want.

pixellany 05-12-2009 02:12 PM

Really good tutorials here---SED, AWK, and more:
http://www.grymoire.com/Unix/

Beats reading man pages.....;)

syg00 05-12-2009 04:55 PM

Being not well versed in awk, I only use it when data is (extremely) well structured - as in the cases above.
Must admit I have a leaning toward regex in that it can be used to extract data from anywhere in a record - building a (fool-proof) regex for this could get challenging though.
Perl might be a better option than sed.

Kenhelm 05-12-2009 06:56 PM

Another method
Code:

echo '
frustrated_sed_user@smtp1.mail.linux.com
frustrated_sed_user@smtp1.linux.com
frustrated_sed_user@linux.com' | grep -o '[^.@]*\.[^.]*$'

linux.com
linux.com
linux.com


H_TeXMeX_H 05-13-2009 05:36 AM

Quote:

Originally Posted by rjkfsm (Post 3538408)
Well, I can tell I'm warmer...

How do I do a reverse search?

If I do:
Code:

echo frustrated_sed_user@smtp1.mail.linux.com |awk -F. '{print $3"."$4}'
I get linux.com just like I want, but if I have:
Code:

echo frustrated_sed_user@smtp1.linux.com |awk -F. '{print $3"."$4}'
I get com.

Further clarification: I'm looking for a more general email address filter, not something that only works with that one string.

Oh and thank you very much for your reply. I think awk may be what I need, but geez that's a lot of documentation.

RK

I wouldn't use awk here, but you can. Here's how I would do it to make it more useful:

Code:

bash-3.1$ echo frustrated_sed_user@smtp1.mail.linux.com | rev | cut -d . -f 1
moc
bash-3.1$ echo frustrated_sed_user@smtp1.mail.linux.com | rev | cut -d . -f 1 | rev
com
bash-3.1$ echo frustrated_sed_user@smtp1.mail.linux.com | rev | cut -d . -f 2 | rev
linux
bash-3.1$ echo frustrated_sed_user@smtp1.mail.linux.com | rev | cut -d . -f 1-2 | rev
linux.com
bash-3.1$ echo frustrated_sed_user@smtp1.linux.com | rev | cut -d . -f 1-2 | rev
linux.com

So basically using 'rev' is a good idea here. It reverses lines character by character.

ghostdog74 05-13-2009 06:35 AM

Quote:

Originally Posted by H_TeXMeX_H (Post 3539220)
I wouldn't use awk here, but you can. Here's how I would do it to make it more useful:

but here, you make extra calls to rev, cut.

ghostdog74 05-13-2009 06:39 AM

Quote:

Originally Posted by syg00 (Post 3538672)
I have a leaning toward regex in that it can be used to extract data from anywhere in a record - building a (fool-proof) regex for this could get challenging though.

well, i don't know why you think without regex you can't extract data from anywhere in a record :)

H_TeXMeX_H 05-13-2009 07:12 AM

Quote:

Originally Posted by ghostdog74 (Post 3539275)
but here, you make extra calls to rev, cut.

The only extra call is to 'rev'. You can use awk instead of cut:

Code:

bash-3.1$ echo frustrated_sed_user@smtp1.mail.linux.com | rev | awk -F. '{print $1"."$2}'| rev
linux.com

Or you could use just awk:

Code:

bash-3.1$ echo frustrated_sed_user@smtp1.mail.linux.com | awk -F. '{print $(NF-1)"."$NF}'
linux.com

as jlightner said earlier

NF is the number of fields, so NF is the last field, and NF-1 is the next to last field.

Whichever way you want to do it, there are so many ways.

ghostdog74 05-13-2009 07:19 AM

Quote:

Originally Posted by H_TeXMeX_H (Post 3539310)
The only extra call is to 'rev'. You can use awk instead of cut:

Code:

bash-3.1$ echo frustrated_sed_user@smtp1.mail.linux.com | rev | awk -F. '{ print $1"."$2}'| rev
linux.com


still the same thing. you have to call rev 2 times, awk 1 time. See post #4 by jlightner. that's common way to get awk fields from the back.

Code:

# time echo frustrated_sed_user@smtp1.mail.linux.com |awk -F. '{print $(NF-1)"."$NF}'
linux.com

real    0m0.004s
user    0m0.004s
sys    0m0.000s

# time echo frustrated_sed_user@smtp1.mail.linux.com | rev | awk -F. '{ print $1"."$2}'| rev
linux.com

real    0m0.007s
user    0m0.004s
sys    0m0.004s


H_TeXMeX_H 05-13-2009 07:23 AM

Oh yeah, I guess I missed that post, and posted the same thing just a second ago. Oh whatever, 0.003 sec is that important.

ghostdog74 05-13-2009 07:28 AM

Quote:

Originally Posted by H_TeXMeX_H (Post 3539310)
Whichever way you want to do it, there are so many ways.

yes, there are many ways, but don't choose the ones less obvious. :)

MensaWater 05-13-2009 09:10 AM

My second post did it without having to "rev" anything. I just changed the variables to be relative to number of fields rather than explicit 3 and 4.


All times are GMT -5. The time now is 12:17 AM.