LinuxQuestions.org
Visit Jeremy's Blog.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 09-04-2020, 03:36 PM   #1
hopeless_n00b
LQ Newbie
 
Registered: Aug 2014
Posts: 21

Rep: Reputation: Disabled
sed pattern match


I don't understand how to make sed match only lines with a numeric second field and nonzero numeric third field (white space separated fields).

A sed command that works is welcome, but I will learn more with an explanation of why it works as well.
 
Old 09-04-2020, 04:07 PM   #2
vincix
Senior Member
 
Registered: Feb 2011
Distribution: Ubuntu, Centos
Posts: 1,158

Rep: Reputation: 87
It would be easier if you could give us a few examples. In any case, as far as I know, sed doesn't work with fields, it works with lines and regex, so maybe awk would be better suited to your needs.

Something to the effect of:
Code:
echo '10 0 three' | awk '$1 ~ /[0-9]+/ && $2 !~ /0/ && $2 ~ /[0-9]+/'
 
1 members found this post helpful.
Old 09-04-2020, 04:22 PM   #3
boughtonp
Member
 
Registered: Feb 2007
Location: UK
Distribution: Debian
Posts: 570

Rep: Reputation: 366Reputation: 366Reputation: 366Reputation: 366

Well, you're asking less for sed help and more for general regex help, but here's all the pieces you need...

You can use "^" to match start of line.
Then "\S*" to allow the first field to be any number of non-whitespace characters (or use "[^ ]*" if you only care about spaces).
Then "\s" for a whitespace character (or "[ ]" or "\ " for spaces only).
Then you can use "[0-9]*" to match numeric and "[^0-9]*" for non-numeric. (In other regex implementations you can use "\d*" and "\D*" for these.)
Make sure you're quoting the pattern in single quotes, so you don't get shell completions happening.

Depending on what the data actually is and how strict the format is, sed may be fine, or maybe grep is fine, or awk might be better, or a specific parser may be best.


Last edited by boughtonp; 09-04-2020 at 04:28 PM.
 
1 members found this post helpful.
Old 09-04-2020, 04:24 PM   #4
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,779

Rep: Reputation: 613Reputation: 613Reputation: 613Reputation: 613Reputation: 613Reputation: 613
Help us to help you. Provide a sample input file (10-15 lines will do). Construct a sample output file which corresponds to your sample input and post both samples here. With "InFile" and "OutFile" examples we can better understand your needs and also judge if our proposed solution fills those needs.

Daniel B. Martin

.
 
1 members found this post helpful.
Old 09-04-2020, 04:32 PM   #5
astrogeek
Moderator
 
Registered: Oct 2008
Distribution: Slackware [64]-X.{0|1|2|37|-current} ::12<=X<=14, FreeBSD_12{.0|.1}
Posts: 5,434
Blog Entries: 11

Rep: Reputation: 3398Reputation: 3398Reputation: 3398Reputation: 3398Reputation: 3398Reputation: 3398Reputation: 3398Reputation: 3398Reputation: 3398Reputation: 3398Reputation: 3398
I think I would use awk for field based matching and operation, so if you are not locked into sed you may try something like this:

Code:
cat infile
abc 123 456 xyz
def 0 0 fed
hij 0 17 pid

awk '$2 ~ /[0-9]+/ && $3 ~ /[1-9][0-9]*/{print}' infile
abc 123 456 xyz
hij 0 17 pid

Or more simply...

awk '$2 ~ /[0-9]+/ && $3!=0{print}' infile
But if sed is your tool of choice, then this should do it:

Code:
sed -rn '/^[^ \t]+[ \t]+[0-9]+[ \t]+[1-9][0-9]*/p' infile
abc 123 456 xyz
hij 0 17 pid
How it works:

The -n option to sed tells it to not print any output unless told to do so.

The -r option tells it to use extended regex syntax which means we do not have to escape things like '+' in our expressions, among other things.

Since you are using whitespace separated fields in the input, we have to detect those field breaks in order to match them, this is what the [ \t]+ terms match - literally "one or more spaces or tabs".

You don't say what is in the first field so all we know is that it will end with a field separator, so we anchor the match to the first of the line with '^' and follow it with [^ \t]+, literally "one or more non space or tab characters followed by a space or tab".

Then for the numeric fields, to match any group of numeric digits we use [0-9]+ and for a non-zero numeric field, [1-9][0-9]*, literally "a non-zero digit followed by zero or more digits".

The '+' operator means one or more of what imediately precedes it in a regex, '*' means zero or more of the preceding expression.

Finally, the 'p' following the match expression tells sed to print anything that matches.

As always, read the helpful man pages! In addition to man sed be sure to become familiar with man perlre for the regular expression syntax.

Good luck!

Last edited by astrogeek; 09-05-2020 at 02:26 PM. Reason: spelling, duh!
 
Old 09-05-2020, 05:10 AM   #6
hopeless_n00b
LQ Newbie
 
Registered: Aug 2014
Posts: 21

Original Poster
Rep: Reputation: Disabled
I think you got me sorted. Thanks!
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
sed delete lines with pattern to pattern (exluding the second) Jykke Linux - Software 10 07-23-2018 02:43 AM
How to capture 1000 lines before a string match and 1000 line a string match including line of string match ? sysmicuser Linux - Newbie 12 11-14-2017 05:21 AM
[SOLVED] Adding (not replacing) a pattern match with a similar pattern? b-bri Linux - Newbie 2 08-31-2009 12:36 AM
printing pattern match and not whole line that matches pattern Avatar33 Programming 13 05-06-2009 06:17 AM
how to use the sed w option to redirect pattern match to file nickleus Linux - General 11 04-18-2006 08:34 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 03:54 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration