LinuxQuestions.org
Visit the LQ Articles and Editorials section
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 08-19-2012, 07:44 PM   #1
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Ubuntu
Posts: 1,136

Rep: Reputation: 299Reputation: 299Reputation: 299
Elimination of lines having fewer than 3 words


Have:
Code:
how now
now is the time for
now
 
  holy  cow  
  the quick   brown fox   
 jumped over
the candlestick
Want:
Code:
now is the time for
  the quick   brown fox
I've been fumbling with variations on this...
Code:
sed -r '/\w{3}.+/p' $InFile
... without success.

Please advise.

Daniel B. Martin
 
Old 08-19-2012, 07:50 PM   #2
byannoni
Member
 
Registered: Aug 2012
Location: /home/byannoni
Distribution: Arch
Posts: 128

Rep: Reputation: 36
This is off the top of my head, if it doesn't work for you I'll be happy to develop it further:
Code:
awk -F'\\s*' 'NF > 4'
Edit:
Actually, this works better:
Code:
perl -ne 'print $_ if $_ =~ /\s*(?:\w+\s+){2,}\w+/'
Here is an equivalent awk for the Perl:
Code:
awk '$0 ~ /\s*(\w+\s+){2,}\w+/'

Last edited by byannoni; 08-19-2012 at 08:23 PM. Reason: Added more awk
 
1 members found this post helpful.
Old 08-19-2012, 08:09 PM   #3
lyle_s
Member
 
Registered: Jul 2003
Distribution: Slackware
Posts: 388

Rep: Reputation: 52
Here's what I had in mind:
Code:
#!/bin/bash

while read
do
        if [ $(echo "$REPLY" | wc --words) -ge 3 ]
        then
                echo "$REPLY"
        fi
done
Code:
lyle@bowman:~/programming/sh$ ./lines < words.test 
now is the time for
  the quick   brown fox   
one two three
I added a line with 3 words to your sample data.

No awk/sed fancyness though.

Lyle.
 
Old 08-19-2012, 09:07 PM   #4
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Ubuntu
Posts: 1,136

Original Poster
Rep: Reputation: 299Reputation: 299Reputation: 299
This didn't work ...
Code:
awk -F'\\s*' 'NF > 4'
... but you put me on the right track.
This does the job nicely ...
Code:
awk 'NF > 2'
Thank you.

Daniel B. Martin

Last edited by danielbmartin; 08-19-2012 at 09:08 PM. Reason: Cosmetic improvement
 
1 members found this post helpful.
Old 08-19-2012, 11:55 PM   #5
firstfire
Member
 
Registered: Mar 2006
Location: Ekaterinburg, Russia
Distribution: Debian, Ubuntu
Posts: 638

Rep: Reputation: 373Reputation: 373Reputation: 373Reputation: 373
Hi.

Using egrep (or grep -E):
Code:
$ cat infile
how now
now is the time for
now
 
  holy  cow  
  the quick   brown fox   
 jumped over
the candlestick
$ egrep '(\w+ +){3}' infile
now is the time for
  the quick   brown fox
The same with basic RE:
Code:
grep  '\(\w\+ \+\)\{3\}' infile

Last edited by firstfire; 08-20-2012 at 12:01 AM.
 
1 members found this post helpful.
Old 08-20-2012, 08:53 AM   #6
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Ubuntu
Posts: 1,136

Original Poster
Rep: Reputation: 299Reputation: 299Reputation: 299
[QUOTE=firstfire;4758746]
Code:
$ egrep '(\w+ +){3}' infile
now is the time for
  the quick   brown fox
This works but I don't understand it. Please elaborate.
This is my (mis)understanding.
Code:
{3} means 3 instances of (\w+ +) 
\w means "a word"
 + means "zero or more blanks"
Why is there a + following \w?

Daniel B. Martin
 
Old 08-20-2012, 10:24 AM   #7
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,562

Rep: Reputation: 1939Reputation: 1939Reputation: 1939Reputation: 1939Reputation: 1939Reputation: 1939Reputation: 1939Reputation: 1939Reputation: 1939Reputation: 1939Reputation: 1939
Not quite:

Code:
\w means a word character class ... ie same as [[:alnum:]]
+ means one or more
The issue with the code example given is if the line contains only 3 words there will be no space at the end hence it will fail
 
Old 08-20-2012, 10:30 AM   #8
firstfire
Member
 
Registered: Mar 2006
Location: Ekaterinburg, Russia
Distribution: Debian, Ubuntu
Posts: 638

Rep: Reputation: 373Reputation: 373Reputation: 373Reputation: 373
Hi.

Well, as Firefox developers say, this is embarrassing.. There should be '*' (a.k.a. Kleene star -- zero or more) instead of '+' (one or more):
Code:
egrep '(\w+ *){3,}'
This regular expression match a string consisting of three or more words, each followed by zero or more spaces, that is how a three-or-more-words string looks like.

Previous attempt (with ' +') worked on your sample data because there were no line with exactly 3 words. If that would be the case, then there must be at least one space after last word for that RE to work:
Code:
$ echo 'a b c' | egrep '(\w+ +){3}'
$ echo 'a b c ' | egrep '(\w+ +){3}'
a b c 
$ echo 'a b c ' | egrep '(\w+ *){3}'
a b c
$ echo 'a b c' | egrep '(\w+ *){3}'
a b c
Note last space after 'c'.

EDIT: grail beats me again

Last edited by firstfire; 08-20-2012 at 10:34 AM.
 
1 members found this post helpful.
Old 08-20-2012, 10:47 AM   #9
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Ubuntu
Posts: 1,136

Original Poster
Rep: Reputation: 299Reputation: 299Reputation: 299
Quote:
Originally Posted by firstfire View Post
Hi.

Well, as Firefox developers say, this is embarrassing.. There should be '*' (a.k.a. Kleene star -- zero or more) instead of '+' (one or more):
Code:
egrep '(\w+ *){3,}'
This regular expression match a string consisting of three or more words, each followed by zero or more spaces, that is how a three-or-more-words string looks like.

Previous attempt (with ' +') worked on your sample data because there were no line with exactly 3 words. If that would be the case, then there must be at least one space after last word for that RE to work:
Code:
$ echo 'a b c' | egrep '(\w+ +){3}'
$ echo 'a b c ' | egrep '(\w+ +){3}'
a b c 
$ echo 'a b c ' | egrep '(\w+ *){3}'
a b c
$ echo 'a b c' | egrep '(\w+ *){3}'
a b c
Note last space after 'c'.

EDIT: grail beats me again
The first code line fails but the code in the examples is different, and it works. Is there a t7po?

Daniel B. Martin
 
Old 08-20-2012, 11:17 AM   #10
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,562

Rep: Reputation: 1939Reputation: 1939Reputation: 1939Reputation: 1939Reputation: 1939Reputation: 1939Reputation: 1939Reputation: 1939Reputation: 1939Reputation: 1939Reputation: 1939
You might need to be a bit more specific daniel about which first line of code you are referring to?
 
Old 08-20-2012, 12:31 PM   #11
firstfire
Member
 
Registered: Mar 2006
Location: Ekaterinburg, Russia
Distribution: Debian, Ubuntu
Posts: 638

Rep: Reputation: 373Reputation: 373Reputation: 373Reputation: 373
Hi, Daniel.

Again, I'm wrong:
Code:
$ echo 'how now'| sed -r  's/(\w+ *)(\w+ *)(\w+ *)/\1:\2:\3/'
how :no:w
So '(\w+ *){3}' is bad. It looks like the only way to do this using RE is to treat last word separately:
Code:
$ egrep '(\w+ +){2}\w' infile
now is the time for
  the quick   brown fox
I apologize for misleading posts. Shame on me
 
1 members found this post helpful.
Old 08-20-2012, 01:14 PM   #12
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Ubuntu
Posts: 1,136

Original Poster
Rep: Reputation: 299Reputation: 299Reputation: 299
[QUOTE=firstfire;4759364]It looks like the only way to do this using RE is to treat last word separately:
Code:
$ egrep '(\w+ +){2}\w' infile
This one is good.

Quote:
I apologize for misleading posts. Shame on me
You are forgiven. It has been a learning experience for both of us.

I'd mark this thread as SOLVED but it already wears that badge of honor.

Daniel B. Martin
 
  


Reply

Tags
awk, grep, sed


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] get two/more specific words on a line and print next few lines Kashif_Bash Programming 11 04-26-2012 12:15 AM
count lines and words lipun4u Linux - Newbie 2 02-15-2010 01:39 AM
How would I ignore other lines in a file with non unique words? btacuso Linux - Newbie 1 05-24-2009 08:20 AM
Get all lines containing 23 specific words with AWK cgcamal Programming 3 11-05-2008 10:51 AM
how many lines and words (bash) sharapchi Programming 4 12-15-2006 12:45 PM


All times are GMT -5. The time now is 10:03 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration