LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 05-19-2010, 09:13 PM   #16
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,006

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191

Well I think for the two scenario that ntubski's is the winner, but happy to throw hat in for or more part:
Code:
awk 'BEGIN{patterns="aunque tengo";split(patterns,array)}{for(x in array)if($0 ~ array[x])i++;if(i < 2)print;i=0}' input_file > output_file
 
1 members found this post helpful.
Old 05-20-2010, 12:31 PM   #17
patolfo
Member
 
Registered: Jan 2006
Distribution: Debian-Sarge r2-k.2.6.8-2.386
Posts: 101

Original Poster
Blog Entries: 1

Rep: Reputation: 15
only works with two patterns

Quote:
Originally Posted by grail View Post
Well I think for the two scenario that ntubski's is the winner, but happy to throw hat in for or more part:
Code:
awk 'BEGIN{patterns="aunque tengo";split(patterns,array)}{for(x in array)if($0 ~ array[x])i++;if(i < 3)print;i=0}' input_file > output_file
I think the only problem with your code is that the number of rgexs to look for is hard coded into the "if(i < 3)" expression.

I am thinking of adding a variable having the array length stored in it, and use it in the conditional.

But anyway these are the codes that make just the right thing
Code:
#!/bin/bash
sed  '/aunque/{/me/{/daņo/d}}' $1 > output_file
sed  '/aunque.*me.*daņo/d' $1 > output_file2
awk 'BEGIN{patterns="aunque me daņo";split(patterns,array)}{for(x in array)if($0 ~ array[x])i++;if(i < 3)print;i=0}' $1 > output_file3

max_matches=1               #max number of pattern matches allowed
patterns=('aunque' 'me' 'daņo') #the patterns to match (you can use as many as you want)
file="$1"
counts="$( eval echo -n {1..$(($max_matches+1))} | tr ' ' '|' )"
{ for pattern in "${patterns[@]}"; do
  egrep -n "$pattern" "$file"
done; grep -n '' "$file"; } | sort -n | uniq -c | egrep "^ *($counts) " | sed -r 's/^[^:]+://'
exit
Can awk edit in place, like the -i option in sed?

p.s.Now i looked for sed -r option in google, and i got this:
-r, --regexp-extended
use extended regular expressions in the script.

And what the heck are those, expanded regexps?
Somebody, some light, which are normal and which advanced, regexp?
 
Old 05-20-2010, 12:35 PM   #18
patolfo
Member
 
Registered: Jan 2006
Distribution: Debian-Sarge r2-k.2.6.8-2.386
Posts: 101

Original Poster
Blog Entries: 1

Rep: Reputation: 15
Arrow ta0kira i tried your code...

but i can not get it to work, well it runs alright, but nothing appears in the console, or the file...
Besides can you explain the regexp inside the last sed: "sed -r 's/^[^:]+://", i think that is where the problem is
Code:
#!/bin/bash

max_matches=1               #max number of pattern matches allowed
patterns=('aunque' 'tengo') #the patterns to match (you can use as many as you want)

file="$1"

counts="$( eval echo -n {1..$(($max_matches+1))} | tr ' ' '|' )"

{ for pattern in "${patterns[@]}"; do
  egrep -n "$pattern" "$file"
done; grep -n '' "$file"; } | sort -n | uniq -c | egrep "^ *($counts) " | sed -r 's/^[^:]+://'
 
Old 05-20-2010, 04:04 PM   #19
ntubski
Senior Member
 
Registered: Nov 2005
Distribution: Debian, Arch
Posts: 3,780

Rep: Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081
Quote:
And what the heck are those, expanded regexps?
Somebody, some light, which are normal and which advanced, regexp?
Extended-regexps.

The problem with ta0kira's script is the last character in the egrep pattern needs to be a tab:
Code:
| egrep "^ *($counts)<TAB>" |
The script can be simplified a bit more:
Code:
#!/bin/bash

max_matches=1               #max number of pattern matches allowed
patterns=('aunque' 'tengo') #the patterns to match (you can use as many as you want)

file="$1"

counts="$( seq --separator '|' $((max_matches+1)))"

{ for pattern in "${patterns[@]}"; do
  egrep -n "$pattern" "$file"
done; grep -n '' "$file"; } | sort -n | uniq -c | sed -nr "/^ *($counts)[\t]/{s/^[^:]+://;p}"
Quote:
ntubski (if it is a name where does it comes from)
A combination of my first initial and a corruption of my last name.
 
Old 05-20-2010, 07:18 PM   #20
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,006

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
Quote:
I think the only problem with your code is that the number of rgexs to look for is hard coded into the "if(i < 3)" expression.
It is < 2 as if you have two or more of the required regex's (name of thread) then we don't want that printed to new file (hence deleted).
You can use length(array) in {g}awk instead.

And as far as I know there is no -i option similar to sed as remembering also that awk doesn't necessarily have to update a file it is more for using the file/input
to generate information as required.
 
Old 05-20-2010, 07:55 PM   #21
ta0kira
Senior Member
 
Registered: Sep 2004
Distribution: FreeBSD 9.1, Kubuntu 12.10
Posts: 3,078

Rep: Reputation: Disabled
Quote:
Originally Posted by ntubski View Post
The problem with ta0kira's script is the last character in the egrep pattern needs to be a tab:
Code:
| egrep "^ *($counts)<TAB>" |
I guess it depends on the implementation of uniq. Mine uses a space (FreeBSD.) To be safe, maybe just use '^ *($counts)[ \t]+'.

sed -r 's/^[^:]+://' matches from the beginning of the line up until the first ":", then deletes all of it. This gets rid of the duplication count given by uniq -c and the line numbering given by grep -n.

grail's solution has at least two advantages over mine:
  1. It can be used with piped input; it only reads the data once.
  2. It's only one line, although the comment regarding the hard-coding of the patterns can be solved any number of very simple ways using a script.
Kevin Barry
 
Old 05-21-2010, 12:30 PM   #22
patolfo
Member
 
Registered: Jan 2006
Distribution: Debian-Sarge r2-k.2.6.8-2.386
Posts: 101

Original Poster
Blog Entries: 1

Rep: Reputation: 15
Well i am using suse
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Deleting empty line at end of text file in BASH human2.0 Linux - General 8 04-01-2009 02:44 AM
vim or sed multiline regexp matching eentonig Programming 1 09-08-2008 09:06 AM
javascript regexp - strange exec behaviour, or space matching? jkobrien Programming 3 08-20-2008 07:09 AM
SED - Delete line above or below as well as matching line... OldGaf Programming 7 06-26-2008 11:51 PM
help with sed / regexp elinenbe Programming 2 02-01-2008 10:09 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 03:29 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration