LinuxQuestions.org
Latest LQ Deal: Linux Power User Bundle
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 03-03-2006, 09:06 PM   #1
realized
LQ Newbie
 
Registered: Oct 2004
Posts: 23

Rep: Reputation: 15
script help...


I have a script that compares fields in a csv file and spams me the matches..

#!/bin/bash

if [ ! -r $1 ]; then
echo "$1 does not exist or is not readable."
fi

gawk -F ',' '{ print $1 FS $2 FS $3 }' $1 | sed 's/ //g' | tr A-Z a-z | sort | uniq -d




My goal is for it to spam me the ENTIRE LINE of the MATCH.. i.e i will get at least 2 lines for every 1 match.. ..


Any ideas?
 
Old 03-04-2006, 10:49 AM   #2
muha
Member
 
Registered: Nov 2005
Distribution: xubuntu, grml
Posts: 451

Rep: Reputation: 37
I'm not entirely sure what you want to do. Maybe post some of the input and intended output :?
I think you want: egrep PATTERN
so something like:
gawk -F ',' '{ print $1 FS $2 FS $3 }' $1 |egrep match
to get the lines which contain 'match'
 
Old 03-06-2006, 11:13 PM   #3
realized
LQ Newbie
 
Registered: Oct 2004
Posts: 23

Original Poster
Rep: Reputation: 15
change:

the goal is now to take a CSV FILE, and look at the FIRST FIELD (before the first ,)

If any lines, the first field of the line, matches.. i want the ENTIRE LINE/ALL FIELDS of that line spammed to the screen.

sooo the first part:

gawk -F ',' '{ print $1 }' $1 | uniq -d
works fine.

so now i have the output of JUST the matchs.. how do i "grep" each line of that .. i.e

grep 'match' $1 ??

how do i define the "match" ?
 
Old 03-07-2006, 03:44 AM   #4
muha
Member
 
Registered: Nov 2005
Distribution: xubuntu, grml
Posts: 451

Rep: Reputation: 37
again i'm not really positive on what you want since you did not give an example.
Anyways, it sounds like you want to match a PATTERN (let's say foo) before the first comma.
If it is matched, the output is the whole line. Rather then using cat, gawk, grep; i'd use sed.
Inputfile:
Code:
$ : cat aa.txt
output,test,test,
foo,test,test,
bar2,test,test,
foo,test,test
footest,test,
the interesting part:
Code:
$ : sed -n '/^foo,/p' aa.txt
foo,test,test,
foo,test,test
Or if there is something else behind the pattern foo allowed:
Code:
$ : sed -n '/^foo.*,/p' aa.txt
foo,test,test,
foo,test,test
footest,test,
If there is something before and after the pattern allowed:
Code:
$ : sed -n '/^[^,]*foo.*,/p' aa.txt
foo,test,test,
foo,test,test
footest,test,
testfoo,test,test,
What i'm trying to do:
-n no output. Usually used together with p.
-p print output.
^foo matches foo in the begin of the sentence
.* matches all characters (.) zero or more times (*)
[^,]* ^ for negation; so match a single non-comma; * for zero or more times.

If you really need the gawk, grep, you might be able to use line-numbers to pass the line number to print from gawk to grep ..

Last edited by muha; 03-07-2006 at 03:50 AM.
 
Old 03-07-2006, 04:25 AM   #5
timmeke
Senior Member
 
Registered: Nov 2005
Location: Belgium
Distribution: Red Hat, Fedora
Posts: 1,515

Rep: Reputation: 61
Or simply use:
Code:
grep -e '^foo' $1
That grep's all lines starting with "foo".
Similar regular expression like in the sed example from muha are possible too.
awk is more powerful than sed and grep, but it can be trickier too.

If you insist on using awk, try something like this:
Code:
awk -F',' '/^foo/ {print;}' $1
Or to match certain columns:
Code:
awk -F',' '{if ($1=='foo') print;}' $1
In awk, the command "print;" prints the current line.
There are other possibilities too (like using the ~ or !~ operators in the if-test for matching regular expressions.

Edit: corrected small typo in awk commands.

Last edited by timmeke; 03-07-2006 at 06:17 AM.
 
Old 03-07-2006, 05:56 AM   #6
muha
Member
 
Registered: Nov 2005
Distribution: xubuntu, grml
Posts: 451

Rep: Reputation: 37
@timmeke, these two awk's don't work for me ..
This one does work for me:
Code:
awk -F ',' '/^foo/ {print}' aa.txt
It also takes prints the line when there is something between the pattern foo and the comma.
 
Old 03-07-2006, 06:19 AM   #7
timmeke
Senior Member
 
Registered: Nov 2005
Location: Belgium
Distribution: Red Hat, Fedora
Posts: 1,515

Rep: Reputation: 61
You're right. The $1 at the end should of course be OUTSIDE of the single quotes, otherwise the shell
will say "file $1 not found" or something like that (the shell needs to interprete $1 as a variable,
not literally).

I've edited my previous post accordingly.

Quote:
It also takes prints the line when there is something between the pattern foo and the comma.
Indeed. The regular expression /^foo/ matches any line that starts with "foo", regardless of what follows.
But you can use any regular expression you like...
 
Old 03-07-2006, 06:57 AM   #8
muha
Member
 
Registered: Nov 2005
Distribution: xubuntu, grml
Posts: 451

Rep: Reputation: 37
Not trying to bitch here, but am trying to learn
@timmekes first awk: since we are not separating the columns, we can ditch the -F option.
We only check whether a line starts with foo, so:
Code:
awk '/^foo/ {print;}' aa.txt
The second awk only works when i double quote it, instead of single quotes 'foo'
Code:
awk -F',' '{if ($1=="foo") print;}' aa.txt
 
Old 03-07-2006, 09:32 AM   #9
archtoad6
Senior Member
 
Registered: Oct 2004
Location: Houston, TX (usa)
Distribution: MEPIS, Debian, Knoppix,
Posts: 4,727
Blog Entries: 15

Rep: Reputation: 233Reputation: 233Reputation: 233
OP: Please, please, give us a (short) sample input file, the current output, & the desired output.

At this point I don't know what more you are trying to accomplish beyond what you already have.

Perhaps this is what you want:
Code:
F=<name_of_target_file> 
S=","   # the separator, for flexiblity 
L=1     # this a variable to aid debugging 
for X in `cut -d"$S" -f 1 $F | sort -u` 
do 
   if [ `grep "^$X$S" $F | wc -l` -ge $L ] 
     then grep "^$X$S" $F 
   fi 
done
Use at your own risk & no fair flaming any dumb coding errors -- I had no test target to run it on & I ain't writing one for you.

RTFM list:
  • uniq
  • sort
  • cut
  • wc
  • test
  • uniq
  • (bash)

This should also work:
Code:
F=<name_of_target_file> 
S="," 
uniq -t"$S" -D -W1 $F
Technically, this could be a one-liner, but using the variables makes it: a) easier to read, b) more flexible. (I thought my 1st try was too good to omit, & it gives a good demo finding a better way.)

Last edited by archtoad6; 03-20-2006 at 07:11 AM. Reason: put "F=" lines @ top of code blocks
 
Old 03-07-2006, 01:11 PM   #10
realized
LQ Newbie
 
Registered: Oct 2004
Posts: 23

Original Poster
Rep: Reputation: 15
Example of file..


user@isp.com,500,200,100
test@aol.com,5431,3015,3561
casper@earthlink.net,4301,hah,mofo
user@isp.com,3051,01001,ajksdf,dadsf
homo@mofoisp.com,3035,1950,00dc,fmo
psst@hushmail.com,9315,d00,0llld,f
test@aol.com,3013,34,6,61


So that file has 2 matches... we are just comparing the first field.. the email addresses..

matches are:

test@aol.com
user@isp.com

so i want a script to parse a file like that, with any LINE where the FIRST FIELD matches ANY OTHER LINE OF THE FILE, i want it to show the entire line...

so the output would be:

user@isp.com,500,200,100
user@isp.com,3051,01001,ajksdf,dadsf

test@aol.com,5431,3015,3561
test@aol.com,3013,34,6,61
 
Old 03-08-2006, 01:47 AM   #11
timmeke
Senior Member
 
Registered: Nov 2005
Location: Belgium
Distribution: Red Hat, Fedora
Posts: 1,515

Rep: Reputation: 61
@muha. Thanks for the corrections.
You're right for both awks.

@realized. To find "double" entries in the first column, I often use a little shell or Perl scripting.
awk can probably do it to.

A not-so-performant Bash example, using grep:
Code:
adr=`cut -d',' -f1 your_file | sort -u`; # this retrieves all (unique) e-mail addresses from your file
for i in $adr; do
   count=`grep -e "^${i}" your_file|wc -l`; #counts the number of occurrences for each of the addresses
   if (( $count > 1 )); then
      grep -e "^${i}" your_file; #prints all the matches found 
   fi;
done
A Perl script would be more efficient, but it requires that you sort your file on the e-mail addresses first. In your case, this sorting is easy, since the addresses are in the first column (start of the lines).
So do
Code:
sort your_file > sorted_file;
first.
Then: use a script like this.
You may need to change the path to the Perl interpreter and possibly add some error/input checks.
Code:
#!/usr/bin/perl -w
$prevAdr=""; #we'll store the previous entry in this var.
$prevLine="";
open(FILE, $ARGV[0])||die "Cannot open file $ARGV[0]";
while(<FILE>)
{
   chop();
   $line=$_;
   @elem=split("\," , $line); #splits the line up based on the "," field separator
   if ($elem[0] eq $prevAdr)
   {
      print "$prevLine\n";
      print "$line\n";
   }
   $prevLine=$line;
   $prevAdr=$elem[0];
}
close(FILE);
Please note that:
1. I haven't tested this script, so it can be buggy.
2. For addresses that occur more than 2 times, which isn't checked, the script will print some entries twice. So you may want to pipe the output of this script into a "sort -u" or "uniq".

[/CODE]

Last edited by timmeke; 03-08-2006 at 01:59 AM.
 
Old 03-08-2006, 02:05 AM   #12
timmeke
Senior Member
 
Registered: Nov 2005
Location: Belgium
Distribution: Red Hat, Fedora
Posts: 1,515

Rep: Reputation: 61
After reading the man page of the "uniq" command (I had never used it before), it seems that that command can do the same trick I just described.
Code:
uniq -d
or
Code:
uniq -D
seem to be your friends...
 
Old 03-20-2006, 06:57 AM   #13
archtoad6
Senior Member
 
Registered: Oct 2004
Location: Houston, TX (usa)
Distribution: MEPIS, Debian, Knoppix,
Posts: 4,727
Blog Entries: 15

Rep: Reputation: 233Reputation: 233Reputation: 233
realized: Thanks for the sample.

timmeke: As I said, RT(F)M uniq

indeed uniq -D will do what you want, provided you sort the input 1st:
Code:
F=<name_of_target_file>
# S is a var. for flexibility -- -t, would also work
S=","
sort $F  | uniq -t"$S" -D -W1
It worked here on your sample.
 
Old 03-20-2006, 07:18 AM   #14
archtoad6
Senior Member
 
Registered: Oct 2004
Location: Houston, TX (usa)
Distribution: MEPIS, Debian, Knoppix,
Posts: 4,727
Blog Entries: 15

Rep: Reputation: 233Reputation: 233Reputation: 233
BTW, my 1st piece of code also works, provided L=2. L=1 shows all lines.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Iptables (with masq) troubleshooting, very simple script attached script and logs. xinu Linux - Networking 13 11-01-2007 04:19 AM
python cgi script and premature end of script headers Neruocomp Programming 1 07-28-2005 11:43 AM
creating shell script that executes as root regardless of who runs the script? m3kgt Linux - General 13 06-04-2004 10:23 PM
send automatic input to a script called by another script in bash programming jorgecab Programming 2 04-01-2004 12:20 AM
linux 9 and java script error - premature end of script header sibil Linux - Newbie 0 01-06-2004 04:21 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 11:07 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration