LinuxQuestions.org
Register a domain and help support LQ
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices



Reply
 
Search this Thread
Old 11-22-2012, 12:49 PM   #1
vitki
LQ Newbie
 
Registered: Nov 2012
Location: South-Africa
Distribution: Fedora , Centos , Backtrack , Suse
Posts: 12

Rep: Reputation: Disabled
Bash scripting (matching $var)


Hi

New to this forum so please excuse me….

I need help with a bash script. I need to match 2 files entry’s and write them in a 3 file.

File1 looks like this

432345,1-Jun-12,1-Jun-12, 552343234,"1,000.00", 552343234,;
543576,4-Jun-12,4-Jun-12, 3453452344,"1,000.00", 3453452344,;
657865,4-Jun-12,4-Jun-12, 2345453456,"1,000.00", 2345453456,;
765756,4-Jun-12,4-Jun-12, 2344564345,"1,000.00", 2344564345,;

File2 like this
123545,12-Jun-12,10-Jun-12, 552343234,"-1,000.00", 55234323,;
677865,1-Jun-12,5-Jun-12, 3453452344,"-1,000.00", 34534523444,;
345325,24-Jun-12,3-Jun-12, 2345453456,"-1,000.00", 234545345666,;
654564,30-Jun-12,5-Jun-12, 2344564345,"-1,000.00", 2344564345,;
654565,30-Jun-12,5-Jun-12, 2344564345,"-1,000.00", 2344564345,;

Now I need to mach entry’s from file 1 to entry’s in file 2, then write it to file 3 as output so that it looks like this

654564,30-Jun-12,5-Jun-12, 2344564345,"-1,000.00", 2344564345,; match
123545,12-Jun-12,10-Jun-12, 552343234,"-1,000.00", 55234323,; match2
543576,4-Jun-12,4-Jun-12, 3453452344,"1,000.00", 3453452344,;match
677865,1-Jun-12,5-Jun-12, 3453452344,"-1,000.00", 34534523444,;match2
657865,4-Jun-12,4-Jun-12, 2345453456,"1,000.00", 2345453456,;match
345325,24-Jun-12,3-Jun-12, 2345453456,"-1,000.00", 2345453456,;match2
765756,4-Jun-12,4-Jun-12, 2344564345,"1,000.00", 2344564345,;match
654564,30-Jun-12,5-Jun-12, 2344564345,"-1,000.00", 2344564345,;match2
654565,30-Jun-12,5-Jun-12, 2344564345,"-1,000.00", 2344564345,; Double

Now not all entry’s are in order this is only an example.

Any help would be appreciated I have been struggling with this for 2 days now, I can match some of the entry’s but have a problem with the doubles and my file 3 gets huge. The file format is all in csv.

I was thinking of reading file 1 line by line then matching it to file 2 line by line and writing the output to file 3, then removing that match line from file 1 and 2, if more than 1 entry exist it might match later on with another entry.

The second problem is not all the numbers do not match fully in some entry's, but if I can match those that do so long I can work on the non matching ones later.

Vitki

The code i have.... (its a mess i know)
#!/bin/sh

#Clean files
rm -f output.csv
rm -f nomatch.csv

#Edit files
#sed -i 's/$/;/' cre.csv
#sed -i 's/$/;/' deb.csv

DELIMITER=","
COUNT=1

function log_out() {

echo $1 >> output.csv
}

cre=`wc -l cre.csv | grep -o '[0-9]*'`
deb=`wc -l deb.csv | grep -o '[0-9]*'`

echo "total cre records are $cre"
echo "total deb records are $deb"


for i in $(cut -f 7 -d "${DELIMITER}" cre.csv | grep -ow '[0-9][0-9][0-9][0-9][0-9][0-9]*');
do

mat=`grep "$i" deb.csv`
code=$?
if [ $code -ne 0 ] ; then echo $mat >> deb.done.csv
sed -i "/$mat/d" deb.csv ; else
mat2=`grep "$i" cre.csv`
log_out "$mat,match"
log_out "$mat2,match2"
echo -n "$COUNT,"
((COUNT++))
fi

done

echo ""
echo "Data Match"
echo "done"

COUNT=1

for i in $(cut -f 1 -d "${DELIMITER}" cre.csv);
do

mat=`grep "$i" output.csv`
code=$?
if [ $code -ne 0 ] ; then echo "$i,cre no match deb file" >> nomatch.csv; else
sed -i '/'"$i"'/s/^/#/' cre.csv
echo -n "$COUNT,"
((COUNT++))
fi

done

echo ""
echo "NO Data Match"
echo "done"

nomatch=`wc -l nomatch.csv | grep -o '[0-9]*'`
echo "No match found for $nomatch records"
sed -i "s/; /\\`echo -e '\n\r'`/g" output.csv

Last edited by vitki; 11-22-2012 at 01:22 PM. Reason: add code
 
Old 11-22-2012, 03:28 PM   #2
markush
Senior Member
 
Registered: Apr 2007
Location: Germany
Distribution: Slackware
Posts: 3,974

Rep: Reputation: 849Reputation: 849Reputation: 849Reputation: 849Reputation: 849Reputation: 849Reputation: 849
Hello vitki, welcome to LQ,

as far as I understand your code you want to find out which of the numbers in the last column of the first file match to the numbers in the last column of the second file. Is this true?

Markus
 
Old 11-22-2012, 03:33 PM   #3
vitki
LQ Newbie
 
Registered: Nov 2012
Location: South-Africa
Distribution: Fedora , Centos , Backtrack , Suse
Posts: 12

Original Poster
Rep: Reputation: Disabled
Yip that is 100% right
 
Old 11-22-2012, 03:40 PM   #4
markush
Senior Member
 
Registered: Apr 2007
Location: Germany
Distribution: Slackware
Posts: 3,974

Rep: Reputation: 849Reputation: 849Reputation: 849Reputation: 849Reputation: 849Reputation: 849Reputation: 849
Then this onliner
Code:
for i in `cut -f 7 -d, cre.csv | grep -ow '[0-9]*'`; do grep $i deb.csv ; done
should already do most of the work? You can execute it on the commandline.

I have not yet understood what exactly the output should be. The onliner prints every line of deb.csv where the number in the last column matches.

Markus
 
Old 11-22-2012, 03:48 PM   #5
vitki
LQ Newbie
 
Registered: Nov 2012
Location: South-Africa
Distribution: Fedora , Centos , Backtrack , Suse
Posts: 12

Original Poster
Rep: Reputation: Disabled
The output should be from file 1 and file 2. So entry in file 1 (test123) must match entry in file 2 (test123). this must then be stored in output as

test123 file1
test123 file2

In other words I want both the query and the match in the output.
 
Old 11-22-2012, 04:04 PM   #6
markush
Senior Member
 
Registered: Apr 2007
Location: Germany
Distribution: Slackware
Posts: 3,974

Rep: Reputation: 849Reputation: 849Reputation: 849Reputation: 849Reputation: 849Reputation: 849Reputation: 849
Well, I'll try to explain how I would do this, I'm not that fast with coding

Make a loop over all lines of cre.csv. cut the number from the line and put it in a variable.
Then find the number against the deb.csv file and if a line matches print both lines.

Markus
 
Old 11-22-2012, 04:28 PM   #7
vitki
LQ Newbie
 
Registered: Nov 2012
Location: South-Africa
Distribution: Fedora , Centos , Backtrack , Suse
Posts: 12

Original Poster
Rep: Reputation: Disabled
No problem me 2.  Yes that is what the current code is doing in a lame way. I have been experimenting with while loop (while read line cre.csv do …) but I get stuck with the second while loop for deb.csv. Meaning I have a while loop with in a while loop (not the best way I think) plus when I do the loop the output file grows at an enormous rate meaning something is definitely wrong.

What I have done is with the first while loop is to read the first line in the cre.csv file then in the second while loop try and match it with a line in deb.csv print both query and match to the output file and remove them from cre.csv and deb.csv so that it can’t the matched in again. The other problem is if more than one match exists in deb.csv to output that as well plus if cre.csv has a double. How to handle the doubles is a big problem.
 
Old 11-22-2012, 04:30 PM   #8
markush
Senior Member
 
Registered: Apr 2007
Location: Germany
Distribution: Slackware
Posts: 3,974

Rep: Reputation: 849Reputation: 849Reputation: 849Reputation: 849Reputation: 849Reputation: 849Reputation: 849
Now I have this
Code:
#!/bin/sh

IFS=$'\012'
for line in `cat cre.csv`
do
        number=$(echo $line | cut -d, -f 7)
        match=`cat deb.csv | grep $number`
        if [ -n $match ]; then
                echo $line "from cre.csv"
                echo $match "from deb.csv"
        fi
done
Maybe you'll compare it with your code and find some ideas.

Markus
 
Old 11-22-2012, 04:53 PM   #9
vitki
LQ Newbie
 
Registered: Nov 2012
Location: South-Africa
Distribution: Fedora , Centos , Backtrack , Suse
Posts: 12

Original Poster
Rep: Reputation: Disabled
Very nice…. I ran it now and for the most part it is working. Now and again it comes up with “line 8: [: too many arguments” but I think that is because of the deb.csv file that has some text in the last field… I will play around with your code a bit. Thanks
 
Old 11-22-2012, 08:53 PM   #10
konsolebox
Senior Member
 
Registered: Oct 2005
Distribution: Gentoo, Slackware, LFS
Posts: 2,245
Blog Entries: 16

Rep: Reputation: 233Reputation: 233Reputation: 233
I'm a little confused. Could you give us an example by example for every kind of match? What type of match gets "match", what type of match gets "match2", "Double" and so on. Would there be "match3" as well? Please elaborate.
 
Old 11-23-2012, 12:40 AM   #11
markush
Senior Member
 
Registered: Apr 2007
Location: Germany
Distribution: Slackware
Posts: 3,974

Rep: Reputation: 849Reputation: 849Reputation: 849Reputation: 849Reputation: 849Reputation: 849Reputation: 849
In short the OP means: match the numbers in the last column of the lines in the first file with the numbers of the last column of the second file and if it matches print both lines and append a string which marks from which file the line is.

Markus
 
Old 11-23-2012, 12:59 AM   #12
vitki
LQ Newbie
 
Registered: Nov 2012
Location: South-Africa
Distribution: Fedora , Centos , Backtrack , Suse
Posts: 12

Original Poster
Rep: Reputation: Disabled
That is correct, if no match occurs then leave the entry in the file and carry on to the next. This means at the end of the filtering / matching possess i will have one file with the quarry and match's and the cre and deb file with the entry's that does not match. We could even move those entry's that does not match out to another file it does not matter. The append string is also not of grate importance it is a nice to have.

here's some data
cre.csv
636973;5-Jun-12;5-Jun-12;FKEYUEWOD GFD4111800221;1000;FGHJ4111800221
645966;4-Jun-12;4-Jun-12;SDFDF RER41111800329;1000;4111800329
612343;1-Jun-12;1-Jun-12;SDSWRJDFA RER41118005043;1000;41118005043
629334;4-Jun-12;4-Jun-12;DAS RER800504;1000;ERTR4111800504
659633;4-Jun-12;4-Jun-12;DAS RER800504;1000;UYTY4111800504
645343;14-Jun-12;16-Jun-12;HJGFDAFA RER41118005;1000;41118005
deb.csv
670644;8-Jun-12;9-Jun-12;FKEYUEWOD GFD4111800221;-1000;QWE800221
629236;5-Jun-12;10-Jun-12;DFGGDF RER4111700329;-1000;EWE4111800329
622323;2-Jun-12;3-Jun-12;TDFHGGH RER41118005043;-1000;FDE41118005043
624426;2-Jun-12;3-Jun-12;TDFHGGH RER41118005043;-1000;8005043
613459;5-Jun-12;15-Jun-12;FRE RER800504;-1000;FDSA4111800504
643758;5-Jun-12;25-Jun-12;FRE RER800504;-1000;FDSA4111800504

Last edited by vitki; 11-23-2012 at 01:16 AM. Reason: add data
 
Old 11-23-2012, 05:29 AM   #13
Reuti
Senior Member
 
Registered: Dec 2004
Location: Marburg, Germany
Distribution: openSUSE 13.1
Posts: 1,320

Rep: Reputation: 252Reputation: 252Reputation: 252
In case you want just the output, you can check the options to the join command:
Code:
$ join -j 6 -t ";" <(sort -t ";" -k 6 cre.csv) <(sort -t ";" -k 6 deb.csv)
You can limit the output to certain fields and matching or non-matching lines.

Last edited by Reuti; 11-23-2012 at 05:32 AM. Reason: Typo
 
Old 11-23-2012, 06:22 AM   #14
vitki
LQ Newbie
 
Registered: Nov 2012
Location: South-Africa
Distribution: Fedora , Centos , Backtrack , Suse
Posts: 12

Original Poster
Rep: Reputation: Disabled
This does seem like a nice quick way to match, although it does not know what to do with the double entry's and reuse entry's that has been used already, but i think i must read a bit more on the man pages. Maybe this can help more with the non-matching... Awesome, Thanks
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Scripting- sed command - replacing strings excluding the lines matching a pattern(s) msriram Linux - General 1 11-05-2012 10:29 AM
[SOLVED] Bash, find : How to avoid [...] pattern matching in file names expanded from "$var"? Telengard Programming 19 04-23-2011 03:36 AM
bash string matching Crafttype Linux - Newbie 2 05-26-2009 07:27 PM
Pattern matching in BASH. ccin1492 Programming 8 12-19-2008 12:00 PM
Shell scripting: Matching and substituing between two lists Micro420 Programming 2 05-20-2007 05:58 AM


All times are GMT -5. The time now is 11:57 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration