Bash scripting (matching $var)
Hi
New to this forum so please excuse me…. I need help with a bash script. I need to match 2 files entry’s and write them in a 3 file. File1 looks like this 432345,1-Jun-12,1-Jun-12, 552343234,"1,000.00", 552343234,; 543576,4-Jun-12,4-Jun-12, 3453452344,"1,000.00", 3453452344,; 657865,4-Jun-12,4-Jun-12, 2345453456,"1,000.00", 2345453456,; 765756,4-Jun-12,4-Jun-12, 2344564345,"1,000.00", 2344564345,; File2 like this 123545,12-Jun-12,10-Jun-12, 552343234,"-1,000.00", 55234323,; 677865,1-Jun-12,5-Jun-12, 3453452344,"-1,000.00", 34534523444,; 345325,24-Jun-12,3-Jun-12, 2345453456,"-1,000.00", 234545345666,; 654564,30-Jun-12,5-Jun-12, 2344564345,"-1,000.00", 2344564345,; 654565,30-Jun-12,5-Jun-12, 2344564345,"-1,000.00", 2344564345,; Now I need to mach entry’s from file 1 to entry’s in file 2, then write it to file 3 as output so that it looks like this 654564,30-Jun-12,5-Jun-12, 2344564345,"-1,000.00", 2344564345,; match 123545,12-Jun-12,10-Jun-12, 552343234,"-1,000.00", 55234323,; match2 543576,4-Jun-12,4-Jun-12, 3453452344,"1,000.00", 3453452344,;match 677865,1-Jun-12,5-Jun-12, 3453452344,"-1,000.00", 34534523444,;match2 657865,4-Jun-12,4-Jun-12, 2345453456,"1,000.00", 2345453456,;match 345325,24-Jun-12,3-Jun-12, 2345453456,"-1,000.00", 2345453456,;match2 765756,4-Jun-12,4-Jun-12, 2344564345,"1,000.00", 2344564345,;match 654564,30-Jun-12,5-Jun-12, 2344564345,"-1,000.00", 2344564345,;match2 654565,30-Jun-12,5-Jun-12, 2344564345,"-1,000.00", 2344564345,; Double Now not all entry’s are in order this is only an example. Any help would be appreciated I have been struggling with this for 2 days now, I can match some of the entry’s but have a problem with the doubles and my file 3 gets huge. The file format is all in csv. I was thinking of reading file 1 line by line then matching it to file 2 line by line and writing the output to file 3, then removing that match line from file 1 and 2, if more than 1 entry exist it might match later on with another entry. The second problem is not all the numbers do not match fully in some entry's, but if I can match those that do so long I can work on the non matching ones later. Vitki The code i have.... (its a mess i know) #!/bin/sh #Clean files rm -f output.csv rm -f nomatch.csv #Edit files #sed -i 's/$/;/' cre.csv #sed -i 's/$/;/' deb.csv DELIMITER="," COUNT=1 function log_out() { echo $1 >> output.csv } cre=`wc -l cre.csv | grep -o '[0-9]*'` deb=`wc -l deb.csv | grep -o '[0-9]*'` echo "total cre records are $cre" echo "total deb records are $deb" for i in $(cut -f 7 -d "${DELIMITER}" cre.csv | grep -ow '[0-9][0-9][0-9][0-9][0-9][0-9]*'); do mat=`grep "$i" deb.csv` code=$? if [ $code -ne 0 ] ; then echo $mat >> deb.done.csv sed -i "/$mat/d" deb.csv ; else mat2=`grep "$i" cre.csv` log_out "$mat,match" log_out "$mat2,match2" echo -n "$COUNT," ((COUNT++)) fi done echo "" echo "Data Match" echo "done" COUNT=1 for i in $(cut -f 1 -d "${DELIMITER}" cre.csv); do mat=`grep "$i" output.csv` code=$? if [ $code -ne 0 ] ; then echo "$i,cre no match deb file" >> nomatch.csv; else sed -i '/'"$i"'/s/^/#/' cre.csv echo -n "$COUNT," ((COUNT++)) fi done echo "" echo "NO Data Match" echo "done" nomatch=`wc -l nomatch.csv | grep -o '[0-9]*'` echo "No match found for $nomatch records" sed -i "s/; /\\`echo -e '\n\r'`/g" output.csv |
Hello vitki, welcome to LQ,
as far as I understand your code you want to find out which of the numbers in the last column of the first file match to the numbers in the last column of the second file. Is this true? Markus |
Yip that is 100% right
|
Then this onliner
Code:
for i in `cut -f 7 -d, cre.csv | grep -ow '[0-9]*'`; do grep $i deb.csv ; done I have not yet understood what exactly the output should be. The onliner prints every line of deb.csv where the number in the last column matches. Markus |
The output should be from file 1 and file 2. So entry in file 1 (test123) must match entry in file 2 (test123). this must then be stored in output as
test123 file1 test123 file2 In other words I want both the query and the match in the output. |
Well, I'll try to explain how I would do this, I'm not that fast with coding ;)
Make a loop over all lines of cre.csv. cut the number from the line and put it in a variable. Then find the number against the deb.csv file and if a line matches print both lines. Markus |
No problem me 2. Yes that is what the current code is doing in a lame way. I have been experimenting with while loop (while read line cre.csv do …) but I get stuck with the second while loop for deb.csv. Meaning I have a while loop with in a while loop (not the best way I think) plus when I do the loop the output file grows at an enormous rate meaning something is definitely wrong.
What I have done is with the first while loop is to read the first line in the cre.csv file then in the second while loop try and match it with a line in deb.csv print both query and match to the output file and remove them from cre.csv and deb.csv so that it can’t the matched in again. The other problem is if more than one match exists in deb.csv to output that as well plus if cre.csv has a double. How to handle the doubles is a big problem. |
Now I have this
Code:
#!/bin/sh Markus |
Very nice…. I ran it now and for the most part it is working. Now and again it comes up with “line 8: [: too many arguments” but I think that is because of the deb.csv file that has some text in the last field… I will play around with your code a bit. Thanks
|
I'm a little confused. Could you give us an example by example for every kind of match? What type of match gets "match", what type of match gets "match2", "Double" and so on. Would there be "match3" as well? Please elaborate.
|
In short the OP means: match the numbers in the last column of the lines in the first file with the numbers of the last column of the second file and if it matches print both lines and append a string which marks from which file the line is.
Markus |
That is correct, if no match occurs then leave the entry in the file and carry on to the next. This means at the end of the filtering / matching possess i will have one file with the quarry and match's and the cre and deb file with the entry's that does not match. We could even move those entry's that does not match out to another file it does not matter. The append string is also not of grate importance it is a nice to have.
here's some data cre.csv 636973;5-Jun-12;5-Jun-12;FKEYUEWOD GFD4111800221;1000;FGHJ4111800221 645966;4-Jun-12;4-Jun-12;SDFDF RER41111800329;1000;4111800329 612343;1-Jun-12;1-Jun-12;SDSWRJDFA RER41118005043;1000;41118005043 629334;4-Jun-12;4-Jun-12;DAS RER800504;1000;ERTR4111800504 659633;4-Jun-12;4-Jun-12;DAS RER800504;1000;UYTY4111800504 645343;14-Jun-12;16-Jun-12;HJGFDAFA RER41118005;1000;41118005 deb.csv 670644;8-Jun-12;9-Jun-12;FKEYUEWOD GFD4111800221;-1000;QWE800221 629236;5-Jun-12;10-Jun-12;DFGGDF RER4111700329;-1000;EWE4111800329 622323;2-Jun-12;3-Jun-12;TDFHGGH RER41118005043;-1000;FDE41118005043 624426;2-Jun-12;3-Jun-12;TDFHGGH RER41118005043;-1000;8005043 613459;5-Jun-12;15-Jun-12;FRE RER800504;-1000;FDSA4111800504 643758;5-Jun-12;25-Jun-12;FRE RER800504;-1000;FDSA4111800504 |
In case you want just the output, you can check the options to the join command:
Code:
$ join -j 6 -t ";" <(sort -t ";" -k 6 cre.csv) <(sort -t ";" -k 6 deb.csv) |
This does seem like a nice quick way to match, although it does not know what to do with the double entry's and reuse entry's that has been used already, but i think i must read a bit more on the man pages. Maybe this can help more with the non-matching... Awesome, Thanks
|
All times are GMT -5. The time now is 07:37 PM. |