LinuxQuestions.org - Bash scripting (matching $var)

- Programming (https://www.linuxquestions.org/questions/programming-9/)

- - Bash scripting (matching $var) (https://www.linuxquestions.org/questions/programming-9/bash-scripting-matching-%24var-4175438305/)

Bash scripting (matching $var)

Hi

New to this forum so please excuse me….

I need help with a bash script. I need to match 2 files entry’s and write them in a 3 file.

File1 looks like this

432345,1-Jun-12,1-Jun-12, 552343234,"1,000.00", 552343234,;
543576,4-Jun-12,4-Jun-12, 3453452344,"1,000.00", 3453452344,;
657865,4-Jun-12,4-Jun-12, 2345453456,"1,000.00", 2345453456,;
765756,4-Jun-12,4-Jun-12, 2344564345,"1,000.00", 2344564345,;

File2 like this
123545,12-Jun-12,10-Jun-12, 552343234,"-1,000.00", 55234323,;
677865,1-Jun-12,5-Jun-12, 3453452344,"-1,000.00", 34534523444,;
345325,24-Jun-12,3-Jun-12, 2345453456,"-1,000.00", 234545345666,;
654564,30-Jun-12,5-Jun-12, 2344564345,"-1,000.00", 2344564345,;
654565,30-Jun-12,5-Jun-12, 2344564345,"-1,000.00", 2344564345,;

Now I need to mach entry’s from file 1 to entry’s in file 2, then write it to file 3 as output so that it looks like this

654564,30-Jun-12,5-Jun-12, 2344564345,"-1,000.00", 2344564345,; match
123545,12-Jun-12,10-Jun-12, 552343234,"-1,000.00", 55234323,; match2
543576,4-Jun-12,4-Jun-12, 3453452344,"1,000.00", 3453452344,;match
677865,1-Jun-12,5-Jun-12, 3453452344,"-1,000.00", 34534523444,;match2
657865,4-Jun-12,4-Jun-12, 2345453456,"1,000.00", 2345453456,;match
345325,24-Jun-12,3-Jun-12, 2345453456,"-1,000.00", 2345453456,;match2
765756,4-Jun-12,4-Jun-12, 2344564345,"1,000.00", 2344564345,;match
654564,30-Jun-12,5-Jun-12, 2344564345,"-1,000.00", 2344564345,;match2
654565,30-Jun-12,5-Jun-12, 2344564345,"-1,000.00", 2344564345,; Double

Now not all entry’s are in order this is only an example.

Any help would be appreciated I have been struggling with this for 2 days now, I can match some of the entry’s but have a problem with the doubles and my file 3 gets huge. The file format is all in csv.

I was thinking of reading file 1 line by line then matching it to file 2 line by line and writing the output to file 3, then removing that match line from file 1 and 2, if more than 1 entry exist it might match later on with another entry.

The second problem is not all the numbers do not match fully in some entry's, but if I can match those that do so long I can work on the non matching ones later.

Vitki

The code i have.... (its a mess i know)
#!/bin/sh

#Clean files
rm -f output.csv
rm -f nomatch.csv

#Edit files
#sed -i 's/$/;/' cre.csv
#sed -i 's/$/;/' deb.csv

DELIMITER=","
COUNT=1

function log_out() {

echo $1 >> output.csv
}

cre=`wc -l cre.csv | grep -o '[0-9]*'`
deb=`wc -l deb.csv | grep -o '[0-9]*'`

echo "total cre records are $cre"
echo "total deb records are $deb"

for i in $(cut -f 7 -d "${DELIMITER}" cre.csv | grep -ow '[0-9][0-9][0-9][0-9][0-9][0-9]*');
do

mat=`grep "$i" deb.csv`
code=$?
if [ $code -ne 0 ] ; then echo $mat >> deb.done.csv
sed -i "/$mat/d" deb.csv ; else
mat2=`grep "$i" cre.csv`
log_out "$mat,match"
log_out "$mat2,match2"
echo -n "$COUNT,"
((COUNT++))
fi

done

echo ""
echo "Data Match"
echo "done"

COUNT=1

for i in $(cut -f 1 -d "${DELIMITER}" cre.csv);
do

mat=`grep "$i" output.csv`
code=$?
if [ $code -ne 0 ] ; then echo "$i,cre no match deb file" >> nomatch.csv; else
sed -i '/'"$i"'/s/^/#/' cre.csv
echo -n "$COUNT,"
((COUNT++))
fi

done

echo ""
echo "NO Data Match"
echo "done"

nomatch=`wc -l nomatch.csv | grep -o '[0-9]*'`
echo "No match found for $nomatch records"
sed -i "s/; /\\`echo -e '\n\r'`/g" output.csv

Hello vitki, welcome to LQ,

as far as I understand your code you want to find out which of the numbers in the last column of the first file match to the numbers in the last column of the second file. Is this true?

Markus

Yip that is 100% right

Then this onliner

Code:

for i in `cut -f 7 -d, cre.csv | grep -ow '[0-9]*'`; do grep $i deb.csv ; done

should already do most of the work? You can execute it on the commandline.

I have not yet understood what exactly the output should be. The onliner prints every line of deb.csv where the number in the last column matches.

Markus

The output should be from file 1 and file 2. So entry in file 1 (test123) must match entry in file 2 (test123). this must then be stored in output as

test123 file1
test123 file2

In other words I want both the query and the match in the output.

Well, I'll try to explain how I would do this, I'm not that fast with coding ;)

Make a loop over all lines of cre.csv. cut the number from the line and put it in a variable.
Then find the number against the deb.csv file and if a line matches print both lines.

Markus

No problem me 2.  Yes that is what the current code is doing in a lame way. I have been experimenting with while loop (while read line cre.csv do …) but I get stuck with the second while loop for deb.csv. Meaning I have a while loop with in a while loop (not the best way I think) plus when I do the loop the output file grows at an enormous rate meaning something is definitely wrong.

What I have done is with the first while loop is to read the first line in the cre.csv file then in the second while loop try and match it with a line in deb.csv print both query and match to the output file and remove them from cre.csv and deb.csv so that it can’t the matched in again. The other problem is if more than one match exists in deb.csv to output that as well plus if cre.csv has a double. How to handle the doubles is a big problem.

Now I have this

Code:

#!/bin/sh



IFS=$'\012'

for line in `cat cre.csv`

do

        number=$(echo $line | cut -d, -f 7)

        match=`cat deb.csv | grep $number`

        if [ -n $match ]; then

                echo $line "from cre.csv"

                echo $match "from deb.csv"

        fi

done

Maybe you'll compare it with your code and find some ideas.

Markus

Very nice…. I ran it now and for the most part it is working. Now and again it comes up with “line 8: [: too many arguments” but I think that is because of the deb.csv file that has some text in the last field… I will play around with your code a bit. Thanks

I'm a little confused. Could you give us an example by example for every kind of match? What type of match gets "match", what type of match gets "match2", "Double" and so on. Would there be "match3" as well? Please elaborate.

In short the OP means: match the numbers in the last column of the lines in the first file with the numbers of the last column of the second file and if it matches print both lines and append a string which marks from which file the line is.

Markus

That is correct, if no match occurs then leave the entry in the file and carry on to the next. This means at the end of the filtering / matching possess i will have one file with the quarry and match's and the cre and deb file with the entry's that does not match. We could even move those entry's that does not match out to another file it does not matter. The append string is also not of grate importance it is a nice to have.

here's some data
cre.csv
636973;5-Jun-12;5-Jun-12;FKEYUEWOD GFD4111800221;1000;FGHJ4111800221
645966;4-Jun-12;4-Jun-12;SDFDF RER41111800329;1000;4111800329
612343;1-Jun-12;1-Jun-12;SDSWRJDFA RER41118005043;1000;41118005043
629334;4-Jun-12;4-Jun-12;DAS RER800504;1000;ERTR4111800504
659633;4-Jun-12;4-Jun-12;DAS RER800504;1000;UYTY4111800504
645343;14-Jun-12;16-Jun-12;HJGFDAFA RER41118005;1000;41118005
deb.csv
670644;8-Jun-12;9-Jun-12;FKEYUEWOD GFD4111800221;-1000;QWE800221
629236;5-Jun-12;10-Jun-12;DFGGDF RER4111700329;-1000;EWE4111800329
622323;2-Jun-12;3-Jun-12;TDFHGGH RER41118005043;-1000;FDE41118005043
624426;2-Jun-12;3-Jun-12;TDFHGGH RER41118005043;-1000;8005043
613459;5-Jun-12;15-Jun-12;FRE RER800504;-1000;FDSA4111800504
643758;5-Jun-12;25-Jun-12;FRE RER800504;-1000;FDSA4111800504

In case you want just the output, you can check the options to the join command:

Code:

$ join -j 6 -t ";" <(sort -t ";" -k 6 cre.csv) <(sort -t ";" -k 6 deb.csv)

You can limit the output to certain fields and matching or non-matching lines.

This does seem like a nice quick way to match, although it does not know what to do with the double entry's and reuse entry's that has been used already, but i think i must read a bit more on the man pages. Maybe this can help more with the non-matching... Awesome, Thanks