Using diff to compare file with common lines, but at different line numbers

jimieee · 05-07-2004, 09:49 AM

Hi,

I wouldn't normally ask this type of question, but I've had a good look on google and can't find the answer, or any decent examples to get me started!

I'm trying to analyse some files containing a few thousand lines of sql commands each. Both of them are used to create a database with the same structure, except one has data in and the other only has the schema. I want to be able to compare them, so that I can output the extra lines of code used to insert the data into the database. From this I hope to replace it with my own data.

I'm aware that I can do this with the diff tool, but can't figure out which options are needed, really I just want an example, can anyone help? This is what I've tried so far:

diff -wbi kernel/setup/packages/blog/sql/mysql/blog.sql kernel/sql/mysql/cleandata.sql > out
and
diff -wbi kernel/sql/mysql/cleandata.sql kernel/setup/packages/blog/sql/mysql/blog.sql > out

neither give me the right results. I think this is something to do with the common lines not necessarily appearing on the same line numbers in each file.

Thanks in advance for any help!

Regards,

James

MasterC · 05-09-2004, 04:03 AM

Having been a day or 2, I'll suggest using vimdiff instead of diff alone. It's a lot more elegant, and I think will give you the results you are looking for, or at least, closer to it.

Another option:
If the data is simply in a different file, and you want to "truncate" it to the bottom of the other file, you can cat the first file into the second:
cat /home/whatever/file1 >> /home/whatever/file2

Cool

jimieee · 05-10-2004, 04:24 AM

Hmmm, had a quick go and it seems a lot nicer to user than diff. Need to read some docs to figure it out, but it appears to "know" what I want

Thanks!

~James~

jimieee · 05-10-2004, 07:26 AM

In the end these tools didn't do exactly what I wanted them to do. I started to write a script, which seems to work, but doesn't like the long lines of text (sed doesn't that is) that I ended up dealing with. All this became too timeconsuming and I was starting to lose track of my objectives here, so I'm changing tactics. Here's my script for anyone interested:

Code:

#!/bin/bash
# A quick script to compare lines in two files a just retrieve the ones that
# are different (Delete CLEAN line matches when they appear in DIRTY, but
# output to OUTPUT).
                                                                                                                                                            
CLEAN="cleandata.sql"
DIRTY="blog.sql"
OUTPUT="output.sql"
                                                                                                                                                            
# Get number of lines in CLEAN
CLEAN_LINE_NUM=`wc -l $CLEAN`
# Argh! Get rid of the text that tells you the file you queried!
CLEAN_LINE_NUM=`echo $CLEAN_LINE_NUM | sed "s/$CLEAN//"`
                                                                                                                                                            
# Copy DIRTY TO OUTPUT, because we don't want to overwrite our souces
cat $DIRTY > $OUTPUT
                                                                                                                                                            
# For each line remove check to see if it exists in DIRTY,
# if it does delete it!
                                                                                                                                                            
i=0
temp_input=`cat $DIRTY`
while [ $i -le $CLEAN_LINE_NUM ]
do
    # Find out current line for CLEAN
    line=`sed -n -e "$i"p < $CLEAN`
    # Delete this line from OUPUT if it exists in DIRTY
    sed -e "/$line/d" $temp_input > $OUTPUT
    temp_input=`cat $OUPUT`
    # Increment loop count variable
    echo "`expr $i + 1` lines completed"
    i=`expr $i + 1`
done