LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (http://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   Using diff to compare file with common lines, but at different line numbers (http://www.linuxquestions.org/questions/linux-newbie-8/using-diff-to-compare-file-with-common-lines-but-at-different-line-numbers-178947/)

jimieee 05-07-2004 10:49 AM

Using diff to compare file with common lines, but at different line numbers
 
Hi,

I wouldn't normally ask this type of question, but I've had a good look on google and can't find the answer, or any decent examples to get me started!

I'm trying to analyse some files containing a few thousand lines of sql commands each. Both of them are used to create a database with the same structure, except one has data in and the other only has the schema. I want to be able to compare them, so that I can output the extra lines of code used to insert the data into the database. From this I hope to replace it with my own data.

I'm aware that I can do this with the diff tool, but can't figure out which options are needed, really I just want an example, can anyone help? This is what I've tried so far:

diff -wbi kernel/setup/packages/blog/sql/mysql/blog.sql kernel/sql/mysql/cleandata.sql > out
and
diff -wbi kernel/sql/mysql/cleandata.sql kernel/setup/packages/blog/sql/mysql/blog.sql > out

neither give me the right results. I think this is something to do with the common lines not necessarily appearing on the same line numbers in each file.

Thanks in advance for any help!

Regards,

James

MasterC 05-09-2004 05:03 AM

Having been a day or 2, I'll suggest using vimdiff instead of diff alone. It's a lot more elegant, and I think will give you the results you are looking for, or at least, closer to it.

Another option:
If the data is simply in a different file, and you want to "truncate" it to the bottom of the other file, you can cat the first file into the second:
cat /home/whatever/file1 >> /home/whatever/file2

Cool

jimieee 05-10-2004 05:24 AM

Hmmm, had a quick go and it seems a lot nicer to user than diff. Need to read some docs to figure it out, but it appears to "know" what I want

Thanks!

~James~

jimieee 05-10-2004 08:26 AM

In the end these tools didn't do exactly what I wanted them to do. I started to write a script, which seems to work, but doesn't like the long lines of text (sed doesn't that is) that I ended up dealing with. All this became too timeconsuming and I was starting to lose track of my objectives here, so I'm changing tactics. Here's my script for anyone interested:

Code:

#!/bin/bash
# A quick script to compare lines in two files a just retrieve the ones that
# are different (Delete CLEAN line matches when they appear in DIRTY, but
# output to OUTPUT).
                                                                                                                                                           
CLEAN="cleandata.sql"
DIRTY="blog.sql"
OUTPUT="output.sql"
                                                                                                                                                           
# Get number of lines in CLEAN
CLEAN_LINE_NUM=`wc -l $CLEAN`
# Argh! Get rid of the text that tells you the file you queried!
CLEAN_LINE_NUM=`echo $CLEAN_LINE_NUM | sed "s/$CLEAN//"`
                                                                                                                                                           
# Copy DIRTY TO OUTPUT, because we don't want to overwrite our souces
cat $DIRTY > $OUTPUT
                                                                                                                                                           
# For each line remove check to see if it exists in DIRTY,
# if it does delete it!
                                                                                                                                                           
i=0
temp_input=`cat $DIRTY`
while [ $i -le $CLEAN_LINE_NUM ]
do
    # Find out current line for CLEAN
    line=`sed -n -e "$i"p < $CLEAN`
    # Delete this line from OUPUT if it exists in DIRTY
    sed -e "/$line/d" $temp_input > $OUTPUT
    temp_input=`cat $OUPUT`
    # Increment loop count variable
    echo "`expr $i + 1` lines completed"
    i=`expr $i + 1`
done



All times are GMT -5. The time now is 10:24 AM.