LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   Comparing two files (https://www.linuxquestions.org/questions/programming-9/comparing-two-files-929083/)

p3rcy 02-13-2012 05:24 AM

Comparing two files
 
Hi,
I am new to Linux and C++.
I want to compare two files and print the values in first file which are not present in second file to the third file. E.g

File1 File2 File3
1 1 2
2 3 4
3 5
4 7
5 9
11

As we can see, values from File1 i.e 2 and 4 are not in file2 and have been printed to file3. I know we have to use arrays. Please help me.

millgates 02-13-2012 05:27 AM

Hi,
1) Is that a homework?
2) If not, is there a reason why you want to do this in C++?
3) Can you show us what you have so far?
4) Are those files sorted?

p3rcy 02-13-2012 05:46 AM

No, this is not a homework. This is an officework. I can do the sorting myself. But, I am out of touch with programming and basically want to read from file and put into an array. But if someone can help me with the complete work, I'd be very grateful. It can be in java also not necessarily in C++.

But I'm completely out of touch with java. I can only remember how to compile now! :(

millgates 02-13-2012 06:21 AM

Well if you're on unix, something like this could do the job (if each file contains each number only once):

Code:

cat "file1" "file2" "file2"|sort -n | uniq -c | awk '{if($1 == 1) print $2}' > file3
If you insist on using C++, loading a list of ints into memory could look like this.

Code:

#include <iostream>
#include <fstream>
#include <list>

int main(){
        std::ifstream file1("file1", std::ios::in);
        if (!file1) { return 1; }
        std::list < int > list1;
        int temp;
        while ((file1 >> temp)) {
                      list1.push_back(temp);
                std::cout << temp << std::endl;
        }

        // do someting with list
        file1.close();
        return 0;
}

You may want to take care of some details such as handling files that contain other characters than digits as that would cause an infinite loop here, but this could be a start.

Weapon S 02-13-2012 06:22 AM

Ehm... read the manual
 
[edit]
:-[ I've been beaten and overclassed by the previous post.

Diff is the tool you want.
<code>man diff</code>
I thought you could make it show only the lines that were missing from one file, but maybe you need to 'pipe' it through grep. (grep Can get pretty complicated, but it's useful to know you don't need to construct a 'regular expression'. Just giving the string you are looking for as an argument to grep qualifies... in most cases.)
Quote:

I know we have to use arrays.
That does sound an awful lot like homework...

millgates 02-13-2012 06:32 AM

Quote:

Originally Posted by Weapon S (Post 4601162)
[edit]
Diff is the tool you want.
Code:

man diff
I thought you could make it show only the lines that were missing from one file, but maybe you need to 'pipe' it through grep.

for the diff version, perhaps something like this might work:

Code:

diff file1 file2| awk '{if ($1 == "<") print $2 }'

millgates 02-13-2012 06:36 AM

Quote:

Originally Posted by p3rcy (Post 4601128)
As we can see, values from File1 i.e 2 and 4 are not in file2 and have been printed to file3.

Btw, shouldn't 11 also be in file3?

p3rcy 02-13-2012 09:03 AM

@millgates :
No 11 was in file2. Improper formatting I guess in the code :(
I tried diff and the code you put, but it seems to put show all the unique entries from both the files.

@Weapon S : Thank you for dedicating your time as well.

Thank you to both of you. But I created my own code somehow in haste and I haven't checked the redundancy of LOCs. But it worked for now. :D

Code:


#include <stdio.h>
#include <stdlib.h>       
#include <iostream>
#include <string>
#define NOT_FOUND -1
#define MAX 1100

using namespace std;

int search( const int arr[], int target, int n );
void showAry( int arr[], int n );


int main()
{
    int x[MAX],y[MAX];
    int index,sizex=0,sizey=0;
    int c, target, m,i,j,n;
   
    FILE* fin,fin2;
    fin=fopen("data.txt", "r");
    if(fin==NULL)
    {
        printf("Error opening file ... Press 'Enter' to exit ... ");
        getchar();
        return -1;
    }
    /* m holds the number elements when loop has reached EOF */
    for (m = 0; fscanf(fin, "%d" , &x[m]) != EOF ; ++m) {;}
    fclose (fin);

   
    fin=fopen("data2.txt", "r");
    if(fin==NULL)
    {
        printf("Error opening file ... Press 'Enter' to exit ... ");
        getchar();
        return -1;
    }
   
    for (m = 0; fscanf(fin, "%d" , &y[m]) != EOF ; ++m) {;}
    fclose (fin);

    //for (m=0;m<5;m++)
      //cout<<x[m]<<endl;
   
    //sizex=0;
    while(x[sizex]) sizex++;   

    //cout<<"\n"<<(size-1)<<endl;   
 
    //for (m=0;m<5;m++)
      //cout<<y[m]<<endl;
    while(y[sizey]) sizey++;
 
    cout<<endl;

    for(i=0;i<sizex;i++)
      { n=0;
      for(j=0;j<sizey;j++)
        {
           
          if(x[i]==y[j])
              break;
          else
            {
            n++;
            if(n==sizey) cout<<x[i]<<endl;
            }
        }
      }   

    //getchar();
    cout<<endl;
    return(0);
}

void showAry( int arr[], int n )
{
    int i;
    for( i=0; i<n; ++i )
        printf( "%d ", arr[i] );
}

int search( const int arr[], int target, int n )
{
    int i;
    for( i=0; i<n; ++i )
        if( target==arr[i] ) return i;
       
    return -1;
}

Usage : The program when executed compares the content of data.txt with data2.txt and outputs the content of data.txt not available in data2.txt to the screen. :) :)

millgates 02-13-2012 09:21 AM

Quote:

Originally Posted by p3rcy (Post 4601239)
@millgates :
I tried diff and the code you put, but it seems to put show all the unique entries from both the files.

Are you sure you copied those examples correctly?

Quote:

Originally Posted by p3rcy (Post 4601239)
But I created my own code somehow in haste and I haven't checked the redundancy of LOCs. But it worked for now

This looks more like C than C++, actually. Or something in between.

Code:

printf("Error opening file ... Press 'Enter' to exit ... ");
cout<<endl;

You really shouldn't mix these two together.
You also don't have to store both files in memory. One is enough :)

p3rcy 02-13-2012 10:38 AM

Yes, I copied the examples correctly. Actually the files I wanted to compare had more than 1000 entries and thats why the whole chaos.

Quote:

Originally Posted by millgates
You really shouldn't mix these two together.
You also don't have to store both files in memory. One is enough

Yes, I know. But like I said, I'm really out of touch and developed the whole program in haste overlooking even the basic optimisations.

One's programming skills can really come handy in life! :D

I'll research more with 'diff'. Thanx :)

Reuti 02-14-2012 12:27 PM

Quote:

Originally Posted by p3rcy (Post 4601288)
I'll research more with 'diff'.

Besides diff there is join (text utilities) which can also print unpairable lines and you get the unique ones as desired:
Code:

$ join -v1 file1 file2
or if unsorted:
Code:

$ join -v1 <(sort file1) <(sort file2)


All times are GMT -5. The time now is 08:01 PM.