Welcome to the most active Linux Forum on the web.
Go Back > Forums > Non-*NIX Forums > Programming
User Name
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.


  Search this Thread
Old 10-01-2007, 01:05 PM   #1
Registered: Jan 2007
Location: INDIA
Distribution: Ubuntu, Debian
Posts: 334

Rep: Reputation: 30
Data Processing help required

Anyone can help for solving this problem?

There are two ascii files (scanned through 2 OMR machines - in .dat form)
of the same OMR sheet.

For taking the mismatched characters between the two, I applied the command
$ cmp -l|awk '{printf("%c %c %c",$1,$2,$3);}'

The $2 and $3 are octal representations of the differing characters

I want to print the actual characters. But it print characters wrongly.

I think the problem is that instead of taking the octal value, it takes the octal value in decimal and then convert it into character form.

Is there any solution?

eg:- file1 contents AAAAA
file2 contents ABAAA

applying the above command prints the octal values of differring characters not the characters A and B, but I want to print A and B
Old 10-01-2007, 01:28 PM   #2
Senior Member
Registered: Oct 2003
Location: UK
Distribution: Kubuntu 12.10 (using awesome wm though)
Posts: 3,530

Rep: Reputation: 63
The first number on each output line is a decimal offset in the file where the different occur (starting at 1 if the first byte in the files differ).

You are correct to say the second and third fields are octal representations of the characters which differ. You can use the perl oct function to convert an octal number (in a string) to a number, which you can then use in printf with the %c format. i.e.
cmp -l file1 file2 |perl -ane 'printf("%d %c %c\n", $F[0], oct($F[1]), oct($F[2]));'
Old 10-02-2007, 02:49 AM   #3
Registered: Jan 2007
Location: INDIA
Distribution: Ubuntu, Debian
Posts: 334

Original Poster
Rep: Reputation: 30
Ok very very thanks mathew for understanding me correctly and for the correct solution.

But I don't know perl. Is there any other solution using the the filters or od like sed,awk,tr?
The above files mentioned is an answer sheet. After correcting the mismatches, i have to score the sheet. In my office database programs are available for the mismatch, valuation and other related activities.
My intention is to do the same job using linux filters as easily as possible.
The database programs use the techniques exporting the dat files into database and split the fields according to character length. And for finding the mismatch it uses the technique of looping the entire character of each file which i consider not a good thing.

If there is not any other solution, I shall learn the perl

thank you
Old 10-02-2007, 04:53 AM   #4
Senior Member
Registered: Oct 2003
Location: UK
Distribution: Kubuntu 12.10 (using awesome wm though)
Posts: 3,530

Rep: Reputation: 63
Doing the same in awk would mean writing your own function to convert octal to decimal, which is a lot more work. Here's one way to do it (the conversion function was taken from here).

The solution is too long to conveniently put on a single line. You could put the awk code in it's own file, or you can wrap the whole lot up in a shell script. My example uses the latter option:
# This file should be called "my_compare".  Invoke it like this:
# my_compare file1 file2

cmp -l "$1" "$2" | awk '
        function o2d(str) {
                n = length(str)
                ret = 0
                for (i = 1; i <= n; i++) {
                        c = substr(str, i, 1)
                        if ((k = index("01234567", c)) > 0)
                                k-- # adjust for 1-basing in awk

                        ret = ret * 8 + k
                return ret

                printf("%d %c %c\n", $1, o2d($2), o2d($3))
So you can see it's a pain. The complexity it just because awk has no built in octal to decimal conversion function (as far as I know). Perl does, which makes the Perl program a lot shorter.

Here's an explanation of the Perl program:

The command line option -ane is an abbreviation of -a -n -e (see the perlrun manual page for details). Essentially this means:
  • Auto-split each line of input into the array @F (-a).
  • Run the program on each line of input (-n).
  • The program is supplied in the next command line argument (-e).

The program is quite simple:
printf("%d %c %c\n", $F[0], oct($F[1]), oct($F[2]));
The only thing which should be confusing if you are not familiar with Perl is this @F $F[x] thing. When talking about a whole array in Perl, you use @arrayname. When referring to an element in an array, you use $arrayname[index].

Recall the -a command line option splits input lines into the array @F, so we can refer to the individual elements as $F[0], $F[1] etc.

In our program we use printf to print an integer (%d), and two characters (%c). The integer value is in $F[0], as is passed as-is. The two character values are converted from octal to decimal first, using the oct function.

Old 10-02-2007, 05:57 AM   #5
Senior Member
Registered: Aug 2006
Posts: 2,697
Blog Entries: 5

Rep: Reputation: 244Reputation: 244Reputation: 244
Originally Posted by sunils1973 View Post
but I want to print A and B
GNU awk
awk 'BEGIN{FS=""}
    FNR==NR{ for(i=1;i<=NF;i++) f[c++]=$i ;next}
              if ($i != f[i-1] ){
               print "At byte " i: "  file2="$i ", file1=" f[i]
' "file1" "file2"
# ./
At byte 2:  file2=B, file1=A


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off

Similar Threads
Thread Thread Starter Forum Replies Last Post
Processing data from a 'foreign' database with mysql, or tools to pre-process data. linker3000 Linux - Software 1 08-14-2007 09:36 PM
processing data within files PirateJack Linux - Newbie 3 03-28-2006 11:32 AM
Data Processing joelhop Linux - General 8 01-01-2006 09:08 PM
Data Processing Server peter72 Linux - Software 1 06-14-2005 12:17 PM
need help on processing large data files eph Programming 3 03-11-2004 05:56 AM > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 08:40 AM.

Main Menu
Write for LQ is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration