LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   script to read and compare two consecutive lines in a file (https://www.linuxquestions.org/questions/linux-newbie-8/script-to-read-and-compare-two-consecutive-lines-in-a-file-857112/)

smritisingh03 01-18-2011 04:23 PM

script to read and compare two consecutive lines in a file
 
Hi All

I have a file like below

ADP_Comment- 4758
ADP_Comment-is missing
cbdkbckd- 46983
cbdkbckd- 46983
ljhjg- 547
ljhjg- 980
.....
.....
.....

Now I ve to read the consecutive lines and then compare them to see if the count MATCHES
DIFFERNT OR TABLE MISSING

please help as it was a long and a real complicated script and this is the last step to my final OP and goshhh I am stuck!!!!Thankyou.

matthewg42 01-18-2011 07:25 PM

Is Perl OK?

How about this:
Code:

#!/usr/bin/perl

use strict;
use warnings;

my %tables;

while(<>) {
        chomp;
        my ($tname, $c) = split(/\s*-\s*/, $_, 2);
        if (!defined($tables{$tname})) {
                $tables{$tname}{'first'} = $c;
        }
        else {
                $tables{$tname}{'last'} = $c;
        }
}

printf "%-20s %-15s %-15s %s\n", 'Table', 'First', 'Second', 'Diagnosis';
printf "%-20s %-15s %-15s %s\n", "-"x20, "-"x15, "-"x15, "-"x20;
foreach my $t (keys %tables) {
        my $diagnosis = undef;
        if (!defined($tables{$t}{'first'})) {
                $tables{$t}{'first'} = '[none]';
                $diagnosis = 'first data missing';
        }
        if (!defined($tables{$t}{'last'})) {
                $tables{$t}{'last'} = '[none]';
                $diagnosis = 'last data missing';
        }
        if (!defined($diagnosis)) {
                if ($tables{$t}{'last'} =~ /missing/i) {
                        $diagnosis = 'last missing';
                }
                elsif ($tables{$t}{'last'} ne $tables{$t}{'first'}) {
                        $diagnosis = 'value mismatch';
                }
                else {
                        $diagnosis = 'value match';
                }
        }
        printf "%-20s %-15s %-15s %s\n",
                $t,
                $tables{$t}{'first'} || 'not found',
                $tables{$t}{'last'} || 'not found',
                $diagnosis;
}

It's rather verbose for the sake of clarity. I'm sure the diagnosis section can be simplified if you state your needs more precisely.

*edit* btw, to run it, use the name of the file with your data as first command line argument, or pipe data into stdin. Output will look something like this:
Code:

Table                First          Second          Diagnosis
-------------------- --------------- --------------- --------------------
ADP_Comment          4758            is missing      last missing
cbdkbckd            46983          46983          value match
ljhjg                547            980            value mismatch


grail 01-18-2011 08:44 PM

Or you could use a simple awk script:
Code:

awk '{current = $NF;getline; if($NF == current}print "match";else print "mismatch"}' file
Obviously you can make the output nicer, but you get the drift :)

smritisingh03 01-19-2011 09:56 AM

thank you matthewg42.ur perl script works wonders.just a little problem.i need to hardcodeb the input file.so how do i do it and i would be in a much better position if u could xplain ur script as dis is my 1st script in perl.Thankyou so much.

hey grail...the awk script isnt wkng...anyways I am trying to fix the errors and once done i ll paste it here for beginners like me :-)Thankyuou so much!!!

matthewg42 01-19-2011 01:56 PM

Perhaps I'm getting old, but I find your posts to be very difficult to understand. Please use proper English on these forums - you will get more, better and more accurate responses if your posts are intelligible.

I think you want to put the name of the input file in the program?

Replace this line:

Code:

while(<>) {
with:
Code:

open(IN, "<yourfilename") || die "cannot open yourfilename for reading: $!";
while(<IN>) {


smritisingh03 01-19-2011 03:55 PM

Hey Matthewg42

when I tried running this script with the above changes the following happens:

Even when there is a match in the count the output says value mismatch.otherwise allz working right!!!

thankyou very much

matthewg42 01-19-2011 04:11 PM

Do you have extraneous whitespace after the numbers?

If this is the case, you could add this:
Code:

open(IN, "<yourfilename") || die "cannot open yourfilename for reading: $!";
while(<IN>) {
    chomp;
    s/\s+$//;
...


matthewg42 01-19-2011 04:13 PM

If that doesn't work, please post the output contents of this command:
Code:

head -4 yourdatafile |od -tc
Paste it in [code] tags so that the formatting is readable.

smritisingh03 01-19-2011 10:30 PM

The Output of the above command is :

Quote:

0000000 # ! / u s r / b i n / p e r l \n
0000020 \n u s e s t r i c t ; \n u s e
0000040 w a r n i n g s ; \n
0000053
A synopsis of the code looks like:


Quote:

TABLE_CON_CSC_ROLE not found not found value match
CIM_ACCT_ADDRESS is 134416 value mismatch
TABLE_X79MIT not found not found value match
ZTABLE_BPM_STEP 4135 4112 value mismatch
TABLE_CASE_AFT_SCHUPD is not found value mismatch
TABLE_CAT_TXN_ROLE not found not found value match
TABLE_SUBSET not found not found value match
ZTABLE_SHOWPAGE 18 18 value match
TABLE_LEAD_FOLDER not found not found value match
TABLE_OPP_ANALYSIS not found not found value match
TABLE_PART_STATS 154 149 value mismatch
TABLE_PER_TER_ROLE not found not found value match
TABLE_ATT_CNT_STT not found not found value match
CASE_ACCN_ADD_TAB is 261 value mismatch
TABLE_DIAG_HINT not found not found value match
TABLE_PROC_INST 380 380 value match
ZTABLE_CTX_OBJ_DB 26645 26645 value match
ZTABLE_X_CRD_OPTION 42 not found value mismatch
TABLE_DS_CALC_FLD not found not found value match
TABLE_LST_PER_ROLE not found not found value match
TABLE_TEMPLATE not found not found value match
TABLE_X_MDU_CONTRACT not found not found value match
ZTABLE_FLD_DEF 457 457 value match
TABLE_WEB_LEASE not found not found value match
TABLE_X_OMS_ROUTING_INFO not found not found value match
ZTABLE_BPM_ALARM_EDIT not found not found value match
TABLE_CALL_SCRIPT 1 1 value match
TABLE_OBJECT_SECURE not found not found value match
TABLE_CYCLE_STAGE 9 9 value match
ZTABLE_QUEUE 235 204 value mismatch
ZTABLE_TRANSN_MAP 5 5 value match
TABLE_INV_COUNT not found not found value match
TABLE_TSK_STG_ROLE not found not found value match
TABLE_EXCH_LOG not found not found value match
The first column is the o/p after querying the tables in database.so wherever "is" is there it is that table is missing in database.but after running the script it is picking up "not found".i dont understand from where is "not found" coming.

Thank you so much for helping me out.I really appriciate !!!

matthewg42 01-19-2011 10:37 PM

I want the od -tc output for the data, not the script (I'm trying to see if there is trailing whitespace).

Solving a mystery like this is something of an addiction :D

*edit*: aha, we I see from the data you posted that it's different from the original data, which explains the difference in what the program is doing.

*edit2*: no, wait I'm confused. urr, well anyhow, if you post the od command output for the input file, then that would be helpful.

smritisingh03 01-19-2011 11:34 PM

wait....do u want m e to include this,i.e,head -4 yourdatafile |od -tc command in the script or just type it on the command prompt.

grail 01-20-2011 12:16 AM

Quote:

head -4 yourdatafile |od -tc
Yes this should be run on the command line as it will show the structure of your input file.
Quote:

the awk script isnt wkng
Not running? (produces an error)
Shows unexpected results?

If you let me know which and what the error, if any, I can probably fix it quickly?

My guess is we probably need to see some more accurate data than was initially provided.

smritisingh03 01-20-2011 01:12 AM

Hi...when I type the above command on command prompt I get the following o/p:

Quote:

0000000 A C C O U N T _ M I S S I N G _
0000020 F R M _ R C I S _ L I N K - i
0000040 s A C C O U N T _ M I S S I N
0000060 G _ F R M _ R C I S _ L I N K -
0000100 4 \n A D P _ C O M M E N T -
0000120 2 3 8 4 A D P _ C O M M E N
0000140 T - 2 3 1 1 \n A D P _ C O N
0000160 F I G - 1 1 A D P _ C O N F
0000200 I G - 1 1 \n A D P _ F I E L
0000220 D - 3 6 3 3 3 A D P _ F I E
0000240 L D - 3 6 3 2 3 \n
0000253

smritisingh03 01-20-2011 01:19 AM

when i run the awk one liner the error looks like the below

Quote:

syntax error The source line is 1.
The error context is
{current = $NF;getline; if($NF == >>> current} <<<
awk: The statement cannot be correctly parsed.
The source line is 1.
awk: The statement cannot be correctly parsed.
The source line is 1.
awk: There is an extra } character.
awk: There is a missing ) character.

smritisingh03 01-20-2011 02:02 AM

Hi guys ..since u all are reeally trying to help me sort out the errors...i think i should better explain you the requirements and what I have done.i think this would make all of us think on similar grounds.

so,there are 4 parts of this script.
1.i have to read a logfile which looks like:

Quote:

. exporting pre-schema procedural objects and actions
. exporting foreign function library names for user SA
. exporting PUBLIC type synonyms
. exporting private type synonyms
. exporting object type definitions for user SA
About to export SA's objects ...
. exporting database links
. exporting sequence numbers
. exporting cluster definitions
. about to export SA's tables via Conventional Path ...
. . exporting table ACCOUNT_MISSING_FRM_RCIS_LINK 4 rows exported
. . exporting table ADP_COMMENT 2311 rows exported
. . exporting table ADP_CONFIG 11 rows exported
. . exporting table ADP_FIELD 36323 rows exported
. . exporting table ADP_HEADER 1 rows exported
Now the first script reads this logfile and gives output file which has only the table name and the row count.The script looks like:

Quote:

logcountfunction()

{
awk ' {

# when executing the script pass the logfile as parameter on the command prompt with the name of the file

#export file_name= "&1"

if (index ( $0, ". . exporting table") >0 || index ($0, ". . exporting partition") >0)
#searching for the pattern in string

{


if ($4 != "partition"){
#$4 is either table or a partition

i=$6;
#$6 is the number of rows stored in variable i here

if (table_flag ==0 && table_name ==temp_table_name){
# checking the flag

printf table_name > "logcountOP";
printf "-" > "logcountOP";
print j > "logcountOP";
}

table_name = $5;
#$5 has the name of the table from which the rows have been imported

temp_table_name = table_name;

table_flag =1;
#setting the flag to 1

}

if ($4 == "partition") {

i=i+$6;
#summing up the rows in partitioned tables

printf("value of i in first: %d\n",i);
j=i;
table_flag=0;
#setting flag to 0

}
if (table_flag !=0 && $6 !=""){

printf table_name >> "logcountOP";
printf "-" >> "logcountOP";
print i >> "logcountOP";

}



}
} ' < $1
}
ignore the partition part please

NOW THE SECOND PART:
I TAKE THE LOG OUTPUT FILE AND CUT THE TABLENAME INTO A VARIABLE.I QUERY THE DATABASE FOR THE ROW COUNT OF EACH TABLE.THE SCRIPT FOR THIS LOOKS LIKE:

Quote:

DBcounttry_finalfunction()
{

#!/bin/ksh



cat logcountOP | while read LINE

TBLName=`echo $LINE|cut -d "-" -f1`

do

if [ $LINE != "" ]

then


printf "${TBLName}-" $TBLName


return_count=$(sqlplus -s ab/ab@avfd5 <<EOF

set heading off feedback off pagesize 0 linesize 30000 trimout on;
whenever sqlerror exit 1;
whenever oserror exit 1;

select count (*) from ${TBLName};
exit 0;
EOF)

if [ $return_count -ge 0 ]
then
print "${TBLName}-${return_count}" >> DBcountOP4
else
echo "$TBLName- is missing" >> DBcountOP4
fi

else
#exit
break
fi
done > DBcountOP3

}
NOW THE THIRD PART.

I AM READY WITH 2 OUTPUT FILES-LOG OUTPUT FILE AND DATABASE OUTPUT FILE.NOW I HAVE TO COMPARE THESE TWO OUTPUT FILES TO FIND IF:

1.THE COUNT MATCHES OR
2.THE COUNT DOES NOT MATCH OR
3.THE TABLE IN THE LOGFILE DOES NOT EXIST IN THE DATABASE

THE SCRIPT FOR ABOVE COMPARISON IS:

Quote:

#!/bin/ksh


compareLOGandDBtry1function()

{



export file1="DBcountOP4"
export file2="logcountOP"

#match=0


while read FILE1_LINE ; do
#reLOGandDBtry1reading the output file from database and the first string is stored in the variable LINE


file1_tablename="$(echo $FILE1_LINE | cut -d '-' -f1)"
#the 1st field is stored i.e,tablename


file1_count="$(echo $FILE1_LINE | cut -d '-' -f2)"
#the 2nd filed is stored i.e,the count of rows for the table


if [ $file1_count != "is" ]

then
#echo "File count is greater than 0"
while read FILE2_LINE ; do
#reading the OP file from log and stores the 1st string



file2_tablename="$(echo $FILE2_LINE | cut -d '-' -f1)"
#the 1st field is stored i.e,tablename


file2_count="$(echo $FILE2_LINE | cut -d '- ' -f2)"
#the 2nd fild stored-the count of rows for the table

if [ "$file1_tablename" = "$file2_tablename" ] && [ "$file1_count" -eq "$file2_count" ]
#start of 1st if block

#if the tablename and the rowcount from the 2 files match then print the following to OP file

then
echo "Table name $file1_tablename $file2_tablename and $file2_count matched"
echo "Both table and Count has matched $file1_tablename $file2_count" >> compareLOGandDBtry1OP
break

fi #end of 1st if block

if [ "$file1_tablename" = "$file2_tablename" ] && [ "$file1_count" -ne "$file2_count" ]
#start of 2nd if
#checking for match b/w tables but a mismatch b/w the rowcount from logfile and the DB output file

then
echo "$file2_tablename table Match count mismatch" "$file1_count" "$file2_count"
echo "table has matched but count does not match $file1_tablename $file1_count $file2_count" >>compareLOGandDBtry1OP
break
fi
# end of 2nd if block

done < $file2
else

echo "$file1_tablename does not exist" >> compareLOGandDBtry1OP
echo "$file1_tablename does not exist"

fi

done < $file1




#

}
AS U CAN SEE ALL THE ABOVE ARE FUNCTIONS.SO,THE MAIN SCRIPT IS AS BELOW WHICH CALLS THE ABOVE FUNCTIONS:

Quote:

#MAIN SCRIPT STARTS HERE
#!/bin/sh

echo "please exit and execute the script again alongwith the logfile name"

. ./logcount

logcountfunction "$1"

. ./DBcounttry_final

DBcounttry_finalfunction

. ./compareLOGandDBtry1

compareLOGandDBtry1function

THIS SCRIPT IS WORKING FINE.THE FIRST 2 FUNCTIONS EXECUTE PROPERLY BUT THE MOMENT 3RD FUNCTION,i.e,COMPARING FUNCTION,IT IS TAKING ALMOST AN HOUR TO GIVE THE OUTPUT!!!THAT IS WHERE THE PROBLEM IS.

SO TO EXPEDITE THE SCRIPT I USED COMM COMMAND FOR A COMPARE.
Quote:

#!/bin/sh

comm -3 logcountOP DBcountOP4 >c.txt

#shows diff


comm -1 logcountOP DBcountOP4 >d.txt

#shows common
~
and then i wrote a script that gives the following output:

Quote:

ACCOUNT_MISSING_FRM_RCIS_LINK- is #means missing
ACCOUNT_MISSING_FRM_RCIS_LINK-4
ADP_COMMENT- 2384
ADP_COMMENT-2311
ADP_CONFIG- 11
ADP_CONFIG-11
ADP_FIELD- 36333
ADP_FIELD-36323
i wrote another script that outputs the below:

ACCOUNT_MISSING_FRM_RCIS_LINK- is ACCOUNT_MISSING_FRM_RCIS_LINK-4
#is means the table is missing in database
ADP_COMMENT- 2384 ADP_COMMENT-2311
ADP_CONFIG- 11 ADP_CONFIG-11
ADP_FIELD- 36333 ADP_FIELD-36323
ADP_HEADER- 1 ADP_HEADER-1

here the problem is that some tables get repeated n number of times in the output file.
now the last script is:

Quote:

#!/bin/sh

export file="newcOP2.txt"

while read FILE_LINE ; do

tablename1="$(echo $FILE_LINE | cut -d '-' -f1)"

echo tablename1 is $tablename1

count1="$(echo $FILE_LINE | cut -d ' ' -f2)"

echo count1 is $count1

LINE2="$(echo $FILE_LINE | cut -d ' ' -f3)"
tablename2="$(echo $LINE2 | cut -d '-' -f1)"
count2="$(echo $LINE2 | cut -d '-' -f2)"

echo tablename2 is $tablename2

echo count2 is $count2

if [ $count1 = "is" ]

then

echo missing

echo status for $tablename1- does not exist in DB >> statusOP

elif [ $count1 -eq $count2 ]

then

echo match

echo status for $tablename2- match >> statusOP

elif [ $count1 -ne $count2 ]

then

echo mismatch
echo status for $tablename2- mismatch >> statusOP


fi

done < $file
finally i am getting the output but the problem is that in the course of reaching the output i have made a mess!!!please help me simplify the script.If you could help me with a script that simply compares the log output file and the database output file.thankyou!!!

grail 01-20-2011 05:02 AM

Well in answer to your question why the awk script failed, this is because of a simple typo:
Code:

awk '{current = $NF;getline; if($NF == current)print "match";else print "mismatch"}' file
However, it would not have worked anyway as your input shown in the last post has no white space and is delimited by a dash (-)
So based on that, the following would be the small change required:
Code:

awk -F"-" '{current = $NF;getline; if($NF == current)print "match";else print "mismatch"}' file
I would mention, and of course it is your prerogative, but it is unusual to write a (ba)sh script that then calls a ksh script, especially when you try to include
it within a function:
Code:

DBcounttry_finalfunction()
{

#!/bin/ksh

I am not even aware that this would work as my understanding is the interpreter needs to be the first line (I could be wrong).

Is there a reason for the mash up? On a quick scan there does not appear to be anything particularly ksh specific that required this?

Anyhoo, let me know how you get on.

cristalp 08-28-2012 04:36 AM

Suppose you had a large file including thousands of lines and you may have a few lines that are different with consecutive lines, you may want to simply print its line number with the code:

Code:


awk '{current = $NF; getline; if($NF != current) print NR}'

If you want to check the number of fields rather than the content, you can try the similar code:

Code:


awk '{current = NF; getline; if(NF != current) print NR}'

Hope this additional solution can help those who has questions that are relevant but not exactly same.


All times are GMT -5. The time now is 12:47 PM.