LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   Comparing and Formatting the text file (https://www.linuxquestions.org/questions/programming-9/comparing-and-formatting-the-text-file-837330/)

flamingo_l 10-11-2010 01:23 AM

Comparing and Formatting the text file
 
hi,

I need a script which can format the below text file which contains comments


Code:

Code:


file1.txt
--------
 
START
Name: some value
Date:
Function Name: .....
...................
Changes:.............
.....................
END
START
Date:
Name: some value
Function Name: .....
...................
Changes:.............
.....................
END
.................
...................




Output should be:


Code:
Code:

Name |Date|Function Name|Changes
Script should compare the column name and paste the output in above said manner.

I am trying this, can anybody please help me on this.

grail 10-11-2010 02:29 AM

Happy to help ... what have you got so far? Are you using a particular app/script/language?

flamingo_l 10-11-2010 04:10 AM

Yes, Linux script.

I got the follwoing script. But the below script works only for 3 columns.

I need to do it for 4 columns given in the above sample file.

Quote:

BEGIN {
FS=":" ; OFS="|"
num=split("Name,Date,Changes",cols,",")
print cols[1],cols[2],cols[3]
}
{
s=$2
sub(/^ */, "", s);
if ($1 == "END") print res[1], res[2], res[3]
else
{
if (res[3] != "" && NR=1)
res[3]=res[3]" "$1
for(i=1; i<=num; i++)
{
if (cols[i] == $1)
res[i]=s;
}
}
}
Sample file:
Quote:

START
Name: some value
Date:
Changes:Change A
more of change A
END

Now, am trying to apply to four feild, it is not working.

I need this urgently and it is not working. Please help me.

grail 10-11-2010 04:26 AM

So I am not understanding, you want the words Date, Name, etc or you want the field that they refer too or are the words you have written actually what they say they are? ie Date is an actual date?

If you sample is:
Code:

START
Name: some_name
Date: some_date
Function Name: some_function
Changes:Change A
more of change A
END

What would be the output you require, based on this as input?

flamingo_l 10-11-2010 04:47 AM

hi Grail,

The above code given by me, would work for yor sample file.

Suppose if the function name exceeds more than 2 lines, then the code is not working propoerly.

I need code for this sample file:
Quote:

START
Name: some_name
Date: some_date
Function Name: some_function_name(jjjjjjjjj,
fjddddd, gggg, ggg)
Changes:Change A
more of change A
END
START
Date: some_date
Name: some_name
Function Name: some_function_nameB(jjjjjjjjj,
fjddddd, gggg, ggg)
Changes:Change B
more of change B
END
And also the sequence of the sub heading Name, Date, Function Name, Change may vary.

flamingo_l 10-11-2010 05:53 AM

For easily extracting the value of function name and Changes, if needed i can place delimeters for the sub-headings (like fucntion name, changes) start and end as follows.


Quote:

START
Name: some_name
Date: some_date
Function Name: <some_function_name(jjjjjjjjj,
fjddddd, gggg, ggg)>
Changes:<Change A
more of change A>
END
START
Date: some_date
Name: some_name
Function Name: <some_function_nameB(jjjjjjjjj,
fjddddd, gggg, ggg)>
Changes:<Change B
more of change B>
END

Since am new to awk programming i am not aware of how to traverse in a given feild.

grail 10-11-2010 07:29 AM

See what ya think:
Code:

#!/usr/bin/awk -f

BEGIN{
        RS="(START|END)\n"
        FS=":"
        OFS="|"
}

NF>0{
    counter++
    for(indx=1;indx<=NF;indx++){
        if($indx ~ "\n"){
            n=split($indx,pieces,"\n")
            if(n == 2)
                arr[counter,val]=pieces[1]
            else
                for(z=1;z<n;z++)
                    if(z > 1)
                        arr[counter,val]=arr[counter,val]"\n"pieces[z]
                    else
                        arr[counter,val]=pieces[z]

                    val=pieces[n]
        }
        else
            val=$indx

        if(!(val in array_vals) && val != "")
            array_vals[val]++
    }
}

END{
    for(y=1;y<=counter;y++)
        for(u in array_vals)
            print u,arr[y,u]
}

May need some refining but seems to work for given examples

flamingo_l 10-12-2010 01:42 AM

hi Grail,

The code is not working.

I have saved the code given by you in a file awk-script and removed the first statement since it was throwing error.
The sample file is saved in file.txt

Executed the below way:

Quote:

awk -f awk-script file.txt

But the output is :

Quote:

Function Name| <some_function_name(jjjjjjjjj,
fjddddd, gggg, ggg)>
Date| some_date
Changes|<Change A
more of change A>
Name| some_name
Function Name| <some_function_nameB(jjjjjjjjj,
fjddddd, gggg, ggg)>
Date| some_date
Changes|<Change B
more of change B>
END
Name| some_name
But the expected output is:


Quote:

Name |Date|Function Name|Changes
some_name|some_date|some_function_nameB(jjjjjjjjj,fjddddd, gggg, ggg)|Change A more of change A
some_name|some_date|some_function_nameB(jjjjjjjjj,fjddddd, gggg, ggg)|Change B more of change B

grail 10-12-2010 01:46 AM

So you will need to do the formatting part, but as you can see the data in both is mostly equivalent. It is only the part enclosed by:

END{}

That you need to look at for formatting.

flamingo_l 10-12-2010 07:44 AM

hi Grail,

I need to understand the logic written so that i can format it accordingly.
Can you please explain me.

flamingo_l 10-12-2010 08:51 AM

hi Grail,

I got an idea, instead of writing an AWK script for formatting, can we merge the lines in changes and function name so that we can use the awk script given by me above to do the formatting.

Suppose,
I have a file as below:



Code:
Quote:

Name: some_name
Date: some_date
Function Name: <some_function_name(jjjjjjjjj,
fjddddd, gggg, ggg)>
Changes:<Change A
more of change A>
Name: some_name
Date: some_date
Function Name: some_function_nameB(jjjjjjjjj,
fjddddd, gggg, ggg)
Changes:Change B
more of change B
I need a script which can merge the lines based on the sub-headings.
Expected output is:




Code:
Quote:

Name: some_name
Date: some_date
Function Name: some_function_name(jjjjjjjjj,fjddddd, gggg, ggg)
Changes:Change A more of change A
Name: some_name
Date: some_date
Function Name: some_function_nameB(jjjjjjjjj,fjddddd, gggg, ggg)
Changes:Change B more of change B

grail 10-12-2010 10:17 AM

So using your input from post #5 and with the small changes made to script as below, see output:
Code:

#!/usr/bin/awk -f

BEGIN{
        RS="(START|END)\n"
        FS=":"
        OFS="|"
}

NF>0{
    counter++
    for(indx=1;indx<=NF;indx++){
        if($indx ~ "\n"){
            n=split($indx,pieces,"\n")
            if(n == 2)
                arr[counter,val]=pieces[1]
            else
                for(z=1;z<n;z++)
                    if(z > 1)
                        arr[counter,val]=arr[counter,val]" "pieces[z]
                    else
                        arr[counter,val]=pieces[z]

                    val=pieces[n]
        }
        else
            val=$indx

        if(!(val in array_vals) && val != "")
            array_vals[val]++
    }
}

END{
    print "Name|Date|Function Name|Changes"
    for(y=1;y<=counter;y++)
        print arr[y,"Name"],arr[y,"Date"],arr[y,"Function Name"],arr[y,"Changes"]
}

Run script as:
Code:

./script.awk input_file
Output based on input above:
Code:

Name|Date|Function Name|Changes
 some_name| some_date| some_function_name(jjjjjjjjj, fjddddd, gggg, ggg)|Change A more of change A
 some_name| some_date| some_function_nameB(jjjjjjjjj, fjddddd, gggg, ggg)|Change B more of change B

I feel you should do some of the hard yards yourself and look up your reference material for awk to workout the how and why.

Post back on anything that you get stuck on

flamingo_l 10-13-2010 02:25 AM

Thanks Grail. It is working perfectly. :)

I have done debug of each and every line and got the logic.

I dont think the following code is required. As per my understanding this is placed to have an array of val in an array- array_vals. Removal of this, the code works fine.

Please correct me if am wrong.

Quote:

if(!(val in array_vals) && val != "")
array_vals[val]++

grail 10-13-2010 03:16 AM

Yeah that was for older stuff and can be removed.


All times are GMT -5. The time now is 02:06 AM.