LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   A data parsing problem - AWK (https://www.linuxquestions.org/questions/programming-9/a-data-parsing-problem-awk-707140/)

indiancosmonaut 02-24-2009 09:50 AM

A data parsing problem - AWK
 
Hi,

I am having some problems in parsing the following data.

Code:

Tom
AGE:25
EDINBURGH

Dick
AGE:24
AMSTERDAM

Harry
AGE:23
BRUSSELS

I am writing a code to generate the following CSV file.

Code:

"AGE:25 EDINBURGH","AGE:24 AMSTERDAM","AGE:23 BRUSSELS"
I started writing the code but I am not able to determine how to get this data because of the following contraints...

a) The data blocks can be in any order


Code -

Code:

function init_array()
{
        for (del in feed)
                delete feed[del];

        i=0;

        _Tom_ = 0;
        _Dick_ = 0;
        _Harry_ = 0;
}

BEGIN { i=0; }

{ record = $0; }

/^$/ { next; }

/Tom/ {
                print "Now reading :Tom:";
                init_array();
                _Tom_ = 1;
                next;
        }

/Dick/ {
                print "Now reading :Dick:";
                init_array();
                _Dick_ = 1;
                next;
        }

/Harry/ {
                print "Now reading :Harry:";
                init_array();
                _Harry_ = 1;
                next;
}

{

        print NR" : "record;
        feed[i]=record;
        i++;
}


function one()
{
        for (j in feed)
                ret=ret"\n"feed[j];

        return (sprintf("%s",ret));
}

if ( _Tom_ == 1)
Tom=one();
..... (But this doesn't seem right)


END { print Tom,",", Dick,",", Harry; };

Its incomplete, but I am looking for some help.

Kind regards,

indiancosmonaut

amani 02-24-2009 10:34 AM

Basically you need to read the original as a csv take the string in the 3x th line and concatenate it to the string in the 3x-1 th line and form a new csv with the new 3x-1 th lines. You seem to want col names only with 0 rows (is easy). Takes about 1 or 2 lines in R

or just delete the 3x th lines


All times are GMT -5. The time now is 12:48 PM.