format-data-using-sed-or-awk

Abhayman · 01-11-2018, 12:01 AM

Hi,

I am having data in below format in a file

Code:

Section : A1234,
Name : ABCBDEDF,
Age : No,   
Name : Reporting,
Age : No
Section : XYSZA,
Name : Work,
Age : YES

I am trying to achieve data in below format :--

Code:

Section : A1234,Name : ABCBDEDF,Age : No
Section : A1234,Name : Reporting,Age : No
Section : XYSZA,Name : Work,Age : YES

I tried few sed and awk statements to but I am able to merge only all rows together.

Code:

awk '{key=$0; getline; print key "" $0;}' test.txt

Any help is appreciated.

syg00 · 01-11-2018, 12:18 AM

You are merely merging each two lines. Simplest might be to save "Name" as well as key (presumably "Section") records, then print when an "Age" record is found.

AnanthaP · 01-11-2018, 01:32 AM

Look carefully at the data. One section can contain many names.
You need 2 variables, KEY and TEXT
When "Section" pattern is encountered, assign $0 to KEY.
When "Name" pattern is encountered, concatenate, KEY (contains SECTION) and this $o (NAME) to TEXT.
When "Age" pattern is encountered, append $0 to the variable TEXT and print ti out and initialize TEXT.

Finally (on END), if TEXT variable contains a value, then print it out.

OK

MadeInGermany · 01-11-2018, 06:19 AM

If you hard-code "Name" and "Age", then it is like this.
There is still two techniques, one with getline, and one with variables.
I always prefer the variables.

Here is an advanced solution that uses many variables.
It only depends on "Section" but does not need to know what follows.
If "Section" is met it sets state variable first and jumps to the next input cycle.
If first is set it must be on the next line; then it learns the first following key ffkey, "Name" in this case.
Because it does not know the ending key, it has to delay the printing of the final EOL ("\n" or ORS in awk). This is done when the next "Section" is met (I put in the first following printing), or in the END section.

Code:

awk '
  $1=="Section" { key=$0; first=1; next }
  first==1 { ffkey=$1; first=0 }
  { printf "%s", ($1==ffkey ? (ors key $0) : $0); ors=ORS }
  END { printf ors }
' test.txt

The last trick is to not print the ORS in the first section. I print variable ors instead that is empty first, and set later to ORS.

firstfire · 01-16-2018, 11:47 PM

Hi.

How about

Code:

 $ awk -F' : ' '{f[$1] = $2;} $1=="Age" { print "Section :", f["Section"] "Name : " f["Name"], "Age : " f["Age"] }' /tmp/data.txt
Section : A1234,Name : ABCBDEDF, Age : No,
Section : A1234,Name : Reporting, Age : No
Section : XYSZA,Name : Work, Age : YES

This may be more suitable if there are more than 3 fields:

Code:

$ awk -F' : ' '{f[$1] = $2;} $1=="Age" {split("Section,Name,Age", keys, ","); for(i in keys) {k=keys[i]; printf("%s : %s", k, f[k])}; printf("\n") }' /tmp/data.txt
Section : A1234,Name : ABCBDEDF,Age : No,
Section : A1234,Name : Reporting,Age : No
Section : XYSZA,Name : Work,Age : YES

keefaz · 01-17-2018, 11:15 AM

Using perl

Code:

perl -lne '/^S/ and $s=$_ or $v.=$_; /^A/ and print($s,$v), $v=""' test.txt

Edit, converting to awk

Code:

awk '/^S/{s=$0; next} {v=v$0} /^A/{print s v; v=""}' test.txt