rows to clumn headers

atjurhs · 12-11-2012, 12:01 PM

Hi guys,

I have a very large file with column data that doesn't have any headers

it is space seperated, sometime several spaces, and it's too big to open in a text editor. I'll call it C.dat. So I do something like

Code:

 head -300 C.dat > littleC.dat

and I can see some of the file contents.

I have another file with row formated text information, like 700 or so rows. I'll call it R.txt. The R.txt file has the "field" information that goes with the columns in C.dat. The R.txt file looks like this:

Code:

--*******************************************************

FIELD_NAME := header_info_a;
 FIELD_DESCRITION := short_name;
 FIELD_SHOW_TRUNCATION := FALSE;
 FIELD_WIDTH := 14;
 FIELD_COLUMN := 12;
 FIELD_JUSTIFICATION := RIGHT;

--*******************************************************

FIELD_NAME := header_info_b;
 --FIELD_UNITS := "---";
 FIELD_SHOW_TRUNCATION := FALSE;
 FIELD_WIDTH := 11;
 FIELD_COLUMN := 37;
 FIELD_JUSTIFICATION := RIGHT;

--*******************************************************

FIELD_NAME := header_info_c;
 --FIELD_UNITS := "---";
 FIELD_SHOW_TRUNCATION := FALSE;
 FIELD_WIDTH := 9;
 FIELD_ROW := 1;
 FIELD_EXP := 0;
 FIELD_COLUMN := 62;
 FIELD_JUSTIFICATION := RIGHT;

--*******************************************************

FIELD_NAME := header_info_d;
 FIELD_WIDTH := 5;
 FIELD_ROW := 1;
 FIELD_COLUMN := 317;
 FIELD_JUSTIFICATION := LEFT;

--*******************************************************

etc.

so the row of information that I need to use from R.txt is FIELD_NAME, and what I'd like to do is strip out each FIELD_NAME string and write it to an another output file along with an index number. I need the index number for another tool that filters by column index.

each "block" of field info is indexed correctly down the R.txt file even though the FIELD_COLUMN numbers are not, so I don't think that I really care about the FIELD_COLUMN numbers, and I can just go down the R.txt file using only the FIELD_NAME string and placing it sequentially in line in the new output file.

here's what I'd like the file to look like:

Code:

1 header_info_a
2 header_info_b
3 header_info_c

etc.

here's what I've done so far:

Code:

#!/bin/bash
grep -F "FIELD_NAME :=" R.txt > temp.txt
sed '/\FIELD_NAME :=/s/FIELD_NAME :=/index_counter/g' temp > outputfile.txt

so this kinda somewhat works, but not really. I don't know how to create the index_counter and the results of sed give more than just the FIELD_NAME string which I don't understand becasue my grep only has the FIELD_NAME string.

thanks soooo much for any help,

Tabitha

atjurhs · 12-11-2012, 12:21 PM

wait wait wait, I fixed part of it, I'm so excited

now I use

Code:

grep -F "FIELD_NAME :=" R.txt | sed '/\FIELD_NAME :=/s/FIELD_NAME :=/index_counter/g' > outputfile.txt

and this gives an outputfile.txt with:

Code:

index_counter header_info_a;
index_counter header_info_b;
index_counter header_info_c;
index_counter header_info_d;

etc.

so now all I need help with is creating the index_counter.....

thanks sooooo much,

Tabitha

danielbmartin · 12-11-2012, 12:30 PM

Try this ...

Code:

grep "FIELD_NAME :=" $InFile  \
|cut -d'=' -f2-               \
|nl

Daniel B. Martin

shivaa · 12-11-2012, 12:31 PM

You can use one-liner awk:

Code:

awk 'BEGIN{FS=" "}; /FIELD_NAME/ {gsub(/;/,"",$3); print $3}' ./R.txt| nl > /path/to/output_file

Output:

Code:

     1  header_info_a
     2  header_info_b
     3  header_info_c
     4  header_info_d

atjurhs · 12-11-2012, 12:40 PM

ut oh, it looks like I have one other problem, my header strings have a prefix, so they really look like

Code:

prefix.1.header_info_a
prefix.1.header_info_b
prefix.2.header_info_c
prefix.1.header_info_d

and I still need them to look like

Code:

1  header_info_a
2  header_info_b
3  header_info_c
4  header_info_d

sorry guys!
and thanks for helping me!!!

Tabby

atjurhs · 12-11-2012, 12:51 PM

Hi Daniel, yours doesn't exactly work, it gives a line number for every line not an index number for just the FIELD_NAMES lines.

Hi Shivaa, yours works perfectly (except for the prefix problem), although your command is a bit beyound my scripting abilities.

could I do something like

Code:

awk 'BEGIN{FS=" "}; /FIELD_NAME/ {gsub(/;/,"",$3); print $3} | sed '/\"prefix."/s/\"prefix."//g' ' ./R.txt| nl > /path/to/output_file

and I guess I'm going to have to learn about gsub and what the BEGIN does?

I still would like to know the general way to create an index_counter and feed it into other awk/sed/bash scripts?

shivaa · 12-11-2012, 12:56 PM

Alright, don't worry

. Just use process substitution to achieve this, as follow:

Code:

awk '{FS="."} {print $3}' <(awk 'BEGIN{FS=" "}; /FIELD_NAME/ {gsub(/;/,"",$3); print $3}' R.txt) | nl > /path/to/output_file

Output:

Code:

     1  header_info_a
     2  header_info_b
     3  header_info_c
     4  header_info_d

Sure, I will give you an awk lesson later

danielbmartin · 12-11-2012, 01:08 PM

Quote:

Originally Posted by atjurhs

Hi Daniel, yours doesn't exactly work, it gives a line number for every line not an index number for just the FIELD_NAMES lines.

Perhaps I misread your problem statement.

Try this ...

Code:

nl $InFile            \
|grep "FIELD_NAME :=" \
|cut -c1-7,21-

Daniel B. Martin

atjurhs · 12-11-2012, 01:15 PM

Shivaa, thanks so much, that's very cool!

is that called "nesting" I had no idea awk could do that?

why did you use $3, I'm not sure what that means, is it different that $1

Thanks again,

Tabby

shivaa · 12-11-2012, 01:42 PM

Quote:

Shivaa, thanks so much, that's very cool!

Thanks

! You can mark the question as SOLVED (Under Thread Tools option on the top of the page).

Quote:

...is that called "nesting" I had no idea awk could do that?

It's called process substitution, which means insert output of a command in another.

Quote:

...why did you use $3, I'm not sure what that means, is it different that $1

$ is nothing but represents variables. Well, a simple answer to this question will lead you to get confused, so better first go through Awk lessions here.

Keep smiling!

atjurhs · 12-11-2012, 02:13 PM

Quote:

Originally Posted by shivaa

It's called process substitution, which means insert output of a command in another.

almost solved, one more question

what's the diference between "process substitution, which means insert output of a command in another" and the | command

I guess I'm asking for the "I will give you an awk lesson later

"

shivaa · 12-11-2012, 10:42 PM

For your knowledge you can read or learn about process substitution, but your question was not all about it.
Well, as per documents..

Quote:

Process substitution feeds the output of a process (or processes) into the stdin of another process.

So I feel, you should once go through following guides, for clear understanding:
1. Process Substitution
2. Advance Bash Scripting Guide
These guides are treasure of knowledge. You'll learn many more techniques as well.