LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 01-17-2014, 12:27 PM   #1
tabbygirl1990
Member
 
Registered: Jul 2013
Location: a warm beach, cool ocean breeze, nice waves, and a Margaritta
Distribution: RHEL 5.5 Tikanga
Posts: 63

Rep: Reputation: 1
how do i keep the header


hi guys,

i wrote an awk script that does the stuff below (filtering a file row by row based on criteria in each column) except it does not include the original files header, can someone please show me how to keep the header in the output file for those columns of data that are kept.

Code:
BEGIN {
       FS = ' '
       }
       {
        if ($3=="42" && $5=="the answer to the universe")
        printf("%f %f %d %f %f %s\, $1, $2, $3, $11, $12, $5)
        }
END{}
and I run the script using the command line:

Code:
awk -f row_parsing_tool.awk  input.inp > output.out
thanks guys! tabby

Last edited by tabbygirl1990; 01-17-2014 at 12:31 PM.
 
Old 01-17-2014, 12:39 PM   #2
schneidz
LQ Guru
 
Registered: May 2005
Location: boston, usa
Distribution: fedora-35
Posts: 5,313

Rep: Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918
Code:
awk 'NR == 1 {print $0}' tabbygirl.txt
 
1 members found this post helpful.
Old 01-17-2014, 12:53 PM   #3
tabbygirl1990
Member
 
Registered: Jul 2013
Location: a warm beach, cool ocean breeze, nice waves, and a Margaritta
Distribution: RHEL 5.5 Tikanga
Posts: 63

Original Poster
Rep: Reputation: 1
that doesn't do what i need at all

that will simply pull the header (the first line) on the file has not been filtered.

i need to have the headers that go along with the filtered file, in this case the headers of columns $1, $2, $3, $11, $12, $5

i'm thinking it will take another
Code:
printf
statement and then a
Code:
cat
 
Old 01-17-2014, 05:07 PM   #4
selfprogrammed
Member
 
Registered: Jan 2010
Location: Minnesota, USA
Distribution: Slackware 13.37, 14.2, 15.0
Posts: 635

Rep: Reputation: 154Reputation: 154
I often would like to know how to do that too. But it seems that all these editing tools will only treat all lines equally, with the same rules or commands applied to every selected line.

That leaves a two pass procedure as the most universal solution.
Get the headers to one temp file, the sorted to another temp file, then cat them back together.
 
Old 01-17-2014, 07:03 PM   #5
Firerat
Senior Member
 
Registered: Oct 2008
Distribution: Debian sid
Posts: 2,683

Rep: Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783
schneidz did provide the answer !


Code:
awk 'NR == 1 {print $0}' tabbygirl.txt
is essentially

if line is 1 then print line 1



add it to your script
Code:
BEGIN {
   FS = ' '
   }
   {
   if ( NR == 1 ) {
       print $0
       } else {
        if ($3=="42" && $5=="the answer to the universe")
           printf("%f %f %d %f %f %s\, $1, $2, $3, $11, $12, $5)
       }
   }
END{}
untested, but you get the idea

edit, obviously replace print $0 with "printf <desired formatting> $1, $2, $3, $11, $12, $5",

Last edited by Firerat; 01-17-2014 at 07:08 PM.
 
2 members found this post helpful.
Old 01-17-2014, 07:12 PM   #6
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,131

Rep: Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121
Save the the fields of interest from the first record into some variable, if you eventually find anything to print, print the saved variable once (set a flag) then the record(s) to follow. print rather than printf as the header will be simple strings, and no need to use cat or any other external command.

D'oh - too slow at typing.

Last edited by syg00; 01-17-2014 at 07:13 PM.
 
1 members found this post helpful.
Old 01-18-2014, 12:08 AM   #7
smeezekitty
Senior Member
 
Registered: Sep 2009
Location: Washington U.S.
Distribution: M$ Windows / Debian / Ubuntu / DSL / many others
Posts: 2,339

Rep: Reputation: 231Reputation: 231Reputation: 231
If two command is acceptable, you can do this
Code:
head -n 1 input.inp > output.out && awk -f row_parsing_tool.awk  input.inp >> output.out
 
Old 01-18-2014, 04:02 PM   #8
Firerat
Senior Member
 
Registered: Oct 2008
Distribution: Debian sid
Posts: 2,683

Rep: Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783
@smeezekitty

just a few problems with that approach

the first is the field order, head -n1 is not much use since it won't 'reorder' the fields

another is highlighted by syg00
that is "do we want a header if we would have no data?"

To get round both use a single awk script
have the BEGIN 'capture' the header feilds to some variable,
now test each record
when condition is 'true' check the header variable,, if set print it and then unset it (or set it to null,e.g. Header=""), then print the data line, repeat with all records

should only get the header once, and only when there was actual output data
 
Old 01-18-2014, 04:50 PM   #9
smeezekitty
Senior Member
 
Registered: Sep 2009
Location: Washington U.S.
Distribution: M$ Windows / Debian / Ubuntu / DSL / many others
Posts: 2,339

Rep: Reputation: 231Reputation: 231Reputation: 231
I wonder under the impression that she wanted the first line verbatim but I could be wrong
 
Old 01-18-2014, 11:38 PM   #10
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192
I am not sure exactly which header we are talking about, ie where it appears in the data (perhaps because no example data was provided (hint)).

However, if we are able to assume that the header is in fact the first row within the file, simply adding this criteria to the existing would do the trick.

I would add that the current setting of FS is also not required as white space is the default.

So it could just be:
Code:
NR == 1 || ($3=="42" && $5=="the answer to the universe"){printf("%f %f %d %f %f %s\, $1, $2, $3, $11, $12, $5)}
 
Old 01-20-2014, 12:08 PM   #11
tabbygirl1990
Member
 
Registered: Jul 2013
Location: a warm beach, cool ocean breeze, nice waves, and a Margaritta
Distribution: RHEL 5.5 Tikanga
Posts: 63

Original Poster
Rep: Reputation: 1
schneidz - i'm sorry, i didn't mean to be snooty at all, one of those days, thank you for your help.

when i ran firerat's script in post 5, my output file came back empty

when i ran grail's command line in post #10, my output file came back with the line after the line that met the filter criteria, but alos no header

so, here's an example space deliminated input file

Code:
DATE		TIME       OPERATOR VERSION  RUN_ID  SPEC      DTG      FAIL  END    ONGOING  FEATURE_1   FEATURE_2      FEATURE_3 GOODNESS     
12/04/2013      6:00:011.27   SM    2.6.8 6   90501   5   921996008.31 FALSE 5  *     5      0.711503131 5660093.929    6.22      0.91
12/05/2013      6:00:011.3     DK    2.6.8 6   90501   4   921996009.31 FALSE 8  *     8      0.567142359 5660095.848    0.53      0.90
12/06/2013      6:00:011.41    SM    2.6.8 5   90503   2   921996009.01 FALSE 8  *     8      0.708699814 5660097.221    0.54      0.91
12/06/2013      6:00:011.41    JF    2.6.8 6   90501   5   921996010.31 FALSE 3  *     3      0.142189285 5660100.259    -0.27      0.08
12/09/2013      6:00:011.55    SM    2.6.8 6   90501   1   921996010.01 FALSE 8  *     8      0.213247275 5660103.596    -0.27      0.08
12/10/2013      6:00:011.41   SM    2.6.8 4   90503   5   921996011.31 FALSE 8  *     8      0.91836074 5660103.492    0.53       0.91
12/10/2013      6:00:011.32   SM    2.6.8 4   90501   5   921996011.01 TRUE 1
12/11/2013      6:00:011.21   DK    2.6.8 4   90501   3   921996015.01 TRUE 1
12/11/2013      6:00:011.42   SM    2.6.8 4   90501   3   921996015.01 FALSE 10  *    10      0.864147301 5660105.265    0.622     0.91
12/12/2013      6:00:011.50   JF    2.6.8 4   90501   3   921996015.31 FALSE 8  *     8      0.539123318 5660104.795    0.622     0.92
12/13/2013      6:00:011.15   SM    2.6.8 4   90503   5   921996016.01 FALSE 2  *     2      0.922633758 5660109.457    7.05      0.96
if i filter on OPERATOR=SM and SPEC=5 then what I'd like to getout is

Code:
DATE		TIME       OPERATOR VERSION  RUN_ID  SPEC      DTG      FAIL  END FEATURE_1   FEATURE_2      FEATURE_3 GOODNESS     
12/04/2013      6:00:011.27   SM    2.6.8 6   90501   5   921996008.31  FALSE  5  0.711503131 5660093.929    6.22      0.91
12/10/2013      6:00:011.41   SM    2.6.8 4   90503   5   921996011.31  FALSE  8  0.91836074 5660103.492    0.53      0.88
12/10/2013      6:00:011.32   SM    2.6.8 4   90501   5   921996011.01  TRUE  1
12/13/2013      6:00:011.15   SM    2.6.8 4   90503   5   921996016.01  FALSE  2  0.922633758 5660109.457    7.05      0.96
the files that i'm trying to process are much much bigger but i think this little one covers all the cases

thanks so much guys!!!

tabby
 
Old 01-20-2014, 07:56 PM   #12
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192
Tabby I see an issue prior to the solution. That being that you have more columns of data than you have of header. This means that once you pass the VERSION column, the header
and data become out of sync. Not sure if your current formatting has allowed for this??

Also, looking at your data, your reference in your format for printf to %f and %d will not match most of the data presented.

So I will leave these 2 issues to you, but the sort of thing I would look at doing is:
Code:
BEGIN{    fmt[1] = "%s %s %s %s %s\n" # header
          fmt[2] = "<the format for other lines>"
}

NR == 1 || ( $3 == "SM" && $7 == 5 ){
    printf(fmt[NR==1?1:2],<choose your columns here>)
}
 
Old 01-20-2014, 09:49 PM   #13
tabbygirl1990
Member
 
Registered: Jul 2013
Location: a warm beach, cool ocean breeze, nice waves, and a Margaritta
Distribution: RHEL 5.5 Tikanga
Posts: 63

Original Poster
Rep: Reputation: 1
thanks sooo much grail !!!

yep, that's actually the way the files are after VERSION, the headers and the data columns don't line up 1 for 1 and when FAIL is set to TRUE then no more data is written to that line

i know i can't have a different numbers of arguments types in the format statement of fmt[2], but is there a way to "PAD" the control characters in fmt[1] ?

Code:
fmt[1] = "%s  %s  %s  %s PAD %s  %s  %s  %s %s  %s  %s  %s %s\n" 
fmt[2] = " printf("%s  %s  %s  %s  %d  %d  %d %f  %s  %d  %f  %f  %f  %f\, $1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $13,  $14, $15, $16)"
as i understand "<choose your columns here>" it would be the column calls in fmt[2] yes?

Code:
NR==1?1:2
i thought that was really cool using the ?: operator for NR, i'll have to think of using that more

thanks!!!

tabby
 
Old 01-20-2014, 10:02 PM   #14
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192
For more on padding have a look here

Unfortunately I seem to have lead you astray it seems for fmt[2]. It should be of the same format as fmt[1] but using different modifiers to display the data you need, like
Code:
fmt[2]= "%s  %s  %s  %s  %d  %d  %d %f  %s  %d  %f  %f  %f  %f\n"
Whereas the "<choose your columns here>" part would be:
Code:
printf(fmt[NR==1?1:2],$1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $13,  $14, $15, $16)
Hope that is a little clearer
 
Old 01-20-2014, 11:15 PM   #15
smeezekitty
Senior Member
 
Registered: Sep 2009
Location: Washington U.S.
Distribution: M$ Windows / Debian / Ubuntu / DSL / many others
Posts: 2,339

Rep: Reputation: 231Reputation: 231Reputation: 231
Here is a working Perl solution
Code:
$oc = "SM";
$sp = "5";

<>;
print("DATE		TIME       OPERATOR VERSION  RUN_ID  SPEC      DTG      FAIL  END FEATURE_1   FEATURE_2      FEATURE_3 GOODNESS\n");
while(<>){
    ($date, $time, $opr, $version, $v2, $run, $spec, $dtg, $fail, $end, $p, $ongoing, $f1, $f2, $f3, $goodness) = split(' ',$_);
    if($oc eq $opr && $sp == $spec){print ($date, "      ", $time, "   ", $opr, "    ", $version, " ", $v2, "   ", $run, "   ", $spec, "   ", $dtg, "  ", $fail, "  ", $end, "  ", $f1, " ", $f2, "    ", $f3, "      ", $goodness, "\n");}
}

Last edited by smeezekitty; 01-21-2014 at 02:05 AM. Reason: broken code
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Amavis: invalid header: all-whitespace header field deathsfriend99 Linux - Server 2 02-16-2012 09:41 AM
Want to add data in the header field of tcp/ip header Maitrikkshah Linux - Networking 1 08-06-2011 06:07 AM
How to check missing header files included from another header file adisan82 Linux - Software 1 01-28-2011 03:57 AM
2.6.15 Header? b0rgri0t Slackware 23 01-22-2006 12:25 PM
c header files in linux in place of header files in windows? harun_acs Programming 1 03-17-2004 02:24 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 04:59 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration