LinuxQuestions.org
Visit the LQ Articles and Editorials section
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices



Reply
 
Search this Thread
Old 11-23-2010, 08:19 AM   #1
emmalg
Member
 
Registered: Jun 2009
Location: Spain
Distribution: Various, Ubuntu, Fedora, Open Solaris, Solaris, RHEL, CentOS
Posts: 64

Rep: Reputation: 16
Reformatting file to be a table in a bash script


Hi All,

I have a file which has the following format repeated over and over again:

Code:
10295> DOP_CENTROID_COEFFS_ADS.14.zero_doppler_time.mjd (DateTime) = 23-NOV-2010 07:32:55.007935
10308> DOP_CENTROID_COEFFS_ADS.14.slant_range_time (ns) = 5.39094e+06
10312> DOP_CENTROID_COEFFS_ADS.14.dop_coef.0 (HzHz/sHz/s2Hz/s3Hz/s4) = -79.0127
10316> DOP_CENTROID_COEFFS_ADS.14.dop_coef.1 (HzHz/sHz/s2Hz/s3Hz/s4) = -647444
10320> DOP_CENTROID_COEFFS_ADS.14.dop_coef.2 (HzHz/sHz/s2Hz/s3Hz/s4) = 5.46523e+08
10324> DOP_CENTROID_COEFFS_ADS.14.dop_coef.3 (HzHz/sHz/s2Hz/s3Hz/s4) = -2.85647e+11
10328> DOP_CENTROID_COEFFS_ADS.14.dop_coef.4 (HzHz/sHz/s2Hz/s3Hz/s4) = 6.39915e+13
I've been trying to reformat this so it is a table I can use in a spreadsheet, unfortunately I'm getting stuck - probably at the easiest part!

The output I want is a single line per entry in the file:
Code:
23-NOV-2010 07:32:55.007935 5.39094e+06 -79.0127 -647444 5.46523e+08 -2.85647e+11 6.39915e+13
23-NOV-2010 08........
Using:

Code:
awk '{print $4 $5}'
I can print just the numbers (though the date time no longer has a space which I have to fix).

Code:
23-NOV-201007:32:55.007935 
5.39094e+06 
-79.0127 
-647444 
5.46523e+08 
-2.85647e+11 
6.39915e+13
I just can't work out how to get the groups of lines into a tab or comma separated table. I tried using an xemacs keyboard macro for the whole thing, but the file is much too large to do it that way.

Any suggestions will be much appreciated. As this is only required for a temporary problem and I have several slightly different files to work with, I was aiming to make a very simple, easily editable bash script.

Cheers
Emma
 
Old 11-23-2010, 08:32 AM   #2
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,698

Rep: Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988
Well for starters, your current code would not give the output you have shown as all after the first line are not showing field 4.

First thing you need to ask yourself is what delimits both the fields you require and the records?

You show as your required output the following:
Quote:
23-NOV-2010 08........
The question of record delimiter can only be seen if you show some more data as it is not possible currently

In answer to some of the things you might be looking for:
Code:
print $4","$5
This will generate the fields with a comma separator.
Code:
BEGIN{OFS=","} ... {print $4,$5}
so will this ... OFS = Output Field Separator
Code:
printf "%s,%s",$4,$5
printf does not use an implicit newline at the end so may help with juggling things onto one line.

Let us know how you get on?
 
Old 11-23-2010, 08:54 AM   #3
emmalg
Member
 
Registered: Jun 2009
Location: Spain
Distribution: Various, Ubuntu, Fedora, Open Solaris, Solaris, RHEL, CentOS
Posts: 64

Original Poster
Rep: Reputation: 16
Quote:
Originally Posted by grail View Post
Well for starters, your current code would not give the output you have shown as all after the first line are not showing field 4.
You're right, I actually printed $5 and $6 (and yes, 6 only exists on the lines where there is a date and time which is why I specifically used that and not $NF)! I just forgot in the time it took me to change screens!

Printing, selecting and formatting the data is less of a worry to me than getting the seven lines for each entry into a single row. Basically, I'm perfectly indifferent to whether a comma or a tab separates each variable, so long as there is a new line before the next entry.

To make it simpler to understand, say this is the input:

10295> DOP_CENTROID_COEFFS_ADS.14.zero_doppler_time.mjd (DateTime) = 23-NOV-2010 07:32:55.007935
10308> DOP_CENTROID_COEFFS_ADS.14.slant_range_time (ns) = 5.39094e+06
10312> DOP_CENTROID_COEFFS_ADS.14.dop_coef.0 (HzHz/sHz/s2Hz/s3Hz/s4) = -79.0127
10316> DOP_CENTROID_COEFFS_ADS.14.dop_coef.1 (HzHz/sHz/s2Hz/s3Hz/s4) = -647444
10320> DOP_CENTROID_COEFFS_ADS.14.dop_coef.2 (HzHz/sHz/s2Hz/s3Hz/s4) = 5.46523e+08
10324> DOP_CENTROID_COEFFS_ADS.14.dop_coef.3 (HzHz/sHz/s2Hz/s3Hz/s4) = -2.85647e+11
10328> DOP_CENTROID_COEFFS_ADS.14.dop_coef.4 (HzHz/sHz/s2Hz/s3Hz/s4) = 6.39915e+13
10295> DOP_CENTROID_COEFFS_ADS.14.zero_doppler_time.mjd (DateTime) = 23-NOV-2010 07:32:55.007935
10308> DOP_CENTROID_COEFFS_ADS.14.slant_range_time (ns) = 5.39094e+06
10312> DOP_CENTROID_COEFFS_ADS.14.dop_coef.0 (HzHz/sHz/s2Hz/s3Hz/s4) = -79.0127
10316> DOP_CENTROID_COEFFS_ADS.14.dop_coef.1 (HzHz/sHz/s2Hz/s3Hz/s4) = -647444
10320> DOP_CENTROID_COEFFS_ADS.14.dop_coef.2 (HzHz/sHz/s2Hz/s3Hz/s4) = 5.46523e+08
10324> DOP_CENTROID_COEFFS_ADS.14.dop_coef.3 (HzHz/sHz/s2Hz/s3Hz/s4) = -2.85647e+11
10328> DOP_CENTROID_COEFFS_ADS.14.dop_coef.4 (HzHz/sHz/s2Hz/s3Hz/s4) = 6.39915e+13

Then the output I want is this (comma or tab separated), comma here for clarity:

23-NOV-2010 07:32:55.007935, 5.39094e+06, -79.0127, -647444, 5.46523e+08, -2.85647e+11, 6.39915e+13
23-NOV-2010 07:32:55.007935, 5.39094e+06, -79.0127, -647444, 5.46523e+08, -2.85647e+11, 6.39915e+13

The important thing to me is how to get the seven parameters for each record onto the same line, with a newline after each.

Using printf might well be a good answer.

Cheers
Emma

Last edited by emmalg; 11-23-2010 at 08:56 AM. Reason: clarity
 
Old 11-23-2010, 09:18 AM   #4
colucix
Moderator
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957
Quote:
Originally Posted by emmalg View Post
Using printf might well be a good answer.
Yes. printf is the answer as suggested by grail. You have just to find a logic to write out the newline at the end of every line. For example you can start by writing data on the same line and when it encounters a new line containing DateTime put the newline and print out the date and time fields. Also you have to add a newline at the end to close the last output line. Here is a way to accomplish the task:
Code:
/DateTime/ {
  
  printf "%s %s", $(NF-1), $NF

  while ( getline > 0 )
    if ( $0 ~ /DateTime/ )
      printf "\n%s %s", $(NF-1), $NF
    else
      printf ",%s", $NF
    
  print ""

}
 
1 members found this post helpful.
Old 11-23-2010, 09:33 AM   #5
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,698

Rep: Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988
colucix beat me to it but thought I would show my similar alternative:
Code:
/DateTime/{
    line=$(NF-1)" "$NF

    while(++x < 7){
        getline
        line=line","$NF
    }

    print line
    x=0
}
 
1 members found this post helpful.
Old 11-23-2010, 10:38 AM   #6
emmalg
Member
 
Registered: Jun 2009
Location: Spain
Distribution: Various, Ubuntu, Fedora, Open Solaris, Solaris, RHEL, CentOS
Posts: 64

Original Poster
Rep: Reputation: 16
Hi again,

I was trying a way based on what colucix suggested and haven't got the expected output. No worried though, I'm working on fixing that after your very helpful suggestions.

I was just a little concerned as I've never tried such a complicated pattern match, so there might well be some issue there. Will this match a date of the format DD-MMM-YYYY?

Code:
    if ( $0 ~  /[0-3][0-9]-[A-Z]{3}-[1-2][0-9]{3}/ )
Cheers
Emma
 
Old 11-23-2010, 10:56 AM   #7
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,698

Rep: Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988
It will be tricky but that will also match a date with 39 as the day portion. Is there a reason for trying to match the date format?

There are also date constructs in awk, but these may not necessarily help what you are trying to do.
 
Old 11-23-2010, 10:56 AM   #8
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,698

Rep: Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988
It will be tricky but that will also match a date with 39 as the day portion. Is there a reason for trying to match the date format?

There are also date constructs in awk, but these may not necessarily help what you are trying to do.
 
Old 11-23-2010, 01:43 PM   #9
emmalg
Member
 
Registered: Jun 2009
Location: Spain
Distribution: Various, Ubuntu, Fedora, Open Solaris, Solaris, RHEL, CentOS
Posts: 64

Original Poster
Rep: Reputation: 16
Partly because I had already started working on the solution presented by colucix and once I started, I had to keep going as I was trying to work out the logic to match the lines which contained the data. :-)

Your solution is simpler but as I have several files contianing a different number of parameters (all with a date), the more complicated option means I shouldn't have to re-work the code each time.

I know none of the data in the files will have a crazy date as I have the good fortune to be one of the engineers responsible for its quality and would only have myself to blame if it was wrong! ;-)

I'll post my solution as soon as I get it sorted at work tomorrow.

Thank you both for your help.
 
Old 11-23-2010, 06:18 PM   #10
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,698

Rep: Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988
Fair call on colucix's being more robust for your needs. I cannot see why you would need to do the if you are asking for as of course DateTime appears
to be on the line so checking the date format appears to have no real use?

Also, I realise that the date may be of the correct format, as you are the one generating the format, I was just warning in case another field has same format but is not a date
and hence it will match

Let us know if we can help more
 
Old 11-24-2010, 09:55 AM   #11
emmalg
Member
 
Registered: Jun 2009
Location: Spain
Distribution: Various, Ubuntu, Fedora, Open Solaris, Solaris, RHEL, CentOS
Posts: 64

Original Poster
Rep: Reputation: 16
Hi Guys,

I really am a bit of a muppet at times!

There was I thinking that your references to "DateTime" were a sort of pseudo code reference to me needing to work out how to match the date!

Then after having created some rather untidy if, else if, else awk code which checks for JAN, FEB, MAR... in turn which seemed to work on all but the first row of data, I went back to the input code and lo and behold it quite clearly says "DateTime" in the label. Now I realise why you were questioning what I was up to! Never mind, I've learnt a lot about reg expressions and how bad at them I am!

Anyway, the solution almost works. The code below gives the following output:

Code:
awk '{

  while ( getline > 0 ) {
    if ( match($0, /DateTime/) ) {
      printf "\n%s %s", $5, $6
    }
    else {
      printf ",%s", $5
    }
  }

  printf "\n"

}' dop_gm_emma.txt > dop_gm_tab.txt
,5.45439e+06,-487.407,-672346,9.15808e+08,-6.75078e+11,1.8763e+14
29-JUL-2010 00:08:33.998848,5.45496e+06,-496.71,-663010,8.98857e+08,-6.64175e+11,1.8547e+14
29-JUL-2010 00:08:40.041145,5.45554e+06,-506.013,-653675,8.81907e+08,-6.53271e+11,1.8331e+14
29-JUL-2010 00:08:46.083440,5.45612e+06,-515.316,-644339,8.64956e+08,-6.42368e+11,1.8115e+14
29-JUL-2010 00:08:52.125737,5.4567e+06,-524.619,-635004,8.48005e+08,-6.31465e+11,1.7899e+14
29-JUL-2010 00:08:58.168033,5.45728e+06,-533.922,-625668,8.31054e+08,-6.20562e+11,1.7683e+14

As you can see the first line is missing. The first two records are given below for reference:

009525> DOP_CENTROID_COEFFS_ADS.0.zero_doppler_time.mjd (DateTime) = 29-JUL-2010 00:08:27.956552
009538> DOP_CENTROID_COEFFS_ADS.0.slant_range_time (ns) = 5.45439e+06
009542> DOP_CENTROID_COEFFS_ADS.0.dop_coef.0 (HzHz/sHz/s2Hz/s3Hz/s4) = -487.407
009546> DOP_CENTROID_COEFFS_ADS.0.dop_coef.1 (HzHz/sHz/s2Hz/s3Hz/s4) = -672346
009550> DOP_CENTROID_COEFFS_ADS.0.dop_coef.2 (HzHz/sHz/s2Hz/s3Hz/s4) = 9.15808e+08
009554> DOP_CENTROID_COEFFS_ADS.0.dop_coef.3 (HzHz/sHz/s2Hz/s3Hz/s4) = -6.75078e+11
009558> DOP_CENTROID_COEFFS_ADS.0.dop_coef.4 (HzHz/sHz/s2Hz/s3Hz/s4) = 1.8763e+14
009580> DOP_CENTROID_COEFFS_ADS.1.zero_doppler_time.mjd (DateTime) = 29-JUL-2010 00:08:33.998848
009593> DOP_CENTROID_COEFFS_ADS.1.slant_range_time (ns) = 5.45496e+06
009597> DOP_CENTROID_COEFFS_ADS.1.dop_coef.0 (HzHz/sHz/s2Hz/s3Hz/s4) = -496.71
009601> DOP_CENTROID_COEFFS_ADS.1.dop_coef.1 (HzHz/sHz/s2Hz/s3Hz/s4) = -663010
009605> DOP_CENTROID_COEFFS_ADS.1.dop_coef.2 (HzHz/sHz/s2Hz/s3Hz/s4) = 8.98857e+08
009609> DOP_CENTROID_COEFFS_ADS.1.dop_coef.3 (HzHz/sHz/s2Hz/s3Hz/s4) = -6.64175e+11
009613> DOP_CENTROID_COEFFS_ADS.1.dop_coef.4 (HzHz/sHz/s2Hz/s3Hz/s4) = 1.8547e+14


I altered the code to include the following directly after the while statement (only for debugging) and found it is definitely skipping the first line, but I have no idea why, a goodle search didn't turn anything up either:

Code:
    printf "%s\n", $0
009538> DOP_CENTROID_COEFFS_ADS.0.slant_range_time (ns) = 5.45439e+06
,5.45439e+06009542> DOP_CENTROID_COEFFS_ADS.0.dop_coef.0 (HzHz/sHz/s2Hz/s3Hz/s4) = -487.407
,-487.407009546> DOP_CENTROID_COEFFS_ADS.0.dop_coef.1 (HzHz/sHz/s2Hz/s3Hz/s4) = -672346
,-672346009550> DOP_CENTROID_COEFFS_ADS.0.dop_coef.2 (HzHz/sHz/s2Hz/s3Hz/s4) = 9.15808e+08
,9.15808e+08009554> DOP_CENTROID_COEFFS_ADS.0.dop_coef.3 (HzHz/sHz/s2Hz/s3Hz/s4) = -6.75078e+11
,-6.75078e+11009558> DOP_CENTROID_COEFFS_ADS.0.dop_coef.4 (HzHz/sHz/s2Hz/s3Hz/s4) = 1.8763e+14
,1.8763e+14009580> DOP_CENTROID_COEFFS_ADS.1.zero_doppler_time.mjd (DateTime) = 29-JUL-2010 00:08:33.998848

On the off chance it would make a difference, I added a newline at the top of my input file and it now works - what is all that about?!

29-JUL-2010 00:08:27.956552,5.45439e+06,-487.407,-672346,9.15808e+08,-6.75078e+11,1.8763e+14
29-JUL-2010 00:08:33.998848,5.45496e+06,-496.71,-663010,8.98857e+08,-6.64175e+11,1.8547e+14
29-JUL-2010 00:08:40.041145,5.45554e+06,-506.013,-653675,8.81907e+08,-6.53271e+11,1.8331e+14
29-JUL-2010 00:08:46.083440,5.45612e+06,-515.316,-644339,8.64956e+08,-6.42368e+11,1.8115e+14

Anyway, thank you both very much for your help! :-)

Emma
 
Old 11-24-2010, 10:07 AM   #12
colucix
Moderator
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957
The first line is skipped, because the code starts with a getline statement:
Code:
{

  while ( getline > 0 ) {
Take in mind that awk reads the first line, stores the record in memory and split it into fields, then begins to execute every block of code (called "rule"). If the first statement in the first (or unique) rule is getline, it reads a new record and the previous one is lost. This is the reason why I put a printf statement before the while loop, in my example.

At this point I don't understand why the proposed solutions cannot work out-of-the-box. Mine works for any number of lines after that one containing the DateTime field, grail's solution works if the number of lines after DateTime is always 6. Maybe am I missing something?
 
1 members found this post helpful.
Old 11-24-2010, 10:39 AM   #13
emmalg
Member
 
Registered: Jun 2009
Location: Spain
Distribution: Various, Ubuntu, Fedora, Open Solaris, Solaris, RHEL, CentOS
Posts: 64

Original Poster
Rep: Reputation: 16
Thank you, it would have done, it was just my stupidity! I shall fix it now and thank you very much.
 
Old 11-24-2010, 10:51 AM   #14
colucix
Moderator
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957
You're welcome!
 
Old 11-24-2010, 09:15 PM   #15
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,698

Rep: Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988
Hey, glad to hear you got it working, thought I would throw another alternative at you for future ideas
Code:
BEGIN{
    FS=" = "
    OFS=","
    RS="\\(DateTime\\)"
}

NF > 1{
    for( x = 2; x <= NF; x++ )
        gsub(/\n.*/,"",$x)

    print gensub(/,/,"","1")
}
 
  


Reply

Tags
awk, bash, getline, printf, reformatting


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
bash script: use the directory of the script file as variable? phling Linux - Newbie 12 01-16-2010 08:16 PM
Basic bash script question re: file size or # of lines in a file the_fornicator Programming 6 09-03-2009 10:41 AM
Bash script for listing FTP usage as the file name of a file created in each share jojothedogboy Programming 1 12-05-2008 04:35 PM
How to take value from file (bash script) teguh.purnama Linux - Newbie 3 09-08-2008 08:32 AM
looking for a perl script to convert html table data into a csv file swiftguy121 Linux - Software 2 04-25-2007 08:28 PM


All times are GMT -5. The time now is 08:47 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration