LinuxQuestions.org
Review your favorite Linux distribution.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 01-27-2014, 12:52 PM   #1
tabbygirl1990
Member
 
Registered: Jul 2013
Location: a warm beach, cool ocean breeze, nice waves, and a Margaritta
Distribution: RHEL 5.5 Tikanga
Posts: 63

Rep: Reputation: 1
changing row data files into column data files


hi guys (again)

with your guys help i've built a bunch of "tools" that can "do stuff" to files that have data listed in columns like this:
Code:
COLUMN DATA
   abcdefg  1_2_3_4  xxxxxxx  tuvwxyz
    3.14      42       2.71      0
    4.13      24       1.72      0
     -1       0        -1        0
but now i've got a data file that is arranged by a "block of rows"
BLOCK OF ROW DATA
Code:
<blank line>
abcdefg := 3.14
1_2_3_4 := 42
xxxxxxx := 2.71
tuvwxyz := 0
<blank line>
abcdefg := 41.3
1_2_3_4 := 24
xxxxxxx := 1.72
tuvwxyz := 0
<blank line>
abcdefg := -1
1_2_3_4 := 0
xxxxxxx := -1
tuvwxyz := 0
<blank line>
I know how to "filter" like on rows that have "xxxxxxx"
but that would only get 1 column of data.

Code:
2.71
1.72
-1
what i really need to do is to magically transform (through the power of awk) the BLOCK OF ROW DATA into COLUMN DATA so i can use all the other awk tools I've written

Code:
   abcdefg 1_2_3_4 xxxxxxx tuvwxyz
    3.14     42      2.71     0
    4.13     24      1.72     0
     -1      0       -1       0
it's kinda like a giant matrix transform of n by 2 -> 2 by n

also, except for the first block, each line (like "abcdefg") will repeat itself every 5 lines from top to bottom in each block
thanks guys!

tabby

Last edited by tabbygirl1990; 01-27-2014 at 04:03 PM. Reason: clarification
 
Old 01-27-2014, 01:10 PM   #2
tabbygirl1990
Member
 
Registered: Jul 2013
Location: a warm beach, cool ocean breeze, nice waves, and a Margaritta
Distribution: RHEL 5.5 Tikanga
Posts: 63

Original Poster
Rep: Reputation: 1
oooh, i got an idea to google on "awk matrix transform" ....
 
Old 01-27-2014, 03:22 PM   #3
tabbygirl1990
Member
 
Registered: Jul 2013
Location: a warm beach, cool ocean breeze, nice waves, and a Margaritta
Distribution: RHEL 5.5 Tikanga
Posts: 63

Original Poster
Rep: Reputation: 1
so with a little help from google this is what i came up with
Code:
BEGIN {
       FS = " "
      }

      {
       ORS = (NR%5?",":RS)
       print $0;
      }

END{}
i would really rather use a standard printf statement but after formatting the 2nd column things get really screwy. here's what a couple lines of a block really look like

Code:
<blank line>
STRING.A  :=  21-MAR-2013 06:00:50.901  ;
STRING.B  :=  90701  ;
STRING.C  :=  1.59573857100974E+08  ;
STRING.D  :=  NEGATIVE  ;
<blank line>
so if i use print $0 everything comes out , and it's waaaay more than what i need
when i try to do a printf statement it's all goofed up because in the first row there's two elemnts in that cell, date and time, and the rest of the cells in each block are all of different data types.

it seems like the best thing to do is use a print $0 get it all out in a nice column format like it should be and then run a second awk that chops out the unwanted columns, but you guys probably know a better way?

thanks so much for your help!

tabby

Last edited by tabbygirl1990; 01-27-2014 at 07:09 PM. Reason: clarification
 
Old 01-27-2014, 04:27 PM   #4
metaschima
Senior Member
 
Registered: Dec 2013
Distribution: Slackware
Posts: 1,982

Rep: Reputation: 492Reputation: 492Reputation: 492Reputation: 492Reputation: 492
So, which columns are unwanted ? Can you post what the output should look like ? I know it looks like that right now, and it is close.
 
Old 01-27-2014, 05:46 PM   #5
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,999

Rep: Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190
Please remember to place your code in the code tags just like your data, so it is clearer and easier to read / follow.

If I understand the issue (and aboves suggestion about output data is always a good idea), you wish to test how many columns you need to output but that it changes for some, ie you want time
and date to be considered as one column. The answer is to test the NF variable as this will indicate just how many columns you have.
 
Old 01-27-2014, 08:27 PM   #6
tabbygirl1990
Member
 
Registered: Jul 2013
Location: a warm beach, cool ocean breeze, nice waves, and a Margaritta
Distribution: RHEL 5.5 Tikanga
Posts: 63

Original Poster
Rep: Reputation: 1
from this
Code:
<blank line>
STRING.A  :=  21-MAR-2013 06:00:51.901  ;
STRING.B  :=  90701  ;
STRING.C  :=  6.57648100001238E+08  ;
STRING.D  :=  NEGATIVE  ;
STRING.E  :=  22  ;
STRING.F  :=  1000001 ;
<blank line>
STRING.A  :=  21-MAR-2013 06:00:52.801  ;
STRING.B  :=  90555  ;
STRING.C  :=  3.76581234932011E+08  ;
STRING.D  :=  NEGATIVE  ;
STRING.E  :=  23  ;
STRING.F  :=  1000001 ;
<blank line>
STRING.A  :=  21-MAR-2013 06:00:53.701  ;
STRING.B  :=  90703  ;
STRING.C  :=  1.59573857100974E+08  ;
STRING.D  :=  NEGATIVE  ;
STRING.E  :=  24  ;
STRING.F  :=  1000001 ;
<blank line>
STRING.A  :=  21-MAR-2013 06:00:54.601  ;
STRING.B  :=  90703  ;
STRING.C  :=  0.00000000000000E+00  ;
STRING.D  :=  COMPLETED  ;
STRING.E  :=  25  ;
STRING.F  :=  1000001 ;
<blank line>
STRING.A  :=  21-MAR-2013 06:00:55.501  ;
STRING.B  :=  90703  ;
STRING.C  :=  0.00000000000000E+00  ;
STRING.D  :=  COMPLETED  ;
STRING.E  :=  26  ;
STRING.F  :=  1000001 ;
<blank line>
STRING.A  :=  21-MAR-2013 06:00:56.401  ;
STRING.B  :=  90703  ;
STRING.C  :=  1.43879002153652E+08  ;
STRING.D  :=  NEGATIVE  ;
STRING.E  :=  27  ;
STRING.F  :=  1000003 ;
<blank line>
STRING.A  :=  21-MAR-2013 06:00:57.301  ;
STRING.B  :=  90555  ;
STRING.C  :=  2.95467710114362E+08  ;
STRING.D  :=  NEGATIVE  ;
STRING.E  :=  28  ;
STRING.F  :=  1000003 ;
<blank line>
STRING.A  :=  21-MAR-2013 06:00:58.201  ;
STRING.B  :=  90555  ;
STRING.C  :=  8.99106598812167E+08  ;
STRING.D  :=  NEGATIVE  ;
STRING.E  :=  29  ;
STRING.F  :=  1000003 ;
<blank line>
STRING.A  :=  21-MAR-2013 06:00:59.101  ;
STRING.B  :=  90703  ;
STRING.C  :=  0.00000000000000E+00  ;
STRING.D  :=  COMPLETED  ;
STRING.E  :=  30  ;
STRING.F  :=  1000003 ;
<blank line>
STRING.A  :=  21-MAR-2013 06:00:60.001  ;
STRING.B  :=  90701  ;
STRING.C  :=  3.32657821783015E+08  ;
STRING.D  :=  NEGATIVE  ;
STRING.E  :=  31  ;
STRING.F  :=  1000004 ;
<blank line>
to this
Code:
STRING.B, STRING.C,STRING.D,STRING.E
90701,6.57648100001238E+08,NEGATIVE,22
90555,3.76581234932011E+08,NEGATIVE,23
90703,1.59573857100974E+08,NEGATIVE,24
90703,0.00000000000000E+00,COMPLETED,25
90703,0.00000000000000E+00,COMPLETED,26
90703,1.43879002153652E+08,NEGATIVE,27
90555,2.95467710114362E+08,NEGATIVE,28
90555,8.99106598812167E+08,NEGATIVE,29
90703,0.00000000000000E+00,COMPLETED,30
90701,3.32657821783015E+08,NEGATIVE,31
 
Old 01-28-2014, 02:16 AM   #7
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983
This one assumes you know in advance what are the strings in the first field and which ones you want to print out (from STRING.B to STRING.E):
Code:
BEGIN {

  OFS = ","
  
}

!/^$/ {

    v[$1,++c[$1]] = $(NF-1)
  
}

END {

  print "STRING.B", "STRING.C", "STRING.D", "STRING.E"
  
  for ( i = 1; i <= c["STRING.B"]; i++ )
    print v["STRING.B",i], v["STRING.C",i], v["STRING.D",i], v["STRING.E",i]

}
Hope this helps.

Edit: here is a more general case, that is independent from the value of the first field, but it prints out all the columns from STRING.A to STRING.F:

Code:
!/^$/ {
    
    h[$1] ? 0 : s[++n] = $1
    
    h[$1]++

    gsub(/  +/," ")
    v[$1,++c[$1]] = gensub(/^.* := (.*) ;$/,"\\1","g")
  
}

END {

  printf "%s", s[1]
  for ( j = 2; j <= n; j++ )
    printf ",%s", s[j]
  print ""
  
  for ( i = 1; i <= c[s[1]]; i++ ) {
    printf "%s", v[s[1],i]
    for ( j = 2; j <= n; j++ )
      printf ",%s", v[s[j],i]
    print ""
  }

}

Last edited by colucix; 01-28-2014 at 03:32 AM.
 
1 members found this post helpful.
Old 01-28-2014, 07:33 AM   #8
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,999

Rep: Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190
I am with colucix on the fact it will depend how much you know about the data.
Here is another alternative:
Code:
awk 'NR==1{print $6,$10,$14,$18}{print $8,$12,$16,$20}' RS="" OFS="," file
 
1 members found this post helpful.
Old 01-28-2014, 10:15 AM   #9
tabbygirl1990
Member
 
Registered: Jul 2013
Location: a warm beach, cool ocean breeze, nice waves, and a Margaritta
Distribution: RHEL 5.5 Tikanga
Posts: 63

Original Poster
Rep: Reputation: 1
thanks you sooooo much guys!!! you guys are awesome!!! both your scripts work perfectly, as always

i keep trying to learn this stuff. i know i'm getting better, still i'm a kindergartner in a land of giants at the school of hard knocks. ok i'll quit whining and put on my big girl panties

i feel like i should have been able to write grail's answer, but mine wasn't even close whooooops, whining again

i really like colucix's 1st script, cause he gives lots o flexibility for other files i'll probably get sent my way.

anywho

thanks again!

tabby
 
Old 01-28-2014, 10:58 AM   #10
tabbygirl1990
Member
 
Registered: Jul 2013
Location: a warm beach, cool ocean breeze, nice waves, and a Margaritta
Distribution: RHEL 5.5 Tikanga
Posts: 63

Original Poster
Rep: Reputation: 1
so can one of you guys show me how i could have fixed this (my post #3) to get what i needed, or is there not really a simple fix?


Code:
BEGIN {
       FS = " "
      }

      {
       ORS = (NR%5?",":RS)
       print $0;
      }

END{}
 
Old 01-28-2014, 01:02 PM   #11
tabbygirl1990
Member
 
Registered: Jul 2013
Location: a warm beach, cool ocean breeze, nice waves, and a Margaritta
Distribution: RHEL 5.5 Tikanga
Posts: 63

Original Poster
Rep: Reputation: 1
so i tried to put in a "filter" that i think should work, it gave me back just the header and no data or all the data, not filtered. i understand why i get the header, but i don't understand why the data filtered by 90701 in the first column???

here's the first part of colucix script
Code:
BEGIN {
      OFS = ","  
       }

!/^$/ {    
      v[$1,++c[$1]] = $(NF-1)  
      }

END   {

      print "STRING.B", "STRING.C", "STRING.D", "STRING.E"
  
      for ( i = 1; i <= c["STRING.B"]; i++ )
here's what i tried
Code:
      if($8 == 90701)

      <AND I ALSO TRIED>

      if("STRING.B" == 90701)
here's the last part of colocuix script
Code:
      print v["STRING.B",i], v["STRING.C",i], v["STRING.D",i], v["STRING.E",i]
}
so the output should be
Code:
STRING.B, STRING.C,STRING.D,STRING.E
90701,6.57648100001238E+08,NEGATIVE,22
90701,3.32657821783015E+08,NEGATIVE,31
 
Old 01-28-2014, 01:08 PM   #12
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983
The array v contains the values, therefore the logical expression should be:
Code:
if (v["STRING.B",i] == 90701)
 
1 members found this post helpful.
Old 01-28-2014, 01:16 PM   #13
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983
Quote:
Originally Posted by tabbygirl1990 View Post
so can one of you guys show me how i could have fixed this (my post #3) to get what i needed, or is there not really a simple fix?


Code:
BEGIN {
       FS = " "
      }

      {
       ORS = (NR%5?",":RS)
       print $0;
      }

END{}
Well, your idea is good. Anyway, you don't want to print the whole record but only the field before the semicolon, that is $(NF-1). Then you have to ignore the blank lines and adjust the number of records in each section (that is 7 instead of 5):
Code:
awk '!/^$/{ORS = (NR%7?",":RS); print $(NF-1)}' file
It works, except that it doesn't print out the header and it prints only the last part of the date field (the value of STRING.A).

Last edited by colucix; 01-28-2014 at 01:18 PM.
 
1 members found this post helpful.
Old 01-28-2014, 01:33 PM   #14
tabbygirl1990
Member
 
Registered: Jul 2013
Location: a warm beach, cool ocean breeze, nice waves, and a Margaritta
Distribution: RHEL 5.5 Tikanga
Posts: 63

Original Poster
Rep: Reputation: 1
super!

probably the best way to learn it, i'll study this some more. i have O'Riely's book on AWK & SED but i sure wish there was a course for it!

thanks soooo much

tabby
 
Old 01-28-2014, 02:14 PM   #15
John VV
LQ Muse
 
Registered: Aug 2005
Location: A2 area Mi.
Posts: 17,622

Rep: Reputation: 2651Reputation: 2651Reputation: 2651Reputation: 2651Reputation: 2651Reputation: 2651Reputation: 2651Reputation: 2651Reputation: 2651Reputation: 2651Reputation: 2651
you might want to have a look at this NASA/ JPL code
"DataProductivityToolKit"
http://code.nasa.gov/project/data-productivity-toolkit/

it is a collection of python scripts
and to quote the README
Quote:

DATA PRODUCTIVITY TOOLKIT

Description
--------------------------------------------------------------------------------
The Data Productivity Toolkit is a collection of linux command-line tools
designed to facilitate the analysis of text-based data sets. Modeled after the
general linux pipeline tools such as awk, grep, and sed, the kit provides
powerfull tools for selecting/combining data, performing statistics, and
visualizing results. The tools are all written in python and in many instances
provide a command-line API to basic python and numpy/scipy/matplotlib routines.

Prerequisites
--------------------------------------------------------------------------------
The Data Productivity Toolkit is written completely in python. It does,
however, require that the following third-party python modules be installed.
- numpy
- scipy
- matplotlib
- mpl-toolkits.basemap
- mpl_toolkits.natgrid
- jinja2
- django

Installation
--------------------------------------------------------------------------------
1) Copy all files into a directory.
2) Add that directory to your path.
3) In that directory, create a symbolic link with the name ppython. It should
point to the python install on your system that contains the modules listed
above. (Note: it is a good idea to use a python install created by the
utitity virtualenv. This will allow good flexibility for maintaining a
version of python best suited to run the toolkit. Note that the package
ships with a ppython symlink to /usr/bin/python.
4) Make sure your install of matplotlib is capable of sending plots to the
screen. You may have to set your matplotlib graphics back-end appropriately.


List of tools (run with -h option for documentaion)
--------------------------------------------------------------------------------
p.bar Creates bar charts
p.binit Assigns data to 2 dimensional bin structure
p.cat Rearrages columnar data into key,x,y format
p.catToTable Create a table from data in key,x,y format
p.cdf Plots the cumulative distribution
p.cl An awk-like math utility
p.color Makes color scatter plots
p.cumsum Computes the cumulative sum of inputs
p.datetime Converts text-based time stamps to seconds from an epoch
p.dedup Removes duplicate keys
p.distribute Distribute jobs across computers efficiently
p.exec Sequentially run commands read from stdin
p.gps2utc Convert gps time to utc time
p.grab Grab columns from a file with python-like indexing
p.grabHeader Extract the commented header from a file
p.groupStat Perform statistics over keyed subgroups of input
p.hist Plots a histogram
p.htmlWrap Create an html wrapper for images in a directory
p.interp Does polynomial interpolation
p.join Join two files on specified key columns
p.link Link to files based on specified key columns
p.linspace Generate a linear spaced sequence of numbers
p.map Plot points on a map
p.medianFilter Runs data through a median filter
p.minMax Find min/max values in specified data column
p.multiJoin Join multiple files together based on key
p.normalize Normalizes input data
p.parallel Run commands in parallel
p.parallelSSH Run commands in parallel across several machines
p.plot Plot points on a graph
p.quadAdd Add all columns from stdin in quadrature
p.quantiles Compute quantiles from input data
p.rand Generate a sequence of random numbers
p.rex Bring python rex to the command line
p.scat Make a scatter plot of input data
p.sed A sed-like utility with python syntax
p.shuffle Randomly shuffle rows of data
p.smooth Smooth data
p.sort Sort data based on specified keys
p.split Split data based on a supplied delimeter
p.strip Remove comments and/or nans from rows
p.tableFormat Nicely format input columns in a table format
p.template Bring jinja templates to the command line
p.utc2gps Convert utc time to gps time
p.utc2local Convert utc time to local time given a lon
 
1 members found this post helpful.
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] reading column data from multiple files and find average vjramana Programming 4 11-07-2012 10:53 AM
[SOLVED] Match codes in first column in 2 files and return data in other columns cgcamal Programming 12 09-30-2011 01:03 AM
PHP: storing all data form a db row in variables with the same name as the db column konqi Programming 2 07-10-2008 05:13 AM
how to get the second column and first row data from a file??? loplayers Linux - Newbie 3 11-05-2007 07:35 PM
changing data or files into .iso!! justsimran Linux - Networking 4 03-07-2007 12:38 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 02:56 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration