[SOLVED] changing row data files into column data files
Linux - NewbieThis Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I know how to "filter" like on rows that have "xxxxxxx"
but that would only get 1 column of data.
Code:
2.71
1.72
-1
what i really need to do is to magically transform (through the power of awk) the BLOCK OF ROW DATA into COLUMN DATA so i can use all the other awk tools I've written
i would really rather use a standard printf statement but after formatting the 2nd column things get really screwy. here's what a couple lines of a block really look like
so if i use print $0 everything comes out , and it's waaaay more than what i need
when i try to do a printf statement it's all goofed up because in the first row there's two elemnts in that cell, date and time, and the rest of the cells in each block are all of different data types.
it seems like the best thing to do is use a print $0 get it all out in a nice column format like it should be and then run a second awk that chops out the unwanted columns, but you guys probably know a better way?
thanks so much for your help!
tabby
Last edited by tabbygirl1990; 01-27-2014 at 07:09 PM.
Reason: clarification
Please remember to place your code in the code tags just like your data, so it is clearer and easier to read / follow.
If I understand the issue (and aboves suggestion about output data is always a good idea), you wish to test how many columns you need to output but that it changes for some, ie you want time
and date to be considered as one column. The answer is to test the NF variable as this will indicate just how many columns you have.
Location: a warm beach, cool ocean breeze, nice waves, and a Margaritta
Distribution: RHEL 5.5 Tikanga
Posts: 63
Original Poster
Rep:
thanks you sooooo much guys!!! you guys are awesome!!! both your scripts work perfectly, as always
i keep trying to learn this stuff. i know i'm getting better, still i'm a kindergartner in a land of giants at the school of hard knocks. ok i'll quit whining and put on my big girl panties
i feel like i should have been able to write grail's answer, but mine wasn't even close whooooops, whining again
i really like colucix's 1st script, cause he gives lots o flexibility for other files i'll probably get sent my way.
Location: a warm beach, cool ocean breeze, nice waves, and a Margaritta
Distribution: RHEL 5.5 Tikanga
Posts: 63
Original Poster
Rep:
so i tried to put in a "filter" that i think should work, it gave me back just the header and no data or all the data, not filtered. i understand why i get the header, but i don't understand why the data filtered by 90701 in the first column???
here's the first part of colucix script
Code:
BEGIN {
OFS = ","
}
!/^$/ {
v[$1,++c[$1]] = $(NF-1)
}
END {
print "STRING.B", "STRING.C", "STRING.D", "STRING.E"
for ( i = 1; i <= c["STRING.B"]; i++ )
here's what i tried
Code:
if($8 == 90701)
<AND I ALSO TRIED>
if("STRING.B" == 90701)
Well, your idea is good. Anyway, you don't want to print the whole record but only the field before the semicolon, that is $(NF-1). Then you have to ignore the blank lines and adjust the number of records in each section (that is 7 instead of 5):
it is a collection of python scripts
and to quote the README
Quote:
DATA PRODUCTIVITY TOOLKIT
Description
--------------------------------------------------------------------------------
The Data Productivity Toolkit is a collection of linux command-line tools
designed to facilitate the analysis of text-based data sets. Modeled after the
general linux pipeline tools such as awk, grep, and sed, the kit provides
powerfull tools for selecting/combining data, performing statistics, and
visualizing results. The tools are all written in python and in many instances
provide a command-line API to basic python and numpy/scipy/matplotlib routines.
Prerequisites
--------------------------------------------------------------------------------
The Data Productivity Toolkit is written completely in python. It does,
however, require that the following third-party python modules be installed.
- numpy
- scipy
- matplotlib
- mpl-toolkits.basemap
- mpl_toolkits.natgrid
- jinja2
- django
Installation
--------------------------------------------------------------------------------
1) Copy all files into a directory.
2) Add that directory to your path.
3) In that directory, create a symbolic link with the name ppython. It should
point to the python install on your system that contains the modules listed
above. (Note: it is a good idea to use a python install created by the
utitity virtualenv. This will allow good flexibility for maintaining a
version of python best suited to run the toolkit. Note that the package
ships with a ppython symlink to /usr/bin/python.
4) Make sure your install of matplotlib is capable of sending plots to the
screen. You may have to set your matplotlib graphics back-end appropriately.
List of tools (run with -h option for documentaion)
--------------------------------------------------------------------------------
p.bar Creates bar charts
p.binit Assigns data to 2 dimensional bin structure
p.cat Rearrages columnar data into key,x,y format
p.catToTable Create a table from data in key,x,y format
p.cdf Plots the cumulative distribution
p.cl An awk-like math utility
p.color Makes color scatter plots
p.cumsum Computes the cumulative sum of inputs
p.datetime Converts text-based time stamps to seconds from an epoch
p.dedup Removes duplicate keys
p.distribute Distribute jobs across computers efficiently
p.exec Sequentially run commands read from stdin
p.gps2utc Convert gps time to utc time
p.grab Grab columns from a file with python-like indexing
p.grabHeader Extract the commented header from a file
p.groupStat Perform statistics over keyed subgroups of input
p.hist Plots a histogram
p.htmlWrap Create an html wrapper for images in a directory
p.interp Does polynomial interpolation
p.join Join two files on specified key columns
p.link Link to files based on specified key columns
p.linspace Generate a linear spaced sequence of numbers
p.map Plot points on a map
p.medianFilter Runs data through a median filter
p.minMax Find min/max values in specified data column
p.multiJoin Join multiple files together based on key
p.normalize Normalizes input data
p.parallel Run commands in parallel
p.parallelSSH Run commands in parallel across several machines
p.plot Plot points on a graph
p.quadAdd Add all columns from stdin in quadrature
p.quantiles Compute quantiles from input data
p.rand Generate a sequence of random numbers
p.rex Bring python rex to the command line
p.scat Make a scatter plot of input data
p.sed A sed-like utility with python syntax
p.shuffle Randomly shuffle rows of data
p.smooth Smooth data
p.sort Sort data based on specified keys
p.split Split data based on a supplied delimeter
p.strip Remove comments and/or nans from rows
p.tableFormat Nicely format input columns in a table format
p.template Bring jinja templates to the command line
p.utc2gps Convert utc time to gps time
p.utc2local Convert utc time to local time given a lon
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.