LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 09-15-2012, 12:27 PM   #1
pomico
LQ Newbie
 
Registered: Jun 2012
Posts: 16

Rep: Reputation: Disabled
awk question - read in txt files, offset data by given amount, output new txt files


Hi, I'm new to bash and I'm taking a 'learn by doing' approach, so apologies if the following script is rather messy. I posted on here a couple of days ago about a script to find the largest number in a set of text files, and I was referred to awk.
I now have a list of files, spec001.txt, spec002.txt, etc which consist of two columns of data, with the second column being of interest to me. I have my script to find the largest number, in the second column amongst these files, which I've called 'max'. I would like to offset the data in the second column of each text file by -(n-1)max, where n is the number in the file name.

So far, I have my portion of script to find the largest number,
Code:
echo "Finding maximum I/I_0..."
awk '                                                                   
BEGIN {max = 1}                                                                 
{if ($2>max) max=$2}                                                            
END {print max}' spec0*.txt
I have then tried to calculate the offset data, which is where I run into trouble. So far, I have
Code:
echo "Creating offset data files..."
        awk'                                                                    
BEGIN {n=1; newI=0}                                                             
{newI=$2+(n-1)*max}                                                        
END {print "     " newI}' spec001.txt
for processing the first spec file. I need to increment n by 1 though, after reading through each 'spec' file, and change the file name accordingly. I was thinking of just using spec0*.txt since they should be done in alphabetical/numerical order.

Presumably I need an array in which to store the offset values from one 'spec' file, but first I am trying to figure out how to name my new offset data files. I would like to keep the same names as described above, except with an 'offset' prefix, i.e. offsetspec001.txt, etc. In bash, I would do this using
Code:
for file in spec0*; do
filename="`echo ${file} | cut -d. -f1`"
and name the output file offset${filename}.txt
However, from what I can see I'll need to do this in awk so everything gets done together, before moving onto the next spec file.
I have found an example on processing delimited files using awk, which gives awk -F':' '{ print $1 }' /etc/passwd as the example, but I'm unsure how to change this to meet my needs. I understand that in my case I'll need -F'.' '{ print $1 }' but I'm not sure how to apply it...

Also if anyone has any information to help with arrays, such as any good web pages with examples that might help, that would be much appreciated.
Apologies if any of this is unclear. To summarise, I have a number of text files, I would like to offset these by a certain number by doing a calculation using awk, and would like to save the output by cutting the original filenames.
Thank you for any help you can offer, it is appreciated

Last edited by pomico; 09-15-2012 at 12:41 PM.
 
Old 09-15-2012, 01:00 PM   #2
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,879

Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
Quote:
Originally Posted by pomico View Post
... Thank you for any help you can offer, it is appreciated
Help us to help you.

You described your input files and desired output in words. Better yet, provide example input files and and example of the desired output file. That gives us a better understanding of your question and some real-world data to use if we come up with useful code.

Daniel B. Martin
 
Old 09-15-2012, 01:06 PM   #3
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,879

Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
Quote:
Originally Posted by pomico View Post
... I would like to offset the data in the second column of each text file by -(n-1)max, where n is the number in the file name.
Depending on the values of n and max, -(n-1)max could be negative. I don't understand what is meant by a negative offset.

"Offset" is relative to something. What? Column 1?

Daniel B. Martin
 
Old 09-15-2012, 01:11 PM   #4
pomico
LQ Newbie
 
Registered: Jun 2012
Posts: 16

Original Poster
Rep: Reputation: Disabled
Of course
Part of spec001.txt is
Code:
  -247.474747474747        1.00000008900185     
  -237.373737373737        1.00003800378827     
  -227.272727272727        1.00924933884979     
  -217.171717171717        1.14643171128656     
  -207.070707070707        1.42297237329297     
  -196.969696969697        1.86530318211860     
  -186.868686868687        2.53444398143706     
  -176.767676767677        3.56322308437322     
  -166.666666666667        4.94624626704007     
  -156.565656565657        6.70217935260413     
  -146.464646464646        8.69047062574172     
  -136.363636363636        10.8910028211805     
  -126.262626262626        12.9247640694563     
  -116.161616161616        14.6242507602093
Part of spec002.txt is
Code:
  -247.474747474747        1.00118081627977     
  -237.373737373737        1.00086584950188     
  -227.272727272727        1.00459210256576     
  -217.171717171717        1.11994731593279     
  -207.070707070707        1.38989998877390     
  -196.969696969697        1.81656108743561     
  -186.868686868687        2.47390947169402     
  -176.767676767677        3.47519327486751     
  -166.666666666667        4.87114758578318
  -156.565656565657        6.62087512538831     
  -146.464646464646        8.65638440573515     
  -136.363636363636        10.8249082369995     
  -126.262626262626        12.9564606566660     
  -116.161616161616        14.6793540160340
and spec003.txt
Code:
  -247.474747474747        1.00025716272262     
  -237.373737373737        1.00057455937513     
  -227.272727272727        1.00084089388210     
  -217.171717171717        1.03689614823861     
  -207.070707070707        1.28630342167525     
  -196.969696969697        1.70214771440898     
  -186.868686868687        2.32464289128508     
  -176.767676767677        3.30119822307261     
  -166.666666666667        4.71667934842233     
  -156.565656565657        6.55964464274313     
  -146.464646464646        8.70558960286527     
  -136.363636363636        11.0143788885279     
  -126.262626262626        13.3092101294597     
  -116.161616161616        15.1377605536371
From looking at just these, the value of 'max' is 15.1377605536371.
So, I would like spec001.txt to remain the same, spec002.txt to have 15.1377605536371 subtracted from each of the values in the second column, and spec003.txt to have 2*15.1377605536371 subtracted from each value in the second column.
This is for plotting purposes, to have a number of data sets plotted on a grid with each offset by a certain amount to aid in comparisons between them.

Thank you
 
Old 09-15-2012, 01:34 PM   #5
pomico
LQ Newbie
 
Registered: Jun 2012
Posts: 16

Original Poster
Rep: Reputation: Disabled
The 'offset' is relative to column two - hopefully my example data sets help. Yes, the offset will be negative.
Quote:
Originally Posted by danielbmartin View Post
Depending on the values of n and max, -(n-1)max could be negative. I don't understand what is meant by a negative offset.

"Offset" is relative to something. What? Column 1?

Daniel B. Martin
 
Old 09-15-2012, 01:44 PM   #6
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,879

Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
Quote:
Originally Posted by pomico View Post
Yes, the offset will be negative.
I misunderstood what you meant by "offset." I thought it referred to the print position of a numerical value. Something like this:
Code:
1   2
3         4
Anyway, that clears the air.

I'll be able to look at this on Monday. Maybe somebody else will solve it before then.

Daniel B. Martin
 
Old 09-15-2012, 02:20 PM   #7
pomico
LQ Newbie
 
Registered: Jun 2012
Posts: 16

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by danielbmartin View Post
I misunderstood what you meant by "offset." I thought it referred to the print position of a numerical value. Something like this:
Code:
1   2
3         4
Anyway, that clears the air.

I'll be able to look at this on Monday. Maybe somebody else will solve it before then.

Daniel B. Martin
Aah I see, apologies for that. Thanks, I'll have another look tomorrow and will post if I get any closer
 
Old 09-15-2012, 02:28 PM   #8
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,692

Rep: Reputation: 7274Reputation: 7274Reputation: 7274Reputation: 7274Reputation: 7274Reputation: 7274Reputation: 7274Reputation: 7274Reputation: 7274Reputation: 7274Reputation: 7274
that looks if they were sorted. In that case you only need to take the last values to find out max.
awk knows the current filename, so you do not need to put it in a for loop (the name of the variable is FILENAME).
I do not know if awk can handle 16 digit floats, but probably it can.
So you need something like this:
first part is ok, try to find out max.
Code:
echo "Finding maximum I/I_0..."
max=$(awk '                                                                   
BEGIN {max = 1}                                                                 
{if ($2>max) max=$2}                                                            
END {print max}' spec0*.txt)
this should work, but probably there is a faster solution
next, construct an awk to do the job:
Code:
awk -v max=$max '   # here we set the max value from the script
{
    match (FILENAME, "[0-9]+", a)   # a[0] will contain the extracted numbers
    $2 -= (a[0] - 1) * max          # from FILENAME
    print > FILENAME ".new"
} spec*.txt
this will generate new files with the extension .new

I did not test it, but probably works as you wish
 
Old 09-15-2012, 03:00 PM   #9
pomico
LQ Newbie
 
Registered: Jun 2012
Posts: 16

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by pan64 View Post
that looks if they were sorted. In that case you only need to take the last values to find out max.
The example data sets I posted are only a section of each data set. There are 100 values in each, which increase from 1 to max and then decrease again later on.

Thank you for your suggestion, I'll have a look
 
Old 09-15-2012, 03:48 PM   #10
pomico
LQ Newbie
 
Registered: Jun 2012
Posts: 16

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by pan64 View Post
Code:
awk -v max=$max '   # here we set the max value from the script
{
    match (FILENAME, "[0-9]+", a)   # a[0] will contain the extracted numbers
    $2 -= (a[0] - 1) * max          # from FILENAME
    print > FILENAME ".new"
} spec*.txt
I get an error:
awk: syntax error at source line 3
context is
match (FILENAME, >>> "[0-9]+", <<< "

although I did read in one place that the 'match' function is only available with nawk. Is this the case? (If it is then I still can't get it to work, but I don't get the same error as above).
After having a quick read around, I see the syntax is match(string,expression,array). Does "[0-9]+" tell the function that we're looking for a string that starts with 0, then 1, then 2, etc, or have I miss understood was "[0-9]+" is doing?
 
Old 09-15-2012, 04:33 PM   #11
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,692

Rep: Reputation: 7274Reputation: 7274Reputation: 7274Reputation: 7274Reputation: 7274Reputation: 7274Reputation: 7274Reputation: 7274Reputation: 7274Reputation: 7274Reputation: 7274
[0-9] means anything between chars 0 and 9, those are the digits, 0, 1, 2, 3, ... 8, 9
+ means any number of chars, but at least one (altogether one or more digits).
my awk knows it, you would try gawk, or at least tell us the version: awk --version
 
Old 09-15-2012, 05:07 PM   #12
pomico
LQ Newbie
 
Registered: Jun 2012
Posts: 16

Original Poster
Rep: Reputation: Disabled
Ah I see, thanks.
Yes, my awk knows it too, it somehow propagated through from an earlier error on my part

I'm getting a different error now though, "awk: syntax error at source line 3
context is
match (FILENAME, >>> "[0-9]+", <<<"

awk version is 20070501

This is my script, in full
Code:
#!/bin/bash                                                                                  

#Script used for creating datasets for 'offset' plots.                                       
echo "Finding maximum I/I_0..."
        max=$(awk '                                                                          
BEGIN {max = 1}                                                                              
{if ($2>max) max=$2}                                                                         
END {print max}' spec0*.txt)

        echo "Creating offset data files..."                                                
awk -v max=$max '                                                                            
{                                                                                            
    match (FILENAME, "[0-9]+", a)                                                            
    $2 -= (a[0] - 1) * max                                                                   
    print > FILENAME ".new"                                                                  
}' spec0*.txt
I didn't realise FILENAME was inbuilt. Handy
 
Old 09-16-2012, 01:27 AM   #13
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,692

Rep: Reputation: 7274Reputation: 7274Reputation: 7274Reputation: 7274Reputation: 7274Reputation: 7274Reputation: 7274Reputation: 7274Reputation: 7274Reputation: 7274Reputation: 7274
you can try gawk instead of awk, probably works. Also you can try "[0-9]*" or "[0-9][0-9]*", instead of "[0-9]+", probably one of them will work too.
I got the following:
Quote:
awk --version
GNU Awk 3.1.7
Copyright (C) 1989, 1991-2009 Free Software Foundation.
I do not really know that version (20070501). Probably it is an old one...
 
Old 09-16-2012, 06:59 AM   #14
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983
Maybe the following works even with not-GNU versions of AWK:
Code:
#!/bin/bash
#
max=$(awk 'BEGIN{max = -9999} $2 > max{max = $2} END{print max}' spec0*)

awk -v max=$max 'FNR == 1{n++} {printf "%19.12f %23.14f\n", $1, $2-(n-1)*max > "offset_" FILENAME }' spec0*
Looking at the man page on-line, it should work even on MacOS X. Looking at the icon in the lower-left corner of your posts it looks like you're running on this operating system, aren't you?
 
Old 09-16-2012, 08:37 AM   #15
pomico
LQ Newbie
 
Registered: Jun 2012
Posts: 16

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by pan64 View Post
you can try gawk instead of awk, probably works. Also you can try "[0-9]*" or "[0-9][0-9]*", instead of "[0-9]+"
Unfortunately the alternatives to "[0-9]+" didn't work either.
I don't have gawk installed, but will look into this if another fix can't be found. Thanks for the suggestions.

Quote:
Originally Posted by colucix View Post
Maybe the following works even with not-GNU versions of AWK:
Code:
#!/bin/bash
#
max=$(awk 'BEGIN{max = -9999} $2 > max{max = $2} END{print max}' spec0*)

awk -v max=$max 'FNR == 1{n++} {printf "%19.12f %23.14f\n", $1, $2-(n-1)*max > "offset_" FILENAME }' spec0*
Looking at the man page on-line, it should work even on MacOS X. Looking at the icon in the lower-left corner of your posts it looks like you're running on this operating system, aren't you?
Yes I'm on Mac OS X
awk seems to be complaining about using FILENAME in the above, the old "illegal statement" error.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
2 .txt files trying to extract data from master file rhbegin Programming 5 01-04-2012 08:53 AM
Copy the contents of a txt file to other txt files (with similar names) by cp command Aquarius_Girl Linux - Newbie 7 07-03-2010 12:54 AM
AWK/Perl for extracting data from txt file to numerous other files briana.paige Linux - Newbie 2 05-05-2009 09:53 AM
Question on script for .new files from UPGRADE.TXT Lufbery Slackware 9 05-15-2008 03:46 PM
How to read HTML or TXT file and output the data? koolkicks311 Programming 1 04-20-2007 11:13 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 08:38 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration