[SOLVED] awk question - read in txt files, offset data by given amount, output new txt files
ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
awk question - read in txt files, offset data by given amount, output new txt files
Hi, I'm new to bash and I'm taking a 'learn by doing' approach, so apologies if the following script is rather messy. I posted on here a couple of days ago about a script to find the largest number in a set of text files, and I was referred to awk.
I now have a list of files, spec001.txt, spec002.txt, etc which consist of two columns of data, with the second column being of interest to me. I have my script to find the largest number, in the second column amongst these files, which I've called 'max'. I would like to offset the data in the second column of each text file by -(n-1)max, where n is the number in the file name.
So far, I have my portion of script to find the largest number,
Code:
echo "Finding maximum I/I_0..."
awk '
BEGIN {max = 1}
{if ($2>max) max=$2}
END {print max}' spec0*.txt
I have then tried to calculate the offset data, which is where I run into trouble. So far, I have
Code:
echo "Creating offset data files..."
awk'
BEGIN {n=1; newI=0}
{newI=$2+(n-1)*max}
END {print " " newI}' spec001.txt
for processing the first spec file. I need to increment n by 1 though, after reading through each 'spec' file, and change the file name accordingly. I was thinking of just using spec0*.txt since they should be done in alphabetical/numerical order.
Presumably I need an array in which to store the offset values from one 'spec' file, but first I am trying to figure out how to name my new offset data files. I would like to keep the same names as described above, except with an 'offset' prefix, i.e. offsetspec001.txt, etc. In bash, I would do this using
Code:
for file in spec0*; do
filename="`echo ${file} | cut -d. -f1`"
and name the output file offset${filename}.txt
However, from what I can see I'll need to do this in awk so everything gets done together, before moving onto the next spec file.
I have found an example on processing delimited files using awk, which gives awk -F':' '{ print $1 }' /etc/passwd as the example, but I'm unsure how to change this to meet my needs. I understand that in my case I'll need -F'.' '{ print $1 }' but I'm not sure how to apply it...
Also if anyone has any information to help with arrays, such as any good web pages with examples that might help, that would be much appreciated.
Apologies if any of this is unclear. To summarise, I have a number of text files, I would like to offset these by a certain number by doing a calculation using awk, and would like to save the output by cutting the original filenames.
Thank you for any help you can offer, it is appreciated
... Thank you for any help you can offer, it is appreciated
Help us to help you.
You described your input files and desired output in words. Better yet, provide example input files and and example of the desired output file. That gives us a better understanding of your question and some real-world data to use if we come up with useful code.
From looking at just these, the value of 'max' is 15.1377605536371.
So, I would like spec001.txt to remain the same, spec002.txt to have 15.1377605536371 subtracted from each of the values in the second column, and spec003.txt to have 2*15.1377605536371 subtracted from each value in the second column.
This is for plotting purposes, to have a number of data sets plotted on a grid with each offset by a certain amount to aid in comparisons between them.
that looks if they were sorted. In that case you only need to take the last values to find out max.
awk knows the current filename, so you do not need to put it in a for loop (the name of the variable is FILENAME).
I do not know if awk can handle 16 digit floats, but probably it can.
So you need something like this:
first part is ok, try to find out max.
Code:
echo "Finding maximum I/I_0..."
max=$(awk '
BEGIN {max = 1}
{if ($2>max) max=$2}
END {print max}' spec0*.txt)
this should work, but probably there is a faster solution
next, construct an awk to do the job:
Code:
awk -v max=$max ' # here we set the max value from the script
{
match (FILENAME, "[0-9]+", a) # a[0] will contain the extracted numbers
$2 -= (a[0] - 1) * max # from FILENAME
print > FILENAME ".new"
} spec*.txt
this will generate new files with the extension .new
that looks if they were sorted. In that case you only need to take the last values to find out max.
The example data sets I posted are only a section of each data set. There are 100 values in each, which increase from 1 to max and then decrease again later on.
awk -v max=$max ' # here we set the max value from the script
{
match (FILENAME, "[0-9]+", a) # a[0] will contain the extracted numbers
$2 -= (a[0] - 1) * max # from FILENAME
print > FILENAME ".new"
} spec*.txt
I get an error:
awk: syntax error at source line 3
context is
match (FILENAME, >>> "[0-9]+", <<< "
although I did read in one place that the 'match' function is only available with nawk. Is this the case? (If it is then I still can't get it to work, but I don't get the same error as above).
After having a quick read around, I see the syntax is match(string,expression,array). Does "[0-9]+" tell the function that we're looking for a string that starts with 0, then 1, then 2, etc, or have I miss understood was "[0-9]+" is doing?
[0-9] means anything between chars 0 and 9, those are the digits, 0, 1, 2, 3, ... 8, 9
+ means any number of chars, but at least one (altogether one or more digits).
my awk knows it, you would try gawk, or at least tell us the version: awk --version
you can try gawk instead of awk, probably works. Also you can try "[0-9]*" or "[0-9][0-9]*", instead of "[0-9]+", probably one of them will work too.
I got the following:
Looking at the man page on-line, it should work even on MacOS X. Looking at the icon in the lower-left corner of your posts it looks like you're running on this operating system, aren't you?
you can try gawk instead of awk, probably works. Also you can try "[0-9]*" or "[0-9][0-9]*", instead of "[0-9]+"
Unfortunately the alternatives to "[0-9]+" didn't work either.
I don't have gawk installed, but will look into this if another fix can't be found. Thanks for the suggestions.
Quote:
Originally Posted by colucix
Maybe the following works even with not-GNU versions of AWK:
Looking at the man page on-line, it should work even on MacOS X. Looking at the icon in the lower-left corner of your posts it looks like you're running on this operating system, aren't you?
Yes I'm on Mac OS X
awk seems to be complaining about using FILENAME in the above, the old "illegal statement" error.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.