why can't I operate on a file and replace it

atjurhs · 01-03-2013, 03:42 PM

hi guys,

i want to do something that's simple but i can't figure out the right syntax, so far everything i've tried (except creating a temp file that i have to later rename) destroys the original file leaving it empty, here's what I'm trying to do.....

awk -F " " '{print $1, $2, $3, $48*(-1), $5}' file.dat > file.dat

so that the output file is a replacement (with the same name) as the input file, like i said, it works fine if i give it a different output file name but i don't want to do that

is there something easy?

Tabby

AwesomeMachine · 01-03-2013, 03:52 PM

Generally, you have to use a temp file.

theNbomr · 01-03-2013, 03:59 PM

If your task can be accomplished with sed or Perl, you can use the '-i' switch to edit the file 'in-place'.
--- rod.

atjurhs · 01-03-2013, 04:45 PM

i have sed, but i knew the math and print part of this was so easy to do in awk so i wrote the one-liner in awk. maybe there's an easy way to do math and print in sed???

i know it's not exactly right but could i do something like....

awk -F " " '{print $1, $2, $3, $48*(-1), $5}' inputfile.dat > outputfile.dat | mv outputfile.dat inputfile.dat

thanks guys for your help!

Tabby

theNbomr · 01-03-2013, 06:08 PM

I'm pretty sure math is out of range for sed, but I can offer a Perl one-liner (test without the -i switch, first):

Code:

perl -i -e 'while(<>){ @z=split; print "$z[0], $z[1], $z[2], ",$z[47]*-1,"$z[4]\n";}' inputfile.dat

--- rod.

atjurhs · 01-03-2013, 06:49 PM

hi Rod,

i'm REALLY a newbie to script writing and barely waddle thru awk sed and bash script and usually with help.

i know perl is really powerful and really with math stuff, but it looks so cryptic, idk....

thanks soooooo much for the script! i'll give it a go tomorrow....

Tabby

rknichols · 01-03-2013, 07:22 PM

Quote:

Originally Posted by theNbomr

I'm pretty sure math is out of range for sed,

The sed language has been shown to be Turing-complete, so math is at least theoretically within its range. Whether that use of sed is practical, ..., well it would almost certainly be easier than coding that in Ook!

theNbomr · 01-03-2013, 07:24 PM

I think if you examine it somewhat, you'll see that it quite closely resembles the Awk script. The key differences are:

Perl needs explicit looping constructs ( while(<>){ .... } )
Perl needs to explicitly split into fields ( 'split', and the default separator is whitespace, just like Awk )
Perl uses zero-based array indexing in contrast to the built-in Awk variables named with non-zero positive integers

Yes, Perl is cryptic to those who have not drunk the magical elixir....

--- rod.

theNbomr · 01-03-2013, 07:30 PM

Quote:

Originally Posted by rknichols

The sed language has been shown to be Turing-complete, so math is at least theoretically within its range. Whether that use of sed is practical, ..., well it would almost certainly be easier than coding that in Ook!

I'd much rather see the sed version than the Ook version. Really, that would be impressive, either by virtue of the usefulness of learning something new, or by the length someone might go to to accomplish it.
--- rod.

AnanthaP · 01-03-2013, 07:53 PM

Quote:

awk -F " " '{print $1, $2, $3, $48*(-1), $5}' file.dat > file_temp.dat
rm file.dat
mv file_temp.dat file.dat

I think this is what was meant in post #2.

By the way, perl is readily available with ALL distros.

OK

ntubski · 01-03-2013, 08:12 PM

Quote:

Originally Posted by AnanthaP

Quote:

awk -F " " '{print $1, $2, $3, $48*(-1), $5}' file.dat > file_temp.dat
rm file.dat
mv file_temp.dat file.dat

I think this is what was meant in post #2.

The rm is not necessary.

Quote:

Originally Posted by theNbomr

I think if you examine it somewhat, you'll see that it quite closely resembles the Awk script. The key differences are:[list][*]Perl needs explicit looping constructs ( while(<>){ .... } )[*]Perl needs to explicitly split into fields ( 'split', and the default separator is whitespace, just like Awk )

Quote:

perlrun:
...
-a

turns on autosplit mode when used with a -n or -p. An implicit split command to the @F array is done as the first thing inside the implicit while loop produced by the -n or -p.
...
-n

causes Perl to assume the following loop around your program, which makes it iterate over filename arguments somewhat like sed -n or awk:

Quote:

Originally Posted by theNbomr

I'd much rather see the sed version than the Ook version. Really, that would be impressive, either by virtue of the usefulness of learning something new, or by the length someone might go to to accomplish it.
--- rod.

Web search turns up dc.sed.

AwesomeMachine · 01-03-2013, 09:11 PM

You don't want to blow away the original file usually right away, at least not until you test the result of the changes. Coreutils programs have some bizarre behaviors under certain conditions. For instance: if you overwrite the first 20 bytes of a file, using the dd command, the outfile will be 20 bytes, unless you use notrunc. Sed has issues with certain charaters used in file names. So, you really want to keep the original file until the result file is tested, and then rm the original, or however you want to do it. I usually use cat.

jpollard · 01-04-2013, 07:44 AM

The problem you are doing is reading a data file and outputting an update to the file.

When the input and output are the same file, then the output will modify the input... causing problems.

Tools like sed use a tmp file internally, and then do the equivalent of "mv tmp originalfilename".

Think about how updates occur. If the original file had:

Code:

a
b
c

And you want to update it by replacing b with bb. If you use the same filename for both input and output, what happens is:

Code:

a
bb

because the second b in your update replaces the newline at the end with a b, and then puts a newline after that. Then there is the newline from the former "c" line, which has newline,newline...

The only saving grace (for very small files) is that the system buffers (or the runtime library buffers) could hold the entire file in memory... and give you the illusion of a tmp file. That doesn't always work either (updates to a file go to the same system buffer as used in input... though if the input has already been read it isn't a problem).

This is the same problem as having two people edit a file simultaneously... the output will be whoever closes the file last...

There is also the problem of making data shorter (replacing bb b for instance). You might get a duplicated data... or other funny looking stuff. This is closely related to issues with random access files (usually opened read/write). It works with fixed length records.. but if you extend/shorten a record your file gets corrupted unless you also do something to compensate (like using a temp file).

rknichols · 01-04-2013, 12:26 PM

Quote:

Originally Posted by jpollard

The problem you are doing is reading a data file and outputting an update to the file.

When the input and output are the same file, then the output will modify the input... causing problems.

It's worse than that. When you try to run something like

Code:

awk '......' file.dat >file.dat

The shell will open file.dat for output, truncating it to zero length, before awk is even invoked, and all awk will see is the empty input file.

shivaa · 01-04-2013, 10:59 PM

Quote:

... So that the output file is a replacement (with the same name) as the input file, like i said, it works fine if i give it a different output file name but i don't want to do that

is there something easy?

As you mentioned, to store the output of some operation in input file again, though it can be done by combining two commands like first command plus mv.

But in order to do it in a one-liner command, you can use process substitution, as:-

Code:

(cat outfile <(command...infile))> infile

For example, if infile.txt has:

Code:

A A
B
C C
D D
E
F F

Then invoke following:-

Code:

(cat outfile.txt <(awk 'NF>=2 {print $0}' infile.txt))> infile.txt

And now infile.txt will have:-

Code:

A A
C C
D D
F F

So make a try on this.