Is there a generic solution to modifying files in place?

stormcloud · 08-11-2009, 10:42 AM

Is there are generic solution to modifying files in place?

For example, suppose I've got a file with data in it I need to process as follows:

Code:

  grep "^tag-" MyFile | cut -d"-" -f2

which would search for lines of data with a specific prefix, chop off the prefix and write the results to standard out.

Now suppose I want to write the results back to the original file. What I can't do is:

Code:

  grep "^tag-" MyFile | cut -d"-" -f2 > MyFile

The result is always an empty file because bash is unable to reading and writing a file at the same time.

The obvious solution is to write to a temporary file and to copy it back, which is bit ugly. A more elegant solution would be some sort of buffering command that will

* Read ALL of the original file
* Close the file
* Write the contents back to standard out.

The bash solution would be:

Code:

  BufferingCommand MyFile | grep "^tag-" | cut -d"-" -f2 > MyFile

Does such a command exist?
Is there a better solution?

pixellany · 08-11-2009, 10:48 AM

What you call a "buffering command" would actually write a temporary file...

For example, the following SED command replaces all occurences of "yes" to "no" and writes back to the same file. But what it REALLY does is first make a temporary file.

sed -i 's/yes/no/g' filename

some utilities also have options to write a backup file as part of the process.

lumak · 08-11-2009, 01:06 PM

Code:

FILE=`tempfile`; cat MyFile > $FILE; grep "^tag-" $FILE | cut -d"-" -f2 > MyFile; rm $FILE

OK that's really 3 commands on one line....

EDIT: I tried making a script to pipe in... but it appeared to have the same problem.

stormcloud · 08-12-2009, 04:21 AM

Thanks for the feed back.

I knew that a few commands had the ability to modify files (sed --in-place being a great example), but I can see this being a more general issue. This is why I used an example that was a bit more complex.

I don't mind commands that internally create temporary files (as sed does) as long as they clean up afterwards. These commands are certainly more elegant then managing the temp files myself.

The thing is, most commands don't have an in-place flag, so what to do in these cases? It looks like managing the file my self (with lumak's suggestion of the `tempfile` command) is the only real option, which is strange as I would imaging the to be a fairly common problem.

nowonmai · 08-12-2009, 04:59 AM

Quote:

Originally Posted by stormcloud

It looks like managing the file my self (with lumak's suggestion of the `tempfile` command) is the only real option, which is strange as I would imaging the to be a fairly common problem.

It seems strange until you realise that most of the text utils in unix were designed to talk to stdin/out so they could be used as inputs to each other, like links in a chain.

jlliagre · 08-12-2009, 05:12 AM

Quote:

Originally Posted by stormcloud

The result is always an empty file because bash is unable to reading and writing a file at the same time.

Bash like any other shell is perfectly able to read and write to a file at the same time. It does what you ask for and here you ask for truncating the file so it just does it.

Quote:

The obvious solution is to write to a temporary file and to copy it back, which is bit ugly. * Read ALL of the original file
* Close the file
* Write the contents back to standard out.

The bash solution would be:

Code:

  BufferingCommand MyFile | grep "^tag-" | cut -d"-" -f2 > MyFile

Does such a command exist?
Is there a better solution?

Your suggestion won't work either, MyFile would be erased before any buffering take place anyway. The shell doesn't run the commands from left to right in a pipeline but the other way around. All but the rightmost program need the pipe listener party to be started before being launched.

Something like this should work though:

Code:

grep "^tag-" MyFile | cut -d"-" -f2 | BufferingCommand MyFile

with Buffering command making sure it has finished reading its input before opening and writing to the file passed as parameter but not redirected to.

lumak · 08-12-2009, 11:13 AM

@ jlliagre
Thanks for that solution.
My question is then, why don't these commands already completely read and close before writing new files? would that cause issues with streaming to stdout? Would it only cause the pipe to respond slower? is there an existing command we can fake as a buffer? (e.g. somecommands MyFile | sed -i "s@@@" > MyFile)(that doesn't work but just illustrates my question)

jlliagre · 08-12-2009, 03:02 PM

Whether the commands do buffer or not doesn't matter. They haven't even a chance to run before it's too late. The shell itself handles the output redirection before launching any command. That's the reason why I suggest not to use ">" in the rightmost command.

stormcloud · 08-13-2009, 04:35 AM

Hi jlliagre,

Knowing that the the commands are started in reverse (right to left) order explains what I'm seeing. MyFile is emptied by the last command (the > operator) before the first command has had chance to read it.

I'm curious, why does the shell launch the commands in this order?

I'd like to give your alternative suggestion a go:

Code:

  grep "^tag-" MyFile | cut -d"-" -f2 | BufferingCommand MyFile

Do you have a suggestion about what I can use as BufferingCommand?

jlliagre · 08-13-2009, 05:17 AM

Quote:

Originally Posted by stormcloud

I'm curious, why does the shell launch the commands in this order?

It doesn't necessarily, you can assume all commands are launched in parallel. However, the shell manages redirections before launching commands using them. There would be no point to launch the last command if the output file can't be created/written to. If it can, then the shell need to empty the file before running the command.

Quote:

Do you have a suggestion about what I can use as BufferingCommand?

As you want to avoid temporary files, one simple solution might be:

Code:

a=$(cat)
echo "$a" > ${1:?}

Note that this script would only properly handle text files, not binary ones.

stormcloud · 08-13-2009, 05:31 AM

Sorry, but I'm not sure I understood

Code:

  a=$(cat) 
  echo "$a" > ${1:?}

Could you please explain.

Incidentally I tried using tee as a buffering command. The following works:

Code:

  grep "^tag-" MyFile | cut -d"-" -f2 | tee MyFile

However

Code:

  grep "^tag-" MyFile | cut -d"-" -f2 | tee MyFile > /dev/null

fails!

jlliagre · 08-13-2009, 05:39 AM

Quote:

Originally Posted by stormcloud

Sorry, but I'm not sure I understood

Code:

  a=$(cat) 
  echo "$a" > ${1:?}

Could you please explain.

The variable a contains the the standard input then this variable is printed to the file passed as a parameter.

Quote:

Incidentally I tried using tee as a buffering command. The following works:

Code:

  grep "^tag-" MyFile | cut -d"-" -f2 | tee MyFile

It is much more risky that my suggestion. There is no full buffering with tee.

Quote:

However

Code:

  grep "^tag-" MyFile | cut -d"-" -f2 | tee MyFile > /dev/null

fails!

Not sure. Possibly a timing issue.

jlliagre · 08-13-2009, 05:47 AM

You can use that construction:

Code:

grep "^tag-" MyFile | cut -d"-" -f2 | { a=$(cat) ; echo "$a" > MyFile ;}

stormcloud · 08-13-2009, 10:28 AM

Works perfectly :-)

Thanks

stormcloud · 08-14-2009, 02:40 AM

It also helps if know that

Code:

a=$(cat)

will do the same as

Code:

a=`cat`

There is always more to learn!

Thanks again