LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - General (https://www.linuxquestions.org/questions/linux-general-1/)
-   -   Is there a generic solution to modifying files in place? (https://www.linuxquestions.org/questions/linux-general-1/is-there-a-generic-solution-to-modifying-files-in-place-746759/)

stormcloud 08-11-2009 10:42 AM

Is there a generic solution to modifying files in place?
 
Is there are generic solution to modifying files in place?

For example, suppose I've got a file with data in it I need to process as follows:

Code:

  grep "^tag-" MyFile | cut -d"-" -f2
which would search for lines of data with a specific prefix, chop off the prefix and write the results to standard out.

Now suppose I want to write the results back to the original file. What I can't do is:

Code:

  grep "^tag-" MyFile | cut -d"-" -f2 > MyFile
The result is always an empty file because bash is unable to reading and writing a file at the same time.

The obvious solution is to write to a temporary file and to copy it back, which is bit ugly. A more elegant solution would be some sort of buffering command that will

* Read ALL of the original file
* Close the file
* Write the contents back to standard out.

The bash solution would be:

Code:

  BufferingCommand MyFile | grep "^tag-" | cut -d"-" -f2 > MyFile

Does such a command exist?
Is there a better solution?

pixellany 08-11-2009 10:48 AM

What you call a "buffering command" would actually write a temporary file...

For example, the following SED command replaces all occurences of "yes" to "no" and writes back to the same file. But what it REALLY does is first make a temporary file.

sed -i 's/yes/no/g' filename

some utilities also have options to write a backup file as part of the process.

lumak 08-11-2009 01:06 PM

Code:

FILE=`tempfile`; cat MyFile > $FILE; grep "^tag-" $FILE | cut -d"-" -f2 > MyFile; rm $FILE
OK that's really 3 commands on one line....

EDIT: I tried making a script to pipe in... but it appeared to have the same problem.

stormcloud 08-12-2009 04:21 AM

Thanks for the feed back.

I knew that a few commands had the ability to modify files (sed --in-place being a great example), but I can see this being a more general issue. This is why I used an example that was a bit more complex.

I don't mind commands that internally create temporary files (as sed does) as long as they clean up afterwards. These commands are certainly more elegant then managing the temp files myself.

The thing is, most commands don't have an in-place flag, so what to do in these cases? It looks like managing the file my self (with lumak's suggestion of the `tempfile` command) is the only real option, which is strange as I would imaging the to be a fairly common problem.

nowonmai 08-12-2009 04:59 AM

Quote:

Originally Posted by stormcloud (Post 3640208)
It looks like managing the file my self (with lumak's suggestion of the `tempfile` command) is the only real option, which is strange as I would imaging the to be a fairly common problem.

It seems strange until you realise that most of the text utils in unix were designed to talk to stdin/out so they could be used as inputs to each other, like links in a chain.

jlliagre 08-12-2009 05:12 AM

Quote:

Originally Posted by stormcloud (Post 3639394)
The result is always an empty file because bash is unable to reading and writing a file at the same time.

Bash like any other shell is perfectly able to read and write to a file at the same time. It does what you ask for and here you ask for truncating the file so it just does it.
Quote:

The obvious solution is to write to a temporary file and to copy it back, which is bit ugly. * Read ALL of the original file
* Close the file
* Write the contents back to standard out.

The bash solution would be:

Code:

  BufferingCommand MyFile | grep "^tag-" | cut -d"-" -f2 > MyFile
Does such a command exist?
Is there a better solution?
Your suggestion won't work either, MyFile would be erased before any buffering take place anyway. The shell doesn't run the commands from left to right in a pipeline but the other way around. All but the rightmost program need the pipe listener party to be started before being launched.

Something like this should work though:
Code:

grep "^tag-" MyFile | cut -d"-" -f2 | BufferingCommand MyFile
with Buffering command making sure it has finished reading its input before opening and writing to the file passed as parameter but not redirected to.

lumak 08-12-2009 11:13 AM

@ jlliagre
Thanks for that solution.
My question is then, why don't these commands already completely read and close before writing new files? would that cause issues with streaming to stdout? Would it only cause the pipe to respond slower? is there an existing command we can fake as a buffer? (e.g. somecommands MyFile | sed -i "s@@@" > MyFile)(that doesn't work but just illustrates my question)

jlliagre 08-12-2009 03:02 PM

Whether the commands do buffer or not doesn't matter. They haven't even a chance to run before it's too late. The shell itself handles the output redirection before launching any command. That's the reason why I suggest not to use ">" in the rightmost command.

stormcloud 08-13-2009 04:35 AM

Hi jlliagre,

Knowing that the the commands are started in reverse (right to left) order explains what I'm seeing. MyFile is emptied by the last command (the > operator) before the first command has had chance to read it.

I'm curious, why does the shell launch the commands in this order?

I'd like to give your alternative suggestion a go:
Code:

  grep "^tag-" MyFile | cut -d"-" -f2 | BufferingCommand MyFile
Do you have a suggestion about what I can use as BufferingCommand?

jlliagre 08-13-2009 05:17 AM

Quote:

Originally Posted by stormcloud (Post 3641616)
I'm curious, why does the shell launch the commands in this order?

It doesn't necessarily, you can assume all commands are launched in parallel. However, the shell manages redirections before launching commands using them. There would be no point to launch the last command if the output file can't be created/written to. If it can, then the shell need to empty the file before running the command.
Quote:

Do you have a suggestion about what I can use as BufferingCommand?
As you want to avoid temporary files, one simple solution might be:
Code:

a=$(cat)
echo "$a" > ${1:?}

Note that this script would only properly handle text files, not binary ones.

stormcloud 08-13-2009 05:31 AM

Sorry, but I'm not sure I understood

Code:

  a=$(cat)
  echo "$a" > ${1:?}

Could you please explain.

Incidentally I tried using tee as a buffering command. The following works:

Code:

  grep "^tag-" MyFile | cut -d"-" -f2 | tee MyFile
However

Code:

  grep "^tag-" MyFile | cut -d"-" -f2 | tee MyFile > /dev/null
fails!

jlliagre 08-13-2009 05:39 AM

Quote:

Originally Posted by stormcloud (Post 3641664)
Sorry, but I'm not sure I understood

Code:

  a=$(cat)
  echo "$a" > ${1:?}

Could you please explain.

The variable a contains the the standard input then this variable is printed to the file passed as a parameter.
Quote:

Incidentally I tried using tee as a buffering command. The following works:

Code:

  grep "^tag-" MyFile | cut -d"-" -f2 | tee MyFile

It is much more risky that my suggestion. There is no full buffering with tee.
Quote:

However
Code:

  grep "^tag-" MyFile | cut -d"-" -f2 | tee MyFile > /dev/null
fails!
Not sure. Possibly a timing issue.

jlliagre 08-13-2009 05:47 AM

You can use that construction:
Code:

grep "^tag-" MyFile | cut -d"-" -f2 | { a=$(cat) ; echo "$a" > MyFile ;}

stormcloud 08-13-2009 10:28 AM

Works perfectly :-)

Thanks

stormcloud 08-14-2009 02:40 AM

It also helps if know that

Code:

a=$(cat)
will do the same as

Code:

a=`cat`
There is always more to learn!

Thanks again


All times are GMT -5. The time now is 08:29 PM.