LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   Problems with command substitution and whitespaces (https://www.linuxquestions.org/questions/linux-newbie-8/problems-with-command-substitution-and-whitespaces-761604/)

integrale16 10-13-2009 09:24 AM

Problems with command substitution and whitespaces
 
Hi,

i have a text file with a lot filenames in it. The files listed in the file shall be deleted. Each filename is on a separate line.
To delete the files I thought I can do something like that:

Code:

rm `cat files_to_be_deleted.txt`
or
Code:

for file in `cat files_to_be_deleted.txt`; do rm $file; done
But this doesn't work because the filenames have whitespaces. Therefore 'for' and 'rm' get only parts of the filenames separated by the whitespace.
So I tried to quote all filenames in the file this way "file to be deleted" but this doesn't change the behaviour that the filename is cutted at the whitespaces. I also tried to quote the whitespaces with backslashes in the file but this also doesn't change the behaviour.


Can someone help me, to understand the behaviour and show me the right way, please? Thanks!


-integrale16

rn_ 10-13-2009 09:36 AM

Yes, i have run into this issue in the past too, and there are at least three different ways I have used to get around this and i'm sure there are more out there, however, i will just list the easiest way here:

Code:

cat files_to_be_deleted.txt | while read filename
do
rm "$filename"
done

HTH.
-RN.

integrale16 10-13-2009 12:27 PM

@RN: Thanks a lot for the alternative solution!

But is there someone out there, who is able and willing to explain me, why it doesn't work in the way I tried it, because I would like to understand it.

As far as I recognized, the quotation marks "" and also the backslash are passed to the next command (i.e. for), but obviously not interpreted the way I would expect. Although the quotation is passed to the next command, the argument is cutted at the whitespaces.

Would be happy if someone could explain it to me.


-integrale16

catkin 10-13-2009 01:16 PM

Quote:

Originally Posted by integrale16 (Post 3717832)
Would be happy if someone could explain it to me.

Here comes happiness :) ...

Without the double quotes, when bash finds $file, it substitutes the value of $file. If the value of $file is "some stupid file name with spaces in the name", bash then "tokenises" it into individual words as delimited by the spaces. Finally bash runs the command and rm is asked to remove not one file but several files: some, stupid, ... and name. When double quotes are used they tell bash not to tokenise the value of $file but to take it as a single token and rm is asked to remove a single file named "some stupid file name with spaces in the name".

For more info on quoting see the GNU Bash Reference Manual

integrale16 10-13-2009 04:06 PM

@catkin:
I think you misunderstood me.
In general it's clear to me that if I write
Code:

rm some stupid file name
bash will hand over the white space separated parts each as a single argument to rm. And I can write
Code:

rm "some stupid file name"
or
Code:

rm some\ stupid\ file\ name
to prevent this.

But my problem is, that I have a textfile which containes a list of filenames which I want to delete.
Therefore I tried this
Code:

rm `cat textfile`
and this does not work, because the arguments are not simply separated at the newlines in the textfile but at all white spaces. Then I tried to qoute every filename in the textfile this way "some stupid file name". But this doesn't work too. This gives the arguments:
"some
stupid
file
name"

I also tried to quote the filenames this way some\ stupid\ file\ name. This gives the arguments:
some\
stupid\
file\
name

What I try to understand now is, why the qoutes are handed over to rm but don't take effect?

The solution posted from RN, works without any problems.
To me it seems it has something to do with the command substitution because the solution with the pipe works fine.

I hope it's clear now, what my problem is and what I want to know respectively try to understand.

-integrale16

lutusp 10-13-2009 06:03 PM

Quote:

Originally Posted by integrale16 (Post 3718059)
@catkin:
I think you misunderstood me.
In general it's clear to me that if I write
Code:

rm some stupid file name
bash will hand over the white space separated parts each as a single argument to rm. And I can write
Code:

rm "some stupid file name"
or
Code:

rm some\ stupid\ file\ name
to prevent this.

But my problem is, that I have a textfile which containes a list of filenames which I want to delete.
Therefore I tried this
Code:

rm `cat textfile`
and this does not work, because the arguments are not simply separated at the newlines in the textfile but at all white spaces. Then I tried to qoute every filename in the textfile this way "some stupid file name". But this doesn't work too. This gives the arguments:
"some
stupid
file
name"

I also tried to quote the filenames this way some\ stupid\ file\ name. This gives the arguments:
some\
stupid\
file\
name

What I try to understand now is, why the qoutes are handed over to rm but don't take effect?

The solution posted from RN, works without any problems.
To me it seems it has something to do with the command substitution because the solution with the pipe works fine.

I hope it's clear now, what my problem is and what I want to know respectively try to understand.

-integrale16

To properly parse these filenames, you need to use a certain kind of loop. The reason is that, without it, Bash will break the filenames up into individual tokens and submit them one by one to "rm". By contrast, a loop like this --

Code:

cat filename | while read path
-- works because the "read" command reads entire lines, not tokens. This means the entire line is placed in the Bash variable "path", spaces and all.

The remainder of the script must guard against tokenization too:

Code:

cat filename | while read path
do
  some-command "$path"
done

See the quotes around the variable name? This prevents tokenization, and allows the use of paths with spaces. Bash removes the quotes from the path before submitting the string to "rm", but it also understands that the quotes represent an instruction not to tokenize. So "rm" gets the entire line at once, as was intended.

chrism01 10-13-2009 06:03 PM

Lookup the IFS var; http://tldp.org/LDP/abs/html/special-chars.html
The default internal field separator http://tldp.org/LDP/abs/html/interna...es.html#IFSREF

any num of spaces or tabs or a newline

One soln is to temporarily set it to newline only

Code:

OLDIFS=$IFS  # save old val : optional
IFS="\n"    # newline only
for file in `cat filelist.txt`
do
    rm "$file"
done

Note that the old value is automatically reset at the end of the script, so saving the old value isn't needed unless you want to re-instate it for further processing in the same script.

integrale16 10-13-2009 06:49 PM

@lutusp:
Thanks for your explanation. I think, it's (more or less) clear for me, how the script with cat and while works.

I would like to know now, why
Code:

rm `cat files_to_be_deleted`
does _not_ work. Who is the bad guy, cat or the bash?

At the moment my explanation would be that the command cat tokenized the input from the textfile. I think cat reads not whole lines but character by character, right? And the output of cat goes directly to the command rm and is not seen or interpreted by the bash, so that the quotation doesn't take effect, also right?
The thing why the solution with cat, while and read works is, that read collects the data from a whole line before it hands it over to the variable, right?

Would be pleased, if someone could tell me if I catched that right, thanks.


@chrism01:
Thanks for your solution. I also thought about something like this, but was not sure how to do it :-)


-integrale16

chrism01 10-13-2009 06:54 PM

If you read those links, they'll explain it. Also, most *nix cmds take 'whitespace' as the default param separator, you just have to keep that in mind.
Cat will rtn space separated words in that context.

catkin 10-14-2009 12:51 AM

Quote:

Originally Posted by integrale16 (Post 3718197)
I would like to know now, why
Code:

rm `cat files_to_be_deleted`
does _not_ work. Who is the bad guy, cat or the bash?

bash is "the bad guy" in the sense that it tokenises the contents of files_to_be_deleted by splitting them into whitespace-separated words.
Quote:

Originally Posted by integrale16 (Post 3718197)
At the moment my explanation would be that the command cat tokenized the input from the textfile. I think cat reads not whole lines but character by character, right?

cat (with no options) pays no attention to the contents of the input file and simply reproduces it verbatim on stdout.

Quote:

Originally Posted by integrale16 (Post 3718197)
And the output of cat goes directly to the command rm and is not seen or interpreted by the bash, so that the quotation doesn't take effect, also right?

No. After running the cat command, bash replaces `cat files_to_be_deleted` with the output from the cat command which is the contents of files_to_be_deleted, verbatim. Now comes your problem; bash then tokenises the output from the cat command. If you double quote `cat files_to_be_deleted` then the entire contents of files_to_be_deleted becomes a single word which is passed to rm as the name of a single file to delete. If there is only one file listed in files_to_be_deleted then this will work.
Quote:

Originally Posted by integrale16 (Post 3718197)
The thing why the solution with cat, while and read works is, that read collects the data from a whole line before it hands it over to the variable, right?

Right.

For the full story of how bash expands command lines (many steps, specific sequence) see the GNU Bash Reference.

integrale16 10-15-2009 07:44 PM

Quote:

No. After running the cat command, bash replaces `cat files_to_be_deleted` with the output from the cat command which is the contents of files_to_be_deleted, verbatim. Now comes your problem; bash then tokenises the output from the cat command.
Exactly this is it what I don't understand.
If it's like you tell me, that the cat command is executed an then the bash replaces the command with it's output, why doesn't it see or better care about the "" qoutes.

Let's assume, the file has this content:
"filename 1"
"filename 2"
"filename 3"

Replacing the cat command with it's output, in my eyes would give:
rm "filename 1" "filename 2" "filename 3"

Then the command should work as I would expect it.
But the resulting command line works something like this (in principle, not verbatim):
rm '"filename' '1"' '"filename' '2"' '"filename' '3"'
or
rm "filename
rm 1"
rm "filename
rm 2"
rm "filename
rm 3"

Maybe I will never catch this :-(

-integrale16

i92guboj 10-15-2009 09:53 PM

Quote:

Originally Posted by integrale16 (Post 3720995)
Exactly this is it what I don't understand.
If it's like you tell me, that the cat command is executed an then the bash replaces the command with it's output, why doesn't it see or better care about the "" qoutes.

Let's assume, the file has this content:
"filename 1"
"filename 2"
"filename 3"

Replacing the cat command with it's output, in my eyes would give:
rm "filename 1" "filename 2" "filename 3"

The resulting command is not anything similar to this. The resulting command will behave more like this:

Code:

rm whatever there was in your file and the next line and the following one
The output of `cat file` will be "tokenized", which basically means it's simplified to the simplest thing that means the same, then it's passed as an argument list to "rm".

For bash, the default value of the separator ($IFS) includes three special characters: blank, TAB and carry return. *ALL* of these are separators, and are treated as such when not quoted or escaped.

So, your loop will act once for each token, and a token is either the first element, the last element, or anything else that goes in between two separators of any kind (again: tab, space or CR).

As I told you, the whole deal is about IFS, so, you can very well define your own IFS and that way you will get along with your original idea, in command line it would look like this:

Code:

$ IFS='<here you just press ENTER>
'
$ for file in $(cat foo); do rm "$file"; done

Quote:

Then the command should work as I would expect it.
But the resulting command line works something like this (in principle, not verbatim):
rm '"filename' '1"' '"filename' '2"' '"filename' '3"'
or
rm "filename
rm 1"
rm "filename
rm 2"
rm "filename
rm 3"
Inside single quotes, no expansion takes place, and *everything* is literal, including the double quotation marks.

Quote:

Maybe I will never catch this :-(

-integrale16
When you do this:

Code:

cat file | while read foo
The big difference is that you are using the file as the input for "read". Read (unlike "for") will only read complete lines, and only stops reading the input when there's a line ending. So, "read" takes characters from the input source (it doesn't matter if it's the keyboard or a file) and it continues reading characters until it read a carry return. In that moment it saves all the buffered characters into the given variable, and since read has no problem reading spaces the $foo var will contain the whole lines.

The key difference is that in a for loop bash gets in the middle and the output is already broken into tokens when it reaches the tools. On the contrary, with "cat | while read" clause, the data is streamed directly from cat into read, so the usual shell rules do not apply.

In the first case, bash read something, cut it into tokens and then pass it to the rm command, in the second case cat sends the data to read, and bash is not in the middle to cut or tokenize anything. Read will eat data from cat until he reads a CR.

In whatever case, and with independence on how you filled the variable, you have to be careful to quote that variable when you want to read its contents.

Code:

cat file | while read foo; do whatever with "$foo"; done


All times are GMT -5. The time now is 09:45 AM.