LinuxQuestions.org
Review your favorite Linux distribution.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 10-13-2009, 10:24 AM   #1
integrale16
Member
 
Registered: Sep 2009
Posts: 56

Rep: Reputation: 15
Problems with command substitution and whitespaces


Hi,

i have a text file with a lot filenames in it. The files listed in the file shall be deleted. Each filename is on a separate line.
To delete the files I thought I can do something like that:

Code:
rm `cat files_to_be_deleted.txt`
or
Code:
for file in `cat files_to_be_deleted.txt`; do rm $file; done
But this doesn't work because the filenames have whitespaces. Therefore 'for' and 'rm' get only parts of the filenames separated by the whitespace.
So I tried to quote all filenames in the file this way "file to be deleted" but this doesn't change the behaviour that the filename is cutted at the whitespaces. I also tried to quote the whitespaces with backslashes in the file but this also doesn't change the behaviour.


Can someone help me, to understand the behaviour and show me the right way, please? Thanks!


-integrale16
 
Old 10-13-2009, 10:36 AM   #2
rn_
Member
 
Registered: Jun 2009
Location: Orlando, FL, USA
Distribution: Suse, Redhat
Posts: 127
Blog Entries: 1

Rep: Reputation: 25
Yes, i have run into this issue in the past too, and there are at least three different ways I have used to get around this and i'm sure there are more out there, however, i will just list the easiest way here:

Code:
cat files_to_be_deleted.txt | while read filename
do
rm "$filename"
done
HTH.
-RN.
 
Old 10-13-2009, 01:27 PM   #3
integrale16
Member
 
Registered: Sep 2009
Posts: 56

Original Poster
Rep: Reputation: 15
@RN: Thanks a lot for the alternative solution!

But is there someone out there, who is able and willing to explain me, why it doesn't work in the way I tried it, because I would like to understand it.

As far as I recognized, the quotation marks "" and also the backslash are passed to the next command (i.e. for), but obviously not interpreted the way I would expect. Although the quotation is passed to the next command, the argument is cutted at the whitespaces.

Would be happy if someone could explain it to me.


-integrale16
 
Old 10-13-2009, 02:16 PM   #4
catkin
LQ 5k Club
 
Registered: Dec 2008
Location: Tamil Nadu, India
Distribution: Debian
Posts: 8,576
Blog Entries: 31

Rep: Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195
Quote:
Originally Posted by integrale16 View Post
Would be happy if someone could explain it to me.
Here comes happiness ...

Without the double quotes, when bash finds $file, it substitutes the value of $file. If the value of $file is "some stupid file name with spaces in the name", bash then "tokenises" it into individual words as delimited by the spaces. Finally bash runs the command and rm is asked to remove not one file but several files: some, stupid, ... and name. When double quotes are used they tell bash not to tokenise the value of $file but to take it as a single token and rm is asked to remove a single file named "some stupid file name with spaces in the name".

For more info on quoting see the GNU Bash Reference Manual
 
Old 10-13-2009, 05:06 PM   #5
integrale16
Member
 
Registered: Sep 2009
Posts: 56

Original Poster
Rep: Reputation: 15
@catkin:
I think you misunderstood me.
In general it's clear to me that if I write
Code:
 rm some stupid file name
bash will hand over the white space separated parts each as a single argument to rm. And I can write
Code:
rm "some stupid file name"
or
Code:
rm some\ stupid\ file\ name
to prevent this.

But my problem is, that I have a textfile which containes a list of filenames which I want to delete.
Therefore I tried this
Code:
rm `cat textfile`
and this does not work, because the arguments are not simply separated at the newlines in the textfile but at all white spaces. Then I tried to qoute every filename in the textfile this way "some stupid file name". But this doesn't work too. This gives the arguments:
"some
stupid
file
name"

I also tried to quote the filenames this way some\ stupid\ file\ name. This gives the arguments:
some\
stupid\
file\
name

What I try to understand now is, why the qoutes are handed over to rm but don't take effect?

The solution posted from RN, works without any problems.
To me it seems it has something to do with the command substitution because the solution with the pipe works fine.

I hope it's clear now, what my problem is and what I want to know respectively try to understand.

-integrale16
 
Old 10-13-2009, 07:03 PM   #6
lutusp
Member
 
Registered: Sep 2009
Distribution: Fedora
Posts: 835

Rep: Reputation: 102Reputation: 102
Quote:
Originally Posted by integrale16 View Post
@catkin:
I think you misunderstood me.
In general it's clear to me that if I write
Code:
 rm some stupid file name
bash will hand over the white space separated parts each as a single argument to rm. And I can write
Code:
rm "some stupid file name"
or
Code:
rm some\ stupid\ file\ name
to prevent this.

But my problem is, that I have a textfile which containes a list of filenames which I want to delete.
Therefore I tried this
Code:
rm `cat textfile`
and this does not work, because the arguments are not simply separated at the newlines in the textfile but at all white spaces. Then I tried to qoute every filename in the textfile this way "some stupid file name". But this doesn't work too. This gives the arguments:
"some
stupid
file
name"

I also tried to quote the filenames this way some\ stupid\ file\ name. This gives the arguments:
some\
stupid\
file\
name

What I try to understand now is, why the qoutes are handed over to rm but don't take effect?

The solution posted from RN, works without any problems.
To me it seems it has something to do with the command substitution because the solution with the pipe works fine.

I hope it's clear now, what my problem is and what I want to know respectively try to understand.

-integrale16
To properly parse these filenames, you need to use a certain kind of loop. The reason is that, without it, Bash will break the filenames up into individual tokens and submit them one by one to "rm". By contrast, a loop like this --

Code:
cat filename | while read path
-- works because the "read" command reads entire lines, not tokens. This means the entire line is placed in the Bash variable "path", spaces and all.

The remainder of the script must guard against tokenization too:

Code:
cat filename | while read path
do
   some-command "$path"
done
See the quotes around the variable name? This prevents tokenization, and allows the use of paths with spaces. Bash removes the quotes from the path before submitting the string to "rm", but it also understands that the quotes represent an instruction not to tokenize. So "rm" gets the entire line at once, as was intended.
 
Old 10-13-2009, 07:03 PM   #7
chrism01
LQ Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Centos 6.8, Centos 5.10
Posts: 17,240

Rep: Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324
Lookup the IFS var; http://tldp.org/LDP/abs/html/special-chars.html
The default internal field separator http://tldp.org/LDP/abs/html/interna...es.html#IFSREF

any num of spaces or tabs or a newline

One soln is to temporarily set it to newline only

Code:
OLDIFS=$IFS  # save old val : optional
IFS="\n"     # newline only
for file in `cat filelist.txt`
do
    rm "$file"
done
Note that the old value is automatically reset at the end of the script, so saving the old value isn't needed unless you want to re-instate it for further processing in the same script.
 
Old 10-13-2009, 07:49 PM   #8
integrale16
Member
 
Registered: Sep 2009
Posts: 56

Original Poster
Rep: Reputation: 15
@lutusp:
Thanks for your explanation. I think, it's (more or less) clear for me, how the script with cat and while works.

I would like to know now, why
Code:
rm `cat files_to_be_deleted`
does _not_ work. Who is the bad guy, cat or the bash?

At the moment my explanation would be that the command cat tokenized the input from the textfile. I think cat reads not whole lines but character by character, right? And the output of cat goes directly to the command rm and is not seen or interpreted by the bash, so that the quotation doesn't take effect, also right?
The thing why the solution with cat, while and read works is, that read collects the data from a whole line before it hands it over to the variable, right?

Would be pleased, if someone could tell me if I catched that right, thanks.


@chrism01:
Thanks for your solution. I also thought about something like this, but was not sure how to do it :-)


-integrale16
 
Old 10-13-2009, 07:54 PM   #9
chrism01
LQ Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Centos 6.8, Centos 5.10
Posts: 17,240

Rep: Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324
If you read those links, they'll explain it. Also, most *nix cmds take 'whitespace' as the default param separator, you just have to keep that in mind.
Cat will rtn space separated words in that context.

Last edited by chrism01; 10-14-2009 at 01:55 AM. Reason: typo
 
Old 10-14-2009, 01:51 AM   #10
catkin
LQ 5k Club
 
Registered: Dec 2008
Location: Tamil Nadu, India
Distribution: Debian
Posts: 8,576
Blog Entries: 31

Rep: Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195
Quote:
Originally Posted by integrale16 View Post
I would like to know now, why
Code:
rm `cat files_to_be_deleted`
does _not_ work. Who is the bad guy, cat or the bash?
bash is "the bad guy" in the sense that it tokenises the contents of files_to_be_deleted by splitting them into whitespace-separated words.
Quote:
Originally Posted by integrale16 View Post
At the moment my explanation would be that the command cat tokenized the input from the textfile. I think cat reads not whole lines but character by character, right?
cat (with no options) pays no attention to the contents of the input file and simply reproduces it verbatim on stdout.

Quote:
Originally Posted by integrale16 View Post
And the output of cat goes directly to the command rm and is not seen or interpreted by the bash, so that the quotation doesn't take effect, also right?
No. After running the cat command, bash replaces `cat files_to_be_deleted` with the output from the cat command which is the contents of files_to_be_deleted, verbatim. Now comes your problem; bash then tokenises the output from the cat command. If you double quote `cat files_to_be_deleted` then the entire contents of files_to_be_deleted becomes a single word which is passed to rm as the name of a single file to delete. If there is only one file listed in files_to_be_deleted then this will work.
Quote:
Originally Posted by integrale16 View Post
The thing why the solution with cat, while and read works is, that read collects the data from a whole line before it hands it over to the variable, right?
Right.

For the full story of how bash expands command lines (many steps, specific sequence) see the GNU Bash Reference.
 
Old 10-15-2009, 08:44 PM   #11
integrale16
Member
 
Registered: Sep 2009
Posts: 56

Original Poster
Rep: Reputation: 15
Quote:
No. After running the cat command, bash replaces `cat files_to_be_deleted` with the output from the cat command which is the contents of files_to_be_deleted, verbatim. Now comes your problem; bash then tokenises the output from the cat command.
Exactly this is it what I don't understand.
If it's like you tell me, that the cat command is executed an then the bash replaces the command with it's output, why doesn't it see or better care about the "" qoutes.

Let's assume, the file has this content:
"filename 1"
"filename 2"
"filename 3"

Replacing the cat command with it's output, in my eyes would give:
rm "filename 1" "filename 2" "filename 3"

Then the command should work as I would expect it.
But the resulting command line works something like this (in principle, not verbatim):
rm '"filename' '1"' '"filename' '2"' '"filename' '3"'
or
rm "filename
rm 1"
rm "filename
rm 2"
rm "filename
rm 3"

Maybe I will never catch this :-(

-integrale16
 
Old 10-15-2009, 10:53 PM   #12
i92guboj
Gentoo support team
 
Registered: May 2008
Location: Lucena, Córdoba (Spain)
Distribution: Gentoo
Posts: 4,063

Rep: Reputation: 381Reputation: 381Reputation: 381Reputation: 381
Quote:
Originally Posted by integrale16 View Post
Exactly this is it what I don't understand.
If it's like you tell me, that the cat command is executed an then the bash replaces the command with it's output, why doesn't it see or better care about the "" qoutes.

Let's assume, the file has this content:
"filename 1"
"filename 2"
"filename 3"

Replacing the cat command with it's output, in my eyes would give:
rm "filename 1" "filename 2" "filename 3"
The resulting command is not anything similar to this. The resulting command will behave more like this:

Code:
rm whatever there was in your file and the next line and the following one
The output of `cat file` will be "tokenized", which basically means it's simplified to the simplest thing that means the same, then it's passed as an argument list to "rm".

For bash, the default value of the separator ($IFS) includes three special characters: blank, TAB and carry return. *ALL* of these are separators, and are treated as such when not quoted or escaped.

So, your loop will act once for each token, and a token is either the first element, the last element, or anything else that goes in between two separators of any kind (again: tab, space or CR).

As I told you, the whole deal is about IFS, so, you can very well define your own IFS and that way you will get along with your original idea, in command line it would look like this:

Code:
$ IFS='<here you just press ENTER>
'
$ for file in $(cat foo); do rm "$file"; done
Quote:
Then the command should work as I would expect it.
But the resulting command line works something like this (in principle, not verbatim):
rm '"filename' '1"' '"filename' '2"' '"filename' '3"'
or
rm "filename
rm 1"
rm "filename
rm 2"
rm "filename
rm 3"
Inside single quotes, no expansion takes place, and *everything* is literal, including the double quotation marks.

Quote:
Maybe I will never catch this :-(

-integrale16
When you do this:

Code:
cat file | while read foo
The big difference is that you are using the file as the input for "read". Read (unlike "for") will only read complete lines, and only stops reading the input when there's a line ending. So, "read" takes characters from the input source (it doesn't matter if it's the keyboard or a file) and it continues reading characters until it read a carry return. In that moment it saves all the buffered characters into the given variable, and since read has no problem reading spaces the $foo var will contain the whole lines.

The key difference is that in a for loop bash gets in the middle and the output is already broken into tokens when it reaches the tools. On the contrary, with "cat | while read" clause, the data is streamed directly from cat into read, so the usual shell rules do not apply.

In the first case, bash read something, cut it into tokens and then pass it to the rm command, in the second case cat sends the data to read, and bash is not in the middle to cut or tokenize anything. Read will eat data from cat until he reads a CR.

In whatever case, and with independence on how you filled the variable, you have to be careful to quote that variable when you want to read its contents.

Code:
cat file | while read foo; do whatever with "$foo"; done

Last edited by i92guboj; 10-15-2009 at 10:57 PM.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] echoing whitespace from a command substitution GahseyFan Linux - General 2 05-16-2009 05:58 AM
passing a veriable in a command substitution dwj79 Programming 4 04-02-2009 03:46 AM
Bash Command Substitution dakensta Programming 5 11-30-2006 04:10 PM
Command substitution and sed daYz Linux - General 9 11-04-2006 02:15 AM
command substitution: ^ rhxk Linux - General 2 04-06-2006 10:51 AM


All times are GMT -5. The time now is 10:18 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration