Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game. |
Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
|
 |
04-15-2012, 04:58 AM
|
#1
|
LQ 5k Club
Registered: Dec 2008
Location: Tamil Nadu, India
Distribution: Debian
Posts: 8,578
|
Bash: when an empty IFS does not work like a default IFS (info)
A few years ago, here on LQ, some of us essayed a theory that, for bash, an an empty IFS is functionally identical to the default IFS.
There is a situation in which that is not true. It is when used in conjunction with read. Here's a demonstration.
Code:
c@CW8:/tmp/tmp$ rm *
c@CW8:/tmp/tmp$ touch no_space_after 'space_after '
c@CW8:/tmp/tmp$ echo -n "$IFS" | od -a
0000000 sp ht nl
0000003
c@CW8:/tmp/tmp$ while read -r -d '' file; do echo "$file<"; done < <(find $dir -type f -print0)
./no_space_after<
./space_after<
c@CW8:/tmp/tmp$ while IFS= read -r -d '' file; do echo "$file<"; done < <(find $dir -type f -print0)
./no_space_after<
./space_after <
|
|
|
Click here to see the post LQ members have rated as the most helpful post in this thread.
|
04-15-2012, 06:12 AM
|
#2
|
LQ Guru
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509
|
I cannot find the cited thread, but it's quite obvious the opposite (that is an empty IFS doesn't act as the default). Another simple example:
Code:
$ while read one two three; do echo "$two<"; done < <(echo one two three)
two<
$ while IFS= read one two three; do echo "$two<"; done < <(echo one two three)
<
$ while IFS= read one two three; do echo "$one<"; done < <(echo one two three)
one two three<
Since there is not a null character in the input string, it is not split at all. Using bash 4.1.7 here.
Last edited by colucix; 04-15-2012 at 06:14 AM.
|
|
|
04-15-2012, 09:29 AM
|
#3
|
Senior Member
Registered: Nov 2005
Distribution: Debian, Arch
Posts: 3,831
|
Well the documented behaviour is that unset IFS is the same as default, not empty IFS:
Quote:
3.5.7 Word Splitting
...
If IFS is unset, or its value is exactly <space><tab><newline>, the default...
If the value of IFS is null, no word splitting occurs.
|
|
|
2 members found this post helpful.
|
04-15-2012, 10:02 AM
|
#4
|
Senior Member
Registered: Dec 2010
Location: Finland
Distribution: Xubuntu, CentOS, LFS
Posts: 1,723
|
Catkin, the -d delimiter option for the read Bash built-in has nothing to do with IFS if you only read into one variable. The delimiter defaults to newline, not IFS, and an empty delimiter refers to ASCII NUL, zero byte.
Only when you read into multiple variables or an array, IFS comes into play. delimiter still specifies the delimiter for the entire input, but that input is then split using IFS between the variables/array. If multiple variables are used, the final parameter will always receive the rest of the input.
It is easier to understand if you think of read as obtaining one full record, delimited by delimiter which defaults to a newline. The record will be split into fields according to IFS, except that the final variable will receive all the rest of the fields. If only one variable is specified, it will receive the entire record, regardless of what IFS is.
Here is the relevant snippet of man bash-builtins manpage, edited for brevity:
Code:
read [-ers] [-a aname] [-d delim] options... [name ...]
One line is read from the standard input, or from the file
descriptor fd supplied as an argument to the -u option, and the
first word is assigned to the first name, the second word to the
second name, and so on, with leftover words and their interven‐
ing separators assigned to the last name. If there are fewer
words read from the input stream than names, the remaining names
are assigned empty values. The characters in IFS are used to
split the line into words. The backslash character (\) may be
used to remove any special meaning for the next character read
and for line continuation. Options, if supplied, have the fol‐
lowing meanings:
-a aname
The words are assigned to sequential indices of the array
variable aname, starting at 0. aname is unset before any
new values are assigned. Other name arguments are
ignored.
-d delim
The first character of delim is used to terminate the
input line, rather than newline.
|
|
|
04-15-2012, 09:56 PM
|
#5
|
LQ 5k Club
Registered: Dec 2008
Location: Tamil Nadu, India
Distribution: Debian
Posts: 8,578
Original Poster
|
Quote:
Originally Posted by ntubski
Well the documented behaviour is that unset IFS is the same as default, not empty IFS:
|
Thanks for the correction ntubski
I was mis-remembering
Sorry for the confusion.
|
|
|
04-15-2012, 10:26 PM
|
#6
|
LQ 5k Club
Registered: Dec 2008
Location: Tamil Nadu, India
Distribution: Debian
Posts: 8,578
Original Poster
|
Quote:
Originally Posted by Nominal Animal
Only when you read into multiple variables or an array, IFS comes into play.
|
Hello Nominal Animal
The OP shows read being used with a single value and IFS having an effect (the trailing space is removed). Here's a more comprehensive demonstration:
Code:
c@CW8:~$ input='no_space
> space_before
> space_after
> space_both_sides '
c@CW8:~$ echo "$input" | while read -r record; do echo ">$record<"; done
>no_space<
>space_before<
>space_after<
>space_both_sides<
c@CW8:~$ echo "$input" | while IFS= read -r record; do echo ">$record<"; done
>no_space<
> space_before<
>space_after <
> space_both_sides <
I interpreted that to mean that any characters in IFS are stripped from the left and right sides of the record but it is not so:
Code:
c@CW8:~$ input='no_space
Xspace_before
space_afterX
Xspace_both_sidesX'
c@CW8:~$ echo "$input" | while read record; do echo ">$record<"; done
>no_space<
>Xspace_before<
>space_afterX<
>Xspace_both_sidesX<
c@CW8:~$ echo "$input" | while IFS=X read record; do echo ">$record<"; done
>no_space<
>Xspace_before<
>space_after<
>Xspace_both_sidesX<
I do not understand why the trailing X was stripped from the third record but not the fourth. It is not because it is the last record:
Code:
c@CW8:~$ input='no_space
> Xspace_before
> space_afterX
> Xspace_both_sidesX
> another record'
c@CW8:~$ echo "$input" | while IFS=X read record; do echo ">$record<"; done
>no_space<
>Xspace_before<
>space_after<
>Xspace_both_sidesX<
>another record<
|
|
|
04-16-2012, 12:43 AM
|
#7
|
Senior Member
Registered: Dec 2010
Location: Finland
Distribution: Xubuntu, CentOS, LFS
Posts: 1,723
|
It gets even weirder. If you don't have a trailing record separator, the entire last record is silently discarded:
Code:
na@farm:~$ input=$'no_space\n space_before\nspace_after \n space_both_sides '
na@farm:~$ IFS=$'\t '; printf '%s\n' "$input" | while read -r record ; do echo ">$record<"; done
>no_space<
>space_before<
>space_after<
>space_both_sides<
na@farm:~$ IFS=$'\t '; printf '%s' "$input" | while read -r record ; do echo ">$record<"; done
>no_space<
>space_before<
>space_after<
(no space_both_sides in output at all)
It happens even if IFS is something else, and we use NUL separators:
Code:
na@farm:~$ input=$'no_space\n space_before\nspace_after \n space_both_sides '
na@farm:~$ IFS='Z'; printf '%s\0' "$input" | while read -rd '' record ; do echo ">$record<"; done
>no_space
space_before
space_after
space_both_sides <
na@farm:~$ IFS='Z'; printf '%s' "$input" | while read -rd '' record ; do echo ">$record<"; done
(outputs nothing!?!)
If we use IFS explicitly as the record separator, it still happens if we do not have a trailing separator:
Code:
na@farm:~$ input=$'no_space\n space_before\nspace_after \n space_both_sides '
na@farm:~$ IFS=$'\n'; printf '%s\n' "$input" | while read -rd $'\n' record ; do echo ">$record<"; done
>no_space<
> space_before<
>space_after <
> space_both_sides <
na@farm:~$ IFS=$'\n'; printf '%s' "$input" | while read -rd $'\n' record ; do echo ">$record<"; done
>no_space<
> space_before<
>space_after <
(no space_both_sides here either)
I believe this is a bug in Bash read builtin. It should not ignore the record if there is no trailing separator. Nor should it consume the trailing separator for a variable that receives the rest of the record, as it does here:
Code:
na@farm:~$ input=$'no_space\n space_before\nspace_after \n space_both_sides '
na@farm:~$ IFS=$'\t '; printf '%s\0' "$input" | while read -rd '' record ; do echo ">$record<"; done
>no_space<
>space_before<
>space_after<
>space_both_sides< (should have a space before <)
It does consume the trailing separator even when multiple variables or an array is used, and happens whatever record separator is used, so at least it is consistent:
Code:
na@farm:~$ input=$'no_space\n space_before\nspace_after \n space_both_sides '
na@farm:~$ IFS=$'\t '; printf '%sZ' "$input" | while read -rd 'Z' one two ; do echo ">$one<|>$two<"; done
>no_space
<|>space_before
space_after
space_both_sides< (should have a space before <)
na@farm:~$ IFS=$'\t '; printf '%sZ' "$input" | while read -rd 'Z' -a any ; do printf '>%s<\n' "${any[@]}" ; done
>no_space
<
>space_before
space_after<
>
<
>space_both_sides<
This has big implications on safe file name handling in Bash. In particular, to avoid truncating file names with trailing characters that might match IFS, one has to set IFS to an empty string:
Code:
na@farm:~$ touch $'test-file1' $'test-file2 '
na@farm:~$ unset IFS
na@farm:~$ find . -maxdepth 1 -type f -name 'test-file*' -print0 |
while read -rd "" FILE ; do
[ -f "$FILE" ] || printf '%s: No such file.\n' "$FILE" >&2
done
./test-file2: No such file.
na@farm:~$ find . -maxdepth 1 -type f -name 'test-file*' -print0 |
while IFS="" read -rd "" FILE ; do
[ -f "$FILE" ] || printf '%s: No such file.\n' "$FILE" >&2
done
(no output; both file names handled correctly)
Fortunately, it does not mess up the IFS, since it only sets it for the read built-in, temporarily. To wit:
Code:
na@farm:~$ touch $'test-file1' $'test-file2 '
na@farm:~$ IFS=$'\t '
na@farm:~$ while IFS="" read -rd "" FILE ; do
[ -f "$FILE" ] || printf '%s: No such file.\n' "$FILE" >&2
done < <( find . -maxdepth 1 -type f -name 'test-file*' -print0 )
(no output; both file names handled correctly)
na@farm:~$ printf '>%s< (%d chars)\n' "$IFS" ${#IFS}
> < (2 chars)
na@farm:~$ rm -f $'test-file1' $'test-file2 '
I'm off to fix my blog post about this.
|
|
2 members found this post helpful.
|
04-16-2012, 01:19 AM
|
#8
|
Bash Guru
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Arch + Xfce
Posts: 6,852
|
Quote:
Originally Posted by Nominal Animal
It gets even weirder. If you don't have a trailing record separator, the entire last record is silently discarded:
|
That's a problem with the while loop, not read. At least not exactly. If there's no trailing delimiter, read doesn't return as true. So the input gets read, but the loop's sub-commands don't get executed. You need to process the final variable values outside the loop if you want to safely handle all situations.
See here: http://mywiki.wooledge.org/BashFAQ/001
Last edited by David the H.; 04-16-2012 at 01:21 AM.
|
|
1 members found this post helpful.
|
04-16-2012, 01:59 AM
|
#9
|
LQ 5k Club
Registered: Dec 2008
Location: Tamil Nadu, India
Distribution: Debian
Posts: 8,578
Original Poster
|
Quote:
Originally Posted by Nominal Animal
Code:
na@farm:~$ IFS=$'\t '
na@farm:~$ printf '>%s< (%d chars)\n' "$IFS" ${#IFS}
> < (2 chars)
na@farm:~$ rm -f $'test-file1' $'test-file2 '
|
Thanks for coming on-board with this
So it looks like that very rare phenomenon, a bash bug. I plan to report it after waiting a few days for understanding to clarify.
Incidentally, printf's %q option is helpful to show the actual value of IFS in the above:
Code:
c@CW8:~$ IFS=$'\t '
c@CW8:~$ printf '>%q< (%d chars)\n' "$IFS" ${#IFS}
>$'\t '< (2 chars)
---------- Post added 16th Apr 2012 at 12:30 ----------
Quote:
Originally Posted by David the H.
That's a problem with the while loop, not read.
|
Thanks David 
|
|
|
04-16-2012, 02:01 AM
|
#10
|
Senior Member
Registered: Dec 2010
Location: Finland
Distribution: Xubuntu, CentOS, LFS
Posts: 1,723
|
Quote:
Originally Posted by David the H.
That's a problem with the while loop, not read. At least not exactly. If there's no trailing delimiter, read doesn't return as true.
|
Now whose brilliant idea was that?
Now we need to do e.g.
Code:
test -v LANG && OLD_LANG="$LANG" || unset OLD_LANG ; LANG=C
test -v LC_ALL && OLD_LC_ALL="$LC_ALL" || unset OLD_LC_ALL ; LC_ALL=C
while [ 1 ]; do
FILE=""
IFS="" read -rd '' FILE || [ -n "$FILE" ] || break
#
# do something with file
#
done
test -v OLD_LANG && LANG="$LANG" || unset LANG
test -v OLD_LC_ALL && LC_ALL="$LC_ALL" || unset LC_ALL
to handle e.g. NUL-delimited file lists correctly, just in case there is no final NUL at end. The locale override is necessary to avoid non-UTF-8 sequences from aborting the script (if an UTF-8 locale is used).
Thanks for the info, though, David the H. 
Last edited by Nominal Animal; 04-16-2012 at 02:11 AM.
|
|
|
04-17-2012, 06:19 AM
|
#11
|
Bash Guru
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Arch + Xfce
Posts: 6,852
|
I was a bit pressed for time when I posted yesterday, so I couldn't go through the thread carefully. I could only post & run re NA's last post.
To first respond to what's come since then:
Pulling up " help read" gives us this:
Code:
Exit Status:
The return code is zero, unless end-of-file is encountered, read times out,
or an invalid file descriptor is supplied as the argument to -u.
Remember, one invocation of read grabs only a single delimited section of input data. It doesn't seem completely unreasonable to me to have it differentiate between hitting a delimiter and EOF. It's only when used in a loop, which invokes it multiple times, that it becomes a "gotcha".
I suppose it would be nice to be able to have read return true on an EOF as well, perhaps as an option flag. Other than that, except in trivial cases, it would probably be best to set up a function for the sub-commands, to keep from having to duplicate the entire code section again.
Another option could be to use mapfile or similar to capture the input lines into an array first, and process those.
Now, to respond to catkin, whitespace is treated slightly differently by IFS than other characters.
When a whitespace character set in IFS matches a whitespace string at the front or end of a line, then all of that whitespace is removed, and the first non-IFS-set character starts the first field.
Non-whitespace characters in IFS, OTOH, always match individually, and are always considered actual delimiters. That means that, if encountered initially, the "empty" value in front of it is considered the first field.
http://mywiki.wooledge.org/IFS
|
|
|
04-17-2012, 02:14 PM
|
#12
|
Senior Member
Registered: Dec 2010
Location: Finland
Distribution: Xubuntu, CentOS, LFS
Posts: 1,723
|
Quote:
Originally Posted by David the H.
Remember, one invocation of read grabs only a single delimited section of input data. It doesn't seem completely unreasonable to me to have it differentiate between hitting a delimiter and EOF.
|
No, it is not unreasonable.
What is unreasonable is that you cannot differentiate between "data read, but no delimiter", "no more input", and "read error". The exit status is the same (false, 1) in all three cases, at least in Bash-4.2.10(1)-release (x86_64-pc-linux-gnu).
In most programming environments one encounters EOF only in the end of file error sense, i.e. only when trying to read past the end of input. Indeed, in POSIX systems, this is the only way kernels tell the userspace that the file pointer is at the end of input, or that there will be no further data available. A short read is always possible, and does not indicate anything about whether there is further data or not.
To me, personally, getting an EOF error (nonzero exit status) while also having input, is counterintuitive.
That said, it is historical behaviour, and therefore will not change.
Fortunately, the workaround can apparently be written pretty concisely. For NUL-separated records:
Code:
DATA=""
while IFS="" read -rd "" DATA || [ -n "$DATA" ]; do
# Do something with DATA
DATA=""
done
While read does clear the DATA in normal situations, it does not clear it if a read error occurs; for example, if you do close the input, for example via exec<&- accidentally in the loop body. Without explicitly clearing DATA the loop would never exit in the true read error case.
I guess I was just a bit frustrated with the behaviour. Thanks again for your efforts, David the H. 
|
|
2 members found this post helpful.
|
04-18-2012, 09:43 AM
|
#13
|
LQ 5k Club
Registered: Dec 2008
Location: Tamil Nadu, India
Distribution: Debian
Posts: 8,578
Original Poster
|
Quote:
Originally Posted by Nominal Animal
Fortunately, the workaround can apparently be written pretty concisely. For NUL-separated records:
Code:
DATA=""
while IFS="" read -rd "" DATA || [ -n "$DATA" ]; do
# Do something with DATA
DATA=""
done
|
Neat workaround for read's counter-intuitive behaviour Nominal Animal
A couple of points of style: DATA= is functionally identical to DATA="" and it may be preferable to use single quotes where doubles are not necessary as in the -d option.
Last edited by catkin; 04-18-2012 at 09:44 AM.
Reason: Fix missing bolding
|
|
|
04-19-2012, 09:40 AM
|
#14
|
Bash Guru
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Arch + Xfce
Posts: 6,852
|
Agreed. It's an exquisite workaround. So simple.
You also stated more eloquently what I was trying to express with my suggestion of an option flag. We'd need to preserve historical compatibility while providing some ability to detect both states.
Thanks for the discussion and the hint.
|
|
|
All times are GMT -5. The time now is 10:16 PM.
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|