LinuxQuestions.org
Review your favorite Linux distribution.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 04-24-2012, 12:32 PM   #1
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,493

Rep: Reputation: 2867Reputation: 2867Reputation: 2867Reputation: 2867Reputation: 2867Reputation: 2867Reputation: 2867Reputation: 2867Reputation: 2867Reputation: 2867Reputation: 2867
Lightbulb (INFO) Something I just learnt about reading a file


Now there may be some out there that are just going to say "well, no sh!t", but I have
been playing with bash for a while now and hadn't previously come across this so I thought I would share

Most experienced bash scripters learnt fairly early on that the following is a bad idea when it comes to reading an unpredictable file:
Code:
for line in $(< file);do <some stuff>;done
Namely because we all found out that word splitting occurs and our nicely formatted file has each line broken up by the value stored in IFS (not helpful)

So then we find that a much better solution is to use the following:
Code:
while read line;do <stuff here>;done
Yay we say, all is right with our little corner of the world

So here I was up after midnight playing with a script which required me to read a file. No biggy, trusty while loop to the rescue .... wrong (so it appears)

Here is my test data file (test_file):
Code:
this a test
 this another test with space at the front
    and yet another with a tab at the front
back to the start
try some space at the end 
and with a tab at the end
Then we use a very basic script to simply echo the lines (test_script.sh):
Code:
#!/bin/bash

while read line
do
    echo "|$line|"   #pipes are to make sure we see the whitespace in the output
done<test_file
Okee dokee ... seems fairly straight forward, so we run it:
Code:
$ ./test_script.sh
|this a test|
|this another test with space at the front|
|and yet another with a tab at the front|
|back to the start|
|try some space at the end|
|and with a tab at the end|
Well gee willikers batman ... what happened to all my whitespace at the start and the end?

hmmm ... so I run off to look at the read command:
Code:
The line is split into fields as with word
splitting, and the first word is assigned to the first NAME, the second
word to the second NAME, and so on, with any leftover words assigned to
the last NAME.  Only the characters found in $IFS are recognised as word
delimiters.
Ok, I see the mention about word splitting but it has not applied to the entire line, so reading on it says it will assign data into the number of variables
to be used by read. We have only supplied one, so I would have thought this would place the whole line in "line" (apparently not)

So I hit the trusty search sites but was unable to find anything to state that read will still remove IFS characters from the start and end.
I made a small change to confirm:
Code:
# previous line
while read line

# revised line
while IFS= read line
On re-running with alteration:
Code:
$ ./test_script.sh 
|this a test|
| this another test with space at the front|
|    and yet another with a tab at the front|
|back to the start|
|try some space at the end |
|and with a tab at the end    |
YAY ... now we have the desired output.

Not sure if it is just me that wasn't aware of this, but in case not ... now you know
 
Click here to see the post LQ members have rated as the most helpful post in this thread.
Old 04-24-2012, 09:30 PM   #2
towheedm
Member
 
Registered: Sep 2011
Location: Trinidad & Tobago
Distribution: Debian Jessie
Posts: 592

Rep: Reputation: 119Reputation: 119
From http://tldp.org/LDP/abs/html/internalvariables.html

Code:
# However ...
# $IFS treats whitespace differently than other characters.

output_args_one_per_line()
{
  for arg
  do
    echo "[$arg]"
  done #  ^    ^   Embed within brackets, for your viewing pleasure.
}

echo; echo "IFS=\" \""
echo "-------"

IFS=" "
var=" a  b c   "
#    ^ ^^   ^^^
output_args_one_per_line $var  # output_args_one_per_line `echo " a  b c   "`
# [a]
# [b]
# [c]


echo; echo "IFS=:"
echo "-----"

IFS=:
var=":a::b:c:::"               # Same pattern as above,
#    ^ ^^   ^^^                #+ but substituting ":" for " "  ...
output_args_one_per_line $var
# []
# [a]
# []
# [b]
# [c]
# []
# []

# Note "empty" brackets.
# The same thing happens with the "FS" field separator in awk.
 
Old 04-24-2012, 09:51 PM   #3
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 15,533

Rep: Reputation: 2041Reputation: 2041Reputation: 2041Reputation: 2041Reputation: 2041Reputation: 2041Reputation: 2041Reputation: 2041Reputation: 2041Reputation: 2041Reputation: 2041
Thanks grail.
You need to find better things to occupy yourself with after midnight - Perth too tame for you ??? ....
 
Old 04-24-2012, 10:39 PM   #4
catkin
LQ 5k Club
 
Registered: Dec 2008
Location: Tamil Nadu, India
Distribution: Debian
Posts: 8,576
Blog Entries: 31

Rep: Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195
You might be interested in this related post, grail.

The thread is a little confusing because I started out mis-remebering the difference between IFS= and unset IFS and then the tread moved on to read returning non-zero when the last record does not end in a record separator but I'm still confused by bash's behaviour in the linked post.
 
Old 04-25-2012, 03:21 AM   #5
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,493

Original Poster
Rep: Reputation: 2867Reputation: 2867Reputation: 2867Reputation: 2867Reputation: 2867Reputation: 2867Reputation: 2867Reputation: 2867Reputation: 2867Reputation: 2867Reputation: 2867
@towheedm - I did see the section you are referring to, but as my initial example showed, I was already aware that a for loop would use IFS on the input data, hence the move to the while loop

@syg00 - Yeah just moved back here from Darwin at the start of this year and still looking for jobs ... so one must keep busy

@catkin - I knew I should have kept following that thread, but stopped after the first few replies thinking I already had all the answers I am pleased to see that I am not the only one stumped by
the behaviour. Still it is good to see alternative solutions and the fact that there are solutions
 
2 members found this post helpful.
Old 04-25-2012, 09:41 PM   #6
towheedm
Member
 
Registered: Sep 2011
Location: Trinidad & Tobago
Distribution: Debian Jessie
Posts: 592

Rep: Reputation: 119Reputation: 119
Unless I'm not clear on what you're saying, the BASH info pages talks about IFS whitespace characters at the beginning and end of the result from word splitting:
Quote:
Word Splitting
The shell scans the results of parameter expansion, command substitu‐
tion, and arithmetic expansion that did not occur within double quotes
for word splitting.

The shell treats each character of IFS as a delimiter, and splits the
results of the other expansions into words on these characters. If IFS
is unset, or its value is exactly <space><tab><newline>, the default,
then sequences of <space>, <tab>, and <newline> at the beginning and
end of the results of the previous expansions are ignored, and any
sequence of IFS characters not at the beginning or end serves to
delimit words. If IFS has a value other than the default, then
sequences of the whitespace characters space and tab are ignored at the
beginning and end of the word, as long as the whitespace character is
in the value of IFS (an IFS whitespace character). Any character in
IFS that is not IFS whitespace, along with any adjacent IFS whitespace
characters, delimits a field. A sequence of IFS whitespace characters
is also treated as a delimiter. If the value of IFS is null, no word
splitting occurs.
I'm a BASH novice, so please correct me if I'm wrong.
 
Old 04-26-2012, 04:28 AM   #7
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,493

Original Poster
Rep: Reputation: 2867Reputation: 2867Reputation: 2867Reputation: 2867Reputation: 2867Reputation: 2867Reputation: 2867Reputation: 2867Reputation: 2867Reputation: 2867Reputation: 2867
No, you are clear on the issue of word splitting and the fact that a for loop does in fact perform this task on whitespace separated data. As a general solution, before now anyway,
I (and others as you can see from some of the comments and other links) thought that the read solution of a while loop overcame this issue. In a way it does as no splitting
is performed between characters but is performed pre and post the first and last characters (ie. not whitespace).

Ultimately it is one of those slightly unusual occurrences where it is obeying the IFS word splitting solution up to a point but then immediately stops between non-whitespace characters.
 
Old 04-26-2012, 07:42 PM   #8
towheedm
Member
 
Registered: Sep 2011
Location: Trinidad & Tobago
Distribution: Debian Jessie
Posts: 592

Rep: Reputation: 119Reputation: 119
OK, I think I'm getting this now. This has to do specifically with using 'read' to retrieve the contents of a line in a file, where the contents of that line may be unknown but may also contain IFS whitespace chars at either the beginning and end, or even both.

But then, if the line contains non-whitespace IFS characters, the IFS rule on word-splitting is not obeyed.

Am I on track here?

Something new to learn in BASH, will have to look at the thread posted by catkin.
 
Old 04-26-2012, 07:57 PM   #9
chrism01
LQ Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Centos 6.9, Centos 7.3
Posts: 17,356

Rep: Reputation: 2367Reputation: 2367Reputation: 2367Reputation: 2367Reputation: 2367Reputation: 2367Reputation: 2367Reputation: 2367Reputation: 2367Reputation: 2367Reputation: 2367
I rarely have to deal with that in bash, but (iirc), I think I usually just 'fix' IFS to be newline only (paranoia mode) and use a for loop (?).

@grail; I'd have thought the mining co's would have job for you? It all sounds good over there, in the media over here ..
 
Old 04-26-2012, 11:56 PM   #10
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,493

Original Poster
Rep: Reputation: 2867Reputation: 2867Reputation: 2867Reputation: 2867Reputation: 2867Reputation: 2867Reputation: 2867Reputation: 2867Reputation: 2867Reputation: 2867Reputation: 2867
@chrism01 - Yes I thought so too but as yet no luck

@towheedm - Yes your on track Just tricky to find things were not as black and white as they first seemed.

Well I will mark as SOLVED, but thanks to everyone for the feedback. Hopefully it will help others when searching about this topic
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
reading info from one file in to another omle Linux - Newbie 1 11-17-2010 02:55 PM
FTP in UBUNTU is diff what i learnt in my Redhat Enterprise flavour rajasekhar19489 Ubuntu 3 10-20-2010 05:42 AM
Gnome/Nautilus issues Beagle-Tracker, file transfer info, replace file info, popup. Mysticle31 Linux - Software 0 01-08-2008 05:30 PM
Reading JPEG Header info lucky6969b Programming 4 06-05-2006 05:12 AM
Convert an info file(bash.info.gz) to a single html file Darwish Linux - Software 2 09-24-2005 06:51 AM


All times are GMT -5. The time now is 12:07 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration