LinuxQuestions.org
Visit Jeremy's Blog.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 03-14-2011, 12:55 AM   #1
Last Attacker
Member
 
Registered: Jun 2004
Location: South Africa
Distribution: Ubuntu
Posts: 120

Rep: Reputation: 15
Bash - Read file without whitespace


Good day all

I am struggling with Bash scripting at the moment (I can't seem how anyone can write scripts with this language!!!)

I have a need at home to have a cron job execute daily to lookup my downloads.txt file, read each url (per line) and download content from that url. Then that entry needs to be removed (well I keep all urls in memory and clear the file afterwards). If an error occurred during the download process, then the url is written to a downloads.err file. I got all the above working except for properly reading the url from the text file without including newline characters. I have been spending days on this script as I am still pretty much new to Bash but scripting is nothing new to me.

I am using the following to read:
while read url; do
--Do whatever here--
done < downloads.txt

How can I get it not to let the url variable have newline characters?

Thanks
 
Old 03-14-2011, 01:27 AM   #2
Telengard
Member
 
Registered: Apr 2007
Location: USA
Distribution: Kubuntu 8.04
Posts: 579
Blog Entries: 8

Rep: Reputation: 147Reputation: 147
Quote:
Originally Posted by Last Attacker View Post
while read url; do
--Do whatever here--
done < downloads.txt
That loop by itself does not add any newline characters to the data read into the url variable. I think the newlines must be coming from something in the --Do whatever here-- part.

If you need to remove newline characters from a string it can be done with the tr command.

Code:
tr -d '\n'
http://www.linuxcommand.org/man_pages/tr1.html

Here is a sample data file.

Code:
foo$ cat movies.txt
Re-Animator (1985)
Cthulhu Mansion (1992)
In the Mouth of Madness (1994)
foo$
As you can see the file contains three newline characters, one after each movie title. Now lets see what happens when we process the file with the tr command from above.

Code:
foo$ tr -d '\n' < movies.txt
Re-Animator (1985)Cthulhu Mansion (1992)In the Mouth of Madness (1994)foo$
Now all the newlines have been removed. There are other ways, but this seems easy to me.

Be a little careful if you use this approach because utilities like awk (for example) use newlines as record separators.

If this approach doesn't fit your needs then perhaps it would be helpful to review what else is going on inside your loop.

HTH

Last edited by Telengard; 03-14-2011 at 01:32 AM.
 
Old 03-14-2011, 01:39 AM   #3
Last Attacker
Member
 
Registered: Jun 2004
Location: South Africa
Distribution: Ubuntu
Posts: 120

Original Poster
Rep: Reputation: 15
Hi Telengrad

Thanks for your quick response.
I did indeed wonder if the newlines came from somewhere else but I found the following:
* if my download.txt file has the url in the first line but there is only one line without the newline characters then that url isn't read
* I added a echo on the url after its being read and found that the newline is indeed being added (you need to pipe it to less to see the ^M character)

Any ideas?

Thanks
 
Old 03-14-2011, 02:55 AM   #4
vishnu_sreekumar
Member
 
Registered: Jan 2006
Location: India
Distribution: Ubuntu, RHEL, Debian
Posts: 49

Rep: Reputation: 20
Can you please try this?

Code:
while read url; do
newurl=`echo $url | tr -d '\n'`
--Do whatever here--
done < downloads.txt

Last edited by vishnu_sreekumar; 03-14-2011 at 02:57 AM.
 
Old 03-14-2011, 03:06 AM   #5
Last Attacker
Member
 
Registered: Jun 2004
Location: South Africa
Distribution: Ubuntu
Posts: 120

Original Poster
Rep: Reputation: 15
I'll try it when I get home.

Thanks
 
Old 03-14-2011, 03:15 AM   #6
Telengard
Member
 
Registered: Apr 2007
Location: USA
Distribution: Kubuntu 8.04
Posts: 579
Blog Entries: 8

Rep: Reputation: 147Reputation: 147
Quote:
Originally Posted by Last Attacker View Post
* I added a echo on the url after its being read and found that the newline is indeed being added (you need to pipe it to less to see the ^M character)
echo will always add a newline the end of the output string unless you specifically tell it not to. You can tell echo to suppress the newline character by using the -n option.

Code:
foo$ echo "adds a newline"
adds a newline
foo$ echo -n "no newline added"
no newline addedfoo$
http://www.linuxcommand.org/man_pages/echo1.html

If that doesn't help you then you may need to resort to using tr in the way vishnu_sreekumar showed.

Last edited by Telengard; 03-14-2011 at 03:16 AM.
 
Old 03-14-2011, 03:21 AM   #7
gnashley
Amigo developer
 
Registered: Dec 2003
Location: Germany
Distribution: Slackware
Posts: 4,749

Rep: Reputation: 461Reputation: 461Reputation: 461Reputation: 461Reputation: 461
What about using read 'raw'
Code:
while read -r url; do
newurl=$url
--Do whatever here--
done < downloads.txt
or nulling the IFS:
Code:
while IFS='' read url; do
newurl=$url
--Do whatever here--
done < downloads.txt
 
Old 03-14-2011, 04:10 AM   #8
colucix
Moderator
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,458

Rep: Reputation: 1941Reputation: 1941Reputation: 1941Reputation: 1941Reputation: 1941Reputation: 1941Reputation: 1941Reputation: 1941Reputation: 1941Reputation: 1941Reputation: 1941
Or you can add a newline on-the-fly, for example by means of awk that doesn't care about missing newline and automatically append it at the end of each record:
Code:
while read url
do
  --Do whatever here--
done < <(awk 1 downloads.txt)
 
Old 03-14-2011, 11:46 AM   #9
theNbomr
LQ 5k Club
 
Registered: Aug 2005
Distribution: OpenSuse, Fedora, Redhat, Debian
Posts: 5,395
Blog Entries: 2

Rep: Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903
In Linux, ^M is not a newline. You must have created the file with a DOS/Win editor, with does use carriage-returns to delimit lines. I suggest pre-filtering the file to remove them.

--- rod.
 
1 members found this post helpful.
Old 03-14-2011, 11:47 AM   #10
Last Attacker
Member
 
Registered: Jun 2004
Location: South Africa
Distribution: Ubuntu
Posts: 120

Original Poster
Rep: Reputation: 15
Thanks for everyone's reply.

Unfortunately the ^M character still didn't go away after trying everyone's suggestions.
However, after some Googling I found out about the tofrodos utility which did the trick. I run it first before doing anything.
The file was most probably Windows ANSI or something.

So now I have something like this:
fromdos -d download.txt

while read url; do
aria2c -s 2 "$url"
done < download.txt

---------- Post added 03-14-11 at 06:48 PM ----------

@theNbomr: Man you replied at the same time I replied. :-)
Thx anyway.
 
Old 03-14-2011, 12:38 PM   #11
Last Attacker
Member
 
Registered: Jun 2004
Location: South Africa
Distribution: Ubuntu
Posts: 120

Original Poster
Rep: Reputation: 15
Smile

Here is my script if someone else would like to know what is going on or who wants a auto downloader for their Linux server.

Code:
#!/bin/bash

LOCK_FILE=/tmp/download.busy
BASE_DIR= << Your shares directory via Samba or something: /var/shares >>
DOWNLOAD_FILE=$BASE_DIR/downloads.txt
LOG_FILE=$BASE_DIR/downloads.log
ERROR_FILE=$BASE_DIR/downloads.err

# Make sure cron job doesn't execute this while already running.
if [ ! -f $LOCK_FILE ]; then
	echo "Currently downloading files... Remove if faulty." > $LOCK_FILE
else
	exit 1
fi

fromdos -d $DOWNLOAD_FILE # Fix newline characters from DOS
echo "" >> $DOWNLOAD_FILE # Make sure last line can be read

while read url; do
	if [ -n "$url" ] ; then # Not Empty
		aria2c -c -s 2 -l "$LOG_FILE" --dir=$BASE_DIR "$url"
		if [ "$?" -ne "0" ] ; then # Return of downloader is not 0
			echo $url >> $ERROR_FILE
		fi
	fi
done < $DOWNLOAD_FILE

echo "" > $DOWNLOAD_FILE # Clean list to prevent re-downloading

rm $LOCK_FILE
Now just to hook this up to Cron.
Thanks to everyone's help!
 
Old 03-14-2011, 01:51 PM   #12
Telengard
Member
 
Registered: Apr 2007
Location: USA
Distribution: Kubuntu 8.04
Posts: 579
Blog Entries: 8

Rep: Reputation: 147Reputation: 147
Talking

Quote:
Originally Posted by theNbomr View Post
In Linux, ^M is not a newline.
Good catch. I totally missed that
 
  


Reply

Tags
bash scripting, newline, read, return


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Remove whitespace in file and directory names using bash TheFutonEng Programming 14 02-26-2012 12:35 PM
[SOLVED] Bash, when I read sort write md5 file, the writen file have a space missing peter1234 Linux - General 3 09-21-2010 09:04 AM
[bash] Read file line by line and split on whitespace tskuzzy Programming 4 07-06-2009 03:24 PM
BASH: Removing ALL whitespace from variable eur0dad Programming 1 09-07-2006 10:25 AM
Using sed in bash to remove whitespace jimieee Programming 3 01-28-2004 10:33 AM


All times are GMT -5. The time now is 07:19 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration