LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (http://www.linuxquestions.org/questions/programming-9/)
-   -   Bash - Read file without whitespace (http://www.linuxquestions.org/questions/programming-9/bash-read-file-without-whitespace-868408/)

Last Attacker 03-14-2011 12:55 AM

Bash - Read file without whitespace
 
Good day all

I am struggling with Bash scripting at the moment (I can't seem how anyone can write scripts with this language!!!)

I have a need at home to have a cron job execute daily to lookup my downloads.txt file, read each url (per line) and download content from that url. Then that entry needs to be removed (well I keep all urls in memory and clear the file afterwards). If an error occurred during the download process, then the url is written to a downloads.err file. I got all the above working except for properly reading the url from the text file without including newline characters. I have been spending days on this script as I am still pretty much new to Bash but scripting is nothing new to me.

I am using the following to read:
while read url; do
--Do whatever here--
done < downloads.txt

How can I get it not to let the url variable have newline characters?

Thanks

Telengard 03-14-2011 01:27 AM

Quote:

Originally Posted by Last Attacker (Post 4289882)
while read url; do
--Do whatever here--
done < downloads.txt

That loop by itself does not add any newline characters to the data read into the url variable. I think the newlines must be coming from something in the --Do whatever here-- part.

If you need to remove newline characters from a string it can be done with the tr command.

Code:

tr -d '\n'
:study: http://www.linuxcommand.org/man_pages/tr1.html

Here is a sample data file.

Code:

foo$ cat movies.txt
Re-Animator (1985)
Cthulhu Mansion (1992)
In the Mouth of Madness (1994)
foo$

As you can see the file contains three newline characters, one after each movie title. Now lets see what happens when we process the file with the tr command from above.

Code:

foo$ tr -d '\n' < movies.txt
Re-Animator (1985)Cthulhu Mansion (1992)In the Mouth of Madness (1994)foo$

Now all the newlines have been removed. There are other ways, but this seems easy to me.

Be a little careful if you use this approach because utilities like awk (for example) use newlines as record separators.

If this approach doesn't fit your needs then perhaps it would be helpful to review what else is going on inside your loop.

HTH

Last Attacker 03-14-2011 01:39 AM

Hi Telengrad

Thanks for your quick response.
I did indeed wonder if the newlines came from somewhere else but I found the following:
* if my download.txt file has the url in the first line but there is only one line without the newline characters then that url isn't read
* I added a echo on the url after its being read and found that the newline is indeed being added (you need to pipe it to less to see the ^M character)

Any ideas?

Thanks

vishnu_sreekumar 03-14-2011 02:55 AM

Can you please try this?

Code:

while read url; do
newurl=`echo $url | tr -d '\n'`
--Do whatever here--
done < downloads.txt


Last Attacker 03-14-2011 03:06 AM

I'll try it when I get home.

Thanks

Telengard 03-14-2011 03:15 AM

Quote:

Originally Posted by Last Attacker (Post 4289918)
* I added a echo on the url after its being read and found that the newline is indeed being added (you need to pipe it to less to see the ^M character)

echo will always add a newline the end of the output string unless you specifically tell it not to. You can tell echo to suppress the newline character by using the -n option.

Code:

foo$ echo "adds a newline"
adds a newline
foo$ echo -n "no newline added"
no newline addedfoo$

:study: http://www.linuxcommand.org/man_pages/echo1.html

If that doesn't help you then you may need to resort to using tr in the way vishnu_sreekumar showed.

gnashley 03-14-2011 03:21 AM

What about using read 'raw'
Code:

while read -r url; do
newurl=$url
--Do whatever here--
done < downloads.txt

or nulling the IFS:
Code:

while IFS='' read url; do
newurl=$url
--Do whatever here--
done < downloads.txt


colucix 03-14-2011 04:10 AM

Or you can add a newline on-the-fly, for example by means of awk that doesn't care about missing newline and automatically append it at the end of each record:
Code:

while read url
do
  --Do whatever here--
done < <(awk 1 downloads.txt)


theNbomr 03-14-2011 11:46 AM

In Linux, ^M is not a newline. You must have created the file with a DOS/Win editor, with does use carriage-returns to delimit lines. I suggest pre-filtering the file to remove them.

--- rod.

Last Attacker 03-14-2011 11:47 AM

Thanks for everyone's reply.

Unfortunately the ^M character still didn't go away after trying everyone's suggestions.
However, after some Googling I found out about the tofrodos utility which did the trick. I run it first before doing anything.
The file was most probably Windows ANSI or something.

So now I have something like this:
fromdos -d download.txt

while read url; do
aria2c -s 2 "$url"
done < download.txt

---------- Post added 03-14-11 at 06:48 PM ----------

@theNbomr: Man you replied at the same time I replied. :-)
Thx anyway.

Last Attacker 03-14-2011 12:38 PM

Here is my script if someone else would like to know what is going on or who wants a auto downloader for their Linux server.

Code:

#!/bin/bash

LOCK_FILE=/tmp/download.busy
BASE_DIR= << Your shares directory via Samba or something: /var/shares >>
DOWNLOAD_FILE=$BASE_DIR/downloads.txt
LOG_FILE=$BASE_DIR/downloads.log
ERROR_FILE=$BASE_DIR/downloads.err

# Make sure cron job doesn't execute this while already running.
if [ ! -f $LOCK_FILE ]; then
        echo "Currently downloading files... Remove if faulty." > $LOCK_FILE
else
        exit 1
fi

fromdos -d $DOWNLOAD_FILE # Fix newline characters from DOS
echo "" >> $DOWNLOAD_FILE # Make sure last line can be read

while read url; do
        if [ -n "$url" ] ; then # Not Empty
                aria2c -c -s 2 -l "$LOG_FILE" --dir=$BASE_DIR "$url"
                if [ "$?" -ne "0" ] ; then # Return of downloader is not 0
                        echo $url >> $ERROR_FILE
                fi
        fi
done < $DOWNLOAD_FILE

echo "" > $DOWNLOAD_FILE # Clean list to prevent re-downloading

rm $LOCK_FILE

Now just to hook this up to Cron.
Thanks to everyone's help!

Telengard 03-14-2011 01:51 PM

Quote:

Originally Posted by theNbomr (Post 4290388)
In Linux, ^M is not a newline.

Good catch. I totally missed that


All times are GMT -5. The time now is 04:47 AM.