LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 03-12-2015, 08:06 PM   #1
aadams
LQ Newbie
 
Registered: Jun 2011
Posts: 14

Rep: Reputation: Disabled
Bash script modification: Allowing spaces in directory/file name searches


OS: RHEL5
Scripting Experience: Beginner
Bash



Currently, myscript.sh will check the contents of fileA and compare it to directoryA and all of its subdirectory names and file names. If myscript.sh finds matches, it writes the matches to $1_found.csv. If myscript.sh looks at the line-by-line content inside fileA and is unable to match any entries to anything inside directoryA, it then writes each of those items to $1_notfound.csv. This is how it should work.



As it stands, the problem with this is that directoryA and fileA both contain some files whos names contain spaces (Example: "My file 1" and "Myfile2"). So, when myscript.sh runs, it finds "Myfile2" in both fileA and directoryA, then writes the name of "Myfile2" to $1_found.csv. That's perfect. However, "My file 1" is read as 3 seperate files even though it is actually present in fileA and directoryA, its name gets written to $1_notfound.csv, because it has spaces in its name. I need "My file 1" and any other file with spaces in the name to get written to $1_found.csv, if it exists in both locations.



So, I am looking for a nice fix for this problem. Matching directory and filenames between fileA and directoryA sholud still be written to $1_found.csv, even if the name has spaces in it. I know it's not a good practice to use filenames with spaces in Linux, but there are other stakeholders involved and files which came from other OS platforms (e.g. Windows) and I don't have the luxury of simply replacing all the spaces with an underscore for example.

I GREATLY appreciate any help on this!


Here's the script...



printf "\n"
echo "--- File Existance Script ----------------------------------"
echo "Script Started: `date`"
echo "Clearing log files..."
rm $1_found.csv
rm $1_notfound.csv
echo "Starting File Check..."
ROW_COUNTER=0
FOUND_COUNTER=0
NOTFOUND_COUNTER=0

while IFS="," read path id
do
for f in $path;
do
if [ -f $f ]
then
FOUND_COUNTER=$((FOUND_COUNTER+1))
echo "$id,$f" >> $1_found.csv;
else
NOTFOUND_COUNTER=$((NOTFOUND_COUNTER+1))
echo "$id,$f" >> $1_notfound.csv
fi
done
ROW_COUNTER=$((ROW_COUNTER+1))
printf "\rProcessing Record(s): $ROW_COUNTER"
done < $1
printf "\n"
echo "Script Ended: `date`"
printf "\n"
echo "Files Found: $FOUND_COUNTER"
echo "Files NOT Found: $NOTFOUND_COUNTER"
echo "------------------------------------------------------------"
printf "\n"
 
Old 03-12-2015, 08:35 PM   #2
T3RM1NVT0R
Senior Member
 
Registered: Dec 2010
Location: Internet
Distribution: Linux Mint, SLES, CentOS, Red Hat
Posts: 2,385

Rep: Reputation: 477Reputation: 477Reputation: 477Reputation: 477Reputation: 477
How about using double quotes in the source file / fileA in your case. You can use the following sed command to put double quotes "" in all file names that you have in file A:

Code:
cat fileA | sed 's/.*/\"&\"/g' > newfilename
Once you get quotes you can rename the newfilename to fileA and can try it against your script.
 
1 members found this post helpful.
Old 03-12-2015, 09:52 PM   #3
Miati
Member
 
Registered: Dec 2014
Distribution: Linux Mint 17.*
Posts: 326

Rep: Reputation: 106Reputation: 106
For loops generally don't work well when reading from file lists.
Here's a quick little script to demonstrate my point

If I have a file named content with the 3 lines:
Code:
foo
bar
foo bar
I want to echo each line so I type in this script to test both ways

Code:
while read lines
        do
                for i in $lines
                        do
                                echo "$i"
                done
done < "$1"

echo ""

while read lines
        do
                echo "$lines"
done < "$1"
Then I run as ./filelister content

and I get this:

Code:
foo
bar
foo
bar

foo
bar
foo bar
Notice the second (while loop) kept the line together while the first (for loop) did not despite being very similar otherwise.
This makes sense if you think about how for loops work.

Code:
for i in A B C D; do echo $i; done
This will echo out A B C D, each going through it's own loop
But what if it was read on a file that went like this?

Code:
A
B C D
Doesn't matter, if it's being written to a for loop like A B C D it'd going to treat each one separately.
Reading it from a while loop prevents this since it deals with each line separately before moving on.

Last edited by Miati; 03-12-2015 at 10:11 PM.
 
1 members found this post helpful.
Old 03-12-2015, 11:53 PM   #4
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
Ok, first off, please place code/data in [code][/code] tags to make it more readable.

Other than that, I see a few issues:

1. All places where you refer to :- $1_anything ... This will not behave at all like you expect. I would suggesting put set -xv on the second line of your script and checking the output provided.

2. As advised above, when in doubt quote. As a beginner, I would suggest you put double quotes around ALL variables until you are ready to experiment with what cases they might not be needed

3. $() is preferred over `` as it is both clearer and can be nested easily, should you require.

4. -f advises 'True if file exists and is a regular file.' ... is there any requirement if it exists but is a different type of file? (ie. sym link or directory)

5. Try naming your variables so things like the passed in parameters are removed. in this code it is not such an issue, but once you start creating functions and the like you will start to have many $1
values and it would be nice to know what each refers to and that we are not looking at global variables when we shouldn't be.

6. I have to presume you pass the name of the file to be read to the script ... there is no usage statement to advise what I need to do, nor any test to make sure it exists in the first place??

7. As you provided no example input data, from the script I ascertain that it is a csv file with a path and an id.
a. Assuming 'path' is something like 'directoryA', the for loop will run exactly once and test to see if there exists a file in my current directory called 'directoryA'
b. If a path to a file, 'directoryA/fileA', it will again run only once and now need the sub-directory and the file to exist
c. If it contains a glob, 'directoryA/*', now it will run multiple times (assuming sub-directory exists, otherwise it runs once), but unless the 'id' is meaningful enough, all data from all sub-directories will end up in the same found/notfound files (not sure if this is desired or not)

8. Info :- ++ increments work in bash (below are equivalent):
Code:
FOUND_COUNTER=$((FOUND_COUNTER+1))

((FOUND_COUNTER++))
9. Personal preference :- not sure I see the point of a mix of printf and echo statements? Even better, you could use heredocs

Hope some of that helps
 
2 members found this post helpful.
Old 03-17-2015, 01:22 PM   #5
aadams
LQ Newbie
 
Registered: Jun 2011
Posts: 14

Original Poster
Rep: Reputation: Disabled
Thank you everyone for the feedback. grail and T3RM1NVT0R, your suggestions were definitely beneficial to me, but I still haven't been able to figure out a solution for this particular problem. My fileA is a filelist that is a .csv file with commas as delimiters. directoryA is a large (360GB) directory structure that contains regular files and a few irregular files throughout the directory tree. I thought it was odd/interesting that when I did cat fileA or vim fileA, the output looked clean, however, after running my search script against fileA, my notfound.csv file now lists its entries in a strange way, not at all like the entries looked in fileA, prior to running the search script. So, I presume my search script isn't reading fileA correctly. What is it about the search script that's not reading or translating my fileA correctly?

fileA is not broken up at the spaces when using cat or vim to view it, however, notfound.csv breaks the entries up just after the spaces.

Note the spaces after "BM". fileA lists all the entries like this, as a single line.



#cat fileA.csv



/content/bogusdirectory/BM BOSIS-00-B00-00297-00/BM BOSIS-00-G10-00297-00_991_A.*,13999
/content/bogusdirectory/BM BOSIS-00-B00-00299-00/BM BOSIS-00-G10-00299-00_991_A.*,14000



notfound.csv lists all the entries, but they are broken up at the spaces and put on seperate lines inside the file. Again, all the entries here in notfound.csv came from the filelist found inside fileA, but the search script read fileA and compared it to directoryA, then if it didn't find an entry in directoryA, the search script wrote the entry to notfound.csv, but when it found a space in an entry, it put the subsequent characters on a new line.



cat notfound.csv



,/content/bogusdirectory/BM
,BOSIS-00-B00-00297-00/BM
,BOSIS-00-B00-00297-00_991_A.*
,/content/bogusdirectory/BM
,BOSIS-00-B00-00299-00/BM
,BOSIS-00-B00-00299-00_991_A.*
 
Old 03-17-2015, 08:34 PM   #6
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
Ok ... so now that we have some data

The for loop below will break the string using IFS and hence white space is used and your string are not what you expect:
Code:
for f in $path;
So my suggestion would be to remove the '*' from the fileA.csv file and then we can use the globbing in the for loop:
Code:
$ cat fileA.csv
/content/bogusdirectory/BM BOSIS-00-B00-00297-00/BM BOSIS-00-G10-00297-00_991_A.,13999
/content/bogusdirectory/BM BOSIS-00-B00-00299-00/BM BOSIS-00-G10-00299-00_991_A.,14000

# then inside script use
while IFS="," read path id
do
  for f in "$path"*
  do
Let me know if that works out for you
 
1 members found this post helpful.
Old 03-18-2015, 07:36 AM   #7
aadams
LQ Newbie
 
Registered: Jun 2011
Posts: 14

Original Poster
Rep: Reputation: Disabled
grail, I have removed the '*' from all lines in fileA and used the globbing in the for loop as recommended. Now, I get the output listed below. Line 16 is where you'll find... if [ -f $f ]. Much gratitude for trying to help me figure this out.

Output after making the above changes, then running search.sh againt fileA:

$ ./search.sh fileA.csv

--- File Existance Script ----------------------------------
Script Started: Wed Mar 18 08:18:47 EDT 2015
Clearing log files...
Starting File Check...
./search.sh: line 16: [: too many arguments
Processing Record(s): 1./search.sh: line 16: [: too many arguments
Processing Record(s): 2./search.sh: line 16: [: too many arguments
Processing Record(s): 3./search.sh: line 16: [: too many arguments
Processing Record(s): 4./search.sh: line 16: [: too many arguments
Processing Record(s): 5./search.sh: line 16: [: too many arguments
Processing Record(s): 6./search.sh: line 16: [: too many arguments
Processing Record(s): 7./search.sh: line 16: [: too many arguments
Processing Record(s): 8./search.sh: line 16: [: too many arguments
Processing Record(s): 9./search.sh: line 16: [: too many arguments
Processing Record(s): 10./search.sh: line 16: [: too many arguments
Processing Record(s): 11./search.sh: line 16: [: too many arguments
Processing Record(s): 12./search.sh: line 16: [: too many arguments
Processing Record(s): 17
Script Ended: Wed Mar 18 08:18:47 EDT 2015

Files Found: 4
Files NOT Found: 13


Code, showing line 16 at "if":

[code] do
for f in "$path"*
do
if [ -f $f ]
then
FOUND_COUNTER=$((FOUND_COUNTER+1))
echo "$id,$f" >> $1_found.csv;
else
NOTFOUND_COUNTER=$((NOTFOUND_COUNTER+1))
echo "$id,$f" >> $1_notfound.csv
fi
done
[code]
 
Old 03-18-2015, 08:27 AM   #8
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,842

Rep: Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308
please use [code]here comes your code[/code]
Probably you need to use:
Code:
if [ -f "$f" ]
 
1 members found this post helpful.
Old 03-18-2015, 08:39 PM   #9
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
I am with pan64 on this one ... as with point 2 of my first post. You can almost never have enough quoting in bash scripts.
 
Old 03-18-2015, 10:57 PM   #10
aadams
LQ Newbie
 
Registered: Jun 2011
Posts: 14

Original Poster
Rep: Reputation: Disabled
You guys rock! Thanks so much for the help! It's working as desired now. I added a line in the search script to have sed strip '*' from each line of fileA, then put double quotes and '*' here in the script... for f in"$path"*. The double quotes here were also necessary. if [ -f "$f" ]
 
Old 03-18-2015, 11:56 PM   #11
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
Glad we got there in the end Please remember to mark ticket as SOLVED.
 
1 members found this post helpful.
  


Reply

Tags
bash, scripting, whitespace



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Get file modification date/time in Bash script cmfarley19 Programming 16 04-15-2015 06:25 PM
[SOLVED] Bash script passing file names with spaces into another program yilez Programming 10 04-04-2013 05:45 AM
Bash Script to Copy Modification Date from a file to his folder pjgm Programming 12 07-31-2011 08:33 AM
bash script: use the directory of the script file as variable? phling Linux - Newbie 12 01-16-2010 07:16 PM
Bash Script - adding file modification date to end of filenames in directory themonkman Programming 2 09-01-2009 11:45 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 09:42 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration