[SOLVED] Bash script modification: Allowing spaces in directory/file name searches
ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Bash script modification: Allowing spaces in directory/file name searches
OS: RHEL5
Scripting Experience: Beginner
Bash
Currently, myscript.sh will check the contents of fileA and compare it to directoryA and all of its subdirectory names and file names. If myscript.sh finds matches, it writes the matches to $1_found.csv. If myscript.sh looks at the line-by-line content inside fileA and is unable to match any entries to anything inside directoryA, it then writes each of those items to $1_notfound.csv. This is how it should work.
As it stands, the problem with this is that directoryA and fileA both contain some files whos names contain spaces (Example: "My file 1" and "Myfile2"). So, when myscript.sh runs, it finds "Myfile2" in both fileA and directoryA, then writes the name of "Myfile2" to $1_found.csv. That's perfect. However, "My file 1" is read as 3 seperate files even though it is actually present in fileA and directoryA, its name gets written to $1_notfound.csv, because it has spaces in its name. I need "My file 1" and any other file with spaces in the name to get written to $1_found.csv, if it exists in both locations.
So, I am looking for a nice fix for this problem. Matching directory and filenames between fileA and directoryA sholud still be written to $1_found.csv, even if the name has spaces in it. I know it's not a good practice to use filenames with spaces in Linux, but there are other stakeholders involved and files which came from other OS platforms (e.g. Windows) and I don't have the luxury of simply replacing all the spaces with an underscore for example.
How about using double quotes in the source file / fileA in your case. You can use the following sed command to put double quotes "" in all file names that you have in file A:
Code:
cat fileA | sed 's/.*/\"&\"/g' > newfilename
Once you get quotes you can rename the newfilename to fileA and can try it against your script.
For loops generally don't work well when reading from file lists.
Here's a quick little script to demonstrate my point
If I have a file named content with the 3 lines:
Code:
foo
bar
foo bar
I want to echo each line so I type in this script to test both ways
Code:
while read lines
do
for i in $lines
do
echo "$i"
done
done < "$1"
echo ""
while read lines
do
echo "$lines"
done < "$1"
Then I run as ./filelister content
and I get this:
Code:
foo
bar
foo
bar
foo
bar
foo bar
Notice the second (while loop) kept the line together while the first (for loop) did not despite being very similar otherwise.
This makes sense if you think about how for loops work.
Code:
for i in A B C D; do echo $i; done
This will echo out A B C D, each going through it's own loop
But what if it was read on a file that went like this?
Code:
A
B C D
Doesn't matter, if it's being written to a for loop like A B C D it'd going to treat each one separately.
Reading it from a while loop prevents this since it deals with each line separately before moving on.
Ok, first off, please place code/data in [code][/code] tags to make it more readable.
Other than that, I see a few issues:
1. All places where you refer to :- $1_anything ... This will not behave at all like you expect. I would suggesting put set -xv on the second line of your script and checking the output provided.
2. As advised above, when in doubt quote. As a beginner, I would suggest you put double quotes around ALL variables until you are ready to experiment with what cases they might not be needed
3. $() is preferred over `` as it is both clearer and can be nested easily, should you require.
4. -f advises 'True if file exists and is a regular file.' ... is there any requirement if it exists but is a different type of file? (ie. sym link or directory)
5. Try naming your variables so things like the passed in parameters are removed. in this code it is not such an issue, but once you start creating functions and the like you will start to have many $1
values and it would be nice to know what each refers to and that we are not looking at global variables when we shouldn't be.
6. I have to presume you pass the name of the file to be read to the script ... there is no usage statement to advise what I need to do, nor any test to make sure it exists in the first place??
7. As you provided no example input data, from the script I ascertain that it is a csv file with a path and an id.
a. Assuming 'path' is something like 'directoryA', the for loop will run exactly once and test to see if there exists a file in my current directory called 'directoryA'
b. If a path to a file, 'directoryA/fileA', it will again run only once and now need the sub-directory and the file to exist
c. If it contains a glob, 'directoryA/*', now it will run multiple times (assuming sub-directory exists, otherwise it runs once), but unless the 'id' is meaningful enough, all data from all sub-directories will end up in the same found/notfound files (not sure if this is desired or not)
8. Info :- ++ increments work in bash (below are equivalent):
Thank you everyone for the feedback. grail and T3RM1NVT0R, your suggestions were definitely beneficial to me, but I still haven't been able to figure out a solution for this particular problem. My fileA is a filelist that is a .csv file with commas as delimiters. directoryA is a large (360GB) directory structure that contains regular files and a few irregular files throughout the directory tree. I thought it was odd/interesting that when I did cat fileA or vim fileA, the output looked clean, however, after running my search script against fileA, my notfound.csv file now lists its entries in a strange way, not at all like the entries looked in fileA, prior to running the search script. So, I presume my search script isn't reading fileA correctly. What is it about the search script that's not reading or translating my fileA correctly?
fileA is not broken up at the spaces when using cat or vim to view it, however, notfound.csv breaks the entries up just after the spaces.
Note the spaces after "BM". fileA lists all the entries like this, as a single line.
notfound.csv lists all the entries, but they are broken up at the spaces and put on seperate lines inside the file. Again, all the entries here in notfound.csv came from the filelist found inside fileA, but the search script read fileA and compared it to directoryA, then if it didn't find an entry in directoryA, the search script wrote the entry to notfound.csv, but when it found a space in an entry, it put the subsequent characters on a new line.
The for loop below will break the string using IFS and hence white space is used and your string are not what you expect:
Code:
for f in $path;
So my suggestion would be to remove the '*' from the fileA.csv file and then we can use the globbing in the for loop:
Code:
$ cat fileA.csv
/content/bogusdirectory/BM BOSIS-00-B00-00297-00/BM BOSIS-00-G10-00297-00_991_A.,13999
/content/bogusdirectory/BM BOSIS-00-B00-00299-00/BM BOSIS-00-G10-00299-00_991_A.,14000
# then inside script use
while IFS="," read path id
do
for f in "$path"*
do
grail, I have removed the '*' from all lines in fileA and used the globbing in the for loop as recommended. Now, I get the output listed below. Line 16 is where you'll find... if [ -f $f ]. Much gratitude for trying to help me figure this out.
Output after making the above changes, then running search.sh againt fileA:
$ ./search.sh fileA.csv
--- File Existance Script ----------------------------------
Script Started: Wed Mar 18 08:18:47 EDT 2015
Clearing log files...
Starting File Check...
./search.sh: line 16: [: too many arguments
Processing Record(s): 1./search.sh: line 16: [: too many arguments
Processing Record(s): 2./search.sh: line 16: [: too many arguments
Processing Record(s): 3./search.sh: line 16: [: too many arguments
Processing Record(s): 4./search.sh: line 16: [: too many arguments
Processing Record(s): 5./search.sh: line 16: [: too many arguments
Processing Record(s): 6./search.sh: line 16: [: too many arguments
Processing Record(s): 7./search.sh: line 16: [: too many arguments
Processing Record(s): 8./search.sh: line 16: [: too many arguments
Processing Record(s): 9./search.sh: line 16: [: too many arguments
Processing Record(s): 10./search.sh: line 16: [: too many arguments
Processing Record(s): 11./search.sh: line 16: [: too many arguments
Processing Record(s): 12./search.sh: line 16: [: too many arguments
Processing Record(s): 17
Script Ended: Wed Mar 18 08:18:47 EDT 2015
Files Found: 4
Files NOT Found: 13
Code, showing line 16 at "if":
[code] do
for f in "$path"*
do
if [ -f $f ]
then
FOUND_COUNTER=$((FOUND_COUNTER+1))
echo "$id,$f" >> $1_found.csv;
else
NOTFOUND_COUNTER=$((NOTFOUND_COUNTER+1))
echo "$id,$f" >> $1_notfound.csv
fi
done
[code]
You guys rock! Thanks so much for the help! It's working as desired now. I added a line in the search script to have sed strip '*' from each line of fileA, then put double quotes and '*' here in the script... for f in"$path"*. The double quotes here were also necessary. if [ -f "$f" ]
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.