LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   Prevent word splitting with file with spaces in name (https://www.linuxquestions.org/questions/linux-newbie-8/prevent-word-splitting-with-file-with-spaces-in-name-4175420271/)

kristo5747 08-03-2012 06:26 PM

Prevent word splitting with file with spaces in name
 
Hello,

I have a script that "validates" a ZIP file that look like this

Code:

AAA_20120801.zip =>
x~back end~20120801.TXT
y~time in~20120801.TXT
z~heat_chamber~20120801.TXT
AAA_20120801.ctl

My task is to compare its contents (i.e the list of files contained inside) with the control file that is provided inside the ZIP file itself.

Since we've started receiving files with spaces in their name, I prevented the OS from word splitting file names when I view the ZIP's contents like so

Code:

FIRST_Array=(); while read length date time filename; do FIRST_Array+=( "$filename" ); echo -e "$filename";
done < <(/usr/bin/unzip -qql AAA_20120801.zip)

When I try to do the same with the control file,

Code:

SECOND_Array=(); while read length date time filename; do SECOND_Array+=( "$filename" ); echo -e "$filename"; done < <(/usr/bin/unzip -c AAA_20120801.zip AAA_20120801.ctl )
SECOND_Array() correctly outputs the file names in the control files but it also output the file sizes listed in the control file
Code:

x~back end~20120801.TXT 2KB
y~time in~20120801.TXT 2KB
z~heat_chamber~20120801.TXT 2KB

and my array comparison (diff -q) fails.

I tried adding this bit of Awk() code to remove the file size field but it brings word splitting back!

Code:

SECOND_Array=(); while read filename; do SECOND_Array+=( "$filename" ); echo -e "$filename"; done < <(/usr/bin/unzip -p AAA_20120801.zip AAA_20120801.ctl |awk '{print $1}')

How can I remove the file size info and prevent word splitting? Any ideas?

Thank you.

smallpond 08-04-2012 05:51 AM

Bash has a very neat feature for removing a suffix, if it is present:

Code:

>echo $A
Alfredo Garcia
>echo ${A% *KB}
Alfredo Garcia
>echo $B     
Moon Unit 16KB
>echo ${B% *KB}
Moon Unit

You can also remove a prefix by using '#' instead of a '%'

grail 08-04-2012 09:31 AM

Why not use the simple option of providing something to store what you don't want:
Code:

SECOND_Array=(); while read length date time filename _; do SECOND_Array+=( "$filename" ); echo -e "$filename"; done < <(/usr/bin/unzip -c AAA_20120801.zip AAA_20120801.ctl )

David the H. 08-05-2012 01:32 AM

It would be much easier to read your code if you formatted it for multiple lines. It's not good practice to concatenate multiple commands on a single line inside a script file. You should generally only do it when working in an interactive shell.


In any case, what appears to have happened is that the format of your input data has changed, and the loop isn't designed to handle that change.

[ Edit: after re-reading the OP a few times, I think I may have misunderstood the situation. I think that input now that the lines could have a filename with spaces, plus a size field? It would help if we could see what the raw input from the unzip command looks like. It could be that smallpond's answer is the correct one. ]

grail [perhaps] has the correct answer. The read command will cram all remaining text into the last variable. So simply add another variable. One common practice is to use "_" as a disposable variable, which has a special meaning to bash and is overwritten after each command, but you can use anything you want.

In fact, since it appears that you're only interested in the filename, you can simply re-use the disposable variable as many times as necessary to capture the unwanted fields.

Code:

SECOND_Array=()

while read -r _ _ _ filename _; do

        SECOND_Array+=( "$filename" )
        echo -e "$filename"

done < <(/usr/bin/unzip -c AAA_20120801.zip AAA_20120801.ctl )

(Note also that it's generally recommended to include the "-r" option as well, in case the input data contains any backslashes.)

Another option would be to change it to use an array instead (the "-a" option).

Code:

SECOND_Array=()

while read -r -a line; do

        SECOND_Array+=( "${line[3]}" )
        echo -e "${line[3]}"

done < <(/usr/bin/unzip -c AAA_20120801.zip AAA_20120801.ctl )

How can I read a file (data stream, variable) line-by-line (and/or field-by-field)?
http://mywiki.wooledge.org/BashFAQ/001

kristo5747 08-07-2012 03:46 PM

This did the trick:

Quote:

while read -r line
do
file=${line% *}
array+=( "$file" )
printf '%s\n' "$file"
done < <(/usr/bin/unzip -p JABL_XML_20120801_165917.zip JABL_XML_20120801_165917.ctl
Thanks to everyone for taking the time.


All times are GMT -5. The time now is 08:18 PM.