LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 08-03-2012, 06:26 PM   #1
kristo5747
Member
 
Registered: Jul 2010
Location: Earth
Distribution: Ubuntu 11.04 (Natty Narwhal)
Posts: 31

Rep: Reputation: 0
Prevent word splitting with file with spaces in name


Hello,

I have a script that "validates" a ZIP file that look like this

Code:
AAA_20120801.zip =>
x~back end~20120801.TXT
y~time in~20120801.TXT
z~heat_chamber~20120801.TXT
AAA_20120801.ctl
My task is to compare its contents (i.e the list of files contained inside) with the control file that is provided inside the ZIP file itself.

Since we've started receiving files with spaces in their name, I prevented the OS from word splitting file names when I view the ZIP's contents like so

Code:
FIRST_Array=(); while read length date time filename; do FIRST_Array+=( "$filename" ); echo -e "$filename"; 
done < <(/usr/bin/unzip -qql AAA_20120801.zip)
When I try to do the same with the control file,

Code:
SECOND_Array=(); while read length date time filename; do SECOND_Array+=( "$filename" ); echo -e "$filename"; done < <(/usr/bin/unzip -c AAA_20120801.zip AAA_20120801.ctl )
SECOND_Array() correctly outputs the file names in the control files but it also output the file sizes listed in the control file
Code:
x~back end~20120801.TXT 2KB
y~time in~20120801.TXT 2KB
z~heat_chamber~20120801.TXT 2KB
and my array comparison (diff -q) fails.

I tried adding this bit of Awk() code to remove the file size field but it brings word splitting back!

Code:
SECOND_Array=(); while read filename; do SECOND_Array+=( "$filename" ); echo -e "$filename"; done < <(/usr/bin/unzip -p AAA_20120801.zip AAA_20120801.ctl |awk '{print $1}')

How can I remove the file size info and prevent word splitting? Any ideas?

Thank you.

Last edited by kristo5747; 08-03-2012 at 07:01 PM.
 
Old 08-04-2012, 05:51 AM   #2
smallpond
Senior Member
 
Registered: Feb 2011
Location: Massachusetts, USA
Distribution: Fedora
Posts: 4,140

Rep: Reputation: 1263Reputation: 1263Reputation: 1263Reputation: 1263Reputation: 1263Reputation: 1263Reputation: 1263Reputation: 1263Reputation: 1263
Bash has a very neat feature for removing a suffix, if it is present:

Code:
>echo $A
Alfredo Garcia
>echo ${A% *KB}
Alfredo Garcia
>echo $B       
Moon Unit 16KB
>echo ${B% *KB}
Moon Unit
You can also remove a prefix by using '#' instead of a '%'
 
Old 08-04-2012, 09:31 AM   #3
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
Why not use the simple option of providing something to store what you don't want:
Code:
SECOND_Array=(); while read length date time filename _; do SECOND_Array+=( "$filename" ); echo -e "$filename"; done < <(/usr/bin/unzip -c AAA_20120801.zip AAA_20120801.ctl )
 
Old 08-05-2012, 01:32 AM   #4
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Arch + Xfce
Posts: 6,852

Rep: Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037
It would be much easier to read your code if you formatted it for multiple lines. It's not good practice to concatenate multiple commands on a single line inside a script file. You should generally only do it when working in an interactive shell.


In any case, what appears to have happened is that the format of your input data has changed, and the loop isn't designed to handle that change.

[ Edit: after re-reading the OP a few times, I think I may have misunderstood the situation. I think that input now that the lines could have a filename with spaces, plus a size field? It would help if we could see what the raw input from the unzip command looks like. It could be that smallpond's answer is the correct one. ]

grail [perhaps] has the correct answer. The read command will cram all remaining text into the last variable. So simply add another variable. One common practice is to use "_" as a disposable variable, which has a special meaning to bash and is overwritten after each command, but you can use anything you want.

In fact, since it appears that you're only interested in the filename, you can simply re-use the disposable variable as many times as necessary to capture the unwanted fields.

Code:
SECOND_Array=()

while read -r _ _ _ filename _; do

	SECOND_Array+=( "$filename" )
	echo -e "$filename" 

done < <(/usr/bin/unzip -c AAA_20120801.zip AAA_20120801.ctl )
(Note also that it's generally recommended to include the "-r" option as well, in case the input data contains any backslashes.)

Another option would be to change it to use an array instead (the "-a" option).

Code:
SECOND_Array=()

while read -r -a line; do

	SECOND_Array+=( "${line[3]}" )
	echo -e "${line[3]}" 

done < <(/usr/bin/unzip -c AAA_20120801.zip AAA_20120801.ctl )
How can I read a file (data stream, variable) line-by-line (and/or field-by-field)?
http://mywiki.wooledge.org/BashFAQ/001

Last edited by David the H.; 08-05-2012 at 01:42 AM. Reason: as added
 
Old 08-07-2012, 03:46 PM   #5
kristo5747
Member
 
Registered: Jul 2010
Location: Earth
Distribution: Ubuntu 11.04 (Natty Narwhal)
Posts: 31

Original Poster
Rep: Reputation: 0
This did the trick:

Quote:
while read -r line
do
file=${line% *}
array+=( "$file" )
printf '%s\n' "$file"
done < <(/usr/bin/unzip -p JABL_XML_20120801_165917.zip JABL_XML_20120801_165917.ctl
Thanks to everyone for taking the time.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] splitting every word in a text file to a new line. imran042 Programming 7 05-25-2012 05:29 PM
splitting a fil.e word by word. PERL casperdaghost Linux - Newbie 6 11-23-2010 05:45 AM
[SOLVED] printf white space issue word splitting dukedog Linux - Newbie 2 06-15-2010 01:46 PM
[SOLVED] Bash: why is word splitting not effective in an assignment catkin Programming 5 12-12-2009 12:02 PM
How do I prevent echo eating spaces? essdeeay Linux - General 3 11-01-2005 03:11 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 11:40 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration