Zipping multiple files

Quads · 12-25-2011, 02:32 PM

I tried searching the forums but couldnt find what I was looking for.

I have a directory with multiple files that I want to zip to individual zip files, i.e. 1.txt 2.txt 3.txt to 1.zip 2.zip 3.zip, etc.

I have tried doing this - for i in *.*; do f=${i%.*};zip ${f}.zip $i; done

However, the filenames have spaces in them and I do not know how to get it to ignore these spaces. I've done it before, I just can't remember. Very frustrating. Thanks in advance.

Dark_Helmet · 12-25-2011, 03:07 PM

Quote:

Originally Posted by Quads

However, the filenames have spaces in them and I do not know how to get it to ignore these spaces.

Just quote the names. For instance, given the code snippet you provided:

Code:

for i in *.* ;
do
  f=${i%.*}
  zip "${f}.zip" "$i"
done

EDIT:
If you want to remove the spaces in the filenames or replace the spaces with something else (like an underscore), then all you have to do is change your assignment to f:

Code:

f=$( echo "${i%.*}" | sed 's@ @_@g' )

David the H. · 12-26-2011, 05:34 AM

Please use [code][/code] tags around your code and data, to preserve formatting and to improve readability.

It's vitally important in shell use and scripting to understand how the shell handles arguments and whitespace. Read these three links for a full explanation.

http://mywiki.wooledge.org/Arguments
http://mywiki.wooledge.org/WordSplitting
http://mywiki.wooledge.org/Quotes

@D.H., there's no need to use sed (or tr) here. The shell can do character replacement itself with another simple parameter substitution.

Code:

for i in *.*; do
	f=${i%.*}
	f=${f//[[:space:]]/_}
	zip "$f.zip" "$i"
done

Use the [:space:] character class instead so that it will handle all whitespace characters (space, tab, newline, vertical tab, form feed, carriage return).

Notice though that the above code example removes spaces from the names of the zip archives, but it doesn't affect the files going into them. To do that you'd have to do a separate rename operation on them first before zipping them (a true rename, not just changing the text stored in the variable).

Dark_Helmet · 12-26-2011, 05:56 AM

Quote:

Originally Posted by David the H.

@D.H., there's no need to use sed (or tr) here. The shell can do character replacement itself with another simple parameter substitution.

Oh, I know about the multiple different parameter substitutions that bash is capable of performing. However, I choose not to use them because they are cryptic. It's far more straightforward (to me) to see a sed command as opposed to bash's ##, %%, //, ^^, or ',,'

It's the same reason I avoid Perl like the plague. There's a Perl culture that's developed that seems to praise "solutions" that contain the fewest keystrokes, most cryptic syntax, and gratuitous use of hidden variables.

If a particular syntax works better for the author, great! To each his own

For me, the performance hit because of an echo-sed pipeline is negligible--especially compared to the time it takes for me to open the bash man page, search for "parameter expansion", hit 'n' a couple times, and read the appropriate section.

David the H. · 12-26-2011, 07:41 AM

Hmm, well it is your choice of course.

I fail to see, however, how "${var//x/y}", globally substitute y for x, is any more "cryptic" than sed's comparable "s/x/y/g" syntax. Like all commands and coding structures it becomes second nature when you use it often enough.

As for performance...

Code:

$ text='a1b2c3d4e5'

$ time echo "$text" | sed 's/[0-9]//g'
abcde

real    0m0.016s
user    0m0.000s
sys     0m0.000s

$ time echo "$text" | tr -d '[0-9]'
abcde

real    0m0.007s
user    0m0.000s
sys     0m0.000s

$ time echo "${text//[0-9]/}"
abcde

real    0m0.000s
user    0m0.000s
sys     0m0.000s

Not especially significant in a short operation, sure, but those extra milliseconds can really start to add up in a long-running script. And I at least think the last command line as a whole is cleaner than the others. It's certainly shorter.

Dark_Helmet · 12-26-2011, 10:48 AM

Quote:

Originally Posted by David the H.

I fail to see, however, how "${var//x/y}", globally substitute y for x, is any more "cryptic" than sed's comparable "s/x/y/g" syntax.

On it's own, not much. Taken together with bash's other expansion modifiers... there's no contextual clue what '#' does as compared to '^' or '/' or ',' or '%'. At least with sed, there's the "hint" that 's' = substitution, 'g' = global, 'd' = delete, and so on.

And as for the performance, I certainly cannot open up bash's man page, search, and read in less than 0.016 seconds. In fact, reading comments in the script about what the expansion does takes much longer than that in even the most concise phrasing imaginable.

For a long script (i.e. thousands of operations), an accumulation of 16 seconds could be noticeable I grant. But again, to me, not a deal breaker. Hundreds of thousands... and in such situations, it might be worth it to research alternatives. Though, I wouldn't think to use a shell script to handle such a large data volume.

David the H. · 12-27-2011, 07:27 AM

Nah, I still don't buy it.

sed was just as hard to grapple with when I first started to really learn how to use it -- even harder in many ways, as it's much more complex and feature-filled. Indeed, I still have to consult the documentation occasionally for expressions I don't use often. The multi-line/hold buffer stuff in particular is a real nightmare.

In contrast bash has about 30 parameter substitution patterns. They all use the same basic syntax, and about half of them are simply variations like ${var%}/${var%%}. So really, there are only about 15 basic expressions and their variations that you need to worry about. Nor does it take much brain power to remember that / means substitution, that # and % mean start from the front and the end, respectively, that (perhaps the easiest of all) ^ and , are for upper and lower case, and that doubling up a character makes it a global/longest match.

Is memorizing that any more difficult than learning regular expressions, for example?

So again, the only time it really takes is the time for you to become accustomed to using them. It's only things you don't use often that require the trips to the man pages. Force yourself to use them as much as possible for a few months and you'll find that they soon become second-nature. In the end all you really appear to be arguing against is the fact that there's a learning curve involved; just like sed, just like regex, just like vi commands, and just like just about everything else, in computing and out of it.

Dark_Helmet · 12-27-2011, 02:31 PM

We can go back and forth all day with this

Quote:

Originally Posted by David the H.

In the end all you really appear to be arguing against is the fact that there's a learning curve involved; just like sed, just like regex, just like vi commands, and just like just about everything else, in computing and out of it.

Well, yes and no.

===============
The "yes" part:
===============
Something is cryptic if it is difficult to understand. Something is difficult to understand if it is not amenable to memory. A structure that has mnemonics is more amenable to memory than a structure with no mnemonics. I'm going to come back to the quoted text above in just a second, but let me turn your argument around if I may...

Because, in the end, all you really appear to be arguing is that once you learn something, it's easy to know and no longer cryptic. Well, of course! if you already know it, it's obvious!

Just as if I already know German, then German is no longer cryptic.

Your experience learning sed was certainly different as compared to mine. However, the mnemonics enabled me to experiment with the commands from one session to the next--each session separated by days--as opposed to opening the man page and re-reading for each session. That being the case, I should state that my use for sed is 99% simple-ish regex substitutions. There may be darker, seedy corners of sed, but I have no use for them (yet). Though, for the purposes of this discussion, those darker, seedy corners are irrelevant. Simple-ish regex substitutions with sed are sufficient to perform the special form bash parameter expansions--because that's what we're comparing and not the complexity of the sed command in its entirety.

And, because, "special form bash parameter expansions" is a tad unwieldy, I hope everyone will excuse me if I substitute "Expansions" for it. Or a 's@special form bash parameter e$xpansions$@E\1@' if you will

===============
The "no" part:
===============
Also, if I may pull a new card into this discussion, I could put forward an argument that learning the form of Expansions runs counter to the Unix philosophy: do one thing and do it well. In that respect, as you said, sed is more "feature-filled" in text manipulation than the Expansions. So, philosophically, bash is not "[doing] it well" compared to other available text manipulation tools. The implication is that (a) bash shouldn't offer the Expansions, or (b) the user should learn the more-capable tool.

For the other prong of the philosophy, "do one thing," bash's primary tasks should be maintaining environment variables, spawning processes after parsing user command lines, and job control--not text manipulation. And, arguably, should leave scripting entirely to the likes of Perl, Python, Tcl, and their other cohorts.

Ok, great. Philosophy is wonderful, but it's not really concrete and not everyone subscribes to it. Even so, it's hard to escape the practical implications of the "do it well" part of the philosophical argument. As a practical matter, why would I, as a user, want to learn a second, third, or fourth syntax for text manipulation when the new syntax has no mnemonics and is less "feature-filled" than what I am already familiar with? You may want to feed me my own words, but I'll cut you off at the pass

"Simple-ish regex substitutions with sed are sufficient to perform the [Expansions]--because that's what we're comparing and not the complexity of the sed command in its entirety."

Indeed. Even if we take away the features sed provides that are beyond Expansions, the argument still stands: why would I, as a user, want to learn a second, third, or fourth syntax for text manipulation when that new syntax gains no new capability over the syntax that I'm already familiar with?

As you pointed out in an earlier response, an echo-sed pipeline is slower than an Expansion. The time saving could be the answer to the "why" question above. But as I said in response to those statistics, for me, the time consumed is negligible until you get to hundreds of thousands of operations. And in such large data volumes, I would probably look for (or write in C/C++/Java/whatever) a tool more specialized to the processing I need.

Quads · 12-29-2011, 12:47 PM

Thanks guys, I will definately look at those links regarding shell usage, I need to get up to par on this stuff.