LinuxQuestions.org - [SOLVED] Find files not already gzipped

- Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)

- - Find files not already gzipped (https://www.linuxquestions.org/questions/linux-newbie-8/find-files-not-already-gzipped-948521/)

Find files not already gzipped

I have a directory with a load of dns logs. Some are gzipped (ending in .gz) and some are just text files.

I run the following command:

ls | grep -v *.gz

expecting to get a list of the files that are NOT zipped, instead I get:

Binary file bind.log.1.120204.gz matches
Binary file bind.log.120125.gz matches
Binary file bind.log.120204.gz matches

Can someone explain what I am doing wrong? And why would I see this output instead of just the files that are not zipped?

The wildcard in your grep (*.gz) is being expanded on the call.

I would use find:

Code:

find . -maxdepth 1 ! -iname "*.gz"

Quote:

Originally Posted by suicidaleggroll (Post 4695514)

The wildcard in your grep (*.gz) is being expanded on the call.

I would use find:

Code:

find . -maxdepth 1 ! -iname "*.gz"

Thanks that works, but can you explain "expanded on the call" for me? I really like to learn the reason why.

Thanks!

For grep, * means "the preceding item will matched zero or more times".

Instead, write:
ls | grep -v gz$

You'll get all files whose name doesn't end in gz.

EDIT First sentence was wrong, corrected after reading post #8, thanks to David The H.

When you execute the command "ls | grep -v *.gz", the *.gz is being expanded to all .gz filenames in the cwd before running the command. In other words, grep is not receiving "*.gz" as an argument, it's receiving all of the .gz files in the cwd. You would have to delimit the *, or put "*.gz" in quotes for it to be passed to the grep as you expect.

However, as Didier pointed out, "*.gz" doesn't mean the same thing to grep as it does to ls, you would need "gz$" to do what you're looking for.

Thanks for the explainations!

how about get this done with only "ls" command. GNU "ls" has rich features. :)

Code:



#ls -l --ignore=*.gz

Please use ***[code][/code] tags*** around your code and data, to preserve formatting and to improve readability. Please do not use quote tags, colors, or other fancy formatting.

In bash, you can also exclude files using a simple extended globbing rule.

Code:



shopt -s extglob        #It's not enabled by default

echo !(*.gz)                #Glob patterns can be used with almost any command,

                        #as they're expanded by the shell before execution.

extended globbing
globbing

Quote:

Originally Posted by Didier Spaier (Post 4695519)

For grep, * means "match what follows zero or more times".

Huh? It means no such thing. grep uses regular expressions in its pattern matching, and in regex, "*" means "match the previous character zero or more times". "*.gz" is thus not a valid regular expression (there's no previous character), although it is a valid globbing pattern. The regex equivalent for that glob pattern is "^.*\.gz$". "." in regex means "any character", "^" is start of line", and "$" is "end of line". Note that the second period has to be backslash-escaped to make it literal.

Actually, the "^.*" part is really not necessary since the expression is anchored to the end of the line, so "\.gz$" is equivalent.

The "gz$" used above does also work for the most part, but do be aware that it will match any string that ends in "gz", e.g. "thingz".

Learning how to properly use regular expressions is one of the best bang-for-the-buck subjects you can spend your time on. A very large number of programs support, or even depend on, them.

Here are a few regular expressions tutorials:
http://mywiki.wooledge.org/RegularExpression
http://www.grymoire.com/Unix/Regular.html
http://www.regular-expressions.info/

Finally, be aware that parsing ls is generally not recommended. Use globbing patterns for simple file matching, and find for more complex ones, although it usually takes a bit more work to handle find's output safely.

Edit: One more point. Unlike shell globbing, find searches recursively, so it would return all matching files in subdirectories as well. Use the -maxdepth option to restrict it to the current directory only, as demonstrated by suicidaleggroll. Also, be aware that the -name options use globbing patterns (but not bash's extended globs). There's a separate set of -regex options if you need more sophisticated pattern matching.

Here are a couple of good links about using find:
http://mywiki.wooledge.org/UsingFind
http://www.grymoire.com/Unix/Find.html

Thanks David, post #4 corrected.