LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   How to list all file types recursively? (https://www.linuxquestions.org/questions/programming-9/how-to-list-all-file-types-recursively-4175422209/)

dahweeds 08-15-2012 10:51 AM

How to list all file types recursively?
 
I'd like to print a list of all file types in the current directory and below. I got this far with a little help from google. I post it since there was not an instant result in my search. Hopefully it will save someone time down the road. Or some better bashers can share a better way:)

Code:

user@system $ find ./ -type f | awk -F . '{print $NF}' |  sort --unique
css
gif
htaccess
jpg
js
php
png
sql
swf
txt

However, this will sort of crash on the typeless file. Example:

Code:

user@system $ touch install/somefile
user@system $ touch README
user@system $ find  ./ -type f  | awk -F . '{print $NF}' |  sort --unique
css
gif
htaccess
/install/somefile
jpg
js
php
png
/README
sql
swf
txt

So, it takes some other sort to strip the directory parts off the list.

Code:

user@system $ find  ./ -type f  | awk -F . '{print $NF}' |  sort --unique |  awk -F / '{print $NF}'
css
gif
htaccess
somefile
jpg
js
php
png
README
sql
swf
txt


ICYWTK:
I was looking for the types so I could set the permissions recursively too. Example:
Code:

user@system $ find ./ -type f -name '*.png' -exec chmod 644 {} \;

grail 08-15-2012 11:02 AM

So you have already solved your problem? Maybe you should post solutions as well in case others would like to know how.

ntubski 08-15-2012 12:08 PM

Quote:

Originally Posted by dahweeds (Post 4754751)
So, it takes some other sort to strip the directory parts off the list.

You could also use find's printf:
Code:

find . -type f -printf '%f\n'  # prints filenames with leading directories removed
Quote:

ICYWTK:
I was looking for the types so I could set the permissions recursively too. Example:
Code:

user@system $ find ./ -type f -name '*.png' -exec chmod 644 {} \;

I don't quite see why. If you want to set permissions for pngs (or any other specific set of types) you can do that without listing types first, if you want to set permissions for any type you find you can just remove the -name condition...

shane_kerr 08-16-2012 02:41 AM

You're pretty close.

You can use the "basename" command to remove the path from the files. It only runs on a single file name at a time, but you can get around this using the "xargs" command with "-l". To make this run smoothly, we need to use null-terminated file names, since otherwise xargs will treat spaces in file names as separators.

The following will give you only the file names under a given directory:

$ find . -type f -print0 | xargs -0l basename

The awk command actually has a way to filter based on a regular expression. So, rather than saying:

awk -F . '{print $NF}'

All you have to do is:

awk -F . '/\./{print $NF}'

That looks for a '.' character, which you have to escape with a backslash, otherwise '.' matches any character.

So, putting all this together:

$ find . -type f -print0 | xargs -0l basename | awk -F . '/\./{print $NF}' | sort --unique

What could be simpler? ;)

dahweeds 08-16-2012 07:27 AM

Quote:

Originally Posted by shane_kerr (Post 4755316)

$ find . -type f -print0 | xargs -0l basename | awk -F . '/\./{print $NF}' | sort --unique

What could be simpler? ;)

I guess from my noobie perspective, the original was a little simpler for two reasons.
  1. basename: I did not know about this Thanks for sharing the tool.
  2. regular expressions: I am way below understanding regular expressions. They come up a lot on the web though, so I often just cut and paste such things when I find a cure for a problem. I hope some may find this one too in the future.


I timed the original discovery against this idea.
Reg-ex version:

Code:

user@system $ time  find . -type f -print0 | xargs -0l basename | awk -F . '/\./{print $NF}' | sort --unique
.... file types ....

real        0m0.390s
user        0m0.008s
sys        0m0.028s

The execution delay was noticable. Maybe if there is a huge directory with many, many files this command could eat resources. (If that matters any more.)

Original:
Code:

user@system $ time find  ./ -type f  | awk -F . '{print $NF}' |  sort --unique |  awk -F / '{print $NF}'
.... file types ....

real        0m0.008s
user        0m0.004s
sys        0m0.000s


For comparison I also put the -printf option in the mix.
Code:

user@system $ time find  ./ -type f  -printf '%f\n' | awk -F . '{print $NF}' |  sort --unique
.... file types ....

real        0m0.007s
user        0m0.004s
sys        0m0.000s


It looks like the winner!

However, the same -printf should work in the reg-ex version too, which can eliminate the pipe to basename. Let's see how it does.
Code:

user@system $ time  find . -type f  -printf '%f\n' | awk -F . '/\./{print $NF}' | sort --unique
.... file types ....

real        0m0.007s
user        0m0.000s
sys        0m0.004s

Well, after all that, I realized that I had already deleted my typeless files from the directory. When I put them back two of the improved versions failed to find the typeless files.

example:
Code:

user@system $ touch README
user@system $ touch inc/SOMEFILE
time  find . -type f  -printf '%f\n' | awk -F . '/\./{print $NF}' | sort --unique
css
gif
htaccess
jpg
js
patch
php
png
sql
swf
txt

real        0m0.007s
user        0m0.000s
sys        0m0.004s

The original idea and ntubski's -printf option work. So perhaps the winner is:
Code:

user@system $ time find  ./ -type f  -printf '%f\n' | awk -F . '{print $NF}' |  sort --unique
css
gif
htaccess
jpg
js
patch
php
png
README
SOMEFILE
sql
swf
txt

real        0m0.007s
user        0m0.000s
sys        0m0.004s

It is fast as the original and finds all the file types, including those with no type.

ntubski 08-16-2012 01:54 PM

Quote:

I timed the original discovery against this idea.
Reg-ex version:

Code:

user@system $ time  find . -type f -print0 | xargs -0l basename | awk -F . '/\./{print $NF}' | sort --unique
.... file types ....

real        0m0.390s
user        0m0.008s
sys        0m0.028s

The execution delay was noticable. Maybe if there is a huge directory with many, many files this command could eat resources. (If that matters any more.)
The increased time is likely due to having to calling basename once per file.

Quote:

Original:
Code:

user@system $ time find  ./ -type f  | awk -F . '{print $NF}' |  sort --unique |  awk -F / '{print $NF}'
.... file types ....

real        0m0.008s
user        0m0.004s
sys        0m0.000s


For comparison I also put the -printf option in the mix.
Code:

user@system $ time find  ./ -type f  -printf '%f\n' | awk -F . '{print $NF}' |  sort --unique
.... file types ....

real        0m0.007s
user        0m0.004s
sys        0m0.000s


It looks like the winner!
I would hesitate to declare one method faster than the other based on such a tiny difference. In this case I'm pretty sure the printf version is doing less work, but in general I would try to increase the input size until it takes at least a few hundred milliseconds to run.

I had another idea which might make things faster:
Code:

find . -type f -printf '%f\n' | awk -F. '!($NF in exts){print $NF; exts[$NF]=1}'
This avoids sorting the output.


Quote:

When I put them back two of the improved versions failed to find the typeless files.
I wouldn't say failed: the regex filters out filenames lacking a dot. It wasn't really clear from your first post that you wanted to keep those "typeless" files.


All times are GMT -5. The time now is 05:31 AM.