LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 08-15-2012, 10:51 AM   #1
dahweeds
LQ Newbie
 
Registered: Nov 2008
Posts: 24

Rep: Reputation: 0
How to list all file types recursively?


I'd like to print a list of all file types in the current directory and below. I got this far with a little help from google. I post it since there was not an instant result in my search. Hopefully it will save someone time down the road. Or some better bashers can share a better way

Code:
user@system $ find ./ -type f | awk -F . '{print $NF}' |  sort --unique
css
gif
htaccess
jpg
js
php
png
sql
swf
txt
However, this will sort of crash on the typeless file. Example:

Code:
user@system $ touch install/somefile
user@system $ touch README
user@system $ find   ./ -type f   | awk -F . '{print $NF}' |  sort --unique
css
gif
htaccess
/install/somefile
jpg
js
php
png
/README
sql
swf
txt
So, it takes some other sort to strip the directory parts off the list.

Code:
user@system $ find   ./ -type f   | awk -F . '{print $NF}' |  sort --unique |  awk -F / '{print $NF}'
css
gif
htaccess
somefile
jpg
js
php
png
README
sql
swf
txt

ICYWTK:
I was looking for the types so I could set the permissions recursively too. Example:
Code:
user@system $ find ./ -type f -name '*.png' -exec chmod 644 {} \;
 
Old 08-15-2012, 11:02 AM   #2
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,432

Rep: Reputation: 1878Reputation: 1878Reputation: 1878Reputation: 1878Reputation: 1878Reputation: 1878Reputation: 1878Reputation: 1878Reputation: 1878Reputation: 1878Reputation: 1878
So you have already solved your problem? Maybe you should post solutions as well in case others would like to know how.
 
Old 08-15-2012, 12:08 PM   #3
ntubski
Senior Member
 
Registered: Nov 2005
Distribution: Debian
Posts: 2,396

Rep: Reputation: 814Reputation: 814Reputation: 814Reputation: 814Reputation: 814Reputation: 814Reputation: 814
Quote:
Originally Posted by dahweeds View Post
So, it takes some other sort to strip the directory parts off the list.
You could also use find's printf:
Code:
find . -type f -printf '%f\n'  # prints filenames with leading directories removed
Quote:
ICYWTK:
I was looking for the types so I could set the permissions recursively too. Example:
Code:
user@system $ find ./ -type f -name '*.png' -exec chmod 644 {} \;
I don't quite see why. If you want to set permissions for pngs (or any other specific set of types) you can do that without listing types first, if you want to set permissions for any type you find you can just remove the -name condition...
 
Old 08-16-2012, 02:41 AM   #4
shane_kerr
LQ Newbie
 
Registered: Oct 2005
Location: Amsterdam, Netherlands
Posts: 14

Rep: Reputation: 1
You're pretty close.

You can use the "basename" command to remove the path from the files. It only runs on a single file name at a time, but you can get around this using the "xargs" command with "-l". To make this run smoothly, we need to use null-terminated file names, since otherwise xargs will treat spaces in file names as separators.

The following will give you only the file names under a given directory:

$ find . -type f -print0 | xargs -0l basename

The awk command actually has a way to filter based on a regular expression. So, rather than saying:

awk -F . '{print $NF}'

All you have to do is:

awk -F . '/\./{print $NF}'

That looks for a '.' character, which you have to escape with a backslash, otherwise '.' matches any character.

So, putting all this together:

$ find . -type f -print0 | xargs -0l basename | awk -F . '/\./{print $NF}' | sort --unique

What could be simpler?
 
Old 08-16-2012, 07:27 AM   #5
dahweeds
LQ Newbie
 
Registered: Nov 2008
Posts: 24

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by shane_kerr View Post

$ find . -type f -print0 | xargs -0l basename | awk -F . '/\./{print $NF}' | sort --unique

What could be simpler?
I guess from my noobie perspective, the original was a little simpler for two reasons.
  1. basename: I did not know about this Thanks for sharing the tool.
  2. regular expressions: I am way below understanding regular expressions. They come up a lot on the web though, so I often just cut and paste such things when I find a cure for a problem. I hope some may find this one too in the future.


I timed the original discovery against this idea.
Reg-ex version:

Code:
user@system $ time  find . -type f -print0 | xargs -0l basename | awk -F . '/\./{print $NF}' | sort --unique
.... file types ....

real	0m0.390s
user	0m0.008s
sys	0m0.028s
The execution delay was noticable. Maybe if there is a huge directory with many, many files this command could eat resources. (If that matters any more.)

Original:
Code:
user@system $ time find   ./ -type f   | awk -F . '{print $NF}' |  sort --unique |  awk -F / '{print $NF}'
.... file types ....

real	0m0.008s
user	0m0.004s
sys	0m0.000s

For comparison I also put the -printf option in the mix.
Code:
user@system $ time find   ./ -type f  -printf '%f\n' | awk -F . '{print $NF}' |  sort --unique 
.... file types ....

real	0m0.007s
user	0m0.004s
sys	0m0.000s

It looks like the winner!

However, the same -printf should work in the reg-ex version too, which can eliminate the pipe to basename. Let's see how it does.
Code:
user@system $ time  find . -type f  -printf '%f\n' | awk -F . '/\./{print $NF}' | sort --unique
.... file types ....

real	0m0.007s
user	0m0.000s
sys	0m0.004s
Well, after all that, I realized that I had already deleted my typeless files from the directory. When I put them back two of the improved versions failed to find the typeless files.

example:
Code:
user@system $ touch README
user@system $ touch inc/SOMEFILE
time  find . -type f  -printf '%f\n' | awk -F . '/\./{print $NF}' | sort --unique
css
gif
htaccess
jpg
js
patch
php
png
sql
swf
txt

real	0m0.007s
user	0m0.000s
sys	0m0.004s
The original idea and ntubski's -printf option work. So perhaps the winner is:
Code:
user@system $ time find   ./ -type f   -printf '%f\n' | awk -F . '{print $NF}' |  sort --unique 
css
gif
htaccess
jpg
js
patch
php
png
README
SOMEFILE
sql
swf
txt

real	0m0.007s
user	0m0.000s
sys	0m0.004s
It is fast as the original and finds all the file types, including those with no type.

Last edited by dahweeds; 08-16-2012 at 07:44 AM. Reason: Further testing showed that all adaptations failed to find the typeless file.
 
Old 08-16-2012, 01:54 PM   #6
ntubski
Senior Member
 
Registered: Nov 2005
Distribution: Debian
Posts: 2,396

Rep: Reputation: 814Reputation: 814Reputation: 814Reputation: 814Reputation: 814Reputation: 814Reputation: 814
Quote:
I timed the original discovery against this idea.
Reg-ex version:

Code:
user@system $ time  find . -type f -print0 | xargs -0l basename | awk -F . '/\./{print $NF}' | sort --unique
.... file types ....

real	0m0.390s
user	0m0.008s
sys	0m0.028s
The execution delay was noticable. Maybe if there is a huge directory with many, many files this command could eat resources. (If that matters any more.)
The increased time is likely due to having to calling basename once per file.

Quote:
Original:
Code:
user@system $ time find   ./ -type f   | awk -F . '{print $NF}' |  sort --unique |  awk -F / '{print $NF}'
.... file types ....

real	0m0.008s
user	0m0.004s
sys	0m0.000s

For comparison I also put the -printf option in the mix.
Code:
user@system $ time find   ./ -type f  -printf '%f\n' | awk -F . '{print $NF}' |  sort --unique 
.... file types ....

real	0m0.007s
user	0m0.004s
sys	0m0.000s

It looks like the winner!
I would hesitate to declare one method faster than the other based on such a tiny difference. In this case I'm pretty sure the printf version is doing less work, but in general I would try to increase the input size until it takes at least a few hundred milliseconds to run.

I had another idea which might make things faster:
Code:
find . -type f -printf '%f\n' | awk -F. '!($NF in exts){print $NF; exts[$NF]=1}'
This avoids sorting the output.


Quote:
When I put them back two of the improved versions failed to find the typeless files.
I wouldn't say failed: the regex filters out filenames lacking a dot. It wasn't really clear from your first post that you wanted to keep those "typeless" files.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Copy certain file types recursively while maintaining file structure on destination? rockf1bull Linux - Newbie 1 06-14-2011 09:28 AM
How to recursively copy certain file types rjo98 Linux - Newbie 45 09-11-2009 05:36 PM
How to list all files recursively, in a non-broken list? nyle Linux - Newbie 1 12-16-2008 10:52 PM
list recursively files with for xeon123 Programming 6 04-04-2007 03:38 PM
Can ls recursively list only directories? Vosper Linux - General 3 07-16-2005 03:57 AM


All times are GMT -5. The time now is 08:24 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration