LinuxQuestions.org
Visit Jeremy's Blog.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 02-18-2018, 09:24 AM   #1
Runarsson
Member
 
Registered: Dec 2017
Location: Soderhamn, Sweden
Distribution: Mint Cinnamon and LXLE + VirtualBox bunch
Posts: 35

Rep: Reputation: Disabled
List directories with specific file types...


I have made a script for a project at work now and would like some assistance from people who know more than I do myself.
The script converts *.xls files on a drive, looks for patterns in the text and prints out the matching file name and directory.
I'm limited to my own Linux wisdom so I know for sure that my way of doing it can't be the most efficient.
It starts with making a list of all the directories (ALL) and use these as variables for (1) searching for the Excel files, (2) creating the text files and in the end (3) making the text for the printout. It was the best I could accomplish with the converting code I had, which only converted in its own directory.

This was the code I found for listing the directories:
find -type d -printf '%d\t%P\n' | sort -r -nk1 | cut -f2- | awk 'NF'

That gives me this form of list:
folderA/subfolderA/subfolderAA
folderB/subfolderB/subfolderBB
folderC/subfolderC/subfolderCC

... which is good enough for the rest of the script.

What would be much(!) better though, would be the same kind of list with only the directories that include Excel files... a kind of 'find . -name "*.xls*"', but without the file names. It would shorten the list with thousands and thousands of directories and make everything much faster.
 
Old 02-18-2018, 10:42 AM   #2
keefaz
LQ Guru
 
Registered: Mar 2004
Distribution: Slackware
Posts: 6,230

Rep: Reputation: 724Reputation: 724Reputation: 724Reputation: 724Reputation: 724Reputation: 724Reputation: 724
Would it work?
Code:
find . -name "*.xls*" | sed -r 's:/[^/]+$:/:' | sort -u
 
Old 02-18-2018, 11:55 AM   #3
Turbocapitalist
Senior Member
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 4,384
Blog Entries: 3

Rep: Reputation: 2184Reputation: 2184Reputation: 2184Reputation: 2184Reputation: 2184Reputation: 2184Reputation: 2184Reputation: 2184Reputation: 2184Reputation: 2184Reputation: 2184
Quote:
Originally Posted by Runarsson View Post
What would be much(!) better though, would be the same kind of list with only the directories that include Excel files... a kind of 'find . -name "*.xls*"', but without the file names. It would shorten the list with thousands and thousands of directories and make everything much faster.
If you want the directories that contain files named '*.xls' then you could use the -printf option to just show the directory name and then remove duplicates:

Code:
find . -type f -name '*.xls' -printf '%h\n' | sort -u
However if you want to invert that, it is much harder to do. Apparently the founders of find never considered such a use-case. So you have to fiddle a little. If you are using bash or zsh then you can use process substitution

Code:
comm -3 <(find . -type f -name '*.xls' -printf '%h\n' | sort -u ) <(find . -type d -print | sort -u )
See "man comm" and especially "man find"

Neither of those actually find or exclude "excel files" it only checks the file name. The utility "file" will check the actual contents.
 
1 members found this post helpful.
Old 02-18-2018, 01:18 PM   #4
Runarsson
Member
 
Registered: Dec 2017
Location: Soderhamn, Sweden
Distribution: Mint Cinnamon and LXLE + VirtualBox bunch
Posts: 35

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by keefaz View Post
Would it work?
Code:
find . -name "*.xls*" | sed -r 's:/[^/]+$:/:' | sort -u
PERFECT!!! Thank you SO much for that! After a small workaround (that wasn't more than I could figure out myself) it became exactly what I wanted.

The old list of 10097 directories turned into a list of 73 directories. It cut the total runtime of the script from 34 to 5,5 minutes(!!!)... and made a grown man cry in his sofa. If I had you here I would hug you.
 
Old 02-18-2018, 01:21 PM   #5
Runarsson
Member
 
Registered: Dec 2017
Location: Soderhamn, Sweden
Distribution: Mint Cinnamon and LXLE + VirtualBox bunch
Posts: 35

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by Turbocapitalist View Post
If you want the directories that contain files named '*.xls' then you could use the -printf option to just show the directory name and then remove duplicates:

Code:
find . -type f -name '*.xls' -printf '%h\n' | sort -u
However if you want to invert that, it is much harder to do. Apparently the founders of find never considered such a use-case. So you have to fiddle a little. If you are using bash or zsh then you can use process substitution

Code:
comm -3 <(find . -type f -name '*.xls' -printf '%h\n' | sort -u ) <(find . -type d -print | sort -u )
See "man comm" and especially "man find"

Neither of those actually find or exclude "excel files" it only checks the file name. The utility "file" will check the actual contents.
Thanks you. I'm going to try this out too (even though I have it solved). You can't learn too much, especially if you haven't got much to start with.
 
Old 02-18-2018, 01:42 PM   #6
Runarsson
Member
 
Registered: Dec 2017
Location: Soderhamn, Sweden
Distribution: Mint Cinnamon and LXLE + VirtualBox bunch
Posts: 35

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by Turbocapitalist View Post
If you want the directories that contain files named '*.xls' then you could use the -printf option to just show the directory name and then remove duplicates:

Code:
find . -type f -name '*.xls' -printf '%h\n' | sort -u
That was actually better. With the end '/' gone to begin with, there was no need for that workaround.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Please advise on how to remove directories that only contain one or more specific file types fmd Linux - Software 3 12-02-2017 11:09 AM
how to Allow Specific File types on Samba Share ? keysys Linux - Server 1 12-28-2012 08:58 PM
[SOLVED] Hiding directories and file-types from grep? Shay Linux - General 1 08-19-2011 04:38 PM
Command For Finding Specific File Types? forkbeard Linux - Newbie 7 10-16-2009 01:02 AM
Keep specific file types, delete the rest ? jchambers Programming 5 11-26-2007 07:25 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 10:38 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration