LinuxQuestions.org
Review your favorite Linux distribution.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices

Reply
 
Search this Thread
Old 11-18-2011, 12:58 AM   #1
suhasingale
LQ Newbie
 
Registered: Feb 2006
Posts: 4

Rep: Reputation: 0
Smile how to list all ascii/text files in a folder recursively?


Hi Geeks,

I want to create a list of ascii/text/xml files for one folder recursively. I know I can use the find command to list all the files and file command to check the file type. But I am not sure how to combine both and create a list of only text/ascii/xml files excluding the data, binary etc types.

Please guide me through.

Regards,
Suhas
 
Old 11-18-2011, 01:50 AM   #2
davemguru
Member
 
Registered: Apr 2006
Location: London
Distribution: Pclos,Debian,Puppy,Fedora
Posts: 87

Rep: Reputation: 40
Quote:
Originally Posted by suhasingale View Post
Hi Geeks,

I want to create a list of ascii/text/xml files for one folder recursively. I know I can use the find command to list all the files and file command to check the file type. But I am not sure how to combine both and create a list of only text/ascii/xml files excluding the data, binary etc types.

Please guide me through.

Regards,
Suhas
Simple search - all files in the current directory NO recursion.
Code:
file *|grep text|grep -v OpenDocument
lists all ascii/text/xml/html/shell script/ascii english text (like ".csv" files) on STDOUT.
The "-v" option to grep is saying "not these". So, for example - if you don't want the english text or the UTF-8 ascii files you could add some more pipes and exclusions.

You know that "find" will recurse from wherever you tell it. So, you could use find to create a temporary list (in /tmp for example) and then iterate through that list in a simple shell script -
Code:
while read pathname
do
..... file $pathname |blah| blah| grep |blah
done </tmp/whatever
Said shell script could then be given a name e.g. - homework.sh and then you could execute the script and redirect it's standard output to a file which would then contain your results.
Code:
 ./homework.sh >results
BTW - Some people view "Geek" as a derogatory term. Perhaps it would be better to address your requests "Hi helpful people". Or if you felt that too presumptuous - you could put "possibly" in parenthesis prior to the word "helpful".
dave

Last edited by davemguru; 11-18-2011 at 01:53 AM.
 
1 members found this post helpful.
Old 11-18-2011, 05:00 AM   #3
suhasingale
LQ Newbie
 
Registered: Feb 2006
Posts: 4

Original Poster
Rep: Reputation: 0
Thanks for the solutions davemguru.

I wrote a simple script myself as below:

Quote:
#!/bin/ksh

find <folder/path> -type f > /tmp/1
temp="/tmp/1"
echo > /tmp/2
for i in `cat $temp`
do
temp1=`file $i | egrep -i "text|xml"`
if [[ $temp1 ]]
then
echo $i >> /tmp/2
fi
done
 
Old 11-18-2011, 01:28 PM   #4
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Debian sid + kde 3.5 & 4.4
Posts: 6,823

Rep: Reputation: 1949Reputation: 1949Reputation: 1949Reputation: 1949Reputation: 1949Reputation: 1949Reputation: 1949Reputation: 1949Reputation: 1949Reputation: 1949Reputation: 1949
(The links I'm posting are for bash, but most of what they say applies to ksh too.)


1) Please use [code][/code] tags around your code, to preserve formatting and to improve readability. Do NOT use quote tags, as they don't protect whitespace.

2) Don't read lines with for. Also, cat is usually unnecessary for reading from single files in modern shells.

3) $(..) is highly recommended over `..`

4) file is, in my experience, notoriously unreliable when it comes to detecting text files. It often mistakenly detects simple text as being, say, a lisp script. It has something to do with the way it parses the beginning of the file to determine it's "magic" type. At the very least try using the -i option and parse the mime-type info instead.

5) Instead of using grep, try generating a simple list of file names, then loop through the list, testing the output of file for each one directly. Perhaps something like this:

Code:
while read file ; do
	if [[ $( file -i "$file" ) =~ plain ]]; then
		echo "$file is a plain text file"
	fi
done <filelist
P.S. I don't think you'll find many people here who see "geek" as being derogatory. Indeed, "geek pride" is on the rise, as subjects like gaming and comics that were once nerdy are now mainstream, and it's becoming clearer to everyone that those who really know the tech are the ones who hold the reins.

Last edited by David the H.; 11-18-2011 at 01:40 PM. Reason: addendum
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
can't recursively copy directories/files with non-ascii names on freeNAS Rascale *BSD 2 03-30-2009 01:07 PM
How to list all files recursively, in a non-broken list? nyle Linux - Newbie 1 12-16-2008 10:52 PM
How to search through all files in a folder, recursively... MJBoa Linux - Software 13 09-14-2008 11:25 PM
list recursively files with for xeon123 Programming 6 04-04-2007 03:38 PM
Utility Needed - list folder tree and files in text file Optiker Linux - Software 21 11-17-2006 02:46 PM


All times are GMT -5. The time now is 08:26 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration