LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - General
User Name
Password
Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.

Notices


Reply
  Search this Thread
Old 12-04-2008, 10:14 AM   #1
ChartmanSg
LQ Newbie
 
Registered: May 2008
Posts: 5

Rep: Reputation: 0
Script need to choose plain text documents


Hi there,

I've written a bash script to process plain text documents.

Unfortunately, the files that I get daily include other formats like swf, PNG, JPEG, HTML, XML, unknown, C Source Code, etc.

What script do I need to select just the plain text documents and ignore the rest?

Would appreciate any help from the experts.

James
 
Old 12-04-2008, 10:20 AM   #2
indienick
Senior Member
 
Registered: Dec 2005
Location: London, ON, Canada
Distribution: Arch, Ubuntu, Slackware, OpenBSD, FreeBSD
Posts: 1,853

Rep: Reputation: 65
You can use the "file" command to get details about a file - then, using the output from that, pipe it to a sandbox to where your script can process the plain-text files.
 
Old 12-04-2008, 10:21 AM   #3
zer0x333
Member
 
Registered: Oct 2007
Posts: 31

Rep: Reputation: 16
Could you use the 'file' command to identify the filetype and act accordingly?
 
Old 12-04-2008, 10:25 AM   #4
ChartmanSg
LQ Newbie
 
Registered: May 2008
Posts: 5

Original Poster
Rep: Reputation: 0
Thanks for the replies guys.

I've identified the files that I need as plain text documents.

But I dont know and cant seem to find the file extensions for plain text documents.

A normal text file would be file.txt

But what extension does a plain text document have?

James
 
Old 12-04-2008, 10:30 AM   #5
zer0x333
Member
 
Registered: Oct 2007
Posts: 31

Rep: Reputation: 16
I would ignore the file extensions if at all possible, and use the output from 'file'.

Plain ASCII text files are described as 'ASCII text'.
 
Old 12-04-2008, 10:33 AM   #6
indienick
Senior Member
 
Registered: Dec 2005
Location: London, ON, Canada
Distribution: Arch, Ubuntu, Slackware, OpenBSD, FreeBSD
Posts: 1,853

Rep: Reputation: 65
Please bear in mind that file extensions are really only used for organization, and are a left-over from the days of DOS.

A plain-text document, under the DOS ideal, would also have a ".txt" extension - BUT - not everyone likes to use extensions on their files (especially plain-text). So, use "file" command for a file you know is plain-text, and run it on a second file that is something completely different (C-source code, or something). Note the differences in output between the two files, and construct an algorithm something like this:
Code:
1. Use the "file" command on file X; pipe the output to "grep -i".
2. If grep returns a result, rename the file to have a ".txt" extension.
3. If grep does not return a result, then move on to the next file in the glob and GoTo Step 1.
So, perhaps something like:
Code:
$ file fileX | grep -i "ascii text" && mv fileX fileX.txt && plain-text-thingamajig-script fileX.txt
 
Old 12-04-2008, 10:38 AM   #7
ChartmanSg
LQ Newbie
 
Registered: May 2008
Posts: 5

Original Poster
Rep: Reputation: 0
Thank you, indienick and zer0x333 !!!

I finally see light at the end of my tunnel.

Now I know what to do

You're both godsends.

James
 
Old 12-04-2008, 10:39 AM   #8
indienick
Senior Member
 
Registered: Dec 2005
Location: London, ON, Canada
Distribution: Arch, Ubuntu, Slackware, OpenBSD, FreeBSD
Posts: 1,853

Rep: Reputation: 65
*blushes* Well, now...I wouldn't go that far, but you're very welcome!
 
Old 12-04-2008, 10:44 AM   #9
zer0x333
Member
 
Registered: Oct 2007
Posts: 31

Rep: Reputation: 16
No problem, not as helpful as indienick! Slack xD

Good Luck!
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
LXer: Producing visually pleasant documents from plain text with reStructuredText and LXer Syndicated Linux News 0 04-29-2008 07:10 AM
plain old text editor autophil Linux - General 9 08-12-2007 08:46 PM
CMS for plain text rblampain Linux - Software 3 12-14-2005 10:40 PM
not a plain text file wazza4610 Linux - Newbie 1 11-22-2005 04:20 AM
Printing from lpr to plain text DoubleLetter Linux - General 2 07-19-2002 11:25 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - General

All times are GMT -5. The time now is 01:14 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration