LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 03-15-2005, 12:56 PM   #1
tifu
LQ Newbie
 
Registered: Mar 2005
Location: markham, ontario
Posts: 6

Rep: Reputation: 0
Unhappy Formating a file with awk


Using KSH (and awk), can I get some help with reading file01 with the content below

================================
This is a sample text file
that contains letters and numbers,
some in UPPERCASE and some in lowercase.

-list 1
list 234
some words are not ForMATed correctly
================================


..... and outputing it to a new file (file02) with the format below? (each word on a new line, formated to one column, and numeric characters removed). If possible, the words should be in lowercase only, but that's not critical.


This
is
a
sample
text
file
that
contains
letters
and
numbers
some
in
UPPERCASE
and
some
in
lowercase
-list
list
some
words
are
not
ForMATed
correctly




Thanks

Tifu
 
Old 03-15-2005, 01:33 PM   #2
whittycat
Member
 
Registered: Nov 2004
Location: Wellington Somerset UK
Distribution: Debian lenny, also DSL-N
Posts: 32

Rep: Reputation: 15
Forget awk. Sed will do this quite nicely with a little help from tr.

Put your text in a file, say qug, then

tr ' ' '\n' <qug | sed -e's/\.//;s/\,//;s/[0-9]*//;/^$/ d' >qug1

If that isn't exactly right it's got to be close.

Tony Sumner
 
Old 03-15-2005, 01:47 PM   #3
tifu
LQ Newbie
 
Registered: Mar 2005
Location: markham, ontario
Posts: 6

Original Poster
Rep: Reputation: 0
Thank you

Tony,

I appreciate the help. That's exactly what I needed.

Tifu
 
Old 03-15-2005, 03:07 PM   #4
six6
Member
 
Registered: Jun 2004
Location: In Adamantine Chains and penal Fire
Distribution: Debian Sarge & Ubuntu Breezy
Posts: 107

Rep: Reputation: 16
And if you needed an operating system independent method:

perl -e 'while (<>) {tr/A-Za-z\ \n-//cd; s/^\s+//; s/\s+/\n/g; s/(\w)/\L$1/g; print;}' < file1 > file2

Notice that does uppercase->lowercase too.

Last edited by six6; 03-15-2005 at 05:54 PM.
 
Old 03-16-2005, 11:12 PM   #5
tifu
LQ Newbie
 
Registered: Mar 2005
Location: markham, ontario
Posts: 6

Original Poster
Rep: Reputation: 0
six6,

Wonderful! As an extra bonus (apart from changing the case for all to lowercase) the perl one-liner also removes all non-alpha characters like # $ % & etc.

What I intended (and did not add from the beginning) was to strip all but a-z characters, but keep [ - ] dash as in "e-mail", and [ ' ] appostrophe as in "ain't".

Could the perl solution you provided be modified easily to do that?

Thanks again

Tifu
 
Old 03-16-2005, 11:35 PM   #6
six6
Member
 
Registered: Jun 2004
Location: In Adamantine Chains and penal Fire
Distribution: Debian Sarge & Ubuntu Breezy
Posts: 107

Rep: Reputation: 16
@tifu

Unfortunately, you can't do this as a "one-liner" from the shell, because of the shell's weakness: it won't allow single quotes (apostrophes) inside other single quotes, even if preceded by a backslash!

But, you can easily overcome this. Just change the "one-liner" to it's own perl script!
Code:
#!/usr/bin/perl
open FILE, "file1" or die "Couldn't open file1 because: $!\n";
while (<FILE>) {
 tr/A-Za-z\ \n\-\\'//cd; # Deletes characters we didn't specify
 s/^\s+//; # Delete leading whitespace
 s/\s+/\n/g; # Changes all spaces to one new line
 s/(\w)/\L$1/g; # Changes letters to lowercase
 print;
}
And run it from your shell.

If you want to allow other things, like @ symbols for example, just change the line
tr/A-Za-z\ \n\-\\'//cd;
to
tr/\'\@A-Za-z\ \n\-\\'\@//cd;

Get the idea? Behold the power of perl!

Edit: Oops, should have put a "\" in front of the "-" originally...

Last edited by six6; 03-16-2005 at 11:45 PM.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
formating text thru awk slack66 Linux - Newbie 4 11-28-2005 11:20 AM
awk: fatal:cannot open file for reading (no such file or Directory) in Linux sangati vishwanath Linux - Software 4 07-06-2005 12:59 AM
editing a file using awk pantera Programming 1 05-14-2004 08:40 AM
formating floppy with fat file system in redhat linux 7.3 !!! hitesh_linux Linux - General 1 06-17-2003 10:30 AM
Getting awk to extract scripts from a file jspaceman Programming 5 11-24-2002 06:37 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 02:08 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration