Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place! |
Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
|
|
01-18-2005, 12:25 PM
|
#1
|
Member
Registered: Sep 2004
Distribution: Debian, kernel 2.6.10
Posts: 50
Rep:
|
mawk
Hi, stupid question...
how can I get list of words included in a text file using awk or something? I need to put them into database and make some operations on them... Not lines, but words. Thx.
J.
|
|
|
01-18-2005, 12:51 PM
|
#2
|
Member
Registered: Sep 2003
Posts: 52
Rep:
|
grep might be more appropriate. Try looking up a grep howto.
|
|
|
01-18-2005, 02:00 PM
|
#3
|
Moderator
Registered: Apr 2002
Location: earth
Distribution: slackware by choice, others too :} ... android.
Posts: 23,067
|
strings <file> | uniq
Cheers,
Tink
|
|
|
01-19-2005, 06:58 AM
|
#4
|
Member
Registered: Sep 2004
Distribution: Debian, kernel 2.6.10
Posts: 50
Original Poster
Rep:
|
this is what I looked for:
Code:
cat file_name | awk 'BEGIN { FS="[SEPARATORS]" } { for(i = 1 ; i <= NF ; i++) print $i }'
J.
|
|
|
01-19-2005, 07:05 AM
|
#5
|
LQ Guru
Registered: Aug 2001
Location: Fargo, ND
Distribution: SuSE AMD64
Posts: 15,733
|
You could spit up the lines into words using 'tr' or 'sed'.
sed 's/ /\n/g' | sort -bf | uniq -i sourcefile >wordlist
tr ' ' '\n' <sourcefile | sort -bf | uniq >wordlist
The you might want to include a filter to remove a entries in the word list to remove lines with numbers and special characters:
sed -e 's/ /\n/g' -e '/[0-9<>+_]/d' | sort | uniq >wordlist
When I tested my first attempt, some lines weren't uniq. Looking in the man page, I found that only successive lines would be reduced, hence I added the sort filter to assure all identical words would be successive.
You may also want to use a sed script instead, in order to handle special cases as they occur.
One thing to consider is capitalization. Do you with to reduce all words to lowercase? But if you did that,
formal words would be incorrect. Also, words spit with a hyphen could be joined by the sed script, but some words should be hyphenated. Like 'file-system'
Last edited by jschiwal; 01-19-2005 at 07:07 AM.
|
|
|
01-20-2005, 10:42 AM
|
#6
|
Member
Registered: Sep 2004
Distribution: Debian, kernel 2.6.10
Posts: 50
Original Poster
Rep:
|
hmm
hmm, my first question was not correct. I didn't need unique words, but words and their counts in a single file. That's what I was thinking about:
use awk to create file containing single word on every new line and use a C code to put them into database (they have to be put in dbs anyway), then make a select with "group by" option. It works, but any other ideas are welcomed.
J.
|
|
|
01-20-2005, 01:48 PM
|
#7
|
Moderator
Registered: Apr 2002
Location: earth
Distribution: slackware by choice, others too :} ... android.
Posts: 23,067
|
Did you see the example in the awk manual?
Code:
#!/usr/bin/awk -f
# Print list of word frequencies
{
for (i = 1; i <= NF; i++)
freq[$i]++
}
END {
for (word in freq)
printf "%s\t%d\n", word, freq[word]
}
Cheers,
Tink
|
|
|
01-21-2005, 06:03 AM
|
#8
|
Member
Registered: Sep 2004
Distribution: Debian, kernel 2.6.10
Posts: 50
Original Poster
Rep:
|
bingo!
I should get better glasses :-)
thx
J.
|
|
|
All times are GMT -5. The time now is 02:26 AM.
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|