Linux - NewbieThis Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Introduction to Linux - A Hands on Guide
This guide was created as an overview of the Linux Operating System, geared toward new users as an exploration tour and getting started guide, with exercises at the end of each chapter.
For more advanced trainees it can be a desktop reference, and a collection of the base knowledge needed to proceed with system and network administration. This book contains many real life examples derived from the author's experience as a Linux system and network administrator, trainer and consultant. They hope these examples will help you to get a better understanding of the Linux system and that you feel encouraged to try out things on your own.
Click Here to receive this Complete Guide absolutely free.
The you might want to include a filter to remove a entries in the word list to remove lines with numbers and special characters:
sed -e 's/ /\n/g' -e '/[0-9<>+_]/d' | sort | uniq >wordlist
When I tested my first attempt, some lines weren't uniq. Looking in the man page, I found that only successive lines would be reduced, hence I added the sort filter to assure all identical words would be successive.
You may also want to use a sed script instead, in order to handle special cases as they occur.
One thing to consider is capitalization. Do you with to reduce all words to lowercase? But if you did that,
formal words would be incorrect. Also, words spit with a hyphen could be joined by the sed script, but some words should be hyphenated. Like 'file-system'
hmm, my first question was not correct. I didn't need unique words, but words and their counts in a single file. That's what I was thinking about:
use awk to create file containing single word on every new line and use a C code to put them into database (they have to be put in dbs anyway), then make a select with "group by" option. It works, but any other ideas are welcomed.