LinuxQuestions.org - Taking in a text file and analzying it

- Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)

- - Taking in a text file and analzying it (https://www.linuxquestions.org/questions/linux-newbie-8/taking-in-a-text-file-and-analzying-it-681071/)

Acidg3rm5

11-04-2008 11:11 AM

Taking in a text file and analzying it

Hi. Would like to write a shell program using bash, to take a given text file and analyze it to produce the frequency of characters in the file... I also want to report frequency of 1-letter word, 2 letter words till 4 letter words.
I'm very new to linux, and this is my first program. Can anyone please help me with it? I would love to write out something here, so that someone may correct me. But i just have no idea how to start.
Example of an output would be something like this...
Character used number of occurrence
a 2
b 4
------------------------------------------

Length of words used no. of occurence
1 letter word 1
2 letter word 3
3 letter word 2
4 letter word 8
Many thanks.

TB0ne

11-04-2008 11:34 AM

Quote:

Originally Posted by Acidg3rm5 (Post 3331446)

This sounds very much like homework......

These bash scripting guides should get you started.
http://tldp.org/LDP/abs/html/
http://tldp.org/HOWTO/Bash-Prog-Intro-HOWTO.html

nishamathew1980

11-04-2008 02:03 PM

A simple Google search will even get you the actual script you need. Have fun "googling"
:)

Linux Archive

Acidg3rm5

11-05-2008 12:10 PM

i am able to come out with something after some research. However, i don't know how i could report the frequency of 1-letter word up till 4-letter word in the codes. Since i have put the Field separator as "".

Another problem i'm hitting is, the program also counts and out put the whitespace that it encounters. Is there a way to make the program ignore the whitespace, or at least not print it? here's an example of my output now.

echo i am testing | bash words.sh
Character used Number of Occurrence
2 <<<can i get rid of this? tell the program not to print the count for whitespace?
a 1
e 1
g 1
i 2
m 1
n 1
s 1
t 2

Quote:

nawk '
BEGIN {FS=""
print "Character used\t Number of Occurrence"
}

{
for (i=1;i<=NF;i++)
count[$i]++
}
END {
for (i in count)
print i"\t\t",count[i]
fi

}'

openSauce

11-05-2008 04:22 PM

Quote:

Originally Posted by Acidg3rm5 (Post 3332579)

If you have a look at the man page for awk, you should be able to find a way to determine the length of a word. Once you've got that you can adapt the script you've got so far.

nukoso

11-05-2008 05:00 PM

you should do your homework!!

And take a closer look to
$ man awk
and
$ man cut

All times are GMT -5. The time now is 05:17 PM.