LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   searching a file (https://www.linuxquestions.org/questions/linux-newbie-8/searching-a-file-633448/)

nick2price 04-06-2008 10:12 AM

searching a file
 
I am trying to print out a list of names that appear in a file. The file has written sentences in it but i just need to search the words that are names and print them out. The question was given to me as:
List of proper names appearing in the text (In English they appear with a capital letter - but so does the first word of the sentence!). I am very new to this so i am not sure what to do. If i give u an example of the file.

My name is Nick, and i have a friend called John. He is 18 and married to Jill. How do i know Sam? Is it because he is friends with Julie?

So i need linux code that till print out
Nick
John
Jill
Sam
Julie

How would i go about doing this?

beadyallen 04-06-2008 10:27 AM

Sounds like homework to me. Look at 'man grep'

nick2price 04-06-2008 10:35 AM

Its not homework, i am just trying to create a database to try to learn linux, but i cannot seem to figure this one out, do u have any examples?

nick2price 04-06-2008 11:44 AM

So if i want to search for the capital letters in the file, i know i would have to use somthing like
grep '[A-Z]' database.txt

But how would i get this to work without printing the words that start of a sentence with a capital letter?

osor 04-06-2008 02:21 PM

Here’s a hint:

Words at the beginning of a sentence either occur directly after the beginning of a line (in grep you use the anchor character ^), or after end-punctuation ([.!?]) followed by space characters. So you would need a regex to filter these out.

Of course this is not completely correct, since you can have a sentence such as:
Code:

John is my friend.

marquardl 04-06-2008 03:18 PM

what do you want to learn by this?
 
What exactly do you want to learn?

If this is some programming exercise, then the solution will depend on the programming environment in question.

As for the logic: You could filter out the first word in a sentence as not being a name. To separate sentences you must look for .!? closing characters.

Also note, that the first word can be a name; not everything within the sentence that starts with an uppercase letter is a name (citations or book titles for example); a dot (.) can appear within a sentence for computer related text (filename); etc. Syntax errors can happen also, but for error checking you will need dictionary look-ups.

A simple project like this can grow into a complex one, without having started any coding yet.

Bye,
M

Linux Archive

Tischbein 04-06-2008 03:50 PM

I get my shortest answer with gawk. But using tr to arrange the sentences one per line and then unleashing your favourite tools on that also works.

Regards, Beineken

prad77 04-06-2008 07:47 PM

cut the lines with ./?/!/ etc.
Then use awk and display from the second matches thereon.

Gentoo

Tischbein 04-08-2008 06:32 PM

No cutting required: gawk: FS == field separator, RS == record separator.

> gawk 'BEGIN{RS="."}{ for (i=2; i<NF; i++) if ( $(i) ~ /^[A-Z].*/ ) print $(i) }' <<EOF
> The little Green dog. Jumped
> over the Elephant's back.
> EOF
Green
Elephant's

Regards, Tinned food.


All times are GMT -5. The time now is 04:28 AM.