searching a file
I am trying to print out a list of names that appear in a file. The file has written sentences in it but i just need to search the words that are names and print them out. The question was given to me as:
List of proper names appearing in the text (In English they appear with a capital letter - but so does the first word of the sentence!). I am very new to this so i am not sure what to do. If i give u an example of the file. My name is Nick, and i have a friend called John. He is 18 and married to Jill. How do i know Sam? Is it because he is friends with Julie? So i need linux code that till print out Nick John Jill Sam Julie How would i go about doing this? |
Sounds like homework to me. Look at 'man grep'
|
Its not homework, i am just trying to create a database to try to learn linux, but i cannot seem to figure this one out, do u have any examples?
|
So if i want to search for the capital letters in the file, i know i would have to use somthing like
grep '[A-Z]' database.txt But how would i get this to work without printing the words that start of a sentence with a capital letter? |
Here’s a hint:
Words at the beginning of a sentence either occur directly after the beginning of a line (in grep you use the anchor character ^), or after end-punctuation ([.!?]) followed by space characters. So you would need a regex to filter these out. Of course this is not completely correct, since you can have a sentence such as: Code:
John is my friend. |
what do you want to learn by this?
What exactly do you want to learn?
If this is some programming exercise, then the solution will depend on the programming environment in question. As for the logic: You could filter out the first word in a sentence as not being a name. To separate sentences you must look for .!? closing characters. Also note, that the first word can be a name; not everything within the sentence that starts with an uppercase letter is a name (citations or book titles for example); a dot (.) can appear within a sentence for computer related text (filename); etc. Syntax errors can happen also, but for error checking you will need dictionary look-ups. A simple project like this can grow into a complex one, without having started any coding yet. Bye, M Linux Archive |
I get my shortest answer with gawk. But using tr to arrange the sentences one per line and then unleashing your favourite tools on that also works.
Regards, Beineken |
|
No cutting required: gawk: FS == field separator, RS == record separator.
> gawk 'BEGIN{RS="."}{ for (i=2; i<NF; i++) if ( $(i) ~ /^[A-Z].*/ ) print $(i) }' <<EOF > The little Green dog. Jumped > over the Elephant's back. > EOF Green Elephant's Regards, Tinned food. |
All times are GMT -5. The time now is 04:28 AM. |