Visit Jeremy's Blog.
Go Back > Forums > Linux Forums > Linux - Newbie
User Name
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!


  Search this Thread
Old 04-06-2008, 10:12 AM   #1
LQ Newbie
Registered: Apr 2008
Posts: 3

Rep: Reputation: 0
searching a file

I am trying to print out a list of names that appear in a file. The file has written sentences in it but i just need to search the words that are names and print them out. The question was given to me as:
List of proper names appearing in the text (In English they appear with a capital letter - but so does the first word of the sentence!). I am very new to this so i am not sure what to do. If i give u an example of the file.

My name is Nick, and i have a friend called John. He is 18 and married to Jill. How do i know Sam? Is it because he is friends with Julie?

So i need linux code that till print out

How would i go about doing this?
Old 04-06-2008, 10:27 AM   #2
Registered: Mar 2008
Location: UK
Distribution: Fedora, Gentoo
Posts: 209

Rep: Reputation: 36
Sounds like homework to me. Look at 'man grep'
Old 04-06-2008, 10:35 AM   #3
LQ Newbie
Registered: Apr 2008
Posts: 3

Original Poster
Rep: Reputation: 0
Its not homework, i am just trying to create a database to try to learn linux, but i cannot seem to figure this one out, do u have any examples?
Old 04-06-2008, 11:44 AM   #4
LQ Newbie
Registered: Apr 2008
Posts: 3

Original Poster
Rep: Reputation: 0
So if i want to search for the capital letters in the file, i know i would have to use somthing like
grep '[A-Z]' database.txt

But how would i get this to work without printing the words that start of a sentence with a capital letter?
Old 04-06-2008, 02:21 PM   #5
HCL Maintainer
Registered: Jan 2006
Distribution: (H)LFS, Gentoo
Posts: 2,450

Rep: Reputation: 76
Here’s a hint:

Words at the beginning of a sentence either occur directly after the beginning of a line (in grep you use the anchor character ^), or after end-punctuation ([.!?]) followed by space characters. So you would need a regex to filter these out.

Of course this is not completely correct, since you can have a sentence such as:
John is my friend.
Old 04-06-2008, 03:18 PM   #6
Registered: Apr 2008
Posts: 100

Rep: Reputation: 15
what do you want to learn by this?

What exactly do you want to learn?

If this is some programming exercise, then the solution will depend on the programming environment in question.

As for the logic: You could filter out the first word in a sentence as not being a name. To separate sentences you must look for .!? closing characters.

Also note, that the first word can be a name; not everything within the sentence that starts with an uppercase letter is a name (citations or book titles for example); a dot (.) can appear within a sentence for computer related text (filename); etc. Syntax errors can happen also, but for error checking you will need dictionary look-ups.

A simple project like this can grow into a complex one, without having started any coding yet.


Linux Archive

Last edited by marquardl; 05-01-2008 at 01:40 AM.
Old 04-06-2008, 03:50 PM   #7
Registered: Oct 2006
Distribution: debian
Posts: 124

Rep: Reputation: 15
I get my shortest answer with gawk. But using tr to arrange the sentences one per line and then unleashing your favourite tools on that also works.

Regards, Beineken
Old 04-06-2008, 07:47 PM   #8
Registered: Mar 2008
Posts: 101

Rep: Reputation: 15
cut the lines with ./?/!/ etc.
Then use awk and display from the second matches thereon.


Last edited by prad77; 04-17-2008 at 03:43 AM.
Old 04-08-2008, 06:32 PM   #9
Registered: Oct 2006
Distribution: debian
Posts: 124

Rep: Reputation: 15
No cutting required: gawk: FS == field separator, RS == record separator.

> gawk 'BEGIN{RS="."}{ for (i=2; i<NF; i++) if ( $(i) ~ /^[A-Z].*/ ) print $(i) }' <<EOF
> The little Green dog. Jumped
> over the Elephant's back.

Regards, Tinned food.


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off

Similar Threads
Thread Thread Starter Forum Replies Last Post
Searching for a file Macska Linux - Software 1 11-25-2007 05:22 PM
awk searching a string from a file within another file changcheh Linux - Software 7 12-29-2006 09:18 AM
File Searching gfrair Linux - Newbie 1 03-14-2005 06:24 PM
File Searching JC404 Linux - Newbie 2 08-02-2003 08:45 PM
searching through a file albean Linux - Newbie 8 11-23-2002 08:16 PM > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 11:04 AM.

Main Menu
Write for LQ is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration