LinuxQuestions.org
Latest LQ Deal: Linux Power User Bundle
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 04-06-2008, 11:12 AM   #1
nick2price
LQ Newbie
 
Registered: Apr 2008
Posts: 3

Rep: Reputation: 0
searching a file


I am trying to print out a list of names that appear in a file. The file has written sentences in it but i just need to search the words that are names and print them out. The question was given to me as:
List of proper names appearing in the text (In English they appear with a capital letter - but so does the first word of the sentence!). I am very new to this so i am not sure what to do. If i give u an example of the file.

My name is Nick, and i have a friend called John. He is 18 and married to Jill. How do i know Sam? Is it because he is friends with Julie?

So i need linux code that till print out
Nick
John
Jill
Sam
Julie

How would i go about doing this?
 
Old 04-06-2008, 11:27 AM   #2
beadyallen
Member
 
Registered: Mar 2008
Location: UK
Distribution: Fedora, Gentoo
Posts: 209

Rep: Reputation: 36
Sounds like homework to me. Look at 'man grep'
 
Old 04-06-2008, 11:35 AM   #3
nick2price
LQ Newbie
 
Registered: Apr 2008
Posts: 3

Original Poster
Rep: Reputation: 0
Its not homework, i am just trying to create a database to try to learn linux, but i cannot seem to figure this one out, do u have any examples?
 
Old 04-06-2008, 12:44 PM   #4
nick2price
LQ Newbie
 
Registered: Apr 2008
Posts: 3

Original Poster
Rep: Reputation: 0
So if i want to search for the capital letters in the file, i know i would have to use somthing like
grep '[A-Z]' database.txt

But how would i get this to work without printing the words that start of a sentence with a capital letter?
 
Old 04-06-2008, 03:21 PM   #5
osor
HCL Maintainer
 
Registered: Jan 2006
Distribution: (H)LFS, Gentoo
Posts: 2,450

Rep: Reputation: 75
Here’s a hint:

Words at the beginning of a sentence either occur directly after the beginning of a line (in grep you use the anchor character ^), or after end-punctuation ([.!?]) followed by space characters. So you would need a regex to filter these out.

Of course this is not completely correct, since you can have a sentence such as:
Code:
John is my friend.
 
Old 04-06-2008, 04:18 PM   #6
marquardl
Member
 
Registered: Apr 2008
Posts: 100

Rep: Reputation: 15
what do you want to learn by this?

What exactly do you want to learn?

If this is some programming exercise, then the solution will depend on the programming environment in question.

As for the logic: You could filter out the first word in a sentence as not being a name. To separate sentences you must look for .!? closing characters.

Also note, that the first word can be a name; not everything within the sentence that starts with an uppercase letter is a name (citations or book titles for example); a dot (.) can appear within a sentence for computer related text (filename); etc. Syntax errors can happen also, but for error checking you will need dictionary look-ups.

A simple project like this can grow into a complex one, without having started any coding yet.

Bye,
M

Linux Archive

Last edited by marquardl; 05-01-2008 at 02:40 AM.
 
Old 04-06-2008, 04:50 PM   #7
Tischbein
Member
 
Registered: Oct 2006
Distribution: debian
Posts: 124

Rep: Reputation: 15
I get my shortest answer with gawk. But using tr to arrange the sentences one per line and then unleashing your favourite tools on that also works.

Regards, Beineken
 
Old 04-06-2008, 08:47 PM   #8
prad77
Member
 
Registered: Mar 2008
Posts: 101

Rep: Reputation: 15
cut the lines with ./?/!/ etc.
Then use awk and display from the second matches thereon.

Gentoo

Last edited by prad77; 04-17-2008 at 04:43 AM.
 
Old 04-08-2008, 07:32 PM   #9
Tischbein
Member
 
Registered: Oct 2006
Distribution: debian
Posts: 124

Rep: Reputation: 15
No cutting required: gawk: FS == field separator, RS == record separator.

> gawk 'BEGIN{RS="."}{ for (i=2; i<NF; i++) if ( $(i) ~ /^[A-Z].*/ ) print $(i) }' <<EOF
> The little Green dog. Jumped
> over the Elephant's back.
> EOF
Green
Elephant's

Regards, Tinned food.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Searching for a file Macska Linux - Software 1 11-25-2007 06:22 PM
awk searching a string from a file within another file changcheh Linux - Software 7 12-29-2006 10:18 AM
File Searching gfrair Linux - Newbie 1 03-14-2005 07:24 PM
File Searching JC404 Linux - Newbie 2 08-02-2003 09:45 PM
searching through a file albean Linux - Newbie 8 11-23-2002 09:16 PM


All times are GMT -5. The time now is 04:13 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration