LinuxQuestions.org
Review your favorite Linux distribution.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 03-31-2008, 07:56 PM   #1
linuxmaveric
Member
 
Registered: Aug 2007
Location: Southern California
Distribution: Ubuntu 8.04 for my desktop & FreeBSD 7.0 for my server blade.
Posts: 31

Rep: Reputation: 15
Smile help :() How to find an address using "grep" & regEx's.


Hello all! I am new to linux and the command line. How could I set up a regular expression using grep that would find a "City, ST Zip" where city can be any word, state can only be Two Capital latters "ST", and a 5 digit zip in a file. How would I set that up. I understand basic grep arguments but regular expressions seem almost like hieroglyphics

I hope this better explains the problem:


Write a regular expression that would match lines of the following form:

City, ST 12345

"City" can be more than one word, and ST is always two capital letters.

HINT: Since the city name can be almost anything, it might be good to start matching at the comma.
 
Old 03-31-2008, 09:04 PM   #2
ophirg
Member
 
Registered: Jan 2008
Location: Israel
Distribution: Kubuntu 13.10
Posts: 134

Rep: Reputation: 34
Hi linuxmaveric

I think the best regular expression would be:
"\w+\s*,\s*ST\s+[0-9]{5,5}"

But wait...
If you think you are going to work with the command line or with scripts, then my advice is to learn regular expressions. They are really handy with tools like grep. And later, you can learn how to use them with awk and then with programming languages like perl and python.

Look at http://www.regular-expressions.info/.
They have nice resources there.
 
Old 03-31-2008, 09:04 PM   #3
prad77
Member
 
Registered: Mar 2008
Posts: 101

Rep: Reputation: 15
regex="\s*(.*)\s*,\s*([A-Z]{{2}})\s+(\d{{5}}?)\s*"

It could like the above one. may be you have tune it further too...
Ofcourse it is interesting to explore otherwise,

http://www.grymoire.com/Unix/Regular.html#uh-8

Gentoo

Last edited by prad77; 04-17-2008 at 03:32 AM.
 
Old 03-31-2008, 09:28 PM   #4
linuxmaveric
Member
 
Registered: Aug 2007
Location: Southern California
Distribution: Ubuntu 8.04 for my desktop & FreeBSD 7.0 for my server blade.
Posts: 31

Original Poster
Rep: Reputation: 15
Talking Thanks for the code examples!

Thanks for the code. I'll work with these to start.
Thanks for the head start and links to resources.
RR.
 
Old 04-02-2008, 05:26 PM   #5
linuxmaveric
Member
 
Registered: Aug 2007
Location: Southern California
Distribution: Ubuntu 8.04 for my desktop & FreeBSD 7.0 for my server blade.
Posts: 31

Original Poster
Rep: Reputation: 15
Smile I figured out a simpler and quick solution to my original problem.

I created a simpler and easier solution for my problem that I originally posted.
Using grep & regEx to find a address consisting of "City, State and zip." City could be anything, state always two capitals "ST", and a 5 digit zip.

Here is my basic solution:

grep ", [A-Z][A-Z] [-0-9][0-9][0-9][0-9][0-9]" /filename

This would do the job nicely. But thanks everyone for the suggestions. They gave me the ideas to figure this out.
 
Old 04-02-2008, 10:33 PM   #6
sundialsvcs
LQ Guru
 
Registered: Feb 2004
Location: SE Tennessee, USA
Distribution: Gentoo, LFS
Posts: 10,610
Blog Entries: 4

Rep: Reputation: 3905Reputation: 3905Reputation: 3905Reputation: 3905Reputation: 3905Reputation: 3905Reputation: 3905Reputation: 3905Reputation: 3905Reputation: 3905Reputation: 3905
Sounds like a homework problem ... ... but it's a valid exercise nonetheless.

A good strategy for planning a regular-expression is to look for "the rocks in the stream." These are the definable anchor-points, and the variable content flows around them.

So, what are the "rocks" in this scenario? Let's see...
  1. The "comma followed by one whitespace."
  2. "A sequence of exactly two alphabetic characters" (which, significantly, is both a "rock" and a data-item that we'll want to capture...)
  3. A series of one-or-more whitespace characters between the state and the zip-code.

Another pair of "rocks" is the beginning of the line and the end of the line ... denoted by the characters '^' and '$'. If you know that the pattern you're looking for must start at the first position of the line and/or must conclude at the last position, you should include this in your pattern since that's very useful to the computer.

So... what's between those "rocks?" Data, of course, and that's the other thing to consider when you're building a regular-expression. You'll enclose these pieces in (parentheses) as a signal that you want to capture whatever characters match these things. If the string that you've been given "matches" the regular-expression, you'll have (some kind of) "easy" way to extract these pieces.

Okay... so what do we have here? Let's see:
  1. A data-item that we want to extract... beginning at the "^"start of the line"^" and consisting of zero-or-more "*" any-characters ".", which we want to capture. (Parentheses...)
  2. The sequence of characters {comma, white"\s"pace}, which is just a rock.
  3. Exactly two "[A-Z]"alphabetic characters. Data.
  4. ... Followed by one or more white"\s"pace characters. Another rock.
  5. Followed by "{5}" "\d"igit-characters, which we want to (capture) as data. ...
  6. Followed by the "$"end-of-the-line"$".

Now, since this undoubtedly is a homework assignment, I'm gonna stop right there. But, each and every one of those bullet points corresponds to one-or-more somethings in a regular-expression pattern.
 
Old 04-03-2008, 12:45 AM   #7
linuxmaveric
Member
 
Registered: Aug 2007
Location: Southern California
Distribution: Ubuntu 8.04 for my desktop & FreeBSD 7.0 for my server blade.
Posts: 31

Original Poster
Rep: Reputation: 15
Talking Naughty naughty :) thats funny

Actually, this was not homework/for testing as all our tests are done in class. But part of a classroom project and forums were Ok'd. So no harm done. But your emoticon is pretty funny LOL!I got good ideas from the previous posts but our project was to simplify the solution as much as possible. My instructor did except my solution grep ", [A-Z][A-Z] [0-9][0-9][0-9][0-9][0-9]" /filename.

"ST" is just an example I used. State could be anything. Thanks for the reply but I already nailed this one on the head. I imagine there are so many different ways to figure this out; using perl, php, ruby...heck even python has its own twist. Regex's can be so varied.

Wow, Thanks for your detailed explanation. You sound like you have a strong command of regex. I will be there someday too. Wish ,me luck!
Once again Thank you everyone for your help.
RR.

Last edited by linuxmaveric; 04-04-2008 at 01:52 AM.
 
Old 04-27-2010, 11:07 PM   #8
Darkcrimson
LQ Newbie
 
Registered: Apr 2010
Posts: 1

Rep: Reputation: 0
Looks like someone's taking the Unix/Linux System Administration course with O'reilly. I remember this question...taken right from the quiz, haha. Good luck to you.
 
Old 04-28-2010, 12:31 AM   #9
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,999

Rep: Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190
May I also suggest looking up regular expressions and quantifiers, look for something like {n,m} - might help make your simple solution even shorter.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
no "ifconfig", how do I find out what IP address I was assigned? clausawits Fedora 8 11-04-2016 03:26 PM
Boot now Hangs @ "GRUB" How do I find Stage 2's Address??? Supafast Linux - Hardware 1 07-06-2005 09:55 PM
Where can I find the "make" & "cc" packages?? sayeed_ather Mandriva 2 04-28-2004 02:02 AM
"Undeleting" data using grep, but get "grep: memory exhausted" error SammyK Linux - Software 2 03-13-2004 03:11 PM
"host" ok, but "ping" can't find ip address hardigunawan Linux - Networking 2 05-16-2002 05:41 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 12:10 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration