LinuxQuestions.org
Help answer threads with 0 replies.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 03-16-2006, 09:58 PM   #1
buldir
Member
 
Registered: Mar 2004
Location: Fairbanks, AK USA
Posts: 135

Rep: Reputation: 15
regexp question: first instance for each line


So I have some text like:
Code:
Identification_Information:
  Citation:
    Citation_Information:
      Originator: Shmo, Joe
      Originator: Shmoe, Jan
      Publication_Date: 092005
      Title: Some Title Goes Here
      Geospatial_Data_Presentation_Form: map
      Series_Information:
        Series_Name: Report
        Issue_Identification: PIR 2005-6
      Publication_Information:
        Publication_Place: Backwoods, USA
        Publisher: Department of Natural Resources
      Other_Citation_Details: 15 p., 1 sheet, scale 1:250,000
      Online_Linkage: http://www.bananas.org
and I want to list the first instance, in each line, of a word ending with a ":", and beginning with a capital letter. So far I have:
Code:
(\w?[A-Z][a-z].+:)
Which gives:
Code:
Identification_Information:
Citation:
Citation_Information:
Originator:
Originator:
Publication_Date:
Title:
Geospatial_Data_Presentation_Form:
Series_Information:
Series_Name:
Issue_Identification:
Publication_Information:
Publication_Place:
Publisher:
Other_Citation_Details: 15 p., 1 sheet, scale 1:
Online_Linkage: http:
I need to get rid of the " 15 p., 1 sheet, scale 1:" and " http:" in the last two lines. Any help would greatly appreciated.
 
Old 03-16-2006, 10:14 PM   #2
xhi
Senior Member
 
Registered: Mar 2005
Location: USA::Pennsylvania
Distribution: Slackware
Posts: 1,065

Rep: Reputation: 45
Code:
(\w?[A-Z][a-z].+?:)
i think the ? should stop it after the first match..

edit> actually this should do it..
Code:
(.+?:)

Last edited by xhi; 03-16-2006 at 10:18 PM.
 
Old 03-17-2006, 12:26 AM   #3
buldir
Member
 
Registered: Mar 2004
Location: Fairbanks, AK USA
Posts: 135

Original Poster
Rep: Reputation: 15
Quote:
Originally Posted by xhi
Code:
(\w?[A-Z][a-z].+?:)
i think the ? should stop it after the first match..

edit> actually this should do it..
Code:
(.+?:)
Thanks for the quick response.
Code:
(.+?:)
gives me:
Code:
Supplemental_Information:
(contact information below). web site (http:
Process_Description:
environment to a 1:
Other_Citation_Details:
15 p., 1 sheet, scale 1:
Online_Linkage: http:
for the text:

Supplemental_Information: (contact information below). web site (http://www.
Process_Description: environment to a 1:250,000 topographic basemap.
Other_Citation_Details: 15 p., 1 sheet, scale 1:250,000
Online_Linkage: http://www.

which is close. I still need to get rid of any other text beyond the first colon. I tried placing:
Code:
{1}
after the colon, but no luck.
 
Old 03-17-2006, 02:43 AM   #4
buldir
Member
 
Registered: Mar 2004
Location: Fairbanks, AK USA
Posts: 135

Original Poster
Rep: Reputation: 15
Here's my last attempt before I hit the sack...
Code:
(\w?[A-Z][a-z].+[a-z]:[^//0-9])
which takes care of the four troublesome lines I mentioned above and gives me
Code:
Supplemental_Information:
Process_Description:
Other_Citation_Details:
Online_Linkage:
but not for the text:

Ordering: Order by phone, Payment accepted: Cash, check, money order, VISA, or MasterCard

which is still:

Code:
Ordering: Order by phone, Payment accepted:
Almost...
 
Old 03-17-2006, 05:32 AM   #5
muha
Member
 
Registered: Nov 2005
Distribution: xubuntu, grml
Posts: 451

Rep: Reputation: 37
Using sed i get this:
Code:
$ sed -n 's/\ *\([A-Z][^:]*:\).*/\1/p' file
Identification_Information:
Citation:
Citation_Information:
Originator:
Originator:
Publication_Date:
Title:
Geospatial_Data_Presentation_Form:
Series_Information:
Series_Name:
Issue_Identification:
Publication_Information:
Publication_Place:
Publisher:
Other_Citation_Details:
Online_Linkage:
Does that work?

Last edited by muha; 03-17-2006 at 05:34 AM.
 
Old 03-17-2006, 08:31 AM   #6
xhi
Senior Member
 
Registered: Mar 2005
Location: USA::Pennsylvania
Distribution: Slackware
Posts: 1,065

Rep: Reputation: 45
Quote:
Originally Posted by buldir
Thanks for the quick response.
Code:
(.+?:)
gives me:
Code:
Supplemental_Information:
(contact information below). web site (http:
Process_Description:
environment to a 1:
Other_Citation_Details:
15 p., 1 sheet, scale 1:
Online_Linkage: http:
for the text:

Supplemental_Information: (contact information below). web site (http://www.
Process_Description: environment to a 1:250,000 topographic basemap.
Other_Citation_Details: 15 p., 1 sheet, scale 1:250,000
Online_Linkage: http://www.

which is close. I still need to get rid of any other text beyond the first colon. I tried placing:
Code:
{1}
after the colon, but no luck.
oops .. should have anchored it to the start of the string..
Code:
^(.+?:)
see if that works.. what lang is this btw?
 
Old 03-20-2006, 01:20 PM   #7
buldir
Member
 
Registered: Mar 2004
Location: Fairbanks, AK USA
Posts: 135

Original Poster
Rep: Reputation: 15
Thanks muha and xhi. This problem wasn't related to any specific language. I needed a regexp to highlight all elements in a metadata file using the program EditPad Pro.
Code:
^(.+?:)
works great. I couldn't use sed because the program only supports regular expressions. I was testing the regular expression in another program called Expresso, but because I forgot to check the "Multiline" options box, the "^" at the beginning of the regexp was not applied to every line, but the entire string. After I checked the option, the regexp that xhi suggested worked like a charm. Thanks again to you both.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
regexp stomach Linux - Software 7 02-15-2006 06:33 PM
little help for regexp EmOuBi Linux - Newbie 6 08-06-2005 02:19 AM
Regexp question scuffell Programming 4 04-30-2005 03:35 AM
postfix regexp question wijnands Linux - Newbie 1 06-03-2004 06:19 AM
regexp question rytrom Linux - Newbie 3 09-01-2003 12:50 PM


All times are GMT -5. The time now is 07:39 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration