LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 03-15-2005, 02:52 PM   #1
pld
Member
 
Registered: Jun 2003
Location: Southern US
Distribution: Ubuntu 5.10
Posts: 206

Rep: Reputation: 30
regexp help ...


Hi all,

still learning regular expressions, and I have a little project I want to use them on.

I have a raw html page that I want to parse a handful of components off with. My string I am searching is something like:

<div class="myclass"...

Now, unfortunately, there is another place where this occurs in the file instead of on a newline:

<div id="something else"><div class="myclass"...

and what I want to grab has no other elements before it in the file:

<div class="myclass"

There are whitespaces before the lines i believe in some cases. So what type of regexp would I be using to single out the element I am looking for with nothing else in front of it? Oh, and I'm grepping the file for these lines, then awking later on for the rest of the data extraction...
 
Old 03-15-2005, 03:45 PM   #2
rose_bud4201
Member
 
Registered: Aug 2002
Location: St Louis, MO
Distribution: Xubuntu, RHEL, Solaris 10
Posts: 929

Rep: Reputation: 30
I would probably do something like

$ cat testfile | grep "whatever you're looking for" | grep "^<div id=\"myclass\""

Edit: I realized that I should probably add some more information.

I used cat because it's easier when chaining commands like this, and used grep again (instead of awk) mainly because I understand grep, and know next to nothing of awk. The regexp should be the same either way.

The ^ character specifies the beginning of the line. So ^stuff would find the word "stuff" in a file if and only if it occurred at the beginning of the line. It would not find "randomstuff", for example. ^random would work, however.

Conversely, the $ character specifies the end of a line. So stuff$ would wind the word "stuff" in a file if and only if it occurred at the end of the line. It _would_ find "stuff" in "randomstuff", but would not find "random".

Hope that helps!

Last edited by rose_bud4201; 03-15-2005 at 03:50 PM.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
regexp help cliff76 Linux - Newbie 3 03-07-2008 02:15 PM
little help for regexp EmOuBi Linux - Newbie 6 08-06-2005 02:19 AM
Regexp problem eremit Programming 6 06-23-2005 06:48 AM
regexp search for [ wijnands Linux - Newbie 3 06-22-2004 02:15 AM
Regexp stumper lackluster Programming 2 11-02-2002 12:31 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 02:50 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration