LinuxQuestions.org
Did you know LQ has a Linux Hardware Compatibility List?
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices



Reply
 
Search this Thread
Old 09-06-2008, 02:41 PM   #1
borinus
LQ Newbie
 
Registered: Sep 2008
Posts: 0

Rep: Reputation: 0
Matching two wildcards with perl and regex


Hi,

I am new at regex, and need some help.

I am trying to create a regular expression for 2 hours but could not do that. What am I missing?

I have a string like this.. It is in the $content variable..

(text...);PADDING-BOTTOM: 4px;"><b><font color=#CC0000>Type</font></b>:<br>&nbsp;<a class=type href="ps.asp?order=yes">PlayStation 3</a>(...text)

and would like to get PlayStation 3 value from there. I have to create regex something like this.

Type</font></b>:<br>&nbsp;<a class=* href=*>(.*?)</a>

so "href" should stay there..

I have written this code,

if ($content =~ /Type<\/font><\/b>:<br>&nbsp;<a class=.+ href=.+>(.*?)<\/a><br>/) {
$platform = $1;
}

but it gives "Namco" as a result. Namco is in the following line. Why it gives that one, I have "Type" value in front of regex not producer. what should I do?

Producer</font></b>:<br>&nbsp;<a class=producer href="pd.asp?name=namco">Namco</a>

Thanks,
 
Old 09-08-2008, 08:32 PM   #2
nadroj
Senior Member
 
Registered: Jan 2005
Location: Canada
Distribution: ubuntu
Posts: 2,539

Rep: Reputation: 59
in your $content variables im assuming you are escaping the double-quotes, even though you explicitly stated what string was in $content (without the escaped quotes, ie \")

for your "Namco" line the expression is not giving the incorrect output, it is actually not even matched at all. notice at the end of the regex there is a "<br>" but this is not present in your "Producer...Namco" line, so the string is never matched. since the if block is not executed in this case (and $platform is not set to $1), $platform must have already had the value of "Namco". to see this, set the value of $platform to something else or empty and when it is printed it will not have changed.

whenever i am debugging any code i will use print statements to ensure something is (or isnt) happening correctly. a simple debug would be to put a "print "matched!";" line in the if block to see when it is being matched or not.

hope it helps

Last edited by nadroj; 09-08-2008 at 08:34 PM.
 
Old 09-09-2008, 04:27 AM   #3
matthewg42
Senior Member
 
Registered: Oct 2003
Location: UK
Distribution: Kubuntu 12.10 (using awesome wm though)
Posts: 3,530

Rep: Reputation: 63
Parsing HTML and XML is difficult. In the long run, will probably save yourself headaches if you use an HTML/XML parsing library rather than trying to do it all with regular expressions.
 
Old 09-09-2008, 05:04 AM   #4
keefaz
Senior Member
 
Registered: Mar 2004
Distribution: Slackware
Posts: 4,617

Rep: Reputation: 136Reputation: 136
I would just search for a match with Type
like
Code:
if ($content =~ /Type/) {
    $platform = $content;
    $platform =~ s/.*<a.+order=yes">(.*)<\/a>.*/\1/;
    print "platform: $platform\n";
}
Is it a script made for download thousands of ps games ?

Last edited by keefaz; 09-09-2008 at 05:06 AM.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
bash: better way to delete files not matching a regex? pbhj Programming 8 10-15-2007 04:05 PM
Embedded regex matching in Perl GATTACA Programming 5 01-17-2007 10:16 AM
regex : matching strings of a unknown lenghtr stevie_velvet Programming 5 07-16-2006 11:56 PM
regex matching things like , etc. aunquarra Programming 2 05-04-2005 08:53 AM
perl regex matching exodist Programming 2 11-15-2004 11:50 PM


All times are GMT -5. The time now is 05:15 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration