Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game. |
Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
|
 |
03-22-2011, 08:05 AM
|
#1
|
Member
Registered: Apr 2004
Location: oxford
Distribution: gentoo
Posts: 463
Rep:
|
java regex for links:
Hello all,
Im trying to extract the href of a <link> tag from a html page however as some links contain further preferences I seem to be unable to extract them, do you have any idea how I can write this:
Link:
PHP Code:
<link rel="stylesheet" type="text/css" media="screen,print" href="Home_files/Home.css" />
regex:
Code:
"(?i)<link\\s*href="
trying to extract the Home_files/Home.css, thanks in advance.
trscookie
|
|
|
03-22-2011, 08:18 AM
|
#2
|
Senior Member
Registered: Sep 2010
Distribution: Debian
Posts: 1,632
|
You need to add to the regex, then extract a group. This is from memory, so just a rough approach:
Code:
Pattern p = Pattern.compile("<link.*?href=\"(.*)\".*?/>");
Matcher m = p.matcher("<link rel="stylesheet" type="text/css" media="screen,print" href="Home_files/Home.css" />");
if (m.find()) {
System.out.println(m.group(1));
}
This should print out the first group - in your case, the bit between the two quotes after the 'href' in a 'link' tag  Again, it's untested, but you should be able to adapt it so that it works
Hope this helps,
|
|
1 members found this post helpful.
|
03-22-2011, 12:25 PM
|
#3
|
Member
Registered: Apr 2004
Location: oxford
Distribution: gentoo
Posts: 463
Original Poster
Rep:
|
Ah, just got another quick question for some reason my regex is skipping one image:
Regex:
Code:
"(?i)<img(.*?)src\\s*=\\s*[\"'](.*?)[\"']"
Finding images:
Code:
Image Found: Home_files/logonew.png
Image Found: Home_files/shapeimage_1.png
Image Found: Home_files/shapeimage_2.jpg
But not finding this one:
Code:
:<img src="Home_files/logonew.png" alt="" style="border: none; height: 425px; width: 230px; " />
:<img usemap="#map1" id="shapeimage_1" src="Home_files/shapeimage_1.png"
style="border: none; height: 359px; left: -6px; position: absolute; top: -5px; width: 226px; z-index: 1; title="" />
<map name="map1" id="map1">
<area href="" title="" onmouseover="IMmouseover('shapeimage_1', '0');" alt=""
onmouseout="IMmouseout('shapeimage_1', '0');" shape="rect" coords="11, 207, 177, 225" /></map>
<img style="height: 18px; left: 5px; position: absolute; top: 202px; width: 166px; "
id="shapeimage_1_link_0" alt="shapeimage_1_link_0" src="Home_files/shapeimage_1_link_0.png" />
: <img src="Home_files/shapeimage_2.jpg" alt="" style="height: 301px; left: 0px; position: absolute; top: 0px; width: 723px; " />
Do you know why it wouldn't find the one in red?
trscookie.
Last edited by trscookie; 03-22-2011 at 12:28 PM.
|
|
|
03-22-2011, 12:54 PM
|
#4
|
Senior Member
Registered: Sep 2010
Distribution: Debian
Posts: 1,632
|
Only thing I can see is that it's over 2 lines - it's possible the regex may not match over this... See if moving it all onto one line fixes it
EDIT: Got home and tested it, and yes the newline causes problems
Last edited by Snark1994; 03-22-2011 at 01:22 PM.
|
|
|
03-22-2011, 01:09 PM
|
#5
|
Member
Registered: Apr 2004
Location: oxford
Distribution: gentoo
Posts: 463
Original Poster
Rep:
|
I think that you are right, I have changed it to:
Code:
"(?im)<img(.*?)src\\s*=\\s*[\"'](.*?)[\"']"
however this doesnt seem to have fixed it, is there any other options I can use?
|
|
|
03-22-2011, 01:38 PM
|
#6
|
Senior Member
Registered: Sep 2010
Distribution: Debian
Posts: 1,632
|
Try using 's' instead 
|
|
|
03-22-2011, 08:06 PM
|
#7
|
Member
Registered: Apr 2004
Location: oxford
Distribution: gentoo
Posts: 463
Original Poster
Rep:
|
Just worked out that its because I have multiple occurrences on the same line, whats the best way I can split that up?
Cheers again,
trscookie.
|
|
|
03-23-2011, 11:27 AM
|
#8
|
Senior Member
Registered: Sep 2010
Distribution: Debian
Posts: 1,632
|
Hm, reading the Matcher docs it looks like you should use the find() method to move onto the next match in the input string  It depends how your code is laid out, though (I'm guessing you're feeding them to the matcher line-by-line?)
|
|
1 members found this post helpful.
|
03-23-2011, 11:53 AM
|
#9
|
Member
Registered: Apr 2004
Location: oxford
Distribution: gentoo
Posts: 463
Original Poster
Rep:
|
humm, tried the .find() option but it would only find one per line, I've cheated a little and split the string at the end of each tag like so: for(String tag : string.split(">")) but it seems to work  cheers for your help 
|
|
|
03-24-2011, 10:45 AM
|
#10
|
Senior Member
Registered: Sep 2010
Distribution: Debian
Posts: 1,632
|
No problem  It was nice to dig out java again
|
|
|
All times are GMT -5. The time now is 11:41 AM.
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|