LinuxQuestions.org
Help answer threads with 0 replies.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 05-13-2008, 12:45 PM   #1
s0l1dsnak3123
LQ Newbie
 
Registered: Jan 2008
Distribution: Ubuntu Hardy Heron
Posts: 18

Rep: Reputation: 0
[perl] getting a URL from an input bot in HTML


I am trying to figure out how to strip a HTML scalar (its contents is below) so that i just get the URL.

[HTML]<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

<meta name="keywords" content="picboost, Images, Free, Upload" />
<meta name="description" content="picboost - the simplest way to host your images" />

<meta name="rating" content="general" />
<meta name="author" content="Henry Legge" />
<meta name="copyright" content="Copyright 2008 PicBoost.com" />
<meta http-equiv="Content-Language" content="en-GB" />


<base href="http://picboost.com/" />

<link href="css/style.css" rel="stylesheet" type="text/css" />

<link rel="icon" href="imgs/favicon.ico" type="image/x-icon" />
<link rel="shortcut icon" href="imgs/favicon.ico" type="image/x-icon" />

<title>Picboost - The simplest way to host your images</title>

<script src="http://mint.picboost.com/?js" type="text/javascript"></script>
<script src="/mint/?js" type="text/javascript"></script>

</head>

<body>
<div id="container">
<h1><span>picboost</span></h1>

<div class="box">
<div class="box-top"></div>
<div class="box-main">
<div id="step2">
<h2 id="s2"><span>Distribute Your File</span></h2>


<span>Upload complete!</span>
<p>
<input type="text" name="direct" value="http://picboost.com/images/2008/May/13/wallpaper.gif" maxlength="" size="" id="direct" onclick="this.select()" />
<label for="direct">direct link</label> </p>
<p>
<input type="text" name="bbcode" value="http://picboost.com/images/2008/May/13/wallpaper.gif" maxlength="" size="" id="bbcode" onclick="this.select()" />
<label for="bbcode">bbcode</label> </p>

<p>
<input type="text" name="html" value="&lt;a href=&quot;http://picboost.com/&quot;&gt;&lt;img src=&quot;http://picboost.com/images/2008/May/...aper.gif&quot; alt=&quot;PicBoost Image&quot; /&gt;&lt;/a&gt;" maxlength="" size="" id="html" onclick="this.select()" />
<label for="html">html</label> </p>

<br /><a href="/">Upload another?</a>
</div>
</div>
</div>

<div id="footer">
<div id="credit">
Copyright 2008 PicBoost.com<br />
Created by <a href="http://henrylegge.com">Henry Legge</a><br />
Designed by <a href="http://pixelspread.com">Pixelspread</a><br />
<a href="tos">Terms of Service</a>

</div>

<div id="ad">
<img src="/ad.png" alt="Ad" />
</div>
</div>
</div>
</body>
</html>[/HTML]

I want to strip everything away apart from this chunk: (please note that the URL will change every time the script is run...)
Code:
<input type="text" name="direct" value="http://picboost.com/images/2008/May/13/wallpaper.gif" maxlength="" size="" id="direct" onclick="this.select()" />
then I want to get rid of everything apart from:
Code:
http://picboost.com/images/2008/May/13/wallpaper.gif
this is the code I have so far:

Code:
$response = $response->content;
	if ( $response =~ m/<span>Upload complete\!<\/span>/ ){
		$conn->privmsg($conn->{channel}, "Upload complete! link at:");
		$URL = $response;
		$URL =~ s/^((.)*)(\<input type=\"text\" name=\"direct\" value=\")//ig;
		$URL =~ s/"(.+)//ig;
		print $URL;
	}
when I run that code I get:

Code:
<!DOCTYPE html PUBLIC 
<html xmlns=
<head>
	<meta http-equiv=

	<meta name=
	<meta name=
	
	<meta name=
	<meta name=
	<meta name=
	<meta http-equiv=
	
	<base href=
	
	<link href=

	<link rel=
	<link rel=
	
	<title>Picboost - The simplest way to host your images</title>
	
	<script src=
<script src=

<body>
<div id=
	<h1><span>picboost</span></h1>

	<div class=
		<div class=
		<div class=
			<div id=
				<h2 id=
				
				<span>Upload complete!</span>
				<p>
					<input type=
 <label for=
				<p>
					<input type=
 <label for=
				<p>
					<input type=
 <label for=
				
				<br /><a href=
			</div>
		</div>
	</div>
	<div id=
		<div id=
			Copyright 2008 PicBoost.com<br />
			Created by <a href=
			Designed by <a href=
			<a href=
		</div>
		
		<div id=
			<img src=
		</div>
	</div>
</div>
</body>
I think I am pretty close to what I intend to do, I just need a little push in the right direction. I have been using this tool: http://regex.larsolavtorvik.com/ to check my regex, and it seems to work on there.

thanks in advance,
s0l1dsnak3123
 
Old 05-14-2008, 03:21 AM   #2
Su-Shee
Member
 
Registered: Sep 2007
Location: Berlin
Distribution: Slackware
Posts: 510

Rep: Reputation: 53
Why don't you just skip to the URL you want at last?

Something like this:

Code:
$url =~ m/value\=\"(http:\/\/.+\.(?:png|jpg|gif))/g;
Match value=", begin capturing, match http:// followed by word chars at least one of them, followed by a dot and either a non-captured png, jpg or gif, end capturing, do it globally.

Something like that.
 
  


Reply

Tags
html, perl, regex



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Encoding ( ) in a url with perl Elguapo Programming 2 05-06-2008 01:43 PM
HTML code to use to save a downloaded file with an other name that the URL jlinkels Programming 4 06-21-2007 07:40 PM
[Perl] cgi.pm - save input in .html file noir911 Programming 2 01-07-2007 02:36 PM
Perl guys! Time delay 4 Eliza bot Gaim response - want to have fun :) lrt2003 Programming 1 10-03-2004 12:04 PM
cgi perl : I cant get perl to append my html file... the_y_man Programming 3 03-22-2004 05:07 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 05:38 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration