LinuxQuestions.org
Review your favorite Linux distribution.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 05-11-2005, 04:29 PM   #1
lowpro2k3
Member
 
Registered: Oct 2003
Location: Canada
Distribution: Slackware
Posts: 340

Rep: Reputation: 30
Perl/regexp help... - query string parsing...


I have the unlucky situation of having taken over control at a small company who's CTO quit. I'm a single programmer (im a 20 year old comp sci student!) and I have this massive, undocumented code base to figure out. Fun job for anyone, I know

Anyways, I'm looking at this one particular webpage I'm forced to change. I'm really not that good at Perl compared to how I am in other languages. So I'm trying to figure out what the old programmer was doing to break apart this query string. If someone could explain this little code block I would really appreciate it. The same block or similiar blocks are all over the site, it must be on 20 pages. If it makes a difference I've kinda mustered out that we're using mod_perl and/or mason. But we still use a regular shabang line ... Anyways some lines/pieces I understand completely, others I have NO clue about. Please read my comments...

Code:
use URI::Escape;


my ($erl, $shname);
my ($buffer, @NVPairs, $NameValue, $Name, $Value, %var, $encshname);

if($ENV{REQUEST_METHOD} eq 'GET') {
    $buffer = $ENV{QUERY_STRING};   # I fully understand this...
} else {
    read(STDIN, $buffer, $ENV{CONTENT_LENGTH});   # But have no clue about this
}



# I get this, break up the query string using ampersand as delimiter
@NVPairs = split(/&/, $buffer);

# Now loop through the options and do something??? I think he's sanitizing the data...
foreach $NameValue (@NVPairs) {
    ($Name, $Value) = split(/=/, $NameValue);  # Understand this part...
    $Value =~ tr/+/ /;   # Huh? Replace '+' with ' '??? Im lost...

    # I dont really understand this too well, especially the pack() function
    $Value =~ s/%([\dA-Fa-f][\dA-Fa-f])/pack("C", hex($1))/eg 

    $var{$Name} = $Value;
}



$erl = $var{url};
$shname = uc($var{comp});
$encname = uri_escape("$shname");
$encshname =~ s/'/%27/g;    # umm... what?
If anyone can help with all or parts of that code it would be a huge help. Its really critical that I understand this, I didnt even want to take the 20 minutes to write this, but I needed to. I'll keep researching, I've learned some parts, but a Perl guru walking me through it would help 10x more.

Thanks
 
Old 05-11-2005, 04:52 PM   #2
puffinman
Member
 
Registered: Jan 2005
Location: Atlanta, GA
Distribution: Gentoo, Slackware
Posts: 217

Rep: Reputation: 30
Code:
read(STDIN, $buffer, $ENV{CONTENT_LENGTH});
This is reading from standard input the number of bytes contained in the content length, and putting the result in $buffer.

Code:
$Value =~ tr/+/ /;
Yeah, this is replacing + with a space. URL's can't have spaces in them, so people use pluses instead.


Code:
$Value =~ s/%([\dA-Fa-f][\dA-Fa-f])/pack("C", hex($1))/eg
This goes through the $Value string and replaces hexadecimal numbers with their character equivalents. So %5A is the letter Z. It's a little bit obfuscated here, he's using the saved value from the parenthesized capture ($1) and converting it to a number -- hex("5a") is 90. Then the pack as a character makes it into a Z in ASCII.

Code:
$encshname =~ s/'/%27/g;
Here is the opposite of the last one. The ' character is 27 in hexadecimal in ASCII.
 
Old 05-11-2005, 04:57 PM   #3
mcosta
Member
 
Registered: Jan 2005
Location: Spain
Distribution: Debian
Posts: 44

Rep: Reputation: 15
It's not too dificult if you know a little about HTTP.

Thinking in CGI:

When you request /cgi-bin/page.pl?a=1&b=2 you do a GET request. The parameters are in the url.
But you can request /cgi-bin/page.pl and stream a=1 later. Then you do POST. You can imagine the browser telling this to the web server:

POST /cgi-bin/page.pl HTTP/1.1
agent: foo
accept: bar

a=1&b=2

And then you read this via stdin. OK? that's easy

Now come the +. A url can't have spaces inside, so what do you do when you want to send one? encode it as '+'. And when you want to send a '+'? encode it as %2B. Yeah, it's really fun.

foo bar: foo+bar
foo+bar: foo%2Bbar
 
Old 05-11-2005, 05:05 PM   #4
mcosta
Member
 
Registered: Jan 2005
Location: Spain
Distribution: Debian
Posts: 44

Rep: Reputation: 15
I forgot the:

s/%([\dA-Fa-f][\dA-Fa-f])/pack("C", hex($1))/eg

Do you remember the + as %2B? sure. 2B is the '+' char ASCII hex value. %([\dA-Fa-f][\dA-Fa-f] is the regex of a hexadeciaml numebr. Sure you see it if I rewrite as:

[0-9A-Fa-f][0-9A-Fa-f])

The \d means decimal.

Now be a good engy RTFM about pack(). perldoc is your friend.
 
Old 05-11-2005, 05:18 PM   #5
AltF4
Member
 
Registered: Sep 2002
Location: .at
Distribution: SuSE, Knoppix
Posts: 532

Rep: Reputation: 31
Re: Perl/regexp help... - query string parsing...

Code:
use URI::Escape;


my ($erl, $shname);
my ($buffer, @NVPairs, $NameValue, $Name, $Value, %var, $encshname);

if($ENV{REQUEST_METHOD} eq 'GET') {
    $buffer = $ENV{QUERY_STRING};   # I fully understand this...
} else {
    read(STDIN, $buffer, $ENV{CONTENT_LENGTH});   # But have no clue about this

## depending on REQUEST_METHOD the CGI parameters
## are either in the QUERY_STRING parameter or are delivered
## on STDIN

}


# I get this, break up the query string using ampersand as delimiter
@NVPairs = split(/&/, $buffer);

# Now loop through the options and do something??? I think he's sanitizing the data...
foreach $NameValue (@NVPairs) {
    ($Name, $Value) = split(/=/, $NameValue);  # Understand this part...
    $Value =~ tr/+/ /;   # Huh? Replace '+' with ' '??? Im lost...

## Blanks in CGI parameters are escaped as '+'
## this converts back to blanks


    # I dont really understand this too well, especially the pack() function
    $Value =~ s/%([\dA-Fa-f][\dA-Fa-f])/pack("C", hex($1))/eg 

## unpack non-printables (represented as '%00' - '%FF'  hex values
## in URLs) back to original values


    $var{$Name} = $Value;
}



$erl = $var{url};
$shname = uc($var{comp});
$encname = uri_escape("$shname");
$encshname =~ s/'/%27/g;    # umm... what?

## replace single quotes (') with their HEX representation
## (probably to avoid SQL injection attacks)
## read more: http://www.unixwiz.net/techtips/sql-injection.html

Hope this helps
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
(shell script) string parsing kuru Programming 4 09-12-2005 07:59 PM
Perl Regexp search-n-replace jpbarto Programming 2 06-16-2005 12:45 PM
perl simple regexp champ Programming 3 07-07-2004 03:27 AM
parsing a user input string daphne19 Programming 1 04-22-2004 07:40 AM
perl regexp problem raven Programming 4 03-21-2004 11:49 PM


All times are GMT -5. The time now is 01:32 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration