LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   Perl/regexp help... - query string parsing... (https://www.linuxquestions.org/questions/programming-9/perl-regexp-help-query-string-parsing-322499/)

lowpro2k3 05-11-2005 04:29 PM

Perl/regexp help... - query string parsing...
 
I have the unlucky situation of having taken over control at a small company who's CTO quit. I'm a single programmer (im a 20 year old comp sci student!) and I have this massive, undocumented code base to figure out. Fun job for anyone, I know :(

Anyways, I'm looking at this one particular webpage I'm forced to change. I'm really not that good at Perl compared to how I am in other languages. So I'm trying to figure out what the old programmer was doing to break apart this query string. If someone could explain this little code block I would really appreciate it. The same block or similiar blocks are all over the site, it must be on 20 pages. If it makes a difference I've kinda mustered out that we're using mod_perl and/or mason. But we still use a regular shabang line :confused:... Anyways some lines/pieces I understand completely, others I have NO clue about. Please read my comments...

Code:

use URI::Escape;


my ($erl, $shname);
my ($buffer, @NVPairs, $NameValue, $Name, $Value, %var, $encshname);

if($ENV{REQUEST_METHOD} eq 'GET') {
    $buffer = $ENV{QUERY_STRING};  # I fully understand this...
} else {
    read(STDIN, $buffer, $ENV{CONTENT_LENGTH});  # But have no clue about this
}



# I get this, break up the query string using ampersand as delimiter
@NVPairs = split(/&/, $buffer);

# Now loop through the options and do something??? I think he's sanitizing the data...
foreach $NameValue (@NVPairs) {
    ($Name, $Value) = split(/=/, $NameValue);  # Understand this part...
    $Value =~ tr/+/ /;  # Huh? Replace '+' with ' '??? Im lost...

    # I dont really understand this too well, especially the pack() function
    $Value =~ s/%([\dA-Fa-f][\dA-Fa-f])/pack("C", hex($1))/eg

    $var{$Name} = $Value;
}



$erl = $var{url};
$shname = uc($var{comp});
$encname = uri_escape("$shname");
$encshname =~ s/'/%27/g;    # umm... what?

If anyone can help with all or parts of that code it would be a huge help. Its really critical that I understand this, I didnt even want to take the 20 minutes to write this, but I needed to. I'll keep researching, I've learned some parts, but a Perl guru walking me through it would help 10x more.

Thanks :)

puffinman 05-11-2005 04:52 PM

Code:

read(STDIN, $buffer, $ENV{CONTENT_LENGTH});
This is reading from standard input the number of bytes contained in the content length, and putting the result in $buffer.

Code:

$Value =~ tr/+/ /;
Yeah, this is replacing + with a space. URL's can't have spaces in them, so people use pluses instead.


Code:

$Value =~ s/%([\dA-Fa-f][\dA-Fa-f])/pack("C", hex($1))/eg
This goes through the $Value string and replaces hexadecimal numbers with their character equivalents. So %5A is the letter Z. It's a little bit obfuscated here, he's using the saved value from the parenthesized capture ($1) and converting it to a number -- hex("5a") is 90. Then the pack as a character makes it into a Z in ASCII.

Code:

$encshname =~ s/'/%27/g;
Here is the opposite of the last one. The ' character is 27 in hexadecimal in ASCII.

mcosta 05-11-2005 04:57 PM

It's not too dificult if you know a little about HTTP.

Thinking in CGI:

When you request /cgi-bin/page.pl?a=1&b=2 you do a GET request. The parameters are in the url.
But you can request /cgi-bin/page.pl and stream a=1 later. Then you do POST. You can imagine the browser telling this to the web server:

POST /cgi-bin/page.pl HTTP/1.1
agent: foo
accept: bar

a=1&b=2

And then you read this via stdin. OK? that's easy

Now come the +. A url can't have spaces inside, so what do you do when you want to send one? encode it as '+'. And when you want to send a '+'? encode it as %2B. Yeah, it's really fun.

foo bar: foo+bar
foo+bar: foo%2Bbar

mcosta 05-11-2005 05:05 PM

I forgot the:

s/%([\dA-Fa-f][\dA-Fa-f])/pack("C", hex($1))/eg

Do you remember the + as %2B? sure. 2B is the '+' char ASCII hex value. %([\dA-Fa-f][\dA-Fa-f] is the regex of a hexadeciaml numebr. Sure you see it if I rewrite as:

[0-9A-Fa-f][0-9A-Fa-f])

The \d means decimal.

Now be a good engy RTFM about pack(). perldoc is your friend.

AltF4 05-11-2005 05:18 PM

Re: Perl/regexp help... - query string parsing...
 
Code:

use URI::Escape;


my ($erl, $shname);
my ($buffer, @NVPairs, $NameValue, $Name, $Value, %var, $encshname);

if($ENV{REQUEST_METHOD} eq 'GET') {
    $buffer = $ENV{QUERY_STRING};  # I fully understand this...
} else {
    read(STDIN, $buffer, $ENV{CONTENT_LENGTH});  # But have no clue about this

## depending on REQUEST_METHOD the CGI parameters
## are either in the QUERY_STRING parameter or are delivered
## on STDIN

}


# I get this, break up the query string using ampersand as delimiter
@NVPairs = split(/&/, $buffer);

# Now loop through the options and do something??? I think he's sanitizing the data...
foreach $NameValue (@NVPairs) {
    ($Name, $Value) = split(/=/, $NameValue);  # Understand this part...
    $Value =~ tr/+/ /;  # Huh? Replace '+' with ' '??? Im lost...

## Blanks in CGI parameters are escaped as '+'
## this converts back to blanks


    # I dont really understand this too well, especially the pack() function
    $Value =~ s/%([\dA-Fa-f][\dA-Fa-f])/pack("C", hex($1))/eg

## unpack non-printables (represented as '%00' - '%FF'  hex values
## in URLs) back to original values


    $var{$Name} = $Value;
}



$erl = $var{url};
$shname = uc($var{comp});
$encname = uri_escape("$shname");
$encshname =~ s/'/%27/g;    # umm... what?

## replace single quotes (') with their HEX representation
## (probably to avoid SQL injection attacks)
## read more: http://www.unixwiz.net/techtips/sql-injection.html


Hope this helps


All times are GMT -5. The time now is 10:35 PM.