LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (http://www.linuxquestions.org/questions/programming-9/)
-   -   Bash and netcat: Stripping http header (http://www.linuxquestions.org/questions/programming-9/bash-and-netcat-stripping-http-header-758911/)

Prokke 10-01-2009 05:21 AM

Bash and netcat: Stripping http header
 
Hi!

I'm getting http-requests with XML-content to a server using netcat as a backend.

I want to get the body of the http-request and format it using xmllint.

Code:

while true; do
        tmp=`mktemp -u $CWD/tfile.XXXXXX 2>/dev/null`
        echo "$HDR\n\n$HTTPRESP" | nc -l -p $lport > $tmp
        LINT_RS=`cat $tmp | xmllint --format - 2>/dev/null`
        echo "------ `date +\"%F %T\"` --------"
        echo "$LINT_RS"
        echo
        echo "request closed, restarting"
        sleep 1
    done

I've been searching for help for a while but havent found anything. Any ideas?

The header is separated from the body with an empty-line which should make it easier for awk.

catkin 10-01-2009 07:09 AM

What is the output from the script and how does it differ from what you want? Please post in code tags to preserve indentation etc.

Prokke 10-01-2009 07:30 AM

Hi Catkin!

Let me rephrase I want to transform an http response with header for example:

Code:

HTTP/1.1 200 OK
Date: Mon, 23 May 2005 22:38:34 GMT
Server: Apache/1.3.27 (Unix)  (Red-Hat/Linux)
Last-Modified: Wed, 08 Jan 2003 23:11:55 GMT
Etag: "3f80f-1b6-3e1cb03b"
Accept-Ranges: bytes
Content-Length: 438
Connection: close
Content-Type: text/html; charset=UTF-8

<xml>blablabla</xml>

To this:

Code:

<xml>blablabla</xml>
The size of the header and may vary, but according to rfc the http header should always be followed by an empty line.

konsolebox 10-01-2009 09:33 AM

Try this one. The code is not yet tested and I'm not yet sure if it will work but you might get the concept.
Code:

#!/bin/bash

for ((;;)); do
        tmp=$(mktemp -u "$CWD"/tfile.XXXXXX 2>/dev/null)

        exec 4< <(exec nc -l -p $lport <<< "$HDR\n\n$HTTPRESP")

        : > "$tmp"

        while read -u 4 LINE && test -n "$LINE"; do
                continue
        done

        while read -u 4 LINE; do
                echo "$LINE" >> "$tmp"
        done

        LINT_RS=`cat $tmp | xmllint --format - 2>/dev/null`
        echo "------ `date +\"%F %T\"` --------"
        echo "$LINT_RS"
        echo
        echo "request closed, restarting"
        sleep 1
done

Edit: No I think it's not going to work since the code will continue after exec.

Edit: New Code:

Code:

#!/bin/bash

for ((;;)); do
        tmp=$(mktemp -u "$CWD"/tfile.XXXXXX 2>/dev/null)

        exec 4< <(exec nc -l -p $lport <<< "$HDR\n\n$HTTPRESP")

        if read -u 4 LINE; then
                if [[ -n "$LINE" ]]; then
                        while read -u 4 LINE && test -n "$LINE"; do
                                continue
                        done
                fi

                if read -u 4 LINE; then
                        echo "$LINE" > "$tmp"

                        while read -u 4 LINE; do
                                echo "$LINE" >> "$tmp"
                        done

                        LINT_RS=`cat $tmp | xmllint --format - 2>/dev/null`
                        echo "------ `date +\"%F %T\"` --------"
                        echo "$LINT_RS"
                        echo
                        echo "request closed, restarting"
                fi
        fi

        exec 4<&-

        sleep 1
done

Edit: perhaps "$HDR\n\n$HTTPRESP" should be "$HDR"$'\n\n'"$HTTPRESP"

Prokke 10-02-2009 05:26 AM

Nice konsolebox, but it doesn't work completetly for me.

It read the header ok, but misses the body for some reason, I haven't figured out why yet.

This one works for me. It finds the linenumber of the first empty line, which separates the header and body, then I use tail to print the body.

Code:

    while true; do
        tmp=`mktemp -u $CWD/tfile.XXXXXX 2>/dev/null`
        dbg "TMP file $tmp"       
       
        #listen for incoming requests using netcat, respond w $HDR\n
        #$HTTPRESP and store the incoming request in the temporary file
        echo "$HDR\n\n$HTTPRESP" | nc -l -p $lport > $tmp

        #dbg "TMP contents `cat $tmp` "
        LC=`wc -l $tmp | gawk ' {print $1} '`
        #Get line number where 1st full newline is
        LN=`sed -n '/^\r/ =' $tmp`
        if [ -z $LN ]
        then
            LN=sed -n '/^\n/ =' $tmp
        fi

        PL=`expr $LC - $LN`
        PL=`expr $PL + 1`
        echo ""
        echo "------ `date +\"%F %T\"` -------- "
        RS=`tail -$PL $tmp`
        LINT_RS=`echo $RS | xmllint --format - 2>/dev/null`
        if [ "$?" != "0"  ]
        then
            echo "bad xml or empty request"
        else
            echo "$LINT_RS"
        fi
        echo "-------------- END -------------- "
        echo "got ${#RS} "
        echo "session closed, restarting"
       
    done


catkin 10-02-2009 06:14 AM

Is that an OK solution for you (in which case, please mark the thread [SOLVED]) or do you want to further refine it?

lutusp 10-02-2009 06:44 AM

Quote:

Originally Posted by Prokke (Post 3703348)
Hi Catkin!

Let me rephrase I want to transform an http response with header for example:

Code:

HTTP/1.1 200 OK
Date: Mon, 23 May 2005 22:38:34 GMT
Server: Apache/1.3.27 (Unix)  (Red-Hat/Linux)
Last-Modified: Wed, 08 Jan 2003 23:11:55 GMT
Etag: "3f80f-1b6-3e1cb03b"
Accept-Ranges: bytes
Content-Length: 438
Connection: close
Content-Type: text/html; charset=UTF-8

<xml>blablabla</xml>

To this:

Code:

<xml>blablabla</xml>
The size of the header and may vary, but according to rfc the http header should always be followed by an empty line.

Try this ('data.txt' contains the text from your example):

Code:

cat data.txt | tr '\n' '#' | sed "s/.*##//" | tr '#' '\n'
output:

Code:

<xml>blablabla</xml>
There are a bunch of ways to get what you want, this is just an example.

Writing your script in Ruby or Python would be better overall, and more flexible.

konsolebox 10-03-2009 02:22 AM

Quote:

Originally Posted by Prokke (Post 3704787)
Nice konsolebox, but it doesn't work completetly for me.

It read the header ok, but misses the body for some reason, I haven't figured out why yet.

did you try to see what was sent to $tmp?

gnashley 10-03-2009 04:22 AM

Below is an implementation of wget written in pure bash. The fetch-page function does what you want, skipping over the header and outputting the rest of the page.

Code:

#!/bin/bash
# Copyright 2008 GilbertAshley <amigo@ibiblio.org>
# BashTrix wget is a minimal implementation of wget
# written in pure BASH, with only a few options.
# The original idea and basic code for this are Copyright 2006 Ainsley Pereira.
# The idea for verify_url is from code which is Copyright 2007 Piete Sartain
# But the above code fragments both still used 'cat'.
# Copyright 2008 Noam Postavsky worked out how to
# get rid of 'cat' and provided other improvements

VERSION=0.2
# Minimum number of arguments needed by this program
MINARGS=1

show_usage() {
echo "Usage: ${0#*/} [OPTIONS] URL"
echo "${0#*/} [-hiOqV] URL"
echo ""
echo "  -i FILE --input-file=FILE                read filenames from FILE"
echo "  -o FILE --output-document=FILE        concatenate output to FILE"
echo "  -q --quiet                                Turn off wget's output"
echo "  -h --help                                Show this help page"
echo "  -V --version                                Show BashTrix wget version"
echo
exit
}

show_version() {
echo "BashTrix: wget $VERSION"
echo "BashTrix wget is a minimal implementation of wget"
echo "written in pure BASH, with only a few options."
exit
}

# show usage if '-h' or  '--help' is the first argument or no argument is given
case $1 in
        ""|"-h"|"--help") show_usage ;;
        "-V"|"--version") show_version ;;
esac

# get the number of command-line arguments given
ARGC=${#}

# check to make sure enough arguments were given or exit
if [[ $ARGC -lt $MINARGS ]] ; then
 echo "Too few arguments given (Minimum:$MINARGS)"
 echo
 show_usage
fi

# process command-line arguments
for WORD in "$@" ; do
        case $WORD in
                -*)  true ;
                        case $WORD in
                                --debug) [[ $DEBUG ]] && echo "Long Option"
                                        DEBUG=1
                                        shift ;;
                                --input-file=*) [[ $DEBUG ]] && echo "Long FIELD Option using '='"
                                        INPUT_FILE=${WORD:13}
                                        shift ;;
                                -i) [[ $DEBUG ]] && echo "Short split FIELD Option"
                                        if [[ ${2:0:1} != "-" ]] ; then
                                        INPUT_FILE=$2
                                        shift 2
                                        else
                                        echo "Missing argument"
                                        show_usage
                                        fi ;;
                                -i*) [[ $DEBUG ]] && echo "Short FIELD Option range -Bad syntax"
                                        echo "Bad syntax. Did you mean this?:"
                                        echo "-i ${WORD:2}"
                                        show_usage
                                        shift ;;
                                --output-document=*) [[ $DEBUG ]] && echo "Long FIELD Option using '='"
                                        DEST=${WORD:18}
                                        shift ;;
                                -O) [[ $DEBUG ]] && echo "Short split FIELD Option"
                                        if [[ ${2:0:1} != "-" ]] ; then
                                        DEST=$2
                                        shift 2
                                        else
                                        echo "Missing argument"
                                        show_usage
                                        fi ;;
                                -O*) [[ $DEBUG ]] && echo "Short FIELD Option range -Bad syntax"
                                        echo "Bad syntax. Did you mean this?:"
                                        echo "-i ${WORD:2}"
                                        show_usage
                                        shift ;;
                                -q|--quiet) BE_QUIET=1
                                        shift;;
                        esac
                ;;
        esac
done

# Starts reading from ${HOST}/${URL}. Throws away HTTP headers so
# page contents can be read from file descriptor "$1"
fetch-page()
{
    # eval's are necessary so that bash parses expansion of $1<> as a single token
    eval "exec $1<>/dev/tcp/${HOST}/80"
    eval "echo -e 'GET ${URL} HTTP/0.9\r\n\r\n' >&$1"
    # read and throw away HTTP headers, the end of headers is
    # indicated by an empty line (all lines are terminated \r\n)
    OLD_IFS="$IFS"
    IFS=$'\r'$'\n'
    while read -u$1 i && [ "${i/$'\r'/}" != "" ]; do : ; done
    IFS="$OLD_IFS"
}

# puts contents of ${HOST}/${URL} into ${DEST}
get_it()
{
# make sure $DEST starts empty
: > $DEST
fetch-page 3
fetch-page 4
# clear IFS, otherwise the bytes in it would read as empty
OLD_IFS="$IFS"
IFS=
# we read a single byte at a time from 3 with delimiter 'A',
# and from 4 with delimiter 'B'.
while read -r -n1 -dA -u3 A && read -r -n1 -dB -u4 B ; do
    # Now $A is the empty string if the true byte is 'A' or NULL, and
    # $B is the empty string if the true byte is 'B' or NULL.
    # Therefore if either $A or $B is not empty they have the true byte
    if [ -n "$B" ] ; then
        echo -n "$B" >> $DEST
    elif [ -n "$A" ] ; then
        echo -n "$A" >> $DEST
    else
        # both are empty so the true byte is NULL
        echo -en '\0' >> $DEST
    fi
done
# restore IFS
IFS="$OLD_IFS"
}

verify_url() {
exec 3<>"/dev/tcp/${HOST}/80"
echo -e "GET ${URL} HTTP/0.9\r\n\r\n" >&3
read -u3 i
if [[ $i =~ "200 OK" ]]; then
        echo 1
else
        echo 0
fi
}

strip_url() {
# remove the http:// or ftp:// from the RAW_URL
RAW_URL=$1
if [[ ${RAW_URL:0:7} = "http://" ]] ; then
        URL=${RAW_URL:7}
elif [[ ${RAW_URL:0:6} = "ftp://" ]] ; then
        URL=${RAW_URL:6}
else
        URL=${RAW_URL}
fi
}

show_error_404() {
if ! [[ $BE_QUIET ]] ; then
        echo "${HOST}/${URL}:"
        echo "ERROR 404: Not Found."
fi
}

if [[ $INPUT_FILE ]] ; then
        for RAW_URL in $(cat $INPUT_FILE) ; do
                # remove the http:// or ftp:// from the RAW_URL
                strip_url $RAW_URL
                # the HOST is the base name of the website
                HOST=${URL%%/*}
                # the url is the remaining path to the file(plus the leading '/'
                URL=/${URL#*/}
                # if the --output-file is not being used, then the DEST is $(basename $URL)
                if [[ $DEST = "" ]] ; then
                        DEST=${URL##*/}
                fi
                # make sure the URL exists
                if [[ "$(verify_url)" = 1  ]] ; then
                        [[ $DEBUG ]] && echo "${HOST}/${URL} - ${GREEN}found."
                        get_it
                else
                        show_error_404
                fi
        done
else
        RAW_URL="$@"
        # this is the same as above, but for single files
        strip_url $RAW_URL
        HOST=${URL%%/*}
        URL=/${URL#*/}
        if [[ $DEST = "" ]] ; then
                DEST=${URL##*/}
        fi
        if [[ "$(verify_url)" = "1" ]] ; then
                get_it
        else
                show_error_404
        fi
fi


Prokke 10-05-2009 03:58 AM

Quote:

Originally Posted by konsolebox (Post 3705844)
did you try to see what was sent to $tmp?

I tried echoing the lines before writing them to $tmp but it didnt print anything.



lutusp: Nice! Thanks! You are probably right, Python/Ruby would have been easier.


Gnashley: Thanks!


All times are GMT -5. The time now is 12:57 PM.