LinuxQuestions.org
Help answer threads with 0 replies.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
 
Search this Thread
Old 07-18-2008, 04:09 PM   #1
gnashley
Amigo developer
 
Registered: Dec 2003
Location: Germany
Distribution: Slackware
Posts: 4,792

Rep: Reputation: 484Reputation: 484Reputation: 484Reputation: 484Reputation: 484
BASH -copy stdin to stdout (replace cat) (bash browser)


I'm working on an interesting snippet of bash code which comprises a browser or replacement for wget. I'm trying to get rid of the last references to external programs -in this case 'cat'.
Can anyone figure out how to get rid of the two times that this calls cat? The crux of it seems to be 'copy stdin to stdout'.

Code:
#!/bin/bash
# bbrowser.sh

# example usage
# ./bbrowser.sh www.slackbuilds.org /slackbuilds/12.0/network/amsn.tar.gz

usage() {
echo "Usage: ${0#*/} URI FILENAME [DEST]"
}


if [ ${#@} -lt 2 ] ; then
 echo "Missing required arguments."
 usage
 exit 0
fi

URI=$1
FILE=$2

if [ $3 ] ; then
 DEST=$3
else
 DEST=$(basename $FILE)
fi

(echo -e "GET ${FILE} HTTP/0.9\r\n\r\n" 1>&3 & cat 0<&3) 3<> /dev/tcp/${URI}/80 \
| (read i; 
	while [ "$(echo ${i/$'\r'/})" != "" ]; do
	read i; 
	done; cat ) > ${DEST}
 
Old 07-19-2008, 09:51 AM   #2
ntubski
Senior Member
 
Registered: Nov 2005
Distribution: Debian
Posts: 2,678

Rep: Reputation: 926Reputation: 926Reputation: 926Reputation: 926Reputation: 926Reputation: 926Reputation: 926Reputation: 926
This turns out to be pretty hard. Here's a reasonable way of doing it with 1 cat (I found your original code to be very confusing since everything is jammed into a single command).

Code:
exec 3<>/dev/tcp/${URI}/80
echo -e "GET ${FILE} HTTP/0.9\r\n\r\n" >&3

#read the HTTP headers
while read -u3 i && [ "${i/$'\r'/}" != "" ]; do : ; done

#read the contents
cat <&3 > ${DEST}
Replace that last cat is hard because the only way to transfer data using builtins is read'ing into a variable, and echoing out (unless anyone sees another way). Null bytes don't show up in a variable(Can bash handle binary data?), and read requires a (non-null) delimiter. This means there will be 2 different bytes in the data stream that will read as the empty string. The only way around this I could figure out is to read the data twice, with 2 different delimiters:

Code:
function fetch-page
{
    eval "exec $1<>/dev/tcp/${URI}/80"
    eval "echo -e 'GET ${FILE} HTTP/0.9\r\n\r\n' >&$1"

    while read -u$1 i && [ "${i/$'\r'/}" != "" ]; do : ; done
}

fetch-page 3
fetch-page 4

: > $DEST
IFS=
while true ; do
    read -r -n1 -dA -u3 A || break
    read -r -n1 -dB -u4 B || break

    if [ -n "$B" ] ; then
        echo -n "$B" >> $DEST
    elif [ -n "$A" ] ; then
        echo -n "$A" >> $DEST
    else
        echo -en '\0' >> $DEST
    fi
done
This is about 4 times slower (for amsn.tar.gz) than the first method I posted...
 
Old 07-20-2008, 11:49 AM   #3
gnashley
Amigo developer
 
Registered: Dec 2003
Location: Germany
Distribution: Slackware
Posts: 4,792

Original Poster
Rep: Reputation: 484Reputation: 484Reputation: 484Reputation: 484Reputation: 484
Great piece of work there! The original code for this trick comes from an LFS user named Ainsley Pereira. I had just begun to convert the code into something more usable and wanted to add it to my collection of pure-bash utilities found here:
http://distro.ibiblio.org/pub/linux/...ects/BashTrix/

Here's what it looks like so far with extra notes at the end:

wget.sh:
Code:
#!/bin/bash
# BashTrix wget is a minimal implementation of wget
# written in pure BASH, with only a few options.
# Copyright Ainsley Pereira, Pieter Sartain, nutbski?, Gilbert Ashley

VERSION=0.1
# Minimum number of arguments needed by this program
MINARGS=1

show_usage() {
echo "Usage: ${0#*/} [OPTIONS] URL"
echo "${0#*/} [-hiOqV] URL"
echo ""
echo "  -i FILE --input-file=FILE		read filenames from FILE"
echo "  -o FILE --output-document=FILE	concatenate output to FILE"
echo "  -q --quiet				Turn off wget's output"
echo "  -h --help				Show this help page"
echo "  -V --version				Show BashTrix wget version"
echo
exit
}

show_version() {
echo "BashTrix: wget $VERSION"
echo "BashTrix wget is a minimal implementation of wget"
echo "written in pure BASH, with only a few options."
exit
}

# show usage if '-h' or  '--help' is the first argument or no argument is given
case $1 in
	""|"-h"|"--help") show_usage ;;
	"-V"|"--version") show_version ;;
esac

# get the number of command-line arguments given
ARGC=${#}

# check to make sure enough arguments were given or exit
if [[ $ARGC -lt $MINARGS ]] ; then
 echo "Too few arguments given (Minimum:$MINARGS)"
 echo
 show_usage
fi

# self-sorting argument types LongEquals, ShortSingle, ShortSplit, and ShortMulti
# process command-line arguments
for WORD in "$@" ; do
	case $WORD in
		-*)  true ;
			case $WORD in
				--debug) [[ $DEBUG ]] && echo "Long Option"
					DEBUG=1
					shift ;;
				--input-file=*) [[ $DEBUG ]] && echo "Long FIELD Option using '='"
					INPUT_FILE=${WORD:13}
					shift ;;
				-i) [[ $DEBUG ]] && echo "Short split FIELD Option"
					if [[ ${2:0:1} != "-" ]] ; then
					 INPUT_FILE=$2
					 shift 2
					else
					 echo "Missing argument"
					 show_usage
					fi ;;
				-i*) [[ $DEBUG ]] && echo "Short FIELD Option range -Bad syntax"
					echo "Bad syntax. Did you mean this?:"
					echo "-i ${WORD:2}"
					 show_usage
					shift ;;
				--output-document=*) [[ $DEBUG ]] && echo "Long FIELD Option using '='"
					DEST=${WORD:18}
					shift ;;
				-O) [[ $DEBUG ]] && echo "Short split FIELD Option"
					if [[ ${2:0:1} != "-" ]] ; then
					 DEST=$2
					 shift 2
					else
					 echo "Missing argument"
					 show_usage
					fi ;;
				-O*) [[ $DEBUG ]] && echo "Short FIELD Option range -Bad syntax"
					echo "Bad syntax. Did you mean this?:"
					echo "-i ${WORD:2}"
					 show_usage
					shift ;;
				-q|--quiet) BE_QUIET=1
					shift;;
			esac
		;;
	esac
done

fetch-page()
{
    eval "exec $1<>/dev/tcp/${DOMAIN}/80"
    eval "echo -e 'GET ${URL} HTTP/0.9\r\n\r\n' >&$1"
    while read -u$1 i && [ "${i/$'\r'/}" != "" ]; do : ; done
}

get_it()
{
fetch-page 3
fetch-page 4
: > $DEST
IFS=
while true ; do
    read -r -n1 -dA -u3 A || break
    read -r -n1 -dB -u4 B || break

    if [ -n "$B" ] ; then
        echo -n "$B" >> $DEST
    elif [ -n "$A" ] ; then
        echo -n "$A" >> $DEST
    else
        echo -en '\0' >> $DEST
    fi
done
}

verify_url() {
eval "exec 3<>/dev/tcp/${DOMAIN}/80"
eval "echo -e 'GET ${URL} HTTP/0.9\r\n\r\n' >&3"
while read -u3 i ; do 
	if [[ $i =~ "200 OK" ]]; then
		echo 1
		break
	else
		echo 0
		break
	fi
done
}

strip_url() {
# remove the http:// or ftp:// from the RAW_URL
RAW_URL=$1
if [[ ${RAW_URL:0:7} = "http://" ]] ; then
	URL=${RAW_URL:7}
elif [[ ${RAW_URL:0:6} = "ftp://" ]] ; then
	URL=${RAW_URL:6}
else
	URL=${RAW_URL}
fi
}

show_error_404() {
if ! [[ $BE_QUIET ]] ; then
	echo "${DOMAIN}/${URL}:"
	echo "ERROR 404: Not Found."
fi
}

if [[ $INPUT_FILE ]] ; then
	for RAW_URL in $(cat $INPUT_FILE) ; do
		# remove the http:// or ftp:// from the RAW_URL
		strip_url $RAW_URL
		# the DOMAIN is the base name of the website
		DOMAIN=${URL%%/*}
		# the url is the remaining path to the file(plus the leading '/'
		URL=/${URL#*/}
		# if the --output-file is not being used, then the DEST is $(basename $URL)
		if [[ $DEST = "" ]] ; then
			DEST=${URL##*/}
		fi
		
		if [[ "$(verify_url)" = 1  ]] ; then
			[[ $DEBUG ]] && echo "${DOMAIN}/${URL} - ${GREEN}found."
			get_it
		else
			show_error_404
		fi
	done
else
	RAW_URL="$@"
	# this is the same as above, but for single files
	strip_url $RAW_URL
	DOMAIN=${URL%%/*}
	URL=/${URL#*/}
	if [[ $DEST = "" ]] ; then
		DEST=${URL##*/}
	fi
	if [[ "$(verify_url)" = 1  ]] ; then
		get_it
	else
		show_error_404
	fi
fi

exit

# unused
HISTORY="
This really cool implementation originated with this code:
(echo -e "GET /~clock/twibright/download/links-2.1pre1.tar.bz2 HTTP/0.9\r\n\r\n" \
1>&3 & cat 0<&3) 3<> /dev/tcp/atrey.karlin.mff.cuni.cz/80 \
| (read i; while [ "$(echo $i | tr -d '\r')" != "" ]; \
do read i; done; cat) > links-2.1pre1.tar.bz2
which is Copyright 2006 by Ainsley Pereira.

Then, LinuxQuestions member Piete Sartain reworked it more or less like this:
(echo -e "HEAD ${repath[$i]} HTTP/1.0\r\n\r\n" 1>&3 & cat 0<&3) 3<> /dev/tcp/${repository[$i]}/80
tmp=`(echo -e "HEAD ${repath[$i]} HTTP/1.0\r\n\r\n" 1>&3 & cat 0<&3) 3<> /dev/tcp/${repository[$i]}/80 | grep "200 OK" | wc -l`

if [ $tmp == 1 ]; then
echo "${BLUE}${program[$i]} - ${GREEN}found."
else
echo "${BLUE}${program[$i]} - ${RED}not found!!"
fi
to check if a URL exists.

I asked for help on the LQ forum to eliminate the use of
the external 'cat' from the original code. Member 'ntubski'
came up with the solution used above in 'get_it'. I reworked
it to check for the existence of the URL in 'verify_url'
"
I was able to get Piete's code working using my bash-only versions of grep and wc, but still needed the external 'cat'. Your code fixes that (and probably points to possible improvement in my bash-only 'cat' -handling raw throughput ala 'cat' with no options).

Congratulations and thanks! Care to have your real name in the credits? You deserve it. Whoever heard of it -a shell-only version of wget!

Do you have any ideas for other utilities which could be done in pure BASH? Love to hear your suggestions or include what you write... I nearly have enough to write a 'bashybox' multi-call utility. Please note that all these utilities are deliberately written with a very verbose style of coding so that even shell beginners may be able to make sense of them, and so that I don't have to struggle so much with them when I leave them alone for awhile.

I still don't really understand why your using the extra fd #4. And I also don't understand about the -dA/-dB syntax with read. I guess I figured it out enough to make the verify_url work, but it would be nice if you added some comments that explain it more fully. Thanks a whole bunch, really! I'll have another look at my 'cat' to see if I can implement the 'raw' handling.

I started the idea of BASH-only utilities mostly as a learning exercise, but they are actually becoming pretty useful as they multiply. Ideas?
 
Old 07-20-2008, 11:51 AM   #4
gnashley
Amigo developer
 
Registered: Dec 2003
Location: Germany
Distribution: Slackware
Posts: 4,792

Original Poster
Rep: Reputation: 484Reputation: 484Reputation: 484Reputation: 484Reputation: 484
Hummm, I don't seem to be able to edit the above post. I forgot to mention that the above works like wget in its' most simple usage:
./wget.sh http://www.slackbuilds.org/slackbuil...rk/amsn.tar.gz
 
Old 07-21-2008, 01:14 PM   #5
ntubski
Senior Member
 
Registered: Nov 2005
Distribution: Debian
Posts: 2,678

Rep: Reputation: 926Reputation: 926Reputation: 926Reputation: 926Reputation: 926Reputation: 926Reputation: 926Reputation: 926
Quote:
Congratulations and thanks! Care to have your real name in the credits?
Sure, it was fun Here's a well commented version (with a few improvements):

Code:
# by Noam Postavsky

# Starts reading from ${DOMAIN}/${URL}. Throws away HTTP headers so
# page contents can be read from file descriptor "$1"
fetch-page()
{
    # eval's are necessary so that bash parses expansion of $1<> as a
    # single token

    eval "exec $1<>/dev/tcp/${DOMAIN}/80"
    eval "echo -e 'GET ${URL} HTTP/0.9\r\n\r\n' >&$1"

    # read and throw away HTTP headers, the end of headers is
    # indicated by an empty line (all lines are terminated \r\n)
    OLD_IFS="$IFS"
    IFS=$'\r'$'\n'
    while read -u$1 i && [ -n "$i" ]; do : ; done
    IFS="$OLD_IFS"
}

# puts contents of ${DOMAIN}/${URL} into ${DEST}
get_it()
{
    # make sure $DEST starts empty
    : > $DEST

    # to read binary data faithfully, we have to read the page twice:
    # each time with a different delimiter.
    fetch-page 3
    fetch-page 4

    # clear IFS, otherwise the bytes in it would read as empty
    OLD_IFS="$IFS"
    IFS=

    # we read a single byte at a time from 3 with delimiter 'A', and
    # from 4 with delimiter 'B'.
    while read -r -n1 -dA -u3 A && read -r -n1 -dB -u4 B ; do
        # Now $A is the empty string if the true byte is 'A' or NUL, and
        # $B is the empty string if the true byte is 'B' or NUL.

        # therefore if either $A or $B is not empty they have the true
        # byte
        if [ -n "$B" ] ; then
            echo -n "$B" >> $DEST
        elif [ -n "$A" ] ; then
            echo -n "$A" >> $DEST
        else
            # both are empty so the true byte is NUL
            echo -en '\0' >> $DEST
        fi
    done

    # restore IFS
    IFS="$OLD_IFS"
}
Quote:
I still don't really understand why your using the extra fd #4. And I also don't understand about the -dA/-dB syntax with read. I guess I figured it out enough to make the verify_url work, but it would be nice if you added some comments that explain it more fully. Thanks a whole bunch, really! I'll have another look at my 'cat' to see if I can implement the 'raw' handling.
Hopefully the above explains it clearly enough. Unfortunately, my trick won't help in making a fully general cat command, since you can't always read a data source twice .

I think verify_url and strip_url could be improved a bit:
Code:
verify_url() {
    exec 3<>"/dev/tcp/${DOMAIN}/80"
    echo -e "GET ${URL} HTTP/0.9\r\n\r\n" >&3
    read -u3 i

    # check for HTTP OK status 
    [[ $i =~ "200 OK" ]]
    got_ok=$?

    # don't leave the poor server hanging, close the stream
    exec 3>&-

    return $got_ok
}
Then
Code:
if [[ "$(verify_url)" = 1  ]]
becomes
Code:
if verify_url
Code:
strip_url()
{
     shopt -qs extglob # with extglob we can use prettier patterns
     RAW_URL="$1"
     URL="${RAW_URL#@(http|ftp)://}"
}
Although I think fetch-page won't work for ftp sites, since the ftp protocol isn't the same as http...

Last edited by ntubski; 07-21-2008 at 01:24 PM. Reason: got the operator screwed up in verify_url
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
BASH: Is it possible to take STDIN? microsoft/linux Programming 8 12-30-2011 04:21 PM
redirecting stdin in bash script artur Programming 3 12-09-2011 06:07 AM
[C] stdout and stdin replace by pipes and execve the child chudzielec Programming 6 01-27-2008 05:52 AM
Bash : add cr to stdout romainp Linux - General 4 08-30-2007 11:05 AM
demultiplexing bash stdout stream pobbz Linux - Software 3 06-21-2007 09:14 AM


All times are GMT -5. The time now is 06:39 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration