LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (http://www.linuxquestions.org/questions/programming-9/)
-   -   BASH/No X: Using google translate to convert TXT files (translate) (http://www.linuxquestions.org/questions/programming-9/bash-no-x-using-google-translate-to-convert-txt-files-translate-720376/)

frenchn00b 04-20-2009 06:24 AM

BASH/No X: Using google translate to convert TXT files (translate)
 
Any ideas how without X we could use the google translator?

eg:
Quote:

googletranslator en de file1.rtf output.rtf
(Oh, with the special chars, we shall use doc or rtf)

Hint:
Location maybe got from "http://google.com/translate?langpair=en%7Ces&u="

Best regards

AnanthaP 04-20-2009 07:27 AM

I had a look at the site mentioned by you. I think you are out of luck. Google doesn't yet provide general translation servies (including file formats and embedded control charactes).

It works only for web pages.

End

frenchn00b 05-04-2009 12:18 AM

Quote:

Originally Posted by AnanthaP (Post 3514937)
I had a look at the site mentioned by you. I think you are out of luck. Google doesn't yet provide general translation servies (including file formats and embedded control charactes).

It works only for web pages.

End

It might very certainly be possible. What is in google is certainly meant for users, so we can use it.

Actually I would like to :
enter a TXT file into my console, and that it translates it via google.

Why not? Why no one is interested, because nothing similar is already for linux?

the_ultimate_samurai 05-04-2009 12:22 AM

hmm...off the top of my head id say you could output said text into an html put it in your /var/www (if you have apache installed) and then link to http://(your ip)/the html and then remove said file after you get your result.

of course i dont know how to put text into a text field from bash.

ah, ok so you can just use the address:

http://translate.google.com/translat...ate0=es|en|foo

so you only need to replace foo and bar in that as well as es and en...

so foo is the domain, bar is any sub domains sepperated by %2F es is the source language (spanish...the default option) and en is the destination language (english...the default option)

so i guess your script needs only output an html into /vaw/www (or wherever your webroot is) then link to it (with appropriate unicode encodings) in the format above (replacing important parts of the link with the link to your html) then promptly clean up the html. (now at this point you can either just use the link to the translated page or you can go a step further and take out the translated text from the resultant html page (which is really just the reverse of the process used to make the page...though the page might be more complicated after translation)

zeebu 05-04-2009 12:33 AM

Another option to setting up a web-server is instead to look at the HTML source on the google page and examine the form (if you need to understand how forms work, google "HTML Forms tutorial"). It's a form whose action is of type POST. The textarea 'text' is what you want to set with the input you wish to translate.

if you've not come across wget, you should read about it, it's very useful in automating web-page downloads. It's also capable of simulating form entry (for both forms of action-type GET and POST).

after examining the google web-page translation, you can use wget to request the page (try "man wget" at your prompt to check the syntax).

Finally this can all be setup and automated into a script. I recommend your script is invoked as:

./GoogleTranslate <filename> <from_language> <to_language>

NB: be sure to pass all *hidden* fields necessary from the form in the POST request too - ignoring them may not give you good results.

hope that helps

the_ultimate_samurai 05-04-2009 12:43 AM

yeah mine was just the first thing that came to mind...im sure there are many better ways

frenchn00b 05-04-2009 02:07 PM

we can use elink and specify the type of mozilla 4.0 something into wget...
(sorr for lack info in a rush)

gnashley 05-05-2009 05:21 AM

I started to say that curl might be the best way to handle both GET and POST requests, but then I remembered you can do this with pur bash as well.
Have a look at bashbrowser:
http://www.pebble.org.uk/linux/bashbrowser
and lastbash:
http://freshmeat.net/projects/lastbash

Here's my adaption of wget using pure bash:

Code:

#!/bin/bash
# Copyright 2008 GilbertAshley <amigo@ibiblio.org>
# BashTrix wget is a minimal implementation of wget
# written in pure BASH, with only a few options.
# The original idea and basic code for this are Copyright 2006 Ainsley Pereira.
# The idea for verify_url is from code which is Copyright 2007 Piete Sartain
# But the above code fragments both still used 'cat'.
# Copyright 2008 Noam Postavsky worked out how to
# get rid of 'cat' and provided other improvements

VERSION=0.2
# Minimum number of arguments needed by this program
MINARGS=1

show_usage() {
echo "Usage: ${0#*/} [OPTIONS] URL"
echo "${0#*/} [-hiOqV] URL"
echo ""
echo "  -i FILE --input-file=FILE                read filenames from FILE"
echo "  -o FILE --output-document=FILE        concatenate output to FILE"
echo "  -q --quiet                                Turn off wget's output"
echo "  -h --help                                Show this help page"
echo "  -V --version                                Show BashTrix wget version"
echo
exit
}

show_version() {
echo "BashTrix: wget $VERSION"
echo "BashTrix wget is a minimal implementation of wget"
echo "written in pure BASH, with only a few options."
exit
}

# show usage if '-h' or  '--help' is the first argument or no argument is given
case $1 in
        ""|"-h"|"--help") show_usage ;;
        "-V"|"--version") show_version ;;
esac

# get the number of command-line arguments given
ARGC=${#}

# check to make sure enough arguments were given or exit
if [[ $ARGC -lt $MINARGS ]] ; then
 echo "Too few arguments given (Minimum:$MINARGS)"
 echo
 show_usage
fi

# process command-line arguments
for WORD in "$@" ; do
        case $WORD in
                -*)  true ;
                        case $WORD in
                                --debug) [[ $DEBUG ]] && echo "Long Option"
                                        DEBUG=1
                                        shift ;;
                                --input-file=*) [[ $DEBUG ]] && echo "Long FIELD Option using '='"
                                        INPUT_FILE=${WORD:13}
                                        shift ;;
                                -i) [[ $DEBUG ]] && echo "Short split FIELD Option"
                                        if [[ ${2:0:1} != "-" ]] ; then
                                        INPUT_FILE=$2
                                        shift 2
                                        else
                                        echo "Missing argument"
                                        show_usage
                                        fi ;;
                                -i*) [[ $DEBUG ]] && echo "Short FIELD Option range -Bad syntax"
                                        echo "Bad syntax. Did you mean this?:"
                                        echo "-i ${WORD:2}"
                                        show_usage
                                        shift ;;
                                --output-document=*) [[ $DEBUG ]] && echo "Long FIELD Option using '='"
                                        DEST=${WORD:18}
                                        shift ;;
                                -O) [[ $DEBUG ]] && echo "Short split FIELD Option"
                                        if [[ ${2:0:1} != "-" ]] ; then
                                        DEST=$2
                                        shift 2
                                        else
                                        echo "Missing argument"
                                        show_usage
                                        fi ;;
                                -O*) [[ $DEBUG ]] && echo "Short FIELD Option range -Bad syntax"
                                        echo "Bad syntax. Did you mean this?:"
                                        echo "-i ${WORD:2}"
                                        show_usage
                                        shift ;;
                                -q|--quiet) BE_QUIET=1
                                        shift;;
                        esac
                ;;
        esac
done

# Starts reading from ${HOST}/${URL}. Throws away HTTP headers so
# page contents can be read from file descriptor "$1"
fetch-page()
{
    # eval's are necessary so that bash parses expansion of $1<> as a single token
    eval "exec $1<>/dev/tcp/${HOST}/80"
    eval "echo -e 'GET ${URL} HTTP/0.9\r\n\r\n' >&$1"
    # read and throw away HTTP headers, the end of headers is
    # indicated by an empty line (all lines are terminated \r\n)
    OLD_IFS="$IFS"
    IFS=$'\r'$'\n'
    while read -u$1 i && [ "${i/$'\r'/}" != "" ]; do : ; done
    IFS="$OLD_IFS"
}

# puts contents of ${HOST}/${URL} into ${DEST}
get_it()
{
# make sure $DEST starts empty
: > $DEST
fetch-page 3
fetch-page 4
# clear IFS, otherwise the bytes in it would read as empty
OLD_IFS="$IFS"
IFS=
# we read a single byte at a time from 3 with delimiter 'A',
# and from 4 with delimiter 'B'.
while read -r -n1 -dA -u3 A && read -r -n1 -dB -u4 B ; do
    # Now $A is the empty string if the true byte is 'A' or NULL, and
    # $B is the empty string if the true byte is 'B' or NULL.
    # Therefore if either $A or $B is not empty they have the true byte
    if [ -n "$B" ] ; then
        echo -n "$B" >> $DEST
    elif [ -n "$A" ] ; then
        echo -n "$A" >> $DEST
    else
        # both are empty so the true byte is NULL
        echo -en '\0' >> $DEST
    fi
done
# restore IFS
IFS="$OLD_IFS"
}

verify_url() {
exec 3<>"/dev/tcp/${HOST}/80"
echo -e "GET ${URL} HTTP/0.9\r\n\r\n" >&3
read -u3 i
if [[ $i =~ "200 OK" ]]; then
        echo 1
else
        echo 0
fi
}

strip_url() {
# remove the http:// or ftp:// from the RAW_URL
RAW_URL=$1
if [[ ${RAW_URL:0:7} = "http://" ]] ; then
        URL=${RAW_URL:7}
elif [[ ${RAW_URL:0:6} = "ftp://" ]] ; then
        URL=${RAW_URL:6}
else
        URL=${RAW_URL}
fi
}

show_error_404() {
if ! [[ $BE_QUIET ]] ; then
        echo "${HOST}/${URL}:"
        echo "ERROR 404: Not Found."
fi
}

if [[ $INPUT_FILE ]] ; then
        for RAW_URL in $(cat $INPUT_FILE) ; do
                # remove the http:// or ftp:// from the RAW_URL
                strip_url $RAW_URL
                # the HOST is the base name of the website
                HOST=${URL%%/*}
                # the url is the remaining path to the file(plus the leading '/'
                URL=/${URL#*/}
                # if the --output-file is not being used, then the DEST is $(basename $URL)
                if [[ $DEST = "" ]] ; then
                        DEST=${URL##*/}
                fi
                # make sure the URL exists
                if [[ "$(verify_url)" = 1  ]] ; then
                        [[ $DEBUG ]] && echo "${HOST}/${URL} - ${GREEN}found."
                        get_it
                else
                        show_error_404
                fi
        done
else
        RAW_URL="$@"
        # this is the same as above, but for single files
        strip_url $RAW_URL
        HOST=${URL%%/*}
        URL=/${URL#*/}
        if [[ $DEST = "" ]] ; then
                DEST=${URL##*/}
        fi
        if [[ "$(verify_url)" = "1" ]] ; then
                get_it
        else
                show_error_404
        fi
fi


chrism01 05-05-2009 11:34 PM

You can use Perl and WWW::Mechanize http://search.cpan.org/~petdance/WWW...W/Mechanize.pm
Its designed for that sort of thing.

ghostdog74 05-06-2009 12:08 AM

an example using Python, for 1 word translation only from en to de.
Code:

import urllib , httplib,re
pat = re.compile("<.*?>",re.M|re.DOTALL)
params = urllib.urlencode({"hl":"en","ie":"UTF-8","text":"cat","sl":"en","tl":"de"}) #translate the word "cat"
headers = {
 "Content-type" : "application/x-www-form-urlencoded",
 "Accept" : "text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5",
 "Host" : "Host: translate.google.com" }
connection = httplib.HTTPConnection("translate.google.com:80")
connection.request("POST","/translate_t",params,headers)
response = connection.getresponse()
#print response.status,response.reason
data=response.read()
start = data.index("class=thead>Dictionary:")
end = data.index("<a class=morelink")
data = data[start+1:end].split("<li>")
for items in data[1:]:
    print pat.sub("",items)

output:
Code:

# ./test.py
Katze
Raubkatze
Typ
Raupe


frenchn00b 09-13-2009 11:55 PM

By the time, is there already some packages for us to install, and be present in distro. such as:

Code:

translatetxt en ru --format rtf myfiletotrans.rtf myfiletranslated.rtf

?


All times are GMT -5. The time now is 12:35 PM.