LinuxQuestions.org
Did you know LQ has a Linux Hardware Compatibility List?
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
LinkBack Search this Thread
Old 04-20-2009, 05:24 AM   #1
frenchn00b
Senior Member
 
Registered: Jun 2007
Location: E.U., Mountains :-)
Distribution: Debian, Etch, the greatest
Posts: 2,546

Rep: Reputation: 50
BASH/No X: Using google translate to convert TXT files (translate)


Any ideas how without X we could use the google translator?

eg:
Quote:
googletranslator en de file1.rtf output.rtf
(Oh, with the special chars, we shall use doc or rtf)

Hint:
Location maybe got from "http://google.com/translate?langpair=en%7Ces&u="

Best regards

Last edited by frenchn00b; 04-20-2009 at 05:25 AM.
 
Old 04-20-2009, 06:27 AM   #2
AnanthaP
Member
 
Registered: Jul 2004
Location: Chennai, India
Distribution: UBUNTU 5.10 since Jul-18,2006 on Intel 820 DC
Posts: 583

Rep: Reputation: 121Reputation: 121
I had a look at the site mentioned by you. I think you are out of luck. Google doesn't yet provide general translation servies (including file formats and embedded control charactes).

It works only for web pages.

End
 
Old 05-03-2009, 11:18 PM   #3
frenchn00b
Senior Member
 
Registered: Jun 2007
Location: E.U., Mountains :-)
Distribution: Debian, Etch, the greatest
Posts: 2,546

Original Poster
Rep: Reputation: 50
Quote:
Originally Posted by AnanthaP View Post
I had a look at the site mentioned by you. I think you are out of luck. Google doesn't yet provide general translation servies (including file formats and embedded control charactes).

It works only for web pages.

End
It might very certainly be possible. What is in google is certainly meant for users, so we can use it.

Actually I would like to :
enter a TXT file into my console, and that it translates it via google.

Why not? Why no one is interested, because nothing similar is already for linux?
 
Old 05-03-2009, 11:22 PM   #4
the_ultimate_samurai
Member
 
Registered: Jan 2006
Distribution: debian-lenny
Posts: 37

Rep: Reputation: 15
hmm...off the top of my head id say you could output said text into an html put it in your /var/www (if you have apache installed) and then link to http://(your ip)/the html and then remove said file after you get your result.

of course i dont know how to put text into a text field from bash.

ah, ok so you can just use the address:

http://translate.google.com/translat...ate0=es|en|foo

so you only need to replace foo and bar in that as well as es and en...

so foo is the domain, bar is any sub domains sepperated by %2F es is the source language (spanish...the default option) and en is the destination language (english...the default option)

so i guess your script needs only output an html into /vaw/www (or wherever your webroot is) then link to it (with appropriate unicode encodings) in the format above (replacing important parts of the link with the link to your html) then promptly clean up the html. (now at this point you can either just use the link to the translated page or you can go a step further and take out the translated text from the resultant html page (which is really just the reverse of the process used to make the page...though the page might be more complicated after translation)

Last edited by the_ultimate_samurai; 05-03-2009 at 11:36 PM.
 
Old 05-03-2009, 11:33 PM   #5
zeebu
LQ Newbie
 
Registered: May 2007
Posts: 4

Rep: Reputation: 0
Another option to setting up a web-server is instead to look at the HTML source on the google page and examine the form (if you need to understand how forms work, google "HTML Forms tutorial"). It's a form whose action is of type POST. The textarea 'text' is what you want to set with the input you wish to translate.

if you've not come across wget, you should read about it, it's very useful in automating web-page downloads. It's also capable of simulating form entry (for both forms of action-type GET and POST).

after examining the google web-page translation, you can use wget to request the page (try "man wget" at your prompt to check the syntax).

Finally this can all be setup and automated into a script. I recommend your script is invoked as:

./GoogleTranslate <filename> <from_language> <to_language>

NB: be sure to pass all *hidden* fields necessary from the form in the POST request too - ignoring them may not give you good results.

hope that helps
 
Old 05-03-2009, 11:43 PM   #6
the_ultimate_samurai
Member
 
Registered: Jan 2006
Distribution: debian-lenny
Posts: 37

Rep: Reputation: 15
yeah mine was just the first thing that came to mind...im sure there are many better ways
 
Old 05-04-2009, 01:07 PM   #7
frenchn00b
Senior Member
 
Registered: Jun 2007
Location: E.U., Mountains :-)
Distribution: Debian, Etch, the greatest
Posts: 2,546

Original Poster
Rep: Reputation: 50
we can use elink and specify the type of mozilla 4.0 something into wget...
(sorr for lack info in a rush)
 
Old 05-05-2009, 04:21 AM   #8
gnashley
Amigo developer
 
Registered: Dec 2003
Location: Germany
Distribution: Slackware
Posts: 4,724

Rep: Reputation: 449Reputation: 449Reputation: 449Reputation: 449Reputation: 449
I started to say that curl might be the best way to handle both GET and POST requests, but then I remembered you can do this with pur bash as well.
Have a look at bashbrowser:
http://www.pebble.org.uk/linux/bashbrowser
and lastbash:
http://freshmeat.net/projects/lastbash

Here's my adaption of wget using pure bash:

Code:
#!/bin/bash
# Copyright 2008 GilbertAshley <amigo@ibiblio.org>
# BashTrix wget is a minimal implementation of wget
# written in pure BASH, with only a few options.
# The original idea and basic code for this are Copyright 2006 Ainsley Pereira.
# The idea for verify_url is from code which is Copyright 2007 Piete Sartain
# But the above code fragments both still used 'cat'.
# Copyright 2008 Noam Postavsky worked out how to
# get rid of 'cat' and provided other improvements

VERSION=0.2
# Minimum number of arguments needed by this program
MINARGS=1

show_usage() {
echo "Usage: ${0#*/} [OPTIONS] URL"
echo "${0#*/} [-hiOqV] URL"
echo ""
echo "  -i FILE --input-file=FILE		read filenames from FILE"
echo "  -o FILE --output-document=FILE	concatenate output to FILE"
echo "  -q --quiet				Turn off wget's output"
echo "  -h --help				Show this help page"
echo "  -V --version				Show BashTrix wget version"
echo
exit
}

show_version() {
echo "BashTrix: wget $VERSION"
echo "BashTrix wget is a minimal implementation of wget"
echo "written in pure BASH, with only a few options."
exit
}

# show usage if '-h' or  '--help' is the first argument or no argument is given
case $1 in
	""|"-h"|"--help") show_usage ;;
	"-V"|"--version") show_version ;;
esac

# get the number of command-line arguments given
ARGC=${#}

# check to make sure enough arguments were given or exit
if [[ $ARGC -lt $MINARGS ]] ; then
 echo "Too few arguments given (Minimum:$MINARGS)"
 echo
 show_usage
fi

# process command-line arguments
for WORD in "$@" ; do
	case $WORD in
		-*)  true ;
			case $WORD in
				--debug) [[ $DEBUG ]] && echo "Long Option"
					DEBUG=1
					shift ;;
				--input-file=*) [[ $DEBUG ]] && echo "Long FIELD Option using '='"
					INPUT_FILE=${WORD:13}
					shift ;;
				-i) [[ $DEBUG ]] && echo "Short split FIELD Option"
					if [[ ${2:0:1} != "-" ]] ; then
					 INPUT_FILE=$2
					 shift 2
					else
					 echo "Missing argument"
					 show_usage
					fi ;;
				-i*) [[ $DEBUG ]] && echo "Short FIELD Option range -Bad syntax"
					echo "Bad syntax. Did you mean this?:"
					echo "-i ${WORD:2}"
					 show_usage
					shift ;;
				--output-document=*) [[ $DEBUG ]] && echo "Long FIELD Option using '='"
					DEST=${WORD:18}
					shift ;;
				-O) [[ $DEBUG ]] && echo "Short split FIELD Option"
					if [[ ${2:0:1} != "-" ]] ; then
					 DEST=$2
					 shift 2
					else
					 echo "Missing argument"
					 show_usage
					fi ;;
				-O*) [[ $DEBUG ]] && echo "Short FIELD Option range -Bad syntax"
					echo "Bad syntax. Did you mean this?:"
					echo "-i ${WORD:2}"
					 show_usage
					shift ;;
				-q|--quiet) BE_QUIET=1
					shift;;
			esac
		;;
	esac
done

# Starts reading from ${HOST}/${URL}. Throws away HTTP headers so
# page contents can be read from file descriptor "$1"
fetch-page()
{
    # eval's are necessary so that bash parses expansion of $1<> as a single token
    eval "exec $1<>/dev/tcp/${HOST}/80"
    eval "echo -e 'GET ${URL} HTTP/0.9\r\n\r\n' >&$1"
    # read and throw away HTTP headers, the end of headers is
    # indicated by an empty line (all lines are terminated \r\n)
    OLD_IFS="$IFS"
    IFS=$'\r'$'\n'
    while read -u$1 i && [ "${i/$'\r'/}" != "" ]; do : ; done
    IFS="$OLD_IFS"
}

# puts contents of ${HOST}/${URL} into ${DEST}
get_it()
{
# make sure $DEST starts empty
: > $DEST
fetch-page 3
fetch-page 4
# clear IFS, otherwise the bytes in it would read as empty
OLD_IFS="$IFS"
IFS=
# we read a single byte at a time from 3 with delimiter 'A',
# and from 4 with delimiter 'B'.
while read -r -n1 -dA -u3 A && read -r -n1 -dB -u4 B ; do
    # Now $A is the empty string if the true byte is 'A' or NULL, and
    # $B is the empty string if the true byte is 'B' or NULL.
    # Therefore if either $A or $B is not empty they have the true byte
    if [ -n "$B" ] ; then
        echo -n "$B" >> $DEST
    elif [ -n "$A" ] ; then
        echo -n "$A" >> $DEST
    else
        # both are empty so the true byte is NULL
	echo -en '\0' >> $DEST
    fi
done
# restore IFS
IFS="$OLD_IFS"
}

verify_url() {
exec 3<>"/dev/tcp/${HOST}/80"
echo -e "GET ${URL} HTTP/0.9\r\n\r\n" >&3
read -u3 i
if [[ $i =~ "200 OK" ]]; then
	echo 1
else
	echo 0
fi
}

strip_url() {
# remove the http:// or ftp:// from the RAW_URL
RAW_URL=$1
if [[ ${RAW_URL:0:7} = "http://" ]] ; then
	URL=${RAW_URL:7}
elif [[ ${RAW_URL:0:6} = "ftp://" ]] ; then
	URL=${RAW_URL:6}
else
	URL=${RAW_URL}
fi
}

show_error_404() {
if ! [[ $BE_QUIET ]] ; then
	echo "${HOST}/${URL}:"
	echo "ERROR 404: Not Found."
fi
}

if [[ $INPUT_FILE ]] ; then
	for RAW_URL in $(cat $INPUT_FILE) ; do
		# remove the http:// or ftp:// from the RAW_URL
		strip_url $RAW_URL
		# the HOST is the base name of the website
		HOST=${URL%%/*}
		# the url is the remaining path to the file(plus the leading '/'
		URL=/${URL#*/}
		# if the --output-file is not being used, then the DEST is $(basename $URL)
		if [[ $DEST = "" ]] ; then
			DEST=${URL##*/}
		fi
		# make sure the URL exists
		if [[ "$(verify_url)" = 1  ]] ; then
			[[ $DEBUG ]] && echo "${HOST}/${URL} - ${GREEN}found."
			get_it
		else
			show_error_404
		fi
	done
else
	RAW_URL="$@"
	# this is the same as above, but for single files
	strip_url $RAW_URL
	HOST=${URL%%/*}
	URL=/${URL#*/}
	if [[ $DEST = "" ]] ; then
		DEST=${URL##*/}
	fi
	if [[ "$(verify_url)" = "1" ]] ; then
		get_it
	else
		show_error_404
	fi
fi
 
Old 05-05-2009, 10:34 PM   #9
chrism01
Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Centos 6.5
Posts: 16,086

Rep: Reputation: 1995Reputation: 1995Reputation: 1995Reputation: 1995Reputation: 1995Reputation: 1995Reputation: 1995Reputation: 1995Reputation: 1995Reputation: 1995Reputation: 1995
You can use Perl and WWW::Mechanize http://search.cpan.org/~petdance/WWW...W/Mechanize.pm
Its designed for that sort of thing.
 
Old 05-05-2009, 11:08 PM   #10
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,695
Blog Entries: 5

Rep: Reputation: 239Reputation: 239Reputation: 239
an example using Python, for 1 word translation only from en to de.
Code:
import urllib , httplib,re
pat = re.compile("<.*?>",re.M|re.DOTALL)
params = urllib.urlencode({"hl":"en","ie":"UTF-8","text":"cat","sl":"en","tl":"de"}) #translate the word "cat"
headers = {
 "Content-type" : "application/x-www-form-urlencoded",
 "Accept" : "text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5",
 "Host" : "Host: translate.google.com" }
connection = httplib.HTTPConnection("translate.google.com:80")
connection.request("POST","/translate_t",params,headers)
response = connection.getresponse()
#print response.status,response.reason
data=response.read()
start = data.index("class=thead>Dictionary:") 
end = data.index("<a class=morelink")
data = data[start+1:end].split("<li>")
for items in data[1:]:
    print pat.sub("",items)
output:
Code:
# ./test.py
Katze
Raubkatze
Typ
Raupe
 
Old 09-13-2009, 10:55 PM   #11
frenchn00b
Senior Member
 
Registered: Jun 2007
Location: E.U., Mountains :-)
Distribution: Debian, Etch, the greatest
Posts: 2,546

Original Poster
Rep: Reputation: 50
By the time, is there already some packages for us to install, and be present in distro. such as:

Code:
translatetxt en ru --format rtf myfiletotrans.rtf myfiletranslated.rtf

?
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Please translate this PHP script to BASH protorox Programming 8 03-28-2014 12:09 PM
translate C to Assembly? browneyes Programming 11 05-01-2009 04:35 PM
to translate or not to translate HTML rblampain General 2 07-05-2007 09:04 AM
please, translate to me maginotjr Slackware 6 07-08-2005 06:40 PM
Please translate the following for a newbie.... Adrian Baker Linux - Newbie 11 04-30-2004 01:12 PM


All times are GMT -5. The time now is 01:45 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration