LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 06-03-2013, 12:39 AM   #1
NikosGr
Member
 
Registered: Jun 2013
Posts: 63

Rep: Reputation: 0
Question Changing filenames from Greeklish => Greek (subprocess complain)


Here it is the snippet of files.py responsible to print greek filenames.
When it comes to subprocess trying to call files.py which in turn tries to print filenames with greek latters i get:


Code:
    for row in data:
        (url, hits, host, lastvisit) = row
        shorturl = url.replace( '/home/nikos/www/data/apps/', '' )
        lastvisit = lastvisit.strftime('%A %e %b, %H:%M')
        print('''
        <form method="get" action="cgi-bin/files.py">
            <tr>
                <td> <center> <input type="submit" name="shorturl" value="%s"> </td>
                <td> <center> <font color=yellow size=5> %s </td>
                <td> <center> <font color=orange size=4> %s </td>
                <td> <center> <font color=silver size=4> %s </td>
            </tr>
        </form>
        ''' % (shorturl, hits, host, lastvisit) )

and here is the the metrites.py snippet that calls files.py via subproccess to run:

Code:
    if page.endswith('.html'):
        with open('/home/nikos/public_html/' + page, encoding='utf-8') as f:
            htmldata = f.read()
        htmldata = htmldata % (quote, music)
        template = htmldata + counter
    elif page.endswith('.py'):
        htmldata = subprocess.check_output( '/home/nikos/public_html/cgi-bin/' + page )
        template = htmldata.decode('utf-8') + counter
    print( template )

Code:
 /home/nikos/public_html/cgi-bin/metrites.py in () 
    217                 template = htmldata + counter 
    218         elif page.endswith('.py'): 
=>  219                 htmldata = subprocess.check_output( '/home/nikos/public_html/cgi-bin/' + page ) 
    220                 template = htmldata.decode('utf-8').replace( 'Content-type: text/html; charset=utf-8', '' ) + counter 
    221                 
htmldata undefined, subprocess = <module 'subprocess' from '/opt/python3/lib/python3.3/subprocess.py'>, subprocess.check_output = <function check_output>, page = 'files.py' 
 /opt/python3/lib/python3.3/subprocess.py in check_output(timeout=None, *popenargs=('/home/nikos/public_html/cgi-bin/files.py',), **kwargs={}) 
    584         retcode = process.poll() 
    585         if retcode: 
=>  586             raise CalledProcessError(retcode, process.args, output=output) 
    587     return output 
    588 
global CalledProcessError = <class 'subprocess.CalledProcessError'>, retcode = 1, process = <subprocess.Popen object>, process.args = '/home/nikos/public_html/cgi-bin/files.py', output = b'Content-type: text/html; charset=utf-8\n\n<bod...n position 74: surrogates not allowed\n\n-->\n\n' 
CalledProcessError: Command '/home/nikos/public_html/cgi-bin/files.py' returned non-zero exit status 1 
      args = (1, '/home/nikos/public_html/cgi-bin/files.py') 
      cmd = '/home/nikos/public_html/cgi-bin/files.py' 
      output = b'Content-type: text/html; charset=utf-8\n\n<bod...n position 74: surrogates not allowed\n\n-->\n\n' 
      returncode = 1 
      with_traceback = <built-in method with_traceback of CalledProcessError object>
I'am looking 5 days for this , please someone help me. Thank you very much.
 
Old 06-03-2013, 02:10 PM   #2
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Arch + Xfce
Posts: 6,852

Rep: Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037
Please explain what you need in more detail. What exactly is this code supposed to do? What is it actually doing? Give us a little background context on what these scripts are, some examples of the input, the desired output, and the actual output.

I'm not a python person myself, but I'm sure someone here can help you.
 
Old 06-04-2013, 01:11 AM   #3
NikosGr
Member
 
Registered: Jun 2013
Posts: 63

Original Poster
Rep: Reputation: 0
Yes iam sorry, i should have given more insight. Here is it.

Here it is the whole code of files.py (initianlly posted a snippet but later i though so it was not good enough) responsible to present print greek filenames and let the user pick one filename for download.

http://pastebin.com/qXasy5iU

And here is the the metrites.py snippet that calls files.py via subprocess to run. When it comes to subprocess trying to call files.py which in turn tries to print filenames with greek letters:

if page.endswith('.html'):
with open('/home/nikos/public_html/' + page, encoding='utf-8') as f:
htmldata = f.read()
htmldata = htmldata % (quote, music)
template = htmldata + counter
elif page.endswith('.py'):
htmldata = subprocess.check_output( '/home/nikos/public_html/cgi-bin/' + page )
template = htmldata.decode('utf-8') + counter
print( template )
Here is the error appearing: http://pastebin.com/iTg0mJbF

I'am looking 5 days for this , please someone help me. Thank you very much.

If i'm right, the solution is to fix the file names to ensure that they are all valid UTF-8 names. If i view the directory containing these files in a file browser that supports UTF-8, i see file names containing Mojibake?

==================================

So, i just renamed one file from 'Euxi tou Ihsou.mp3' => 'Eυχή του Ιησού.mp3' and here is how it appears in the filename directory listing via Chrome.

superhost.gr/data/apps/

It doesn't display the file with proper Greek but with !@#$%^&...

So, when files.py needs to actually open that file, it cannot decode its stored byte stream from the hdd to proper 'utf-8' charset.

So, how to properly fix those filenames if this is the problem?
 
Old 06-04-2013, 02:37 PM   #4
NikosGr
Member
 
Registered: Jun 2013
Posts: 63

Original Poster
Rep: Reputation: 0
print( '''Content-type: text/html; charset=utf-8\n''' )

# Compute a set of current fullpaths
fullpaths = set()
path = "/home/nikos/www/data/apps/"

for root, dirs, files in os.walk(path):
for fullpath in files:
fullpaths.add( os.path.join(root, fullpath) )


I don't have to deal with file's contents but rather filenames themselves.

CODE: SELECT ALL
root@nikos [~]# ls -l /home/nikos/www/data/apps/
total 368548
drwxr-xr-x 2 nikos nikos 4096 Jun 4 14:49 ./
drwxr-xr-x 6 nikos nikos 4096 May 26 21:13 ../
-rwxr-xr-x 1 nikos nikos 13157283 Mar 17 12:57 100\ Mythoi\ tou\ Aiswpou.pdf*
-rwxr-xr-x 1 nikos nikos 29524686 Mar 11 18:17 Anekdotologio.exe*
-rw-r--r-- 1 nikos nikos 42413964 Jun 2 20:29 Battleship.exe
-rw-r--r-- 1 nikos nikos 236032 Jun 4 14:10 \323\352\335\370\357\365\ \335\355\341\355\ \341\361\351\350\354\374.exe
-rwxr-xr-x 1 nikos nikos 66896732 Mar 17 13:13 Kosmas\ o\ Aitwlos\ -\ Profiteies.pdf*
-rw-r--r-- 1 nikos nikos 51819750 Jun 2 20:04 Luxor\ Evolved.exe
-rw-r--r-- 1 nikos nikos 60571648 Jun 2 14:59 Monopoly.exe
-rw-r--r-- 1 nikos nikos 3511233 Jun 4 14:11 \305\365\367\336\ \364\357\365\ \311\347\363\357\375.mp3
-rwxr-xr-x 1 nikos nikos 1788164 Mar 14 11:31 Online\ Movie\ Player.zip*
-rw-r--r-- 1 nikos nikos 5277287 Jun 1 18:35 O\ Nomos\ tou\ Merfy\ v1-2-3.zip
-rwxr-xr-x 1 nikos nikos 16383001 Jun 22 2010 Orthodoxo\ Imerologio.exe*
-rw-r--r-- 1 nikos nikos 6084806 Jun 1 18:22 Pac-Man.exe
-rw-r--r-- 1 nikos nikos 25476584 Jun 2 19:50 Scrabble.exe
-rwxr-xr-x 1 nikos nikos 49141166 Mar 17 12:48 To\ 1o\ mou\ vivlio\ gia\ to\ skaki.pdf*
-rwxr-xr-x 1 nikos nikos 3298310 Mar 17 12:45 Vivlos\ gia\ Atheofovous.pdf*
-rw-r--r-- 1 nikos nikos 1764864 May 29 21:50 V-Radio\ v2.4.msi
root@nikos [~]#
-------------------------------------------------


As you see the subdirectory 'apps' contain both ebglish and greek lettered filenames.
Are those both unicode? Are the filenames of the actuals files also encoded as byte streams,much like the contents inside them?
if they are unicode then i really see no trouble when trying to:

cur.execute('''SELECT url FROM files WHERE url = %s''', ( fullpath, )

but this is what getting days now:

CODE: SELECT ALL
[Tue Jun 04 20:33:28 2013] [error] [client 46.12.95.59] ValueError: underlying buffer has been detached
[Tue Jun 04 20:33:28 2013] [error] [client 46.12.95.59]
[Tue Jun 04 20:33:28 2013] [error] [client 46.12.95.59] Original exception was:
[Tue Jun 04 20:33:28 2013] [error] [client 46.12.95.59] Traceback (most recent call last):
[Tue Jun 04 20:33:28 2013] [error] [client 46.12.95.59] File "files.py", line 72, in <module>
[Tue Jun 04 20:33:28 2013] [error] [client 46.12.95.59] cur.execute('''SELECT url FROM files WHERE url = %s''', (fullpath,) )
[Tue Jun 04 20:33:28 2013] [error] [client 46.12.95.59] File "/usr/local/lib/python3.3/site-packages/PyMySQL3-0.5-py3.3.egg/pymysql/cursors.py", line 108, in execute
[Tue Jun 04 20:33:28 2013] [error] [client 46.12.95.59] query = query.encode(charset)
[Tue Jun 04 20:33:28 2013] [error] [client 46.12.95.59] UnicodeEncodeError: 'utf-8' codec can't encode character '\\udcc5' in position 61: surrogates not allowed


What is the problem in your opinion? Since everythign is encoded in utf-8 for i'm using python 3.3.2 what does this error mean?
Please tell me what to try, iam hopeless and very tired of this issue.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Create a ISO without changing filenames (genisoimage) ajr-tech Linux - Software 1 08-15-2012 06:57 PM
[Bash] Rename filenames with corresponding filenames Power2All Linux - Software 4 12-02-2009 04:15 AM
View NTFS Partition with Greek Filenames corectly (UBUNTU) thothoneos Linux - Newbie 2 10-29-2005 09:18 AM
Can't read greek filenames! Braveheart1980 Linux - Software 18 11-10-2004 06:37 PM
Changing multiple filenames with a script brecki Linux - Newbie 8 01-30-2004 03:10 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 11:38 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration