LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 08-12-2014, 04:26 AM   #1
vujanic86
LQ Newbie
 
Registered: Aug 2014
Posts: 3

Rep: Reputation: Disabled
File Name is not saved correctly for specific letters (probably encoding issue))


Hi,

We have Red Hat Enterprise Linux Server release 6.0 (Santiago)

I have two PDF files.
I copy them using ftp on Red Hat Server.
File names are:
Beogradska.pdf
ČarlijaČaplina.pdf

Files saved on linux system are presented like this:
Beogradska.pdf --- CORRECT
?arlija?aplina.pdf --NOT CORRECT

This is big problems because I open those files via HTTP server and first file is opened correclty while for second HTTP server reports that file "?arlija?aplina.pdf does not exist" which is true because my file name is ČarlijaČaplina.pdf not ?arlija?aplina.pdf like it is saved in Linux.

I have problems with all serbian letters if they are in file name (č,ć,,).


How to save file name correctly in Linux?
LANG variable is: LANG=en_US.UTF-8

But I do not know if that is the issue or something else? When file is opened then it displays information correclty
 
Old 08-12-2014, 08:35 AM   #2
jpollard
Senior Member
 
Registered: Dec 2012
Location: Washington DC area
Distribution: Fedora, CentOS, Slackware
Posts: 4,654

Rep: Reputation: 1255Reputation: 1255Reputation: 1255Reputation: 1255Reputation: 1255Reputation: 1255Reputation: 1255Reputation: 1255Reputation: 1255
The problem appears to be an incomplete UTF-8 implementation. The LANG spec is for en_US - which doesn't have the extended characters (hence the "?arlija?aplina" - the locale doesn't know how to display the specific character, hence the "?" instead. The actual correct character should be in the file name, just not recognized on output).

You should be able to see what the original bytes are using a "ls --show-control-chars" which should show numeric values for the unknown characters. You should be able to then match these numbers against the UTF8 set to see what they should be.

Since you are using FTP to copy the files there are several places the language context may be mismatched - on the server, and on the client. Since the files are transferred the names SHOULD contain the same as the original bytes for the file name. There can also be issues with Apache not handling the encoding properly.

I like the search entry provided by http://unicode-table.com/en/ for a quick lookup.

Last edited by jpollard; 08-12-2014 at 08:40 AM.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
File Name is not saved correctly for specific letters (probably encoding issue)) vujanic86 Linux - Software 1 08-12-2014 05:12 PM
File Name is not saved correctly for specific letters (probably encoding issue)) vujanic86 Red Hat 2 08-12-2014 09:21 AM
Help with sed and awk to change L-case letters to U-case for specific lines in a file rootaccess Linux - General 12 05-21-2012 03:50 PM
mysqldump: special chars are not correctly saved ddaas Linux - Server 3 02-23-2011 09:07 AM
[SOLVED] Awk - finding and counting words specific letters within mora978 Programming 9 10-13-2010 11:45 AM


All times are GMT -5. The time now is 08:43 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration