Linux - GeneralThis Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
1) I have an old hard disk with file names that have accented characters. For example á é ã ô. When I browse that old disk, those characters are truncated.
I thought they should have been encoded in iso-8859-1 because I was a big fan of iso-8859-1 at the time, so I tried to convert them like this:
$ convmv -f iso-8859-1 -t utf8 ./
It didn't work. Neither did cp1252.
So I ran a loop over the output of `convmv --list' and did a dry run with each one of the possible encodings as the -from option. None of them worked.
Question number 1: is there some way to convert and correct those truncated characters in file names?
2) Then I used rsync to copy some very new files to the old disk, and I noticed that many (not all) file names were written enclosed in 'hard quotes' and not all of those have accented characters.
Question number 2: Why are those hard quotes being added to the file names, is it because the old hard disk is formatted with a different encoding?
3) Question number 3: Does "file system encoding" even exist? I thought the OS had its encoding enforced at run time, but file systems were agnostic.
I would say this is less about encoding, and more about locales. Presuming the Portuguese of your location, I would suggest you stay away from iso-8859-1 & cp-1252, which are distinctly unimaginate US codepages, where you don't get accents.
Get information. Look at 'man locale' and get some lists up. There's no problem setting up a Portuguese user with the different locale for that stuff. You need the corredt locale to see the correct names.
Generally, you want to get on Unicode. I have hacked the Irish keymap to get characters not on my keymap. There's some ingenious character definitions like 'dead_abovedot' because old Irish uses an abovedot. I have that mapped to AltGr & h. I press that, and nothing is printed; but the next letter I type has the abovedot - e.g. ̇ḃ.
Last edited by business_kid; 12-28-2022 at 02:13 PM.
OK, thanks, but UTF-8 has been my locale for more than 10 years and everything works fine... usually. This very old backup hard disk is an exception.
I also know how to map weird characters. I have a bunch mapped to my right-most Windows key for Russian. I was interested in learning Russian a couple of years ago, but gave up on that very fast. Anyway, my real problem now is recursively fixing a large number of broken file names.
what do you mean by 'truncated'? Are they missing? Garbled? Replaced with non-accented characters? Can you show an example of ls and ls|hexdump? In general, there is no such thing as 'filesystem encoding', filenames are interpreted according to the current session-level locale, so you have to guess the correct one and set LANG/LC_* variables accordingly.
Re Q2: ls always adds single quotes to filenames with delimiters which otherwise will be split into multiple parts in subsequent processing. Use 'ls -N' to list literal file names.
what do you mean by 'truncated'? Are they missing? Garbled? Replaced with non-accented characters? Can you show an example of ls and ls|hexdump? In general, there is no such thing as 'filesystem encoding', filenames are interpreted according to the current session-level locale, so you have to guess the correct one and set LANG/LC_* variables accordingly.
Re Q2: ls always adds single quotes to filenames with delimiters which otherwise will be split into multiple parts in subsequent processing. Use 'ls -N' to list literal file names.
On 'ls' output, it shows as '2009 ACORDO ORTOGR?FICO 2009 - tabela.doc'
The correct form would be '2009 ACORDO ORTOGRÁFICO 2009 - tabela.doc'
I suspected there is no such thing as filesystem encoding. Now I see what happened is that I used a different locale at the time the file was written. Most likely iso-8859-1.
Anyway, my real problem now is recursively fixing a large number of broken file names.
Glad you know your way around. Are you sure you're not just viewing them in the wrong codepage, Locale or Whatever? If the files are named wrongly, why not rename them? Are you sure the disk is OK?
While I think of it Codepage 437 used to be an old chesnut from Dos It was one of the standard lines in autoexec.bat (if memory serves correctly) That might have varied in other places. I didn't see CP-1252 until windows 95(?).
Glad you know your way around. Are you sure you're not just viewing them in the wrong codepage, Locale or Whatever? If the files are named wrongly, why not rename them? Are you sure the disk is OK?
While I think of it Codepage 437 used to be an old chesnut from Dos It was one of the standard lines in autoexec.bat (if memory serves correctly) That might have varied in other places. I didn't see CP-1252 until windows 95(?).
There are too many of them in directories and subdirectories. Doing that manually would be a good punishment sentence.
There is no DOS or Windows involved. It's all Linux, ext3. I believe I used Ubuntu at the time, but could have been Slackware.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.