{Converting the encoding} Avoiding to use the name of a file more the one time!?
ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
but, the filename is used 2 times!
1. How to decrease it to once?
2. How to overwrite the output on the input file?
3. `file -bi` tries to guess the text encoding, so, it can be mistake! Is there any better replacement?
It's not really clear what you're trying to do. Also, I almost missed what you were trying to do in the command substitution. You should really use $() instead of backticks (`) for command substitution.
What are you trying to do? I'm guessing you're trying to detect the encoding of the current file and convert it to utf-8. I'd rather avoid guessing. Be explicit and please provide the sample contents of abc.srt.
EDIT:
I think I see what you're trying to do. Create a small function instead and add it to one of your RC files (e.g. ~/.bashrc).
Code:
function toutf8() {
iconv -f "$(file -bi "$1" | sed 's/^.*=//')" -t "utf-8" "$1"
}
Then you can execute it on the command line like so...
@OP: you can use a variable to store the file-name with no problem. The problem is that encoding cannot be detected programmatically. Full stop. It can be checked if the file is valid as UTF-8, but even if it is, it still can be ISO-8859-x (no telling what x is between 1 and 16)
However, "file" can not recognize encoding, it just guesses!
You need to work more on it to stop converting when it can not recognize encoding, whit an error in output, if you want it to be a general code.
in your experience, which one has more accurate results: file, icu or konwert?
unfortunately, konwert just supports these languages: cs (Czech), de (German), el (Greek), eo (Esperanto), es (Spanish), fr (French), he (Hebrew), it (Italian), pl (Polish), pt (Portuguese), ru (Russian), and sv (Swedish).
konwert is the command-line tool and it works great for me. I just tried file -bi and it gives wrong encoding on my test input. I don't know any command line tool based on ICU (which is a library), so you probably have to write one yourself. But my experience in using it in nodejs was generally positive. At least you know when ICU failed to find encoding.
I just found another tool in ubuntu repos called uchardet. I works (detects encoding) fine on my test input.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.