Bash: Trouble converting files from dos to unix format
TITLE
ABS Ch16-19DictLookupDef INTRO I am reading Advanced Bash Scripting by Mendell Cooper. In this eBook there is a Chapter 16 External Filters, Programs and Commands. Within this chapter there is Example 16-19 Looking up definitions in Webster's 1913 Dictionary, which wants the reader to download Webster's Dictionary 1913 (1st 100 pages). I first tried this program, but it didn't work for me giving the following error: ======================================================================== Problem 1: [[: not found ======================================================================== Code:
ll /usr/share/dict/webster1913-dict.txt I quickly fixed this specific error by
======================================================================== Problem 2: dictionary - non-ASCII characters ======================================================================== Before running this program, I then check /usr/share/dict/webster1913-dict.txt: Code:
ll /usr/share/dict/webster1913-dict.txt
Workaround 2 I go and then fix the content for about 4 words along with their definitions, and then I run the program. Code:
$ /usr/local/bin/practice/ABS/Ch16-19DictLookupDef3.sh Ape PREMISES I guess I had to make these changes because of the comment lines 9-10: "Convert it from DOS to UNIX format (with only LF at end of line) before using it with this script.". QUESTION How to convert it to UNIX format? I tried the utility dos2unix below, but it says that the dictionary file downloaded from project Gutenberg is a binary file!? Code:
$ dos2unix webster1913-dict.txt |
|
Quote:
|
If you cat it and get that strange text, then it doesn't seem like that's a plain text file.
Usually to convert a MS-DOS text file to a UNIX text file, you'd just have to do something like this with it: Code:
#!/usr/bin/perl |
The issue with many "DOS" files is also the encoding they use may be different from the encoding you use.
You can try to find the encoding with the command 'file' Code:
file /my/file.txt Quote:
Code:
iconv -f ISO-8859-15 -t UTF-8 -o /my/file.utf8.txt /my/file.txt |
no success
Quote:
Code:
$ ll 247*.txt
|
bash
Quote:
Code:
[[: not found How dependable is "sh" to test backward compatibility with older machines? |
Quote:
|
attempt of Laserbreak's solution
Quote:
Below is my attempt of your proposal: Code:
$ vim convert_msdos-UNIX.sh Code:
$ ll convert_msdos-UNIX.sh-rw-rw-r-- 1 a a 55 6月 9 15:37 convert_msdos-UNIX.sh |
Ramurd's solution - Attempt
Quote:
Ramurd, Below is my attempt of your proposal: Code:
$ file 247-0.txt Code:
$ iconv -f data -t UTF-8 -o ./247-0.txt ./247-0-UTF8.txt Any ideas what to do for format type 'data'? |
Quote:
|
@OP: Sorry I've lost track somewhere. What is the actual question? If it is related with a file, examine it with a hex-viewer, eg:
Code:
echo 'árvíztűrő tükörfúrógép' >sample |
Before using a hex viewer, see if the magic number gives you information on the file. The command is
Code:
file /usr/share/dict/webster1913-dict.txt Issue here, dos2unix will do the conversion but it assumes that the file IS in DOS text mode. This file appears to have encoding that is not the simple text that these utilities assume. You will need to find out WHAT it is to determine what conversions or mapping may be done. |
Quote:
It should be no great surprise that an old dictionary contains non-ASCII characters showing how words are pronounced. |
Quote:
The question then becomes "can the apps you are using properly use a dictionary file of this particular data format?" and if the answer is "no" then you have a more interesting problem. You may need to change apps to one that can use this dictionary, find a dictionary for your app, or find a converter SPECIFIC to the dictionary formats to do the conversion. |
All times are GMT -5. The time now is 12:40 AM. |