LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 06-07-2017, 10:42 AM   #1
andrew.comly
Member
 
Registered: Dec 2012
Distribution: Trisquel-Mini 7.0, Lubuntu 14.04, Debian lxde 8.0
Posts: 311
Blog Entries: 2

Rep: Reputation: 16
Question Bash: Trouble converting files from dos to unix format


TITLE
ABS Ch16-19DictLookupDef

INTRO
I am reading Advanced Bash Scripting by Mendell Cooper. In this eBook there is a Chapter 16 External Filters, Programs and Commands. Within this chapter there is Example 16-19 Looking up definitions in Webster's 1913 Dictionary, which wants the reader to download Webster's Dictionary 1913 (1st 100 pages).

I first tried this program, but it didn't work for me giving the following error:

========================================================================
Problem 1: [[: not found
========================================================================
Code:
	ll /usr/share/dict/webster1913-dict.txt 
	-rw-rw-r-- 1 a a 1.5M  3月 23  2012 /usr/share/dict/webster1913-dict.txt
	
	sh Ch16-19DictLookupDef2.sh Abbey
	Ch16-19DictLookupDef2.sh: 20: Ch16-19DictLookupDef2.sh: [[: not found
	1st parameter detected!
Workaround 1
I quickly fixed this specific error by
  1. changing all double brackets into single brackets
  2. changing the "Definition"
    FROM
    Code:
    Definition=$(fgrep -A $MAXCONTEXTLINES "$1 \\" "$dictfile")
    TO
    Code:
    Definition=$(fgrep -A $MAXCONTEXTLINES "$1" "$dictfile"
{SOLVED}


========================================================================
Problem 2: dictionary - non-ASCII characters
========================================================================
Before running this program, I then check /usr/share/dict/webster1913-dict.txt:
Code:
ll /usr/share/dict/webster1913-dict.txt 
-rw-rw-r-- 1 a a 1.5M  3月 23  2012 /usr/share/dict/webster1913-dict.txt
I then cat it and look at some of its content. Quite strangely it doesn't consist solely of ASCII characters, examples taken from the first 66 lines:
  1. ½
  2. ØAb¶aÏca (?),
  3. AÏbac¶iÏnate

Workaround 2
I go and then fix the content for about 4 words along with their definitions, and then I run the program.

Code:
$ /usr/local/bin/practice/ABS/Ch16-19DictLookupDef3.sh Ape
1st parameter detected!
Ape (?), n. [AS. apa; akin to D. aap, OHG. affo, G. affe, Icel. api, Sw. apa, Dan. abe, W. epa.] 1. (Zo”l.) A quadrumanous mammal, esp. of the family Simiad‘, having teeth of the same number and form as in man, having teeth of the same number and form as in man, and possessing neither a tail nor cheek pouches. The name is applied esp. to species of the genus Hylobates, and is sometimes used as a general term for all Quadrumana. The higher forms, the gorilla, chimpanzee, and ourang, are often called anthropoid apes or man apes.
 The ape of the Old Testament was probably the rhesus monkey of India, and allied forms.
========================================================================

PREMISES
I guess I had to make these changes because of the comment lines 9-10: "Convert it from DOS to UNIX format (with only LF at end of line)
before using it with this script.".

QUESTION
How to convert it to UNIX format? I tried the utility dos2unix below, but it says that the dictionary file downloaded from project Gutenberg is a binary file!?
Code:
	$ dos2unix webster1913-dict.txt
	dos2unix: Binary symbol 0x15 found at line 35
	dos2unix: Skipping binary file webster1913-dict.txt
Any better way of fixing this than my above workaround for Example 16-19. Looking up definitions in Webster's 1913 Dictionary?

Last edited by andrew.comly; 06-09-2017 at 12:04 AM.
 
Old 06-07-2017, 12:02 PM   #2
norobro
Member
 
Registered: Feb 2006
Distribution: Debian Sid
Posts: 792

Rep: Reputation: 331Reputation: 331Reputation: 331Reputation: 331
  1. Try the -f option of dos2unix https://waterlan.home.xs4all.nl/dos2....htm#f---force
  2. Load the file into vim then do:
    Code:
    :set ff=unix
    :w
 
1 members found this post helpful.
Old 06-07-2017, 12:16 PM   #3
rknichols
Senior Member
 
Registered: Aug 2009
Distribution: Rocky Linux
Posts: 4,779

Rep: Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212
Quote:
Originally Posted by andrew.comly View Post
I first tried this program, but it didn't work for me giving the following error:

========================================================================
Problem 1: [[: not found
========================================================================
Code:
	ll /usr/share/dict/webster1913-dict.txt 
	-rw-rw-r-- 1 a a 1.5M  3月 23  2012 /usr/share/dict/webster1913-dict.txt
	
	sh Ch16-19DictLookupDef2.sh Abbey
	Ch16-19DictLookupDef2.sh: 20: Ch16-19DictLookupDef2.sh: [[: not found
	1st parameter detected!
You need to run that with "bash", not "sh". When invoked with the name "sh", bash tries to mimic historical versions of sh that do not support "[[ ... ]]".
 
2 members found this post helpful.
Old 06-07-2017, 04:22 PM   #4
Laserbeak
Member
 
Registered: Jan 2017
Location: Manhattan, NYC NY
Distribution: Mac OS X, iOS, Solaris
Posts: 508

Rep: Reputation: 143Reputation: 143
If you cat it and get that strange text, then it doesn't seem like that's a plain text file.


Usually to convert a MS-DOS text file to a UNIX text file, you'd just have to do something like this with it:

Code:
#!/usr/bin/perl
$/ = ''; 
$_ = <>;
s/\r\n/\n/gs;
print;
./abovecode < msdos.txt > unix.txt
 
Old 06-08-2017, 11:17 AM   #5
Ramurd
Member
 
Registered: Mar 2009
Location: Rotterdam, the Netherlands
Distribution: Slackwarelinux
Posts: 703

Rep: Reputation: 111Reputation: 111
The issue with many "DOS" files is also the encoding they use may be different from the encoding you use.
You can try to find the encoding with the command 'file'
Code:
file /my/file.txt
may get this result:
Quote:
/my/file.txt: ISO-8859 text, with very long lines, with CRLF line terminators
For example, if you use UTF-8 you can convert it with iconv:
Code:
iconv -f ISO-8859-15 -t UTF-8 -o /my/file.utf8.txt /my/file.txt
 
Old 06-09-2017, 12:42 AM   #6
andrew.comly
Member
 
Registered: Dec 2012
Distribution: Trisquel-Mini 7.0, Lubuntu 14.04, Debian lxde 8.0
Posts: 311

Original Poster
Blog Entries: 2

Rep: Reputation: 16
Post no success

Quote:
Originally Posted by norobro View Post
  1. Try the -f option of dos2unix https://waterlan.home.xs4all.nl/dos2....htm#f---force
  2. Load the file into vim then do:
    Code:
    :set ff=unix
    :w
Thanks, below I my results with above advice:
Code:
$ ll 247*.txt
-rw-rw-r-- 1 a a 1.5M  3月 22  2012 247-0.txt
$ dos2unix -f 247-0.txt
dos2unix: converting file 247-0.txt to Unix format ...
$ vim 247-0.txt
:set ff
{fileformat=unix              1,1           Top}
:wq
$ ll 247*.txt
-rw-rw-r-- 1 a a 1.5M  6月  9 13:21 247-0.txt
But still the dictionary file has non-ASCII characters in it, most especially in the words, e.g.
  1. Ab¶botÏship (?), n. [Abbot + Ïship.] The state or office of an abbot.
  2. AbÏbre¶viÏate (?), v.t. [imp. & p.p. Abbreviated (?); p.pr. & vb.n. Abbreviating.] [L. abbreviatus, p.p. of abbreviare; ad + breviare to shorten, fr. brevis short. See Abridge.] 1. To make briefer; to shorten; to abridge; to reduce by contraction or omission, especially of words written or spoken.
    It is one thing to abbreviate by contracting, another by cutting off.
    Bacon.
    2. (Math.) To reduce to lower terms, as a fraction.
    AbÏbre¶viÏate (?), a. [L. abbreviatus, p.p.] 1. Abbreviated; abridged; shortened. [R.] ½The abbreviate form.¸
    Earle.
    2. (Biol.) Having one part relatively shorter than another or than the ordinary type.
  3. AbÏbre¶viÏate, n. An abridgment. [Obs.]
    Elyot.
  4. AbÏbre¶viÏa·ted (?), a. Shortened; relatively short; abbreviate.
  5. AbÏbre·viÏa¶tion (?), n. [LL. abbreviatio: cf. F. abbr‚viation.] 1. The act of shortening, or reducing.
    2. The result of abbreviating; an abridgment.
    ...
  6. AbÏbre¶viÏa·tor (?), n. [LL.: cf. F. abbr‚viateur.] 1. One who abbreviates or shortens.
    2. One of a college of seventyÐtwo officers of the papal court whose duty is to make a short minute of a decision on a petition, or reply of the pope to a letter, and afterwards expand the minute into official form.
 
Old 06-09-2017, 12:52 AM   #7
andrew.comly
Member
 
Registered: Dec 2012
Distribution: Trisquel-Mini 7.0, Lubuntu 14.04, Debian lxde 8.0
Posts: 311

Original Poster
Blog Entries: 2

Rep: Reputation: 16
Smile bash

Quote:
Originally Posted by rknichols View Post
You need to run that with "bash", not "sh". When invoked with the name "sh", bash tries to mimic historical versions of sh that do not support "[[ ... ]]".
thanks a lot, now when I run that with the original version of this program, there is no more
Code:
[[: not found
error message.

How dependable is "sh" to test backward compatibility with older machines?
 
Old 06-09-2017, 08:50 AM   #8
rknichols
Senior Member
 
Registered: Aug 2009
Distribution: Rocky Linux
Posts: 4,779

Rep: Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212
Quote:
Originally Posted by andrew.comly View Post
How dependable is "sh" to test backward compatibility with older machines?
Going back through history, there have been a lot of programs that have been called "sh". There is no way a simple switch could make bash behave exactly like all of them.
 
1 members found this post helpful.
Old 06-09-2017, 06:41 PM   #9
andrew.comly
Member
 
Registered: Dec 2012
Distribution: Trisquel-Mini 7.0, Lubuntu 14.04, Debian lxde 8.0
Posts: 311

Original Poster
Blog Entries: 2

Rep: Reputation: 16
Question attempt of Laserbreak's solution

Quote:
Originally Posted by Laserbeak View Post
Usually to convert a MS-DOS text file to a UNIX text file, you'd just have to do something like this with it:
Code:
#!/usr/bin/perl
$/ = ''; 
$_ = <>;
s/\r\n/\n/gs;
print;
./abovecode < msdos.txt > unix.txt
Laserbreak,

Below is my attempt of your proposal:
Code:
$ vim convert_msdos-UNIX.sh
#!/usr/bin/perl
$/ = ''; 
$_ = <>;
s/\r\n/\n/gs;
print;
{:wq}

Code:
$ ll convert_msdos-UNIX.sh-rw-rw-r-- 1 a a 55  6月  9 15:37 convert_msdos-UNIX.sh
$ chmod +x convert_msdos-UNIX.sh
$ bash convert_msdos-UNIX.sh < 247-0.txt > 247-0_unix.txt
convert_msdos-UNIX.sh: line 2: $/: No such file or directory
convert_msdos-UNIX.sh: line 3: syntax error near unexpected token `;'
convert_msdos-UNIX.sh: line 3: `$_ = <>;'
{No Success Yet}

Last edited by andrew.comly; 06-09-2017 at 06:44 PM. Reason: accuracy
 
Old 06-09-2017, 06:48 PM   #10
andrew.comly
Member
 
Registered: Dec 2012
Distribution: Trisquel-Mini 7.0, Lubuntu 14.04, Debian lxde 8.0
Posts: 311

Original Poster
Blog Entries: 2

Rep: Reputation: 16
Question Ramurd's solution - Attempt

Quote:
Originally Posted by Ramurd View Post
You can try to find the encoding with the command 'file'
Code:
file /my/file.txt
may get this result:


For example, if you use UTF-8 you can convert it with iconv:
Code:
iconv -f ISO-8859-15 -t UTF-8 -o /my/file.utf8.txt /my/file.txt
_____________________________________
Ramurd,

Below is my attempt of your proposal:
Code:
$ file 247-0.txt 
247-0.txt: data
This format type 'data' is not what your proposal calls for, but thinking syntaxally I then subbed in 'data' for 'ISO-8859':
Code:
$ iconv -f data -t UTF-8 -o ./247-0.txt ./247-0-UTF8.txt
iconv: conversion from `data' is not supported
Try `iconv --help' or `iconv --usage' for more information.
{No Success Yet}

Any ideas what to do for format type 'data'?
 
Old 06-09-2017, 08:34 PM   #11
Laserbeak
Member
 
Registered: Jan 2017
Location: Manhattan, NYC NY
Distribution: Mac OS X, iOS, Solaris
Posts: 508

Rep: Reputation: 143Reputation: 143
Quote:
Originally Posted by andrew.comly View Post
Laserbreak,

Below is my attempt of your proposal:
Code:
$ vim convert_msdos-UNIX.sh
#!/usr/bin/perl
$/ = ''; 
$_ = <>;
s/\r\n/\n/gs;
print;
{:wq}

Code:
$ ll convert_msdos-UNIX.sh-rw-rw-r-- 1 a a 55  6月  9 15:37 convert_msdos-UNIX.sh
$ chmod +x convert_msdos-UNIX.sh
$ bash convert_msdos-UNIX.sh < 247-0.txt > 247-0_unix.txt
convert_msdos-UNIX.sh: line 2: $/: No such file or directory
convert_msdos-UNIX.sh: line 3: syntax error near unexpected token `;'
convert_msdos-UNIX.sh: line 3: `$_ = <>;'
{No Success Yet}
It's a perl program not a bash or sh program.
 
1 members found this post helpful.
Old 06-10-2017, 07:07 AM   #12
NevemTeve
Senior Member
 
Registered: Oct 2011
Location: Budapest
Distribution: Debian/GNU/Linux, AIX
Posts: 4,862
Blog Entries: 1

Rep: Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869
@OP: Sorry I've lost track somewhere. What is the actual question? If it is related with a file, examine it with a hex-viewer, eg:

Code:
echo 'árvíztűrő tükörfúrógép' >sample
adcr sample
od -tx1 sample
0000000 e1 72 76 ed 7a 74 fb 72 f5 20 74 fc 6b f6 72 66
0000020 fa 72 f3 67 e9 70 0d 0a
iconv -f iso-8859-2 -t utf-8 sample >sample_u
od -tx1 sample_u
0000000 c3 a1 72 76 c3 ad 7a 74 c5 b1 72 c5 91 20 74 c3
0000020 bc 6b c3 b6 72 66 c3 ba 72 c3 b3 67 c3 a9 70 0d
0000040 0a
 
1 members found this post helpful.
Old 06-10-2017, 07:27 AM   #13
wpeckham
LQ Guru
 
Registered: Apr 2010
Location: Continental USA
Distribution: Debian, Ubuntu, RedHat, DSL, Puppy, CentOS, Knoppix, Mint-DE, Sparky, VSIDO, tinycore, Q4OS,Manjaro
Posts: 5,625

Rep: Reputation: 2695Reputation: 2695Reputation: 2695Reputation: 2695Reputation: 2695Reputation: 2695Reputation: 2695Reputation: 2695Reputation: 2695Reputation: 2695Reputation: 2695
Before using a hex viewer, see if the magic number gives you information on the file. The command is
Code:
file /usr/share/dict/webster1913-dict.txt
and the output should tell you something.

Issue here, dos2unix will do the conversion but it assumes that the file IS in DOS text mode. This file appears to have encoding that is not the simple text that these utilities assume. You will need to find out WHAT it is to determine what conversions or mapping may be done.
 
Old 06-10-2017, 08:45 AM   #14
rknichols
Senior Member
 
Registered: Aug 2009
Distribution: Rocky Linux
Posts: 4,779

Rep: Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212
Quote:
Originally Posted by wpeckham View Post
Before using a hex viewer, see if the magic number gives you information on the file. The command is
Code:
file /usr/share/dict/webster1913-dict.txt
and the output should tell you something.
Already done in #10, with the result: "247-0.txt: data".

It should be no great surprise that an old dictionary contains non-ASCII characters showing how words are pronounced.
 
1 members found this post helpful.
Old 06-10-2017, 03:09 PM   #15
wpeckham
LQ Guru
 
Registered: Apr 2010
Location: Continental USA
Distribution: Debian, Ubuntu, RedHat, DSL, Puppy, CentOS, Knoppix, Mint-DE, Sparky, VSIDO, tinycore, Q4OS,Manjaro
Posts: 5,625

Rep: Reputation: 2695Reputation: 2695Reputation: 2695Reputation: 2695Reputation: 2695Reputation: 2695Reputation: 2695Reputation: 2695Reputation: 2695Reputation: 2695Reputation: 2695
Quote:
Originally Posted by rknichols View Post
Already done in #10, with the result: "247-0.txt: data".
Well there you go. Tools for properly reformatting text files are going to have undefined behavior if you use them on data files. The difference between DOS and UNIX format text is not the issue because this is not text.

The question then becomes "can the apps you are using properly use a dictionary file of this particular data format?" and if the answer is "no" then you have a more interesting problem. You may need to change apps to one that can use this dictionary, find a dictionary for your app, or find a converter SPECIFIC to the dictionary formats to do the conversion.
 
1 members found this post helpful.
  


Reply

Tags
ascii, binary, text processing, unix



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
SED - Converting UNIX to DOS file aurelio26 Linux - Newbie 7 04-09-2008 12:15 AM
script to convert dos to unix format kapilcool Linux - Software 3 06-15-2006 11:50 AM
Command to convert dos file to a unix format sathish80 Linux - Newbie 1 03-22-2006 11:32 PM
newline problem between DOS and Unix format cjs_pro Programming 7 03-11-2005 11:08 AM
converting MS-DOS files to Linux dalraidia Linux - Newbie 10 12-25-2002 07:49 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 10:29 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration