LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 09-23-2024, 09:16 PM   #1
lucmove
Senior Member
 
Registered: Aug 2005
Location: Brazil
Distribution: Debian
Posts: 1,484

Rep: Reputation: 111Reputation: 111
Is this an encoding problem?


Maybe this is not the right place to ask my question since it involves some Windows and no Linux. But the only off-topic section of LQ is described as for "non-technical" topics and my topic is technical.

Anyway,

Some word processors have this auto-correct feature. It detects a very specific misspelling and corrects the spelling automatically. For example, you type "speling" press space, and the software automatically changes the word to "spelling." That has always been done based on a list. If the misspelling is not in the list, it won't be corrected.

That is a very important feature to me because I use it as shorthand. My very long list has, for example:

Code:
b=because
fex=for example
i=I
ot=of the
smt=something
t=the
theyre=they're
That helps me type considerably faster. I've been doing that for more than 20 years.

I found this very old Windows office suite called Easy Office and tried it for a while. It has a word processor. I immediately investigated the presence of auto-correct (Yes!) and the possibility of using it as shorthand (Yes too!) and soon found the list in a file named 'correct.tlx' which contains, for example:

Code:
ahev	Ahave
ahppen	Ahappen
ahs	Chas
ahve	Ahave
almots	Aalmost
almsot	Aalmost
alomst	Aalmost
alot	Aa lot
So I added my own items (in Portuguese):

Code:
c	Acom
m	Amais
n	Anão
pq	Aporque
q	Aque
I tested and it works... Well, almost. Replacing 'n' with 'não' does not work. All the other entries work. I can see that something happens. Some kind of flicker or flinching. But the word is not "corrected."

So here are my questions to you gentlemen:

1. Why does 'não' fail? The accented character must be part of the reason, but what would the entire reason be? In what situation would an accented string fail?

2. Why is the capital 'A' used as the column separator? Is it arbitrary or could it be some obscure character that "nobody uses" that looks like a capital A to me because of how weird encodings are displayed sometimes? I am very ignorant of encodings but I have seen capital As before in mangled text.

Thank you for any clarification. Apologies for going off-topic.
 
Old 09-24-2024, 09:46 AM   #2
boughtonp
Senior Member
 
Registered: Feb 2007
Location: UK
Distribution: Debian
Posts: 3,778

Rep: Reputation: 2659Reputation: 2659Reputation: 2659Reputation: 2659Reputation: 2659Reputation: 2659Reputation: 2659Reputation: 2659Reputation: 2659Reputation: 2659Reputation: 2659
Quote:
Originally Posted by lucmove View Post
Maybe this is not the right place to ask my question since it involves some Windows and no Linux. But the only off-topic section of LQ is described as for "non-technical" topics and my topic is technical.
No maybe about it. The LQ Programming forum is for any questions involving programming, (meaning the the question should revolve around code).

A question about a probably broken feature in an obsolete and proprietary Windows word processor is not a programming question.


Since it is proprietary, nobody here can inspect the source code and see what it does or doesn't do. (If any source even exists given the company website disappeared a decade ago, and the suite itself was replaced four years prior to that).

Either ask on a forum that accepts questions about unmaintained Windows software, or switch to using a Linux-compatible suite on a Linux-based OS, where the bug likely doesn't exist.

 
Old 09-24-2024, 09:58 AM   #3
smallpond
Senior Member
 
Registered: Feb 2011
Location: Massachusetts, USA
Distribution: Fedora
Posts: 4,220

Rep: Reputation: 1295Reputation: 1295Reputation: 1295Reputation: 1295Reputation: 1295Reputation: 1295Reputation: 1295Reputation: 1295Reputation: 1295
Most likely a software bug. If the code reads não and stops on the accented character it will replace n with n.
 
Old 09-24-2024, 04:17 PM   #4
yancek
LQ Guru
 
Registered: Apr 2008
Distribution: Slackware, Ubuntu, PCLinux,
Posts: 10,958

Rep: Reputation: 2595Reputation: 2595Reputation: 2595Reputation: 2595Reputation: 2595Reputation: 2595Reputation: 2595Reputation: 2595Reputation: 2595Reputation: 2595Reputation: 2595
Quote:
Replacing 'n' with 'não' does not work
What does 'not work' even mean? Do you just see a 'flicker' when you try to use it? Which release of windows are you using to try this? Did you try posting at a support forum for whatever release that is? I'd agree with post 2 and doubt you can get help here for reasons mentioned in that post.
 
Old 09-24-2024, 10:05 PM   #5
EdGr
Senior Member
 
Registered: Dec 2010
Location: California, USA
Distribution: I run my own OS
Posts: 1,038

Rep: Reputation: 485Reputation: 485Reputation: 485Reputation: 485Reputation: 485
Quote:
Originally Posted by lucmove View Post
1. Why does 'não' fail? The accented character must be part of the reason, but what would the entire reason be? In what situation would an accented string fail?
Most likely, you have added a UTF-8 character. The program expects code pages.
Ed
 
1 members found this post helpful.
Old 09-24-2024, 10:30 PM   #6
lucmove
Senior Member
 
Registered: Aug 2005
Location: Brazil
Distribution: Debian
Posts: 1,484

Original Poster
Rep: Reputation: 111Reputation: 111
Quote:
Originally Posted by smallpond View Post
Most likely a software bug. If the code reads não and stops on the accented character it will replace n with n.
Yes, but what kind of bug? Lack of string cleaning? Some kind of assumption?

Quote:
Originally Posted by EdGr View Post
Most likely, you have added a UTF-8 character. The program expects code pages.
Ed
Thanks. This answer is headed where I wanted to go. I edited the file on Windows 8. The application is from the Windows 95 era. Would the encoding be correct if I had edited the file on Windows 95? Does the application used for editing (Notepad) affect the encoding?

Code:
$ file encoding.tlx 
encoding.tlx: ASCII text, with CRLF line terminators
$ file -i encoding.tlx 
encoding.tlx: text/plain; charset=us-ascii
Doesn't look like UTF-8 to me.
 
Old 09-24-2024, 10:45 PM   #7
EdGr
Senior Member
 
Registered: Dec 2010
Location: California, USA
Distribution: I run my own OS
Posts: 1,038

Rep: Reputation: 485Reputation: 485Reputation: 485Reputation: 485Reputation: 485
The program may have removed the UTF-8 character (U+00E3).

Some text editors allow one to specify the encoding. The file may not have been intended to be edited.
Ed
 
Old 09-25-2024, 12:45 AM   #8
NevemTeve
Senior Member
 
Registered: Oct 2011
Location: Budapest
Distribution: Debian/GNU/Linux, AIX
Posts: 4,935
Blog Entries: 1

Rep: Reputation: 1893Reputation: 1893Reputation: 1893Reputation: 1893Reputation: 1893Reputation: 1893Reputation: 1893Reputation: 1893Reputation: 1893Reputation: 1893Reputation: 1893
You can try with both Windows-1252 and UTF-8 (though most likely neither will work).
Code:
printf 'nw nww\xe3\nnu nuu\xc3\xa3\n' >sample.txt
result:
Code:
nw nww@
nu nuu@@
Depending on your editor's settings either of the two will be shown as 'ã'
Attached Files
File Type: txt sample.txt (17 Bytes, 4 views)

Last edited by NevemTeve; 09-25-2024 at 12:48 AM.
 
Old 09-25-2024, 09:17 AM   #9
sundialsvcs
LQ Guru
 
Registered: Feb 2004
Location: SE Tennessee, USA
Distribution: Gentoo, LFS
Posts: 10,933
Blog Entries: 4

Rep: Reputation: 4018Reputation: 4018Reputation: 4018Reputation: 4018Reputation: 4018Reputation: 4018Reputation: 4018Reputation: 4018Reputation: 4018Reputation: 4018Reputation: 4018
IIRC, in "the Windows 95 era," UTF didn't exist yet. We were still using cumbersome "code pages." So, the application might not know what to do.

UTF and Unicode was "the obvious, elegant solution" that was a long time in coming.

Last edited by sundialsvcs; 09-26-2024 at 09:49 AM.
 
Old 09-25-2024, 09:41 AM   #10
Guttorm
Senior Member
 
Registered: Dec 2003
Location: Trondheim, Norway
Distribution: Debian and Ubuntu
Posts: 1,460

Rep: Reputation: 449Reputation: 449Reputation: 449Reputation: 449Reputation: 449
Hi

Before UTF8, latin1 or ISO8859-1 was encodings often used. At least with most languages with latin letters.

You could try to convert it with iconv:

Code:
iconv -c -f UTF-8 -t ISO8859-1 input_file > output_file
 
1 members found this post helpful.
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Chinese encoding not encoding in kate linuxmandrake Linux - Software 1 12-12-2010 09:50 AM
Problem encoding Divx with Dvd::Rip mitchmiller Linux - Software 1 03-07-2004 11:35 PM
Problem with encoding dvdrips using mencoder/acidrip hari_seldon99 Linux - Software 7 01-29-2004 07:32 AM
drip (dvd rip) divx encoding problem demitasse Linux - Software 2 12-15-2003 08:25 AM
Encoding problem serotizm Linux - Newbie 0 10-21-2002 01:48 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 08:39 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration