Download your favorite Linux distribution at LQ ISO.
Go Back > Forums > Linux Forums > Linux - Newbie
User Name
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!


  Search this Thread
Old 12-06-2014, 12:00 PM   #1
Registered: Jan 2010
Distribution: Ubuntu
Posts: 95

Rep: Reputation: 4
Strange characters in a file

Hello all,

I recently got a dataset from the website of Within the data, I see a question mark with a black background. It looks like random characters cannot be read. I ran the command of more FILE_NAME.txt and I get the results I describe. I have also attached a screenshot.
Attached Thumbnails
Click image for larger version

Name:	error code.PNG
Views:	22
Size:	8.9 KB
ID:	17031  
Old 12-06-2014, 01:42 PM   #2
Ser Olmy
Senior Member
Registered: Jan 2012
Distribution: Slackware
Posts: 2,463

Rep: Reputation: Disabled
The text file was created/saved on a system using a different character set encoding than the computer/application you're using.

Only US ASCII codes are reasonably universal; these include characters A-Z and a-z, numbers, basic punctuation and a small selection of special characters like the dollar, the hash and percentage signs, some very basic mathematical symbols and so on. Other characters are considered "special", and various encoding schemes exist to handle various types of "extended" character sets.

If there's a mismatch between the encoding schemes used by a sender and a recipient of data, any "extended" codes may be interpreted incorrectly. In your case, accented characters aren't displayed properly. This is a very common problem with "pure" text files, since they lack any sort of header that identifies the character set encoding scheme being used.

If you can figure out which encoding scheme was used to create the file, you can convert it to the encoding scheme you're using with the iconv command.
Old 12-06-2014, 03:24 PM   #3
Senior Member
Registered: Dec 2012
Location: Washington DC area
Distribution: Fedora, CentOS, Slackware
Posts: 4,707

Rep: Reputation: 1270Reputation: 1270Reputation: 1270Reputation: 1270Reputation: 1270Reputation: 1270Reputation: 1270Reputation: 1270Reputation: 1270
As above...

but in addition, this is common when the data originates on a Windows system. Microsoft software tends to generate/use some not quite standard character sets. In at least one instance such screwups involved having a parity bit set on the apostrophe character... thus showing up as a ? instead.

In your specific case, it does look a bit more like just a different character font, but it could just be some Windows software with the not-quite-standard characters.


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off

Similar Threads
Thread Thread Starter Forum Replies Last Post
Strange characters on nginx Velotrol Linux - Server 0 11-27-2012 03:44 PM
strange characters in mozilla firefox girish_hilage Linux - General 3 04-24-2009 10:05 AM
Strange characters in command line in X Vitalie Ciubotaru Linux From Scratch 5 11-30-2006 06:53 PM
strange characters when routing man page to txt file DJOtaku Linux - General 3 05-15-2005 01:03 AM
Strange Characters : RedHat 8.0 UberPhreek Linux - Newbie 1 10-18-2002 05:58 AM > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 05:10 PM.

Main Menu
Write for LQ is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration