LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - General (https://www.linuxquestions.org/questions/linux-general-1/)
-   -   Strange "characters" appearing in auto "created" man pages (https://www.linuxquestions.org/questions/linux-general-1/strange-characters-appearing-in-auto-created-man-pages-791745/)

Sector11 02-26-2010 09:01 AM

Strange "characters" appearing in auto "created" man pages
 
Hello people

If I use:
Code:

man aptitude
I see what I am supposed to see, for example:
Code:

      install
          Install one or more packages. The packages should be listed after
          the “install” command; if a package name contains a tilde character
          (“~”) or a question mark (“?”), it will be treated as a search
          pattern and every package matching the pattern will be installed
          (see the section “Search Patterns” in the aptitude reference
          manual).

Now if I create a text file with:
Code:

man aptitude>aptitude.txt
and then look at it, I see:
Code:

      install
          Install one or more packages. The packages should be listed after
          the “install” command; if a package name contains a tilde character
          (“~”) or a question mark (“?”), it will be treated as a search
          pattern and every package matching the pattern will be installed
          (see the section “Search Patterns” in the aptitude reference
          manual).

Does anyone know why and is there a fix?

David the H. 02-26-2010 09:55 AM

Garbled symbols are a sure sign of a conflict in character encodings. The file is probably either being created in an encoding that can't handle those characters, or it's being created correctly and the display program is set to use the wrong encoding. Do you get the same effect no matter what text reader or editor you use? If not, then my first guess is that the file is being created using utf-8, but the text display is trying to use something else, such as Western European (iso-8859-1).

If all programs show the same problem, then the source is likely the encoding used when the file is created; in which case I couldn't off-hand tell you why it's doing that exactly or how to fix it. The same command works just fine for me.

Please run the "locale" command and post the results, so we can see what encoding your shell is set to.

knudfl 02-26-2010 02:39 PM

Are you sure, that a "troff" document can be converted to
text just like that. I don't think so.

http://heirloom.sourceforge.net/doctools/troff.1b.html

http://vmlinux.org/cgi-bin/dwww?type...cation=TROFF/1
.....

jschiwal 02-26-2010 02:44 PM

That is what the man command does. Convert troff documents to text in your terminal. Changing the encoding of your terminal to utf8 would resolve strange characters when reading a manpage.

You may have a document that is intended to be printed instead of viewed in the terminal. But this wouldn't be the case for man pages.

It may be better to do something like this:
man --pager=cat --encoding=utf8 ><topic>.txt <topic>

You could create a oneliner in ~/bin/ or use an alias

alias man2txt='man --pager=cat --encoding=utf8'

man2txt smb.conf

#!/bin/bash
topic="$1"
man --pager=cat --encoding=utf8 $topic >${topic}.txt

p.s. No, I didn't change my signature just for this post. I had it previously.

Sector11 02-28-2010 09:56 AM

Quote:

Originally Posted by David the H. (Post 3877888)
Please run the "locale" command and post the results, so we can see what encoding your shell is set to.

Code:

Sun Feb 28, 12:47 $ locale
LANG=en_GB.UTF-8
LC_CTYPE="en_GB.UTF-8"
LC_NUMERIC="en_GB.UTF-8"
LC_TIME="en_GB.UTF-8"
LC_COLLATE="en_GB.UTF-8"
LC_MONETARY="en_GB.UTF-8"
LC_MESSAGES="en_GB.UTF-8"
LC_PAPER="en_GB.UTF-8"
LC_NAME="en_GB.UTF-8"
LC_ADDRESS="en_GB.UTF-8"
LC_TELEPHONE="en_GB.UTF-8"
LC_MEASUREMENT="en_GB.UTF-8"
LC_IDENTIFICATION="en_GB.UTF-8"
LC_ALL=
Sun Feb 28, 12:47 $

Hi David the H., Thanks for the response.

I see UTF-8 in there, Geany tells me the file is: ISO-8859-1

But the file I created "copying" the terminal output to a text tile is: UTF-8 (without BOM) and both gedit and geany read the strange characters in the first file.

Sector11 02-28-2010 10:13 AM

Quote:

Originally Posted by knudfl (Post 3878134)
Are you sure, that a "troff" document can be converted to
text just like that. I don't think so.

http://heirloom.sourceforge.net/doctools/troff.1b.html

http://vmlinux.org/cgi-bin/dwww?type...cation=TROFF/1
.....

Hi knudfl, thanks for responding, I'll check those links out.

I had to reinstall lately and was looking at the aptitude man pages when I decided that I wanted it as a text tile, and that's the result. Strange thing is about 50% of the time I get these strange characters.

They are usually a single quote: ( ' ) a double quote but not the "text" ones ( " ) these look like a ( 66 ) and ( 99 ) if you get my drift, and the hyphen ( - ).

A search and replace fixes it but it is a "process" I could do without.

Sector11 02-28-2010 10:31 AM

Quote:

Originally Posted by jschiwal (Post 3878140)
That is what the man command does. Convert troff documents to text in your terminal. Changing the encoding of your terminal to utf8 would resolve strange characters when reading a manpage.

You may have a document that is intended to be printed instead of viewed in the terminal. But this wouldn't be the case for man pages.

It may be better to do something like this:
man --pager=cat --encoding=utf8 ><topic>.txt <topic>

You could create a oneliner in ~/bin/ or use an alias

alias man2txt='man --pager=cat --encoding=utf8'

man2txt smb.conf

#!/bin/bash
topic="$1"
man --pager=cat --encoding=utf8 $topic >${topic}.txt

p.s. No, I didn't change my signature just for this post. I had it previously.

Hi jschiwal,

Terminator is configured to use UTF-8 and I've never had the problem when "reading" in a terminal just with reading the text file:

Code:

man program_name > program_name.txt
I tried your man2txt above and ended up with the same strange characters.

Sector11 02-28-2010 11:05 AM

Quote:

Originally Posted by jschiwal (Post 3878140)
p.s. No, I didn't change my signature just for this post. I had it previously.

Cute, I'm doing 110 things at once and just saw what you were talking about. Works nice except I don't use KDE. Evince can read the .ps file but not like you have in your sig. :)

I'm going to play with that though. I would much rather have text files, I can read them easier and edit things (add nots etc - for personal use.)

Another thanks for you.


All times are GMT -5. The time now is 02:37 AM.