-   Linux - General (
-   -   Saving output without control characters (

damianpfister 07-28-2009 09:40 AM

Saving output without control characters
I have a text log file which has a copy of everything typed on the console. When I cat this file it will show me the likes of:


...and so on.

Now if I either cat -v or simply vi this file I get the following:


Now it is obvious that the ^H is the control character for BACKSPACE and ^M for RETURN - both of which are actually translated by cat so that the final command executed is shown and not all the errors (plus backspaces/enters).

It is easy enough to get rid of the ^M through sed or tr, but how do you get around all those ^H characters?

If I type in:

server1:>grep top outputfile.txt

(where outputfile.txt has the above 3 commands in it), it does not show me the last line of server1:>top since the actual output was server1:>tap^H^Hop^M

This makes it difficult to manipulate the data in this text file since the file is not "true" text but a combination of text and control characters:

server1:>file outputfile.txt
outputfile.txt: ASCII text, with CRLF line terminators, with overstriking

Is there any way I can do a cat of this file and then redirect that to a file, with that file looking the same as it would to STDOUT (console/xterm)?

If I currently try to do:

server1:>cat outputfile.txt > newfile.txt

I am left with all those original control characters, which makes for viewing in vi or parsing with grep difficult. Somewhere between cat and the console those control characters are actually interpreted rather than simply displayed....something I just cannot seem to replicate!

Any suggestions?

unSpawn 07-28-2009 10:17 AM

'dos2unix', 'recode' (or tr, sed or vi)?

damianpfister 07-28-2009 10:33 AM

The problem is that a simple conversion is not what is needed - rather a translation. Converting the likes of ESC and carriage returns between dos and unix is straight forward enough with dos2unix, vi, tr, sed and the like (not sure about recode - never used that before).

When you actually cat the file, it displays the "interpreted" output based on all the control characters, thus "masking" the fact that there was one command typed followed by a bunch of backspaces and then another command, before RETURN was hit.

Converting ^M (Carriage return) is fine, but how do you convert ^H when it is essentially a backspace that signifies the previous character needs to be erased?

Another example
cat file.txt

Hello World!

cat -v file.txt

Hello everyone^H^H^H^H^H^H^H^Hworld!^M

I tried doing a cat of the file (without -v so it shows up as I want it to...without control characters) and then an xsel to "select" the output and put that into the mouse copy buffer (middle-click paste) and then try output that to a file. No joy - get the exact same thing.

Yet if I cat the file, select the text with a mouse and paste it (middle-click) into another file it copies Hello World! rather than Hello everyone^H^H^H^H^H^H^H^Hworld!^M which is what I want (no control characters)....but it is not practical using a mouse (especially when inside of a script).

catkin 07-28-2009 12:40 PM

Hello damianpfister :)

It could be done in bash or maybe better by awk (how big are these files?). Would get complex if the user did command line editing but the bacspaces should be easy enough.

I guess the files are generated by the script command. That would make it a common problem; there are a lot of hits if you netsearch for "script command" and output.

There may also be font control characters (color, underlining etc.) and cursor positioning controls which would make it even more complex -- a very difficult problem to solve completely and generically (especially for all terminal types!) but maybe basic cleanup is "good enough".



tredegar 07-28-2009 02:03 PM

Errrr..... ... .. .

You seem to be running a key-logger. That you did not write yourself (or you'd know how to parse the files it produces, or make it produce "better" log files).

This sort of software is usually most unwelcome. And you have only this single post to your LQ name.

Can you give us some reasons why we should help you?

I expect your reasons will be very understandable, but ... .. .

This is a polite request for further information, and I expect it to be answered as such.

damianpfister 07-29-2009 04:16 AM

I am using the rootsh wrapper (, to monitor commands being executed by junior team members. The reason rootsh was chosen rather than script was for complete audit control, as script writes it's output to the users home directory (and is thus very visible and modifiable by them).

The log files are stored in a more secure location. I then run a custom script to pick out which Sysadmin logged into which server, at what date/time and the commands they executed. A weekly report is then created for management to peruse - any complaints/issues then dealt with between management and the Junior admin concerned.

So yes it is "keylogging" in the true sense of the word, but nothing sinister at all (everything within company policy).

The unfortunate thing is that rootsh logs absolutely everything - including ^H backspace keys and previous commands before they were erased by backspace.

It was suggested that I attempt to use col -b as a way around the whole backspace issue....plan on testing that out today.

tredegar 07-29-2009 11:15 AM

Thanks for the explanation.
Maybe sed can help you, but my sed skills are almost zero.
I did try for you, but, as I said I'm hopeless at sed
The closest I got was this (lifted from sed one liners explained Number 84) sed 's/.^H//g' but that's not quite right (you are welcome to try it, and improve on it though)

How are you going to cope with other control sequences (Eg Ctrl-C Ctrl-D?)
Is there any config file for rootsh that can adjust its behaviour?

damianpfister 07-30-2009 04:34 AM

Normally I would use sed to get rid of characters I do not want - especially ^M carriage returns. The problem is that simply doing this will not make the command display correctly.

For example:

Hello everyone^H^H^H^H^H^H^H^Hworld!^M

If I simply deleted all the ^H (backspaces) I would end up with:

Hello everyoneworld!^M

When what I really want is:

Hello world!^M

Those ^H backspaces are what was actually typed to delete the word everyone and then world was typed in to replace it.

I have tried the col -b option and it appears to work!

cat file.txt
Hello World!

cat -v file.txt
Hello everyone^H^H^H^H^H^H^H^Hworld!^M

col -b < file.txt > newfile.txt
Hello World!

cat -v newfile.txt
Hello World!

So essentially col -b acts as a filter of sorts, "interpreting" the backspace characters and only showing the final output not the initially deleted word nor the backspaces themselves.

All times are GMT -5. The time now is 12:31 AM.