LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   Remove Control Characters from a File (https://www.linuxquestions.org/questions/linux-newbie-8/remove-control-characters-from-a-file-4175448618/)

NeonFlash 02-04-2013 08:30 PM

Remove Control Characters from a File
 
I want to delete all the control characters from my file using linux bash commands.

There are some control characters like EOF (0x1A) especially which are causing the problem when I load my file in another software. I want to delete these.

Here is what I have tried so far:

this will list all the control characters:


Code:

cat -v -e -t file.txt | head -n 10

^A+^X$
^A1^X$
^D ^_$
^E-^D$
^E-^S$
^E1^V$
^F%^_$
^F-^D$
^F.^_$
^F/^_$
^F4EZ$
^G%$

This will list all the control characters using grep:

Code:

$ cat file.txt | head -n 10 | grep '[[:cntrl:]]'
+
1

-
-
1
%
-
.
/

matches the above output of cat command.

Now, I ran the following command to show all lines not containing control characters but it is still showing the same output as above (lines with control characters)

Code:

$ cat file.txt | head -n 10 | grep '[^[:cntrl:]]'
+
1

-
-
1
%
-
.
/

here is the output in hex format:

Code:

$ cat file.txt | head -n 10 | grep '[[:cntrl:]]' | od -t x2
0000000 2b01 0a18 3101 0a18 2004 0a1f 2d05 0a04
0000020 2d05 0a13 3105 0a16 2506 0a1f 2d06 0a04
0000040 2e06 0a1f 2f06 0a1f
0000050

as you can see, the hex values, 0x01, 0x18 are control characters.

I tried using the tr command to delete the control characters but it deletes \r\n also:

Code:

$ cat file.txt | tr -d "[:cntrl:]" >> test.txt

$ cat test.txt | wc -l
0

If I delete all control characters, I will end up deleting the newline and carriage return as well which is used as the newline character on windows.

Note: I want to delete all the control characters excluding, \r\n since they are the newline characters on windows. If I delete all the control characters then everything will be on the same line.

Thanks.

shivaa 02-04-2013 08:55 PM

Can you make a try using awk?
Code:

awk '{gsub(/[:cntrl:]/,"",$0); print $0}' file.txt

joshp 02-04-2013 09:05 PM

Another option may be to use sed something along the lines of

Code:

sed 's/[:cntrl:]//g' file.txt
Not 100% sure that will work.

NeonFlash 02-04-2013 09:24 PM

thanks for the answers, I will try them out.

However, both the above command lines will remove all the control characters. How do I exclude, \r and \n from that?


All times are GMT -5. The time now is 06:24 PM.