LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Distributions > Slackware
User Name
Password
Slackware This Forum is for the discussion of Slackware Linux.

Notices


Reply
  Search this Thread
Old 04-25-2016, 07:54 AM   #1
business_kid
LQ Guru
 
Registered: Jan 2006
Location: Ireland
Distribution: Slackware, Slarm64 & Android
Posts: 16,289

Rep: Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322
Strip unwanted carriage returns


A simple (apparently) problem. I want to strip carriage returns(0x0a) from plain text. I also need to out column breaks & form feeds on some things. This fails to do it

Code:
sed 's/\x0a/\x20/g' -i some_file.txt
But the instruction format works for other unwanted hex characters in the text. I am not familiar with awk or perl. I have vim installed, & nano, Sigil, Calibre & Libreoffice if magic there will work. I don't understand why sed does not.
 
Old 04-25-2016, 07:59 AM   #2
Didier Spaier
LQ Addict
 
Registered: Nov 2008
Location: Paris, France
Distribution: Slint64-15.0
Posts: 11,057

Rep: Reputation: Disabled
fromdos --help

EDIT But anyway carriage return (CR) is not 0A (that is line feed or LF) but 0D, see for instance http://www.utf8-chartable.de/unicode...ble.pl?names=2

Also I don't know any character named "column break". What is that?

Last edited by Didier Spaier; 04-25-2016 at 10:44 AM.
 
Old 04-25-2016, 08:32 AM   #3
Paulo2
Member
 
Registered: Aug 2012
Distribution: Slackware64 15.0 (started with 13.37). Testing -current in a spare partition.
Posts: 928

Rep: Reputation: 515Reputation: 515Reputation: 515Reputation: 515Reputation: 515Reputation: 515
Isn't carriage-return 0x0d?
As Didier pointed, it is a DOS thing
Code:
echo -e '\x0d'|cat -A
^M$
echo -e '\x0a'|cat -A
$
$
 
Old 04-25-2016, 09:04 AM   #4
55020
Senior Member
 
Registered: Sep 2009
Location: Yorks. W.R. 167397
Distribution: Slackware
Posts: 1,307
Blog Entries: 4

Rep: Reputation: Disabled
I suspect your sed substitution just needs '-e', then you can zap any other stuff like FF in the same command.
 
Old 04-25-2016, 09:53 AM   #5
aaazen
Member
 
Registered: Dec 2009
Posts: 358

Rep: Reputation: Disabled
Here are some commands that might be helpful.

$man ascii

$hexdump -C [filename]

$fromdos < [dostextfile] > [unixtextfile]

$todos < [unixtextfile] > [dostextfile]
 
1 members found this post helpful.
Old 04-25-2016, 10:23 AM   #6
tronayne
Senior Member
 
Registered: Oct 2003
Location: Northeastern Michigan, where Carhartt is a Designer Label
Distribution: Slackware 32- & 64-bit Stable
Posts: 3,541

Rep: Reputation: 1065Reputation: 1065Reputation: 1065Reputation: 1065Reputation: 1065Reputation: 1065Reputation: 1065Reputation: 1065
Within vi, you can do the non-printing character trick to remove all carriage returns:

Open the text file that contains the return characters and enter
Code:
:g/Ctrl-VCtrl-M/s///g
Ctrl-V allows you to insert "control" characters in a file, Ctrl-M is the carriage return; you simply substitute the CR with nothing. Note that you do not place a space between the two control characters.

Here's a little cross reference of all of the control characters with their names:
Code:
	Dec	Hex	Octal	Binary		ASCII
	000	000	0000	00000000	NUL	(Ctrl-@)
	001	001	0001	00000001	SOH	(Ctrl-A)
	002	002	0002	00000010	STX	(Ctrl-B)
	003	003	0003	00000011	ETX	(Ctrl-C)
	004	004	0004	00000100	EOT	(Ctrl-D)
	005	005	0005	00000101	ENQ	(Ctrl-E)
	006	006	0006	00000110	ACK	(Ctrl-F)
	007	007	0007	00000111	BEL	(Ctrl-G)
	008	008	0010	00001000	BS	(Ctrl-H)
	009	009	0011	00001001	HT	(Ctrl-I)
	010	00a	0012	00001010	NL	(Ctrl-J)
	011	00b	0013	00001011	VT	(Ctrl-K)
	012	00c	0014	00001100	NP	(Ctrl-L)
	013	00d	0015	00001101	CR	(Ctrl-M)
	014	00e	0016	00001110	SO	(Ctrl-N)
	015	00f	0017	00001111	SI	(Ctrl-O)
	016	010	0020	00010000	DLE	(Ctrl-P)
	017	011	0021	00010001	DC1	(Ctrl-Q)
	018	012	0022	00010010	DC2	(Ctrl-R)
	019	013	0023	00010011	DC3	(Ctrl-S)
	020	014	0024	00010100	DC4	(Ctrl-T)
	021	015	0025	00010101	NAK	(Ctrl-U)
	022	016	0026	00010110	SYN	(Ctrl-V)
	023	017	0027	00010111	ETB	(Ctrl-W)
	024	018	0030	00011000	CAN	(Ctrl-X)
	025	019	0031	00011001	EM	(Ctrl-Y)
	026	01a	0032	00011010	SUB	(Ctrl-Z)
	027	01b	0033	00011011	ESC	(Ctrl-[)
	028	01c	0034	00011100	FS	(Ctrl-\)
	029	01d	0035	00011101	GS	(Ctrl-])
	030	01e	0036	00011110	RS	(Ctrl-^)
	031	01f	0037	00011111	US	(Ctrl-_)
	032	020	0040	00100000	SP	(Ctrl-`)
You can use the same trick in the shell; e.g., if your screen becomes unreadable (with all sorts of goofy characters), you can enter Ctrl-VCtrl-N (or Ctrl-VCtrl-O) to, hopefully, recover the screen (you may have shifted-in or shiftend-out which changes character sets) which happens if you cat a non-ASCII file.

Hope this helps some.

Last edited by tronayne; 04-25-2016 at 10:26 AM.
 
2 members found this post helpful.
Old 04-25-2016, 11:52 AM   #7
business_kid
LQ Guru
 
Registered: Jan 2006
Location: Ireland
Distribution: Slackware, Slarm64 & Android
Posts: 16,289

Original Poster
Rep: Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322
Thanks All of you for the replies. To answer some points

Carriage Returns could well be 0x0d; I opened the file with xxd & 0x0a is the offending character. If it's not <CR>, that's my bad.
I know about fromdos & todos.

The vim thing looks great for Carriage returns - what's the magic for a line feed?
EDIT: Oh I see it. Ctrl-J. Thanks, tronayne

I'll get back to work tomorrow and try all those suggestions.

Last edited by business_kid; 04-25-2016 at 11:55 AM.
 
Old 04-28-2016, 05:58 AM   #8
fsbooks
Member
 
Registered: Jan 2002
Location: Missoula. Montana, USA
Distribution: Slackware (various)
Posts: 464

Rep: Reputation: 52
Being a linefeed editor, I don't think sed can remove new lines. I believe it takes the line, defined by a '\n' at the end (NL, \x00a, etc). It then returns the line, changed by expression(s), with a new line at the end. I would think by definition, the material to be edited would not have a new line, as that would define a new line. For your usage, I'd use tr for a simple solution.

Code:
$ cat ttt
aaa
bbbb
ccccc
$ <ttt od -h
0000000 6161 0a61 6262 6262 630a 6363 6363 000a
0000017
$ <ttt >ttt2 tr '\n' ' '
$ <ttt2 od -h
0000000 6161 2061 6262 6262 6320 6363 6363 0020
0000017
$ cat ttt2
aaa bbbb ccccc $ wc -l ttt2
0 ttt2
Note the od dump has replaced 0a's with 20's, a cat of the file leaves the prompt on the same line, and wc reports 0 lines (trailing NL has also been replaced with a space.
 
2 members found this post helpful.
Old 04-28-2016, 06:54 AM   #9
Didier Spaier
LQ Addict
 
Registered: Nov 2008
Location: Paris, France
Distribution: Slint64-15.0
Posts: 11,057

Rep: Reputation: Disabled
Quote:
Originally Posted by fsbooks View Post
Being a linefeed editor, I don't think sed can remove new lines.
It can.
Code:
/tmp$ sed ":a;N;s/\n/ /;ba" ttt
aaa bbbb ccccc
/tmp$
 
1 members found this post helpful.
Old 04-28-2016, 10:55 AM   #10
lazydog
Senior Member
 
Registered: Dec 2003
Location: The Key Stone State
Distribution: CentOS Sabayon and now Gentoo
Posts: 1,249
Blog Entries: 3

Rep: Reputation: 194Reputation: 194
I use dos2unix and it cleans up the files imported form windows nicely.
 
Old 04-29-2016, 12:32 AM   #11
vonbiber
Member
 
Registered: Apr 2009
Distribution: slackware 14.1 64-bit, slackware 14.2 64-bit, SystemRescueCD
Posts: 533

Rep: Reputation: 129Reputation: 129
I'm surprised nobody mentioned tr.
Code:
$ tr -d '\r' < input.txt > output.txt
 
Old 04-29-2016, 09:05 AM   #12
GazL
LQ Veteran
 
Registered: May 2008
Posts: 6,897

Rep: Reputation: 5019Reputation: 5019Reputation: 5019Reputation: 5019Reputation: 5019Reputation: 5019Reputation: 5019Reputation: 5019Reputation: 5019Reputation: 5019Reputation: 5019
The problem with that tr is that it will remove all CRs not just the CR on a CRLF. It's unlikely that you'll often encounter a solitary CR but the possibility is there.

I have used sed -i 's/\r$//' before, but as someone recently pointed out to me '\r' is a gnu extension and not in the POSIX sed implementation, which is something that some people care about.
 
Old 04-30-2016, 01:14 AM   #13
vonbiber
Member
 
Registered: Apr 2009
Distribution: slackware 14.1 64-bit, slackware 14.2 64-bit, SystemRescueCD
Posts: 533

Rep: Reputation: 129Reputation: 129
Quote:
Originally Posted by GazL View Post
The problem with that tr is that it will remove all CRs not just the CR on a CRLF. It's unlikely that you'll often encounter a solitary CR but the possibility is there.
Yes, you're right.
Actually single CRs used (?) to be MacOS's way of terminating a line.
I wonder if they switched to LFs since they started using BSD.
 
Old 04-30-2016, 02:39 AM   #14
Didier Spaier
LQ Addict
 
Registered: Nov 2008
Location: Paris, France
Distribution: Slint64-15.0
Posts: 11,057

Rep: Reputation: Disabled
Quote:
Originally Posted by GazL View Post
The problem with that tr is that it will remove all CRs not just the CR on a CRLF. It's unlikely that you'll often encounter a solitary CR but the possibility is there.
Just to clarify: as other utilities like sed, tr deals with characters, not bytes.

In most used character encoding (ASCII and UTF-8) CR is represented by a single byte, but by two in UCS-2, by 4 in UCS-4 aka UTF-32.

Also, I assume that we are speaking of text files, not binary files.

Last edited by Didier Spaier; 04-30-2016 at 02:41 AM.
 
Old 04-30-2016, 06:08 AM   #15
tronayne
Senior Member
 
Registered: Oct 2003
Location: Northeastern Michigan, where Carhartt is a Designer Label
Distribution: Slackware 32- & 64-bit Stable
Posts: 3,541

Rep: Reputation: 1065Reputation: 1065Reputation: 1065Reputation: 1065Reputation: 1065Reputation: 1065Reputation: 1065Reputation: 1065
I've never had a failure, stripping the carriage returns from Windows text files, with this little utility, dos2unx:
Code:
#!/bin/sh
#
# dos2unx file [file...]
#
# Converts text files (names specified on command line) from MS-DOS
# format to UNIX format.  Essentially, gets rid of all newlines (\n),
# since line feeds (\l) are all it needs.

if [ $# -lt 1 ]
then
        echo usage: dos2unx file [file ...]
        exit 1
fi

for FILE
do
        echo -n "dos2unx: converting ${FILE} ... "
        tr -d '\r' < ${FILE} > /tmp/conv$$
        rm -f ${FILE}
        cp -f /tmp/conv$$ ${FILE}
        rm -f /tmp/conv$$
        echo "done"
done
Just save this as dos2unx.sh and
Code:
make dos2unx
mv dos2unx /usr/local/bin
Works just fine (and /usr/local/bin is on your PATH).

Hope this helps some.

Last edited by tronayne; 05-01-2016 at 07:19 AM.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Unwanted Carriage Return hossf Programming 8 11-29-2011 08:09 AM
How do you delete carriage returns in vi? Stevithen Linux - General 3 01-14-2010 06:13 AM
adding carriage returns Zxarr Programming 7 09-20-2005 05:44 AM
Carriage Returns Thorkyl Linux - Software 7 06-28-2004 05:42 PM
Carriage Returns Trouble sancho5 Linux - General 2 08-29-2001 08:59 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Distributions > Slackware

All times are GMT -5. The time now is 09:17 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration