SlackwareThis Forum is for the discussion of Slackware Linux.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
A simple (apparently) problem. I want to strip carriage returns(0x0a) from plain text. I also need to out column breaks & form feeds on some things. This fails to do it
Code:
sed 's/\x0a/\x20/g' -i some_file.txt
But the instruction format works for other unwanted hex characters in the text. I am not familiar with awk or perl. I have vim installed, & nano, Sigil, Calibre & Libreoffice if magic there will work. I don't understand why sed does not.
Location: Northeastern Michigan, where Carhartt is a Designer Label
Distribution: Slackware 32- & 64-bit Stable
Posts: 3,541
Rep:
Within vi, you can do the non-printing character trick to remove all carriage returns:
Open the text file that contains the return characters and enter
Code:
:g/Ctrl-VCtrl-M/s///g
Ctrl-V allows you to insert "control" characters in a file, Ctrl-M is the carriage return; you simply substitute the CR with nothing. Note that you do not place a space between the two control characters.
Here's a little cross reference of all of the control characters with their names:
You can use the same trick in the shell; e.g., if your screen becomes unreadable (with all sorts of goofy characters), you can enter Ctrl-VCtrl-N (or Ctrl-VCtrl-O) to, hopefully, recover the screen (you may have shifted-in or shiftend-out which changes character sets) which happens if you cat a non-ASCII file.
Thanks All of you for the replies. To answer some points
Carriage Returns could well be 0x0d; I opened the file with xxd & 0x0a is the offending character. If it's not <CR>, that's my bad.
I know about fromdos & todos.
The vim thing looks great for Carriage returns - what's the magic for a line feed?
EDIT: Oh I see it. Ctrl-J. Thanks, tronayne
I'll get back to work tomorrow and try all those suggestions.
Last edited by business_kid; 04-25-2016 at 11:55 AM.
Being a linefeed editor, I don't think sed can remove new lines. I believe it takes the line, defined by a '\n' at the end (NL, \x00a, etc). It then returns the line, changed by expression(s), with a new line at the end. I would think by definition, the material to be edited would not have a new line, as that would define a new line. For your usage, I'd use tr for a simple solution.
Note the od dump has replaced 0a's with 20's, a cat of the file leaves the prompt on the same line, and wc reports 0 lines (trailing NL has also been replaced with a space.
The problem with that tr is that it will remove all CRs not just the CR on a CRLF. It's unlikely that you'll often encounter a solitary CR but the possibility is there.
I have used sed -i 's/\r$//' before, but as someone recently pointed out to me '\r' is a gnu extension and not in the POSIX sed implementation, which is something that some people care about.
The problem with that tr is that it will remove all CRs not just the CR on a CRLF. It's unlikely that you'll often encounter a solitary CR but the possibility is there.
Yes, you're right.
Actually single CRs used (?) to be MacOS's way of terminating a line.
I wonder if they switched to LFs since they started using BSD.
The problem with that tr is that it will remove all CRs not just the CR on a CRLF. It's unlikely that you'll often encounter a solitary CR but the possibility is there.
Just to clarify: as other utilities like sed, tr deals with characters, not bytes.
In most used character encoding (ASCII and UTF-8) CR is represented by a single byte, but by two in UCS-2, by 4 in UCS-4 aka UTF-32.
Also, I assume that we are speaking of text files, not binary files.
Last edited by Didier Spaier; 04-30-2016 at 02:41 AM.
Location: Northeastern Michigan, where Carhartt is a Designer Label
Distribution: Slackware 32- & 64-bit Stable
Posts: 3,541
Rep:
I've never had a failure, stripping the carriage returns from Windows text files, with this little utility, dos2unx:
Code:
#!/bin/sh
#
# dos2unx file [file...]
#
# Converts text files (names specified on command line) from MS-DOS
# format to UNIX format. Essentially, gets rid of all newlines (\n),
# since line feeds (\l) are all it needs.
if [ $# -lt 1 ]
then
echo usage: dos2unx file [file ...]
exit 1
fi
for FILE
do
echo -n "dos2unx: converting ${FILE} ... "
tr -d '\r' < ${FILE} > /tmp/conv$$
rm -f ${FILE}
cp -f /tmp/conv$$ ${FILE}
rm -f /tmp/conv$$
echo "done"
done
Just save this as dos2unx.sh and
Code:
make dos2unx
mv dos2unx /usr/local/bin
Works just fine (and /usr/local/bin is on your PATH).
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.