LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - General (https://www.linuxquestions.org/questions/linux-general-1/)
-   -   clean file from broken lines and join them together (https://www.linuxquestions.org/questions/linux-general-1/clean-file-from-broken-lines-and-join-them-together-4175592816/)

cosmbrth 11-03-2016 11:21 AM

clean file from broken lines and join them together
 
Hello everyone,

I have an issue formatting some files. I receive everyday some files with broken lines. Every line finishes with ^M.

I have to formatt them manually: delete the false new lines and concatenate them to a single line.

Example:

Received file:

line1line1-sameending^M
line2line2-sameending^M
line3
line3--sameending^M
line4line4-sameending^M
line5line5
-sameending^M
line6line6-sameending^M


I have to formatt it to:

line1line1-sameending^M
line2line2-sameending^M
line3line3--sameending^M
line4line4-sameending^M
line5line5-sameending^M
line6line6-sameending^M

when I do cat -E myfile, the output is like:

$line1line1-sameending^M
$line2line2-sameending^M
line3$
$line3--sameending^M
$line4line4-sameending^M
line5line5$
$-sameending^M
$line6line6-sameending^M


I've tried many ideas but can't have it corrected, like

"while IFS= read -r -n1 char; do echo "$char"; done < myfile"
and then try to convert multiple lines to a single line.

But I can't seem to resolve it.

Do you have any ideas?
Thank you in advance,

szboardstretcher 11-03-2016 11:41 AM

Well, first off - make a backup.

Then, I would use 'dos2unix' to remove the non-linux carriage returns (^M)

Code:

dos2unix yourfile
Then I would use sed to search out 'sameending' and join lines without.

Code:

sed ':a;/sameending$/!{N;s/\n//;ba}' yourfile
Which gives this:

Code:

line1line1-sameending
line2line2-sameending
line3line3-sameending
line4line4-sameending
line5line5-sameending
line6line6-sameending


MadeInGermany 11-04-2016 03:41 PM

Here is another sed script that works with your original file
Code:

#!/bin/bash
eol=$'\015' # a ^M character
sed '
:L
# if the eol is found, branch to the end
/'"$eol"'$/b
# append the next line; join it (remove the NL character)
$!N; s/\n//
# on success branch to the :L
tL
' yourfile



All times are GMT -5. The time now is 06:58 PM.