LinuxQuestions.org - [SOLVED] Remove lone lines from a text file

Page 1 of 2

Show 50 post(s) from this thread on one page

- Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)

- - Remove lone lines from a text file (https://www.linuxquestions.org/questions/linux-newbie-8/remove-lone-lines-from-a-text-file-890321/)

lethalfang

07-06-2011 05:44 PM

Remove lone lines from a text file

Hey, anyone has ideas how to remove lone lines from a text file?

If I have a file that is like this:
-----------------------------------
line 1
line 2
line 3

line 4

line 5
line 6

line 7

line 8
line 9
line 10
-----------------------------------

What command(s) will remove the lone lines of this file, i.e., line 4 and line 7?

Thanks in advance.

colucix

07-06-2011 06:41 PM

Slightly modified from the sed FAQ here:

Code:

sed ': more;$!N;s/\n/&/2;t enough;$!b more;: enough;/^\n.*\n$/d;P;D' file

This deletes also the blank lines around the single line. If you want to preserve them:

Code:

sed ': more;$!N;s/\n/&/2;t enough;$!b more;: enough;s/^\n.*\n$/\n/;P;D' file

If you want to keep only one blank line:

Code:

sed ': more;$!N;s/\n/&/2;t enough;$!b more;: enough;s/^\n.*\n$//;P;D' file

Hope this helps.

rojak

07-06-2011 06:43 PM

Try: $ sed -i.bk '/^$/ d' myfile

sycamorex

07-06-2011 06:57 PM

Quote:

Originally Posted by rojak (Post 4407279)

Try: $ sed -i.bk '/^$/ d' myfile

That doesn't delete lines containing just spaces or tabs.

lethalfang

07-06-2011 07:17 PM

Quote:

Originally Posted by colucix (Post 4407277)

Slightly modified from the sed FAQ here:

Code:

sed ': more;$!N;s/\n/&/2;t enough;$!b more;: enough;/^\n.*\n$/d;P;D' file

This deletes also the blank lines around the single line. If you want to preserve them:

Code:

sed ': more;$!N;s/\n/&/2;t enough;$!b more;: enough;s/^\n.*\n$/\n/;P;D' file

If you want to keep only one blank line:

Code:

sed ': more;$!N;s/\n/&/2;t enough;$!b more;: enough;s/^\n.*\n$//;P;D' file

Hope this helps.

Thanks. This kinda works, but when there are multiple lone lines, it seems to only delete one at a time. For example:
------
line 1
line 2

line 3

line 4

line 5
line 6
------

The script gets rid of line 3, but not line 4.
Is there any way to get rid of all lone lines at once?

Thanks.

sandwormusmc

07-07-2011 12:43 PM

Quote:

Originally Posted by lethalfang (Post 4407296)

I actually found a ridiculously easy way to do this a while back that made me say "duh" at the way I'd been doing it (complex sed commands and whatnot).

Try:

Code:

# grep . myfile

Then you can redirect that to a temp file and remove the old one if necessary ...

lethalfang

07-07-2011 12:47 PM

Quote:

Originally Posted by sandwormusmc (Post 4408032)

I actually found a ridiculously easy way to do this a while back that made me say "duh" at the way I'd been doing it (complex sed commands and whatnot).

Try:

Code:

# grep . myfile

Then you can redirect that to a temp file and remove the old one if necessary ...

This just gets rid of empty lines?
I'm wondering if I can get rid of the lines that are empty above and below.

sandwormusmc

07-07-2011 03:24 PM

Guess I'm confused on what you mean by "lone lines". I assumed you meant empty lines, but are you saying you want to remove specific lines? As in "remove arbitrary line X and Y" from a set of input?

lethalfang

07-07-2011 03:58 PM

Quote:

Originally Posted by sandwormusmc (Post 4408164)

Guess I'm confused on what you mean by "lone lines". I assumed you meant empty lines, but are you saying you want to remove specific lines? As in "remove arbitrary line X and Y" from a set of input?

Yep. Basically, if a line has an empty line both above and beneath, I want that line removed.

I actually wrote a tedious and rudimentary script to do that. It kinda works, but it's totally inefficient. I can write some rudimentary bash scripts, but I'm not all that good at it.

Code:

#!/bin/bash



file=$1



# Get the total number of lines in the file

num_lines=$(cat $file | wc -l)



line_j=1



empty_var=""



while [ $line_j -le $num_lines ]

do



  # line_i is the line before line_j, and line_k is the line after line_j.

  line_i=$(( $line_j - 1 ))

  line_k=$(( $line_j + 1 ))  



  # see if those lines are empty

  val_i=$(cat $file | awk 'NR=='$line_i'' | awk '{print $1}' )

  val_k=$(cat $file | awk 'NR=='$line_k'' | awk '{print $1}' )





      if [ $val_i = $empty_var -a $val_k = $empty_var ]



        then true



      else

  

        cat $file | awk 'NR=='$line_j'' >> Duplicate_$file



      fi







  line_j=$(( $line_j + 1 ))





done

Besides, the terminal keep popping up
"./identify_duplicates.sh: line 22: [: too many arguments"
for this line of code
"if [ $val_i = $empty_var -a $val_k = $empty_var ]"

chrism01

07-07-2011 07:11 PM

You can add

Code:

set -xv

as the 2nd line to see what the script is actually doing.

lethalfang

07-07-2011 09:41 PM

Quote:

Originally Posted by chrism01 (Post 4408339)

You can add

Code:

set -xv

as the 2nd line to see what the script is actually doing.

Ahh, that's a good tip.
The error messages went away when I changed

Code:

if [ $val_i = $empty_var -a $val_k = $empty_var ]

into

Code:

if [ "$val_i" = "$empty_var" -a "$val_k" = "$empty_var" ]

The issue seemed to be that, when $val_i has an non-empty value, say, "STUFF," the code was reading '[' STUFF = ']', i.e., something being compared to nothing. It's a messed up inequality, but the equality test failed anyway, so the previous code did its job.

Now does anyone have a more efficient one-liner for that stuff? :-)

Diantre

07-07-2011 09:50 PM

Quote:

Originally Posted by lethalfang (Post 4408406)

Now does anyone have a more efficient one-liner for that stuff? :-)

I'm not sure if this is more efficient or not, but it's a one liner that does the same:

Code:

[ -z "$val_i" -a -z "$val_k" ] && cat $file | awk 'NR=='$line_j'' >> Duplicate_$file

ntubski

07-07-2011 10:47 PM

Quote:

Originally Posted by lethalfang (Post 4408406)

Now does anyone have a more efficient one-liner for that stuff? :-)

Getting close to the edge of what you can reasonably call a "one-liner", but yes:

Code:

awk '{l3=l2;l2=l1;l1=$0}NR>=2&&!(l3==""&&l2!=""&&l1==""){print l2}END{print}' $file >> Duplicate_$file

grail

07-07-2011 11:02 PM

Well its not pretty and can probably be condensed, but this seems to work:

Code:

awk 'x && NF{ y=1 }y{ print x }{if(NF)x = $0;else{ if(y)print; x = y = 0}}END{if(y)print x}' file

chrism01

07-07-2011 11:47 PM

Re post #11 Double Brackets [[ ]] work better http://tldp.org/LDP/abs/html/testcon...ml#DBLBRACKETS

All times are GMT -5. The time now is 06:47 AM.

Page 1 of 2

Show 50 post(s) from this thread on one page