[SOLVED] Remove lone lines from a text file

lethalfang · 07-06-2011, 05:44 PM

Hey, anyone has ideas how to remove lone lines from a text file?

If I have a file that is like this:
-----------------------------------
line 1
line 2
line 3

line 4

line 5
line 6

line 7

line 8
line 9
line 10
-----------------------------------

What command(s) will remove the lone lines of this file, i.e., line 4 and line 7?

Thanks in advance.

colucix · 07-06-2011, 06:41 PM

Slightly modified from the sed FAQ here:

Code:

sed ': more;$!N;s/\n/&/2;t enough;$!b more;: enough;/^\n.*\n$/d;P;D' file

This deletes also the blank lines around the single line. If you want to preserve them:

Code:

sed ': more;$!N;s/\n/&/2;t enough;$!b more;: enough;s/^\n.*\n$/\n/;P;D' file

If you want to keep only one blank line:

Code:

sed ': more;$!N;s/\n/&/2;t enough;$!b more;: enough;s/^\n.*\n$//;P;D' file

Hope this helps.

rojak · 07-06-2011, 06:43 PM

Try: $ sed -i.bk '/^$/ d' myfile

sycamorex · 07-06-2011, 06:57 PM

Quote:

Originally Posted by rojak

Try: $ sed -i.bk '/^$/ d' myfile

That doesn't delete lines containing just spaces or tabs.

lethalfang · 07-06-2011, 07:17 PM

Quote:

Originally Posted by colucix

Slightly modified from the sed FAQ here:

Code:

sed ': more;$!N;s/\n/&/2;t enough;$!b more;: enough;/^\n.*\n$/d;P;D' file

This deletes also the blank lines around the single line. If you want to preserve them:

Code:

sed ': more;$!N;s/\n/&/2;t enough;$!b more;: enough;s/^\n.*\n$/\n/;P;D' file

If you want to keep only one blank line:

Code:

sed ': more;$!N;s/\n/&/2;t enough;$!b more;: enough;s/^\n.*\n$//;P;D' file

Hope this helps.

Thanks. This kinda works, but when there are multiple lone lines, it seems to only delete one at a time. For example:
------
line 1
line 2

line 3

line 4

line 5
line 6
------

The script gets rid of line 3, but not line 4.
Is there any way to get rid of all lone lines at once?

Thanks.

sandwormusmc · 07-07-2011, 12:43 PM

Quote:

Originally Posted by lethalfang

Thanks. This kinda works, but when there are multiple lone lines, it seems to only delete one at a time. For example:
------
line 1
line 2

line 3

line 4

line 5
line 6
------

The script gets rid of line 3, but not line 4.
Is there any way to get rid of all lone lines at once?

Thanks.

I actually found a ridiculously easy way to do this a while back that made me say "duh" at the way I'd been doing it (complex sed commands and whatnot).

Try:

Code:

 # grep . myfile

Then you can redirect that to a temp file and remove the old one if necessary ...

lethalfang · 07-07-2011, 12:47 PM

Quote:

Originally Posted by sandwormusmc

I actually found a ridiculously easy way to do this a while back that made me say "duh" at the way I'd been doing it (complex sed commands and whatnot).

Try:

Code:

 # grep . myfile

Then you can redirect that to a temp file and remove the old one if necessary ...

This just gets rid of empty lines?
I'm wondering if I can get rid of the lines that are empty above and below.

sandwormusmc · 07-07-2011, 03:24 PM

Guess I'm confused on what you mean by "lone lines". I assumed you meant empty lines, but are you saying you want to remove specific lines? As in "remove arbitrary line X and Y" from a set of input?

lethalfang · 07-07-2011, 03:58 PM

Quote:

Originally Posted by sandwormusmc

Guess I'm confused on what you mean by "lone lines". I assumed you meant empty lines, but are you saying you want to remove specific lines? As in "remove arbitrary line X and Y" from a set of input?

Yep. Basically, if a line has an empty line both above and beneath, I want that line removed.

I actually wrote a tedious and rudimentary script to do that. It kinda works, but it's totally inefficient. I can write some rudimentary bash scripts, but I'm not all that good at it.

Code:

#!/bin/bash

file=$1

# Get the total number of lines in the file
num_lines=$(cat $file | wc -l)

line_j=1

empty_var=""

while [ $line_j -le $num_lines ]
do

   # line_i is the line before line_j, and line_k is the line after line_j.
   line_i=$(( $line_j - 1 ))
   line_k=$(( $line_j + 1 ))   

   # see if those lines are empty
   val_i=$(cat $file | awk 'NR=='$line_i'' | awk '{print $1}' )
   val_k=$(cat $file | awk 'NR=='$line_k'' | awk '{print $1}' )


      if [ $val_i = $empty_var -a $val_k = $empty_var ]

         then true

      else
   
         cat $file | awk 'NR=='$line_j'' >> Duplicate_$file

      fi



   line_j=$(( $line_j + 1 ))


done

Besides, the terminal keep popping up
"./identify_duplicates.sh: line 22: [: too many arguments"
for this line of code
"if [ $val_i = $empty_var -a $val_k = $empty_var ]"

chrism01 · 07-07-2011, 07:11 PM

You can add

Code:

set -xv

as the 2nd line to see what the script is actually doing.

lethalfang · 07-07-2011, 09:41 PM

Quote:

Originally Posted by chrism01

You can add

Code:

set -xv

as the 2nd line to see what the script is actually doing.

Ahh, that's a good tip.
The error messages went away when I changed

Code:

if [ $val_i = $empty_var -a $val_k = $empty_var ]

into

Code:

if [ "$val_i" = "$empty_var" -a "$val_k" = "$empty_var" ]

The issue seemed to be that, when $val_i has an non-empty value, say, "STUFF," the code was reading '[' STUFF = ']', i.e., something being compared to nothing. It's a messed up inequality, but the equality test failed anyway, so the previous code did its job.

Now does anyone have a more efficient one-liner for that stuff? :-)

Diantre · 07-07-2011, 09:50 PM

Quote:

Originally Posted by lethalfang

Now does anyone have a more efficient one-liner for that stuff? :-)

I'm not sure if this is more efficient or not, but it's a one liner that does the same:

Code:

[ -z "$val_i" -a -z "$val_k" ] && cat $file | awk 'NR=='$line_j'' >> Duplicate_$file

ntubski · 07-07-2011, 10:47 PM

Quote:

Originally Posted by lethalfang

Now does anyone have a more efficient one-liner for that stuff? :-)

Getting close to the edge of what you can reasonably call a "one-liner", but yes:

Code:

awk '{l3=l2;l2=l1;l1=$0}NR>=2&&!(l3==""&&l2!=""&&l1==""){print l2}END{print}' $file >> Duplicate_$file

grail · 07-07-2011, 11:02 PM

Well its not pretty and can probably be condensed, but this seems to work:

Code:

awk 'x && NF{ y=1 }y{ print x }{if(NF)x = $0;else{ if(y)print; x = y = 0}}END{if(y)print x}' file

chrism01 · 07-07-2011, 11:47 PM

Re post #11 Double Brackets [[ ]] work better http://tldp.org/LDP/abs/html/testcon...ml#DBLBRACKETS