LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   Remove lone lines from a text file (https://www.linuxquestions.org/questions/linux-newbie-8/remove-lone-lines-from-a-text-file-890321/)

lethalfang 07-06-2011 05:44 PM

Remove lone lines from a text file
 
Hey, anyone has ideas how to remove lone lines from a text file?

If I have a file that is like this:
-----------------------------------
line 1
line 2
line 3

line 4

line 5
line 6

line 7

line 8
line 9
line 10
-----------------------------------

What command(s) will remove the lone lines of this file, i.e., line 4 and line 7?

Thanks in advance.

colucix 07-06-2011 06:41 PM

Slightly modified from the sed FAQ here:
Code:

sed ': more;$!N;s/\n/&/2;t enough;$!b more;: enough;/^\n.*\n$/d;P;D' file
This deletes also the blank lines around the single line. If you want to preserve them:
Code:

sed ': more;$!N;s/\n/&/2;t enough;$!b more;: enough;s/^\n.*\n$/\n/;P;D' file
If you want to keep only one blank line:
Code:

sed ': more;$!N;s/\n/&/2;t enough;$!b more;: enough;s/^\n.*\n$//;P;D' file
Hope this helps.

rojak 07-06-2011 06:43 PM

Try: $ sed -i.bk '/^$/ d' myfile

sycamorex 07-06-2011 06:57 PM

Quote:

Originally Posted by rojak (Post 4407279)
Try: $ sed -i.bk '/^$/ d' myfile

That doesn't delete lines containing just spaces or tabs.

lethalfang 07-06-2011 07:17 PM

Quote:

Originally Posted by colucix (Post 4407277)
Slightly modified from the sed FAQ here:
Code:

sed ': more;$!N;s/\n/&/2;t enough;$!b more;: enough;/^\n.*\n$/d;P;D' file
This deletes also the blank lines around the single line. If you want to preserve them:
Code:

sed ': more;$!N;s/\n/&/2;t enough;$!b more;: enough;s/^\n.*\n$/\n/;P;D' file
If you want to keep only one blank line:
Code:

sed ': more;$!N;s/\n/&/2;t enough;$!b more;: enough;s/^\n.*\n$//;P;D' file
Hope this helps.

Thanks. This kinda works, but when there are multiple lone lines, it seems to only delete one at a time. For example:
------
line 1
line 2

line 3

line 4

line 5
line 6
------

The script gets rid of line 3, but not line 4.
Is there any way to get rid of all lone lines at once?

Thanks.

sandwormusmc 07-07-2011 12:43 PM

Quote:

Originally Posted by lethalfang (Post 4407296)
Thanks. This kinda works, but when there are multiple lone lines, it seems to only delete one at a time. For example:
------
line 1
line 2

line 3

line 4

line 5
line 6
------

The script gets rid of line 3, but not line 4.
Is there any way to get rid of all lone lines at once?

Thanks.

I actually found a ridiculously easy way to do this a while back that made me say "duh" at the way I'd been doing it (complex sed commands and whatnot).

Try:

Code:

# grep . myfile
Then you can redirect that to a temp file and remove the old one if necessary ...

lethalfang 07-07-2011 12:47 PM

Quote:

Originally Posted by sandwormusmc (Post 4408032)
I actually found a ridiculously easy way to do this a while back that made me say "duh" at the way I'd been doing it (complex sed commands and whatnot).

Try:

Code:

# grep . myfile
Then you can redirect that to a temp file and remove the old one if necessary ...

This just gets rid of empty lines?
I'm wondering if I can get rid of the lines that are empty above and below.

sandwormusmc 07-07-2011 03:24 PM

Guess I'm confused on what you mean by "lone lines". I assumed you meant empty lines, but are you saying you want to remove specific lines? As in "remove arbitrary line X and Y" from a set of input?

lethalfang 07-07-2011 03:58 PM

Quote:

Originally Posted by sandwormusmc (Post 4408164)
Guess I'm confused on what you mean by "lone lines". I assumed you meant empty lines, but are you saying you want to remove specific lines? As in "remove arbitrary line X and Y" from a set of input?

Yep. Basically, if a line has an empty line both above and beneath, I want that line removed.

I actually wrote a tedious and rudimentary script to do that. It kinda works, but it's totally inefficient. I can write some rudimentary bash scripts, but I'm not all that good at it.

Code:

#!/bin/bash

file=$1

# Get the total number of lines in the file
num_lines=$(cat $file | wc -l)

line_j=1

empty_var=""

while [ $line_j -le $num_lines ]
do

  # line_i is the line before line_j, and line_k is the line after line_j.
  line_i=$(( $line_j - 1 ))
  line_k=$(( $line_j + 1 )) 

  # see if those lines are empty
  val_i=$(cat $file | awk 'NR=='$line_i'' | awk '{print $1}' )
  val_k=$(cat $file | awk 'NR=='$line_k'' | awk '{print $1}' )


      if [ $val_i = $empty_var -a $val_k = $empty_var ]

        then true

      else
 
        cat $file | awk 'NR=='$line_j'' >> Duplicate_$file

      fi



  line_j=$(( $line_j + 1 ))


done

Besides, the terminal keep popping up
"./identify_duplicates.sh: line 22: [: too many arguments"
for this line of code
"if [ $val_i = $empty_var -a $val_k = $empty_var ]"

chrism01 07-07-2011 07:11 PM

You can add
Code:

set -xv
as the 2nd line to see what the script is actually doing.

lethalfang 07-07-2011 09:41 PM

Quote:

Originally Posted by chrism01 (Post 4408339)
You can add
Code:

set -xv
as the 2nd line to see what the script is actually doing.

Ahh, that's a good tip.
The error messages went away when I changed
Code:

if [ $val_i = $empty_var -a $val_k = $empty_var ]
into
Code:

if [ "$val_i" = "$empty_var" -a "$val_k" = "$empty_var" ]
The issue seemed to be that, when $val_i has an non-empty value, say, "STUFF," the code was reading '[' STUFF = ']', i.e., something being compared to nothing. It's a messed up inequality, but the equality test failed anyway, so the previous code did its job.

Now does anyone have a more efficient one-liner for that stuff? :-)

Diantre 07-07-2011 09:50 PM

Quote:

Originally Posted by lethalfang (Post 4408406)
Now does anyone have a more efficient one-liner for that stuff? :-)

I'm not sure if this is more efficient or not, but it's a one liner that does the same:

Code:

[ -z "$val_i" -a -z "$val_k" ] && cat $file | awk 'NR=='$line_j'' >> Duplicate_$file

ntubski 07-07-2011 10:47 PM

Quote:

Originally Posted by lethalfang (Post 4408406)
Now does anyone have a more efficient one-liner for that stuff? :-)

Getting close to the edge of what you can reasonably call a "one-liner", but yes:
Code:

awk '{l3=l2;l2=l1;l1=$0}NR>=2&&!(l3==""&&l2!=""&&l1==""){print l2}END{print}' $file >> Duplicate_$file

grail 07-07-2011 11:02 PM

Well its not pretty and can probably be condensed, but this seems to work:
Code:

awk 'x && NF{ y=1 }y{ print x }{if(NF)x = $0;else{ if(y)print; x = y = 0}}END{if(y)print x}' file

chrism01 07-07-2011 11:47 PM

Re post #11 Double Brackets [[ ]] work better http://tldp.org/LDP/abs/html/testcon...ml#DBLBRACKETS


All times are GMT -5. The time now is 06:47 AM.