Removing lines containing "0"s

dwarf · 02-07-2012, 02:07 PM

Hi experts,
I have a text file consisting five columns (tab separated). The first column is a timestamp; the other columns have values between +4000.0 and -4000.0 .I like to remove all the lines, which apart from the timestamp, have only "0" values.
Do you have any suggestion how to do this easily (awk or sed)?

Thanks in advance!

sycamorex · 02-07-2012, 02:11 PM

Hi and welcome to LQ.

What have you tried so far? Can you post the code so that we could help you?

edit: Additionally, can you post a sample of your data (including some lines which need to be removed)?

colucix · 02-07-2012, 02:31 PM

A suggestion:

Code:

awk '$2+$3+$4+$5' file

This uses http://www.gnu.org/software/gawk/man...l#Truth-Values. To edit the file in place:

Code:

awk '$2+$3+$4+$5 {print > FILENAME}' file

dwarf · 02-08-2012, 03:11 AM

OK,
here's what I've tried so far:

Quote:

grep "[0-9][0-9][0-9][0-9].[0-9][0-9][0-9].[0-9][0-9].[0-9][0-9].[0-9][0-9].[0-9][0-9][0-9]" input_file | awk 'BEGIN {FS="\t"}{ if(($2 >0) || ($3 >0) || ($4 >0) || ($5 >0)) print}' > input_filtered.txt

It returns the correct values but only in a positive range. When I try to filter for negative values ( eg. $2 < 0) it returns everything including "0" values.
There must be a way to exclude the zeros.

Regards

colucix · 02-08-2012, 04:31 AM

Difficult to tell without seeing a sample of the input data. Please, post it enclosed in CODE tags (not QUOTE) to preserve formatting.

dwarf · 02-08-2012, 01:38 PM

Please find attached a sample file to understand what I am talking about.

Harlin · 02-08-2012, 01:51 PM

You're only wanting to keep the timestamps and that's it?

colucix · 02-08-2012, 02:19 PM

The solution suggested in post #3 should work. Otherwise, please describe what do you want to achieve in more details. Example:

Code:

$ cat input_file.txt 
2012.012.00:01.000      0       100     0       0
2012.012.00:01.001      0       0       2331    -400
2012.012.01:01.002      0       88      0       -423
2012.012.01:01.003      0       85      0       0
2012.012.01:01.004      0       0       0       -437
2012.012.02:01.005      0       83      2299    0
2012.012.03:01.006      0       0       0       0
2012.012.03:01.007      0       0       0       0
2012.012.03:01.008      0       0       0       -223
$ awk '$2+$3+$4+$5' input_file.txt
2012.012.00:01.000      0       100     0       0
2012.012.00:01.001      0       0       2331    -400
2012.012.01:01.002      0       88      0       -423
2012.012.01:01.003      0       85      0       0
2012.012.01:01.004      0       0       0       -437
2012.012.02:01.005      0       83      2299    0
2012.012.03:01.008      0       0       0       -223

The two lines in shaded brown (containing only zeroes) are removed from the output. Moreover, notice that you might omit the FS specification since TAB is one of the default separators in awk. From the GNU awk user's guide:

Quote:

By default, fields are separated by whitespace, like words in a line. Whitespace in awk means any string of one or more spaces, TABs, or newlines;
...
In POSIX awk, newlines are not considered whitespace for separating fields.

barnac1e · 02-08-2012, 02:30 PM

In awk, try

match(str, regex)
match(str, regex, [, array]) {G}

Cedrik · 02-08-2012, 02:31 PM

I don't know if it would be possible in those data, that 2,3,4,5 addition could result to zero with all numbers not necessarly equal to zero, I mean:

Code:

2012.012.01:01.003      0       85      0       -85

In this case, maybe it's better to do something like:

Code:

awk '$2$3$4$5 != "0000"' input_file.txt

# or maybe safer
awk '$2 || $3 || $4 || $5' input_file.txt

colucix · 02-09-2012, 03:58 AM

Thanks, Cedrik. You're absolutely right! I should have thought about this possibility.

dwarf · 02-09-2012, 12:41 PM

Thanks colucix and Cedrik for the answers.
I've tested:

Code:

$ awk '$2+$3+$4+$5' input_file.txt
awk '$2 || $3 || $4 || $5' input_file.txt  and
awk '$2$3$4$5 != "0000"' input_file.txt

with openSUSE 12.1

All of them return what I was looking for. I will check tomorrow on a Solaris 8 machine.
The 3rd piece of code makes sense to me but for the other two it's not clear for me why I get this result, -
and why my first attempt

Code:

awk 'BEGIN {FS="\t"}{ if(($2 !=0) || ($3 !=0) || ($4 !=0) || ($5 !=0)) print}'> output_file.txt

doesn't work.
Regards,

Cedrik · 02-09-2012, 03:35 PM

Your first attempt didn't work because:

This expression is fine:

Code:

awk 'BEGIN {FS="\t"}{ if(($2 !=0) || ($3 !=0) || ($4 !=0) || ($5 !=0)) print}'

But your grep reg expression (which was not needed btw) does not match:

Code:

[0-9][0-9][0-9][0-9].[0-9][0-9][0-9].[0-9][0-9].[0-9][0-9].[0-9][0-9].[0-9][0-9][0-9]

It matches:

<4 digits number>
<any char>
<3 digits number>
<any char>
<2 digits number>
<any char>
<2 digits number>
<any char>
<2 digits number>
<any char>
<3 digits number>

This one matches:

Code:

^[0-9]\{4\}\.[0-9]\{3\}\.[0-9][0-9]:[0-9][0-9]\.[0-9]\{3\}

<4 digits number> (at the start of line: ^)
<a '.' char>
<3 digits number>
<a '.' char>
<2 digits number>
< a ':' char>
<2 digits number>
<a '.' char>
<3 digits number>

dwarf · 02-10-2012, 02:49 AM

I've checked different syntaxes which is working on openSUSE on a solaris8 machine and none of them are working. They return an error message:
awk: syntax error near line 1
awk: bailing out near line 1
I've checked the

Code:

awk 'BEGIN {FS="\t"}{ if(($2 !=0) || ($3 !=0) || ($4 !=0) || ($5 !=0)) print}' input_file

which returns the original input_file without any changes.
Any conclusions what different on solaris?

Cedrik · 02-10-2012, 05:06 AM

If think on Solaris 8, you should use /usr/bin/nawk instead of /usr/bin/awk

or if you have perl installed

Code:

perl -ane 'print if grep {$_} @F[1..4]' input_file.txt