LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   How to remove the columns which contains NA in linux (https://www.linuxquestions.org/questions/linux-newbie-8/how-to-remove-the-columns-which-contains-na-in-linux-4175576080/)

Mike_Brown 03-28-2016 07:24 PM

How to remove the columns which contains NA in linux
 
I would like to remove the column which contains any number of NA. I used this command

Code:

awk ' $0 !="NA" {print $0}' file
But it does not work.
For example, the file is as following

1 2 3 NA 6 male
4 6 2 1 NA female
NA 2 2 NA 3 male
7 2 2 7 NA male

I want to the output file as

2 3 male
6 2 female
2 2 male
2 2 male

jpollard 03-28-2016 07:56 PM

Well, if you read the documentation, $0 is the input record. $1 is the first field, $2 is the second...

So instead of print $0, try print $2 $3 $7

Mike_Brown 03-28-2016 08:02 PM

Quote:

Originally Posted by jpollard (Post 5522759)
Well, if you read the documentation, $0 is the input record. $1 is the first field, $2 is the second...

So instead of print $0, try print $2 $3 $7

I do not know which columns contains NA. Then, I do not know $2 $3 $7

syg00 03-28-2016 09:26 PM

This is not a trivial exercise - you would do well to take note and read the doco.
For example, you cannot know all the columns that contain the string until you have read the entire file. Having saved a list of those columns you will have to re-read the entire file to ascertain which fields you still want. Or you could keep each record in an array for later processing.

In the doco you will find an inbuilt variable that tells you the number of fields in the current record - which you can loop through to test each for the string.

BW-userx 03-29-2016 08:43 AM

awk
http://www.cyberciti.biz/faq/howto-d...-bsd-appleosx/

http://how-to.linuxcareer.com/learni...x-commands-awk

grep
http://www.cyberciti.biz/faq/howto-u...in-linux-unix/

sed
http://www.cyberciti.biz/faq/howto-d...-bsd-appleosx/

that should be enough to get you started down the right path. Look closely at the methodology of each function.

HMW 03-29-2016 11:31 AM

I have to confess that I don't understand the description of the problem. Nevertheless, given your INPUT
Code:

1 2 3 NA 6 male
4 6 2 1 NA female
NA 2 2 NA 3 male
7 2 2 7 NA male

And your expected OUTPUT
Code:

2 3 male
6 2 female
2 2 male
2 2 male

I solved this using sed:
Code:

echo "1 2 3 NA 6 male
4 6 2 1 NA female
NA 2 2 NA 3 male
7 2 2 7 NA male" | sed '<hidden>'
2 3 male
6 2 female
2 2 male
2 2 male

Will be more than happy to share my solution once OP gets back with a renewed effort and/or clarification of the problem at hand.

Best regards,
HMW

BW-userx 03-29-2016 11:56 AM

Quote:

Originally Posted by HMW (Post 5523035)
Best regards,
HMW

off topic while you wait. But does HMW mean "Her Majesty's Wardrobe"? I was just trying to figure out that acronym. :D NHI

HMW 03-29-2016 12:58 PM

Quote:

Originally Posted by BW-userx (Post 5523052)
But does HMW mean "Her Majesty's Wardrobe"? I was just trying to figure out that acronym. :D NHI

HAHA!

No, nothing fancy at all. It's just my initials, inspired by RMS.

hydrurga 03-29-2016 01:22 PM

Quote:

Originally Posted by HMW (Post 5523035)
I have to confess that I don't understand the description of the problem.

I think what OP wants to do is (i) only generate data for columns which do not contain NA for any of the records in the entire record set. (ii) always output the 6th column.

In the example they gave, only columns 2 and 3 did not contain NA in any of the records.

As syg00 pointed out, all the records therefore need to be read before any output can be generated (unless you determine earlier on that no columns can be printed ;)).


All times are GMT -5. The time now is 07:25 AM.