LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 04-12-2013, 05:03 PM   #1
sonia102d
LQ Newbie
 
Registered: Sep 2012
Posts: 18

Rep: Reputation: Disabled
Delete particular lines in a file


Hi

This is a sample of my data file.

##field PH01000000 1 4869017
#PH01000000G0240
WWW278545G0240 P.he_model_v1.0 erine 119238 121805 . - . ID=PH01000000G0240;Description="zinc finger, C3HC4 type domain containing protein, expressed"
WWW278545G0240 P.he_model_v1.0 RA 119238 121805 . - . ID=PH01000000G0240.RA;Parent=PH01000000G0240
WWW278545G0240 P.he_model_v1.0 NGS 120721 121773 . - . ID=PH01000000G0240.NGS;Parent=PH01000000G0240

#PH01000000G0250
WWW278545G0250 P.he_model_v1.0 erine 125260 126544 . - . ID=PH01000000G0250;Description="FERONIA receptor-like kinase, putative, expressed"
WWW278545G0250 P.he_model_v1.0 RA 125260 126544 . - . ID=PH01000000G0250.RA;Parent=PH01000000G0250
WWW278545G0250 P.he_model_v1.0 NGS 125971 126544 . - . ID=PH01000000G0250.NGS;Parent=PH01000000G0250

#PH01000000G0290
WWW278545G0290 P.he_model_v1.0 erine 151334 153926 . + . ID=PH01000000G0290;Description="DUF581 domain containing protein, expressed"
WWW278545G0290 P.he_model_v1.0 RA 151334 153926 . + . ID=PH01000000G0290.RA;Parent=PH01000000G0290

I want to use some basic vi command or any comand to delete all those lines whose third column entry is RA or NGS.

I want to retain only those lines whos 3rd column is erine.
I used awk '$3=="erine" {print}' file
but this coommand delted all lines other than one having erine.

i howeever want the entire file, incl. headers and empty lines, but RA and NGS removed, what should i try?
 
Old 04-12-2013, 06:10 PM   #2
jpollard
Senior Member
 
Registered: Dec 2012
Location: Washington DC area
Distribution: Fedora, CentOS, Slackware
Posts: 4,604

Rep: Reputation: 1241Reputation: 1241Reputation: 1241Reputation: 1241Reputation: 1241Reputation: 1241Reputation: 1241Reputation: 1241Reputation: 1241
You do realize you are contradicting yourself with:
Quote:
I want to retain only those lines whos 3rd column is erine.
I used awk '$3=="erine" {print}' file
but this coommand delted all lines other than one having erine.
"retain only those lines" is exactly what you asked for.

Now if wanted "delete the lines that have RA or NGS in the third column, then that is something else:

you might check (note not tested):
Code:
awk '$3=="NGS"||$3=="RA" { next } {print}' file
 
Old 04-14-2013, 09:50 AM   #3
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Debian sid + kde 3.5 & 4.4
Posts: 6,823

Rep: Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957
You can make it even easier if you switch to a regex test, and go for the negation:

Code:
awk '$3 !~ /^(NGS|RA)$/' file
Since the default action is to print, you can leave that off too.


And please use ***[code][/code]*** tags around your code and data, to preserve the original formatting and to improve readability. Do not use quote tags, bolding, colors, "start/end" lines, or other creative techniques.

Last edited by David the H.; 04-14-2013 at 09:51 AM. Reason: more robust regex
 
Old 04-14-2013, 09:53 AM   #4
BruceFerjulian
LQ Newbie
 
Registered: Apr 2013
Posts: 5

Rep: Reputation: Disabled
How about SED

You can use ( sed ) to clip out a range of lines.

# cat one.txt

1
2
3
4
5
6
7
8
9
10

# cat one.txt | sed '2,3d' > two.txt
# cat two.txt

1
4
5
6
7
8
9
10
 
Old 04-14-2013, 12:17 PM   #5
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Debian sid + kde 3.5 & 4.4
Posts: 6,823

Rep: Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957
The OP is asking how to remove lines based on patterns, not line numbers. And although sed can match by patterns too, in this particular case it gets ugly since it needs to specifically match column 3.

Code:
sed -rn '/^[^ ]+[ ][^ ]+[ ](NGS|RA)\b/!p' infile
awk is definitely the better tool to use here.


PS: Useless Use Of Cat!
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
how to copy some lines in a file and delete these lines after gartura Linux - General 1 07-20-2010 09:55 AM
Delete Duplicate Lines in a file, leaving only the unique lines left xmrkite Linux - Software 6 01-14-2010 07:18 PM
sed delete lines from file one if regexp are listed in file two fucinheira Programming 6 09-17-2009 09:28 AM
Delete first and last lines of a file ChainsawPenguin Programming 5 09-28-2007 08:28 AM
delete some lines from a file freelinuxcpp Linux - Software 4 01-17-2004 11:28 AM


All times are GMT -5. The time now is 02:24 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration