LinuxQuestions.org
Review your favorite Linux distribution.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 08-10-2013, 05:35 AM   #1
hector00
LQ Newbie
 
Registered: Aug 2011
Posts: 8

Rep: Reputation: Disabled
Using Sed to recognise second value in TSV text file... remove all others


Hi Gurus,
Been at Sed again and not getting too far.
I've loads of text files (which represent dictionaries of inverted text indexes) the content of which looks like this

Code:
475470
#term	doc freq	idx
carbendacime	1	114569
carbendacime35	1	114570
carbendazim	1	114571
carbene	5	114572
carbeni	5	114573
carbenicillin	4	114574
carbenoxolone	1	114575
carbethoxypsoralen	1	114576
Here I only care about the first and second tokens which are term and doc freq e.g. carbendacime and 1, carbendacime35 and 1, carbendazim and 1, etc.

I would like to use Sed to identify all terms which have a doc freq value of >=10, I then want to print this out the tuple to a new text file.

Any advice on whether to even use sed as oppose to awk would be greatly appreciated.
Thank you
Lewis
 
Old 08-10-2013, 09:54 AM   #2
Firerat
Senior Member
 
Registered: Oct 2008
Distribution: Debian Jessie / sid
Posts: 1,471

Rep: Reputation: 444Reputation: 444Reputation: 444Reputation: 444Reputation: 444
awk would be much easier


looks like tabs as FS,
Code:
awk -F $'\t' '($2 >= 10 ) {printf "%s\t%s\n",$1,$2}' Input
Note
your sample data will only return the header


if you don't want the header
Code:
awk -F $'\t' '(!/#term/ && $2 >= 10) {printf "%s\t%d\n",$1,$2}' Input
if you are not fussed about the output having tabs, then
Code:
awk -F $'\t' '(!/#term/ && $2 >= 10) {print $1" "$2}' Input
 
1 members found this post helpful.
Old 08-11-2013, 05:10 PM   #3
hector00
LQ Newbie
 
Registered: Aug 2011
Posts: 8

Original Poster
Rep: Reputation: Disabled
This is absolutely dynamite and exactly what I was after.
I am amazed at how powerful awk is but the syntax throws me off everytime.
Thank you so much for the verbose answer... really appreciated.
Best
Lewis
 
Old 08-11-2013, 08:57 PM   #4
chrism01
LQ Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Centos 6.9, Centos 7.3
Posts: 17,357

Rep: Reputation: 2367Reputation: 2367Reputation: 2367Reputation: 2367Reputation: 2367Reputation: 2367Reputation: 2367Reputation: 2367Reputation: 2367Reputation: 2367Reputation: 2367
There's a good awk HOWTO here http://www.grymoire.com/Unix/Awk.html
 
1 members found this post helpful.
Old 08-11-2013, 09:20 PM   #5
hector00
LQ Newbie
 
Registered: Aug 2011
Posts: 8

Original Poster
Rep: Reputation: Disabled
Thanks chrism01.
 
  


Reply

Tags
awk, regex, sed


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
sed to remove certain text ITiger Linux - General 1 12-04-2012 02:08 PM
[SOLVED] Need help with SED command to remove text guessity Linux - Newbie 1 02-05-2010 07:25 AM
[SOLVED] Using sed to remove text in fgrep string manwithaplan Linux - General 6 10-11-2009 01:00 AM
SED, regexp or such - remove text after space aolong Linux - General 5 03-07-2008 02:36 PM
help with sed to remove all text except for some Benanzo Linux - Software 7 01-04-2007 06:21 AM


All times are GMT -5. The time now is 08:56 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration