LinuxQuestions.org
Help answer threads with 0 replies.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 10-27-2011, 07:32 AM   #1
grandeabobora
LQ Newbie
 
Registered: Oct 2011
Posts: 2

Rep: Reputation: Disabled
Help to edit a large text file


Hello all

I have a text file with 2 columns. This file is sorted by its 2nd column. Some of the entries in this column are duplicated (I don't care about duplicates in the other column). I want to remove the duplicated entries, according to the 2nd column, but if an entry is duplicated, the values in the 1st column must be added. For example, if I have the following file

Column 1 Column 2
1 a
2 b
4 b
8 b
16 c
10 c

I want a script that returns to me the following table:

Column 1 Column 2
1 a
14 b
26 c


Is there a way, using awk, sed, or any other terminal command, to do it? I don't want to open the file in Excel or a similar software because:

1) I have 1.207.522 rows
2) I need to use this procedure in, at least, 36 different files. Maybe there are more to come
 
Old 10-27-2011, 07:42 AM   #2
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983
A typical example to unleash the power of awk:
Code:
awk '{_[$2] += $1}END{for (i in _) print _[i],i}' file
 
Old 10-27-2011, 07:46 AM   #3
grandeabobora
LQ Newbie
 
Registered: Oct 2011
Posts: 2

Original Poster
Rep: Reputation: Disabled
Awesome! It works perfectly!

Thanks
 
Old 10-27-2011, 07:51 AM   #4
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,119

Rep: Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120
lol ... of course it does !!!.
@colucix strikes again ....
 
Old 10-27-2011, 07:53 AM   #5
Person_1873
Member
 
Registered: Sep 2007
Location: Australia
Distribution: Gentoo / Debian / Rasbian / Proxmox
Posts: 519

Rep: Reputation: 44
i would tend to think excel to be the best solution here however,

you could use grep in a loop of some sort coupled with sed and basic bash mathematics to achieve your goal, if you have a limited range for the second column then you could use a simple for loop, post back with your range and i'll see if i can help further
 
Old 10-27-2011, 08:34 AM   #6
linuxwin2
Member
 
Registered: Oct 2011
Posts: 44

Rep: Reputation: Disabled
Quote:
Originally Posted by colucix View Post
A typical example to unleash the power of awk:
Code:
awk '{_[$2] += $1}END{for (i in _) print _[i],i}' file
Good
 
Old 10-27-2011, 08:46 AM   #7
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,005

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
Be cautious that the output may be out of order as 'in' does not preserve order.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Edit a large text file emclinux Programming 8 04-25-2009 04:54 AM
extracting a chunk of text from a large text file lothario Linux - Software 3 02-28-2007 08:16 AM
file is too large to edit ukrainet Linux - Newbie 4 02-28-2005 07:46 AM
File too large to edit ukrainet Linux - Newbie 8 01-18-2005 02:43 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 06:12 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration