LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 10-20-2010, 02:35 AM   #1
hattori.hanzo
Member
 
Registered: Aug 2006
Posts: 167

Rep: Reputation: 15
Removing duplicates with offset (datetime)


I have a column of datetime entries which I sorted to removed duplicate entries. I have still lots of entries which are adjacent to each other by 1 second. How would I go about removing any entries which have an offset of the previous or after entry by 1 second?

Code:
2010-10-19 00:00:59;
2010-10-19 00:04:38; <--
2010-10-19 00:04:39; <--
2010-10-19 00:05:27; 
2010-10-19 00:10:14; <--
2010-10-19 00:10:15; <--
2010-10-19 00:10:31;
2010-10-19 00:12:14;
2010-10-19 00:14:04; <--
2010-10-19 00:14:05; <--
2010-10-19 00:14:06; <--
2010-10-19 00:15:19;
 
Click here to see the post LQ members have rated as the most helpful post in this thread.
Old 10-20-2010, 06:11 AM   #2
Jerry Mcguire
Member
 
Registered: Jul 2009
Location: Hong Kong SAR
Distribution: RedHat, Fedora
Posts: 143

Rep: Reputation: 17
Wink

Convert the date-time into seconds since 1970. You then have a list of numbers, e.g.
List A:
Code:
10 19 20 30 31 32 40
Add 1 to the list:
List B:
Code:
11 20 21 31 32 33 41
For each 'b' in List B,
If 'b' is found in List A,
Then remove 'b' and 'b'-1 in List A.

The remainders in List A is the timestamps you need.

Last edited by Jerry Mcguire; 10-20-2010 at 06:35 AM. Reason: made a mistake
 
Old 10-20-2010, 06:39 AM   #3
colucix
Moderator
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957
A solution in GNU awk:
Code:
{
  datespec = gensub(/[-:;]/," ","g")
  if ( mktime(datespec) - pre > 1 ) print
  pre = mktime(datespec)
}
 
2 members found this post helpful.
Old 10-20-2010, 06:49 AM   #4
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,686

Rep: Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987
Wasn't sure from the description if you want to print the first one found or the last. This does the last:
Code:
awk -F"[:;]" 'NR > 1 && (sec > $(NF-1)+1 || sec < $(NF-1)-1){print line;last=sec}{line=$0;sec=$(NF-1)}END{if(sec > last + 1 || sec < last -1)print}' file
 
1 members found this post helpful.
Old 10-20-2010, 06:58 AM   #5
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,686

Rep: Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987
Well that looks a lot better ... I always forget about the time functions
 
Old 10-20-2010, 08:28 PM   #6
hattori.hanzo
Member
 
Registered: Aug 2006
Posts: 167

Original Poster
Rep: Reputation: 15
Thank you colucix and grail. Much appreciated.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Merging files and removing near-duplicates TheBigH Linux - Newbie 3 12-02-2009 05:24 PM
[SOLVED] datetime values offset when restoring from db dump? sneakyimp Linux - Software 2 10-05-2009 02:59 PM
LXer: Sorting Perl Lists And Removing Duplicates On Linux Or Unix LXer Syndicated Linux News 0 09-04-2008 06:20 AM
Perl DateTime abdul_zu Linux - General 1 01-14-2006 03:55 AM
php :: datetime gmarais Programming 3 03-06-2004 05:33 PM


All times are GMT -5. The time now is 08:23 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration