LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - General
User Name
Password
Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.

Notices


Reply
  Search this Thread
Old 03-17-2011, 06:41 PM   #1
fdiaz05
LQ Newbie
 
Registered: Jan 2003
Location: Los Angeles
Posts: 14

Rep: Reputation: 0
Duplicate removal/text manipulation


hey guys wonder if anyone can help with this little dilema

Trying to remove lines from a syslog text file that have duplicate strings


Mar 10 06:51:11[http-8080-1] INFO com.MYCOMPANY.webservices.userservice.web.UserServiceController [u:2533274802474744|360] Authorize [platformI$tformIdAndOs=2533274802474744|360, userRegion=America|360]

then a few lines down

Mar 10 06:52:03 [http-8080-1] INFO com.MYCOMPANY.webservices.userservice.web.UserServiceController [u:2533274802474744|360] Authorize [platformI$tformIdAndOs=2533274802474744|360, userRegion=America|360

got the same thing in terms of a u: number but the issue is I need to remove duplicates and just leave one and the file has multiple duplicates of different u: numbers and it's 14,000 lines long.

can anyone tell me if I can use awk? sed? or sort for something like this to ? removing lines that have a certain string in there that's a duplicate.

Any help is appreciated! thanks
 
Old 03-17-2011, 07:00 PM   #2
k3lt01
Senior Member
 
Registered: Feb 2011
Location: Australia
Distribution: Debian Wheezy, Jessie, Sid/Experimental, playing with LFS.
Posts: 2,900

Rep: Reputation: 637Reputation: 637Reputation: 637Reputation: 637Reputation: 637Reputation: 637
I do a very similar thing when I am building a host file from multiple files put together which always have multiple entries of the same web address'.

It probably wont serve your purpose but it may show you a few tips.
Code:
sort /home/michael/hosts | tr '\t'  ' ' | tr -s ' ' | uniq >| /home/michael/hosts.new
 
Old 03-17-2011, 07:03 PM   #3
fdiaz05
LQ Newbie
 
Registered: Jan 2003
Location: Los Angeles
Posts: 14

Original Poster
Rep: Reputation: 0
can you explain? I see it but where is the indicator you are using in your case to do the text manipulation?
 
Old 03-17-2011, 07:22 PM   #4
k3lt01
Senior Member
 
Registered: Feb 2011
Location: Australia
Distribution: Debian Wheezy, Jessie, Sid/Experimental, playing with LFS.
Posts: 2,900

Rep: Reputation: 637Reputation: 637Reputation: 637Reputation: 637Reputation: 637Reputation: 637
sort- sorts the lines into alphabetical order so lines starting with a will be placed before lines starting with b etc.

tr '\t' ' ' | tr -s ' '- this part cleans up white space and a couple of other things. I'm not sure exactly but thats the general idea of it.

uniq- deletes duplicate entries, so if I have more than 1 line saying something like abcdefg.com in the combined host file the output at the end will only have 1 abcdefg.com line in it.

Everything is in the man pages.
 
Old 03-17-2011, 07:41 PM   #5
fdiaz05
LQ Newbie
 
Registered: Jan 2003
Location: Los Angeles
Posts: 14

Original Poster
Rep: Reputation: 0
I did but i'm looking for something along the lines of a particular indicator and if that indicator after the u: is not uniq I need to remove that entire line. it's a bit tricky
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Text manipulation using sed JGuillou Linux - Newbie 2 05-08-2010 12:17 AM
Easy string/text manipulation/indentation for restructured text brianmcgee Linux - Software 1 04-22-2008 08:27 PM
need help with text manipulation pcorajr Programming 12 12-15-2006 07:33 AM
Duplicate librsvg2 packages - removal causes scriplet failed - exit status 1 trekk Fedora 2 11-08-2006 06:42 PM
More text manipulation ice_hockey Linux - General 2 05-28-2005 01:43 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - General

All times are GMT -5. The time now is 06:27 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration