LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 03-29-2015, 07:11 PM   #16
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,505

Rep: Reputation: 2890Reputation: 2890Reputation: 2890Reputation: 2890Reputation: 2890Reputation: 2890Reputation: 2890Reputation: 2890Reputation: 2890Reputation: 2890Reputation: 2890

Quote:
Is there any command that can remove duplicate lines if the ip and the username is the same on each line?
I am not 100 sure on which item is the username, but assuming it is the item after /user, none of your output would seem to match this?

With the input below I have highlighted those that are the same:
Code:
141.101.105.102 - - [28/Mar/2015:01:59:56 +0200] "GET / HTTP/1.1" 200 24194 "-" "Pingdom.com_bot_version_1.4_(http://www.pingdom.com/)"
141.101.105.158 - - [28/Mar/2015:02:09:56 +0200] "GET / HTTP/1.1" 200 24260 "-" "Pingdom.com_bot_version_1.4_(http://www.pingdom.com/)"
141.101.105.102 - - [28/Mar/2015:02:19:56 +0200] "GET / HTTP/1.1" 200 24277 "-" "Pingdom.com_bot_version_1.4_(http://www.pingdom.com/)"
108.162.215.53 - - [27/Mar/2015:23:13:21 +0200] "GET /user/74595-tery1/?tab=idm HTTP/1.1" 200 3905 "-" "Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)"
108.162.215.53 - - [27/Mar/2015:23:11:59 +0200] "GET /user/275904-ktlk21/ HTTP/1.1" 200 3805 "-" "Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)"
108.162.215.75 - - [27/Mar/2015:23:21:31 +0200] "GET /user/74595-tery1/?tab=topics HTTP/1.1" 200 13588 "-" "Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)"
So here I see only 1 duplicate to be removed. Please advise what it is I have missed?

Currently I can remove these duplicates with a simple awk:
Code:
awk '/spider|bot/{split($7,a,"/");if(!_[$1a[3]]++)print}' access.log
 
Old 03-29-2015, 08:07 PM   #17
ASTRAPI
Member
 
Registered: Feb 2007
Posts: 210

Original Poster
Rep: Reputation: 16
Quote:
I am not 100 sure on which item is the username, but assuming it is the item after /user, none of your output would seem to match this?
My mistake sorry Username is wrong i mean user-agent. (Pingdom.com_bot_version_1.4 and Sogou web spider/4.0)

@UnSpwan

With your command i got some lines like:

Code:
108.162.220.125 "Superfeedr bot/2.0 http://superfeedr.com - Make your feeds realtime: get in touch - feed-id:133270590"
108.162.220.125 "Superfeedr bot/2.0 http://superfeedr.com - Make your feeds realtime: get in touch - feed-id:237947010"
What i want is to filter the user-agent with same ip to keep only one line...And the above has the same ip 108.162.220.125 and same user-agent Superfeedr bot/2.0.
I think that i got this result because of the feed-id that is not the same but i don't care for that info....

@grail

Your command just add in order the ip's....

What i want is to filter the user-agent with same ip to keep only one line.

Last edited by ASTRAPI; 03-29-2015 at 08:08 PM.
 
Old 03-29-2015, 08:17 PM   #18
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,505

Rep: Reputation: 2890Reputation: 2890Reputation: 2890Reputation: 2890Reputation: 2890Reputation: 2890Reputation: 2890Reputation: 2890Reputation: 2890Reputation: 2890Reputation: 2890
So if the user-agent is the last field, it can be simplified to:
Code:
awk '/spider|bot/ && !_[$1$NF]++' access.log
 
2 members found this post helpful.
Old 03-29-2015, 08:37 PM   #19
ASTRAPI
Member
 
Registered: Feb 2007
Posts: 210

Original Poster
Rep: Reputation: 16
Yes grail your command is working great thanks !!!!

I add also this >> bots.txt at the end to get the results on a new file there....
 
Old 03-30-2015, 01:11 AM   #20
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,505

Rep: Reputation: 2890Reputation: 2890Reputation: 2890Reputation: 2890Reputation: 2890Reputation: 2890Reputation: 2890Reputation: 2890Reputation: 2890Reputation: 2890Reputation: 2890
No probs ... don't forget to mark as SOLVED once you have a working solution
 
Old 03-30-2015, 07:02 AM   #21
allend
Senior Member
 
Registered: Oct 2003
Location: Melbourne
Distribution: Slackware-current
Posts: 4,550

Rep: Reputation: 1433Reputation: 1433Reputation: 1433Reputation: 1433Reputation: 1433Reputation: 1433Reputation: 1433Reputation: 1433Reputation: 1433Reputation: 1433
For grail writing awk is simple, while for the simple writing awk is a grail.
For anybody else trying to follow grail's solution, this may help. http://unix.stackexchange.com/questi...es-awk-a0-work
 
Old 03-30-2015, 07:36 AM   #22
jpollard
Senior Member
 
Registered: Dec 2012
Location: Washington DC area
Distribution: Fedora, CentOS, Slackware
Posts: 4,696

Rep: Reputation: 1261Reputation: 1261Reputation: 1261Reputation: 1261Reputation: 1261Reputation: 1261Reputation: 1261Reputation: 1261Reputation: 1261


It works as long as the entire file + overhead will fit in memory. With a bit more work the algorithm works in perl too(tying a hash table to a key file), which is then usable until the disk fills up.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Need to remove empty lines from a file ashishkumar1000 Linux - Newbie 10 08-19-2012 02:13 PM
How to remove first 2 lines of a file in a script nazs Programming 16 02-19-2007 07:08 AM
How to remove lines from a file doza Linux - General 2 04-27-2005 11:59 AM
How do i remove blank lines from a file? kakho Programming 1 04-15-2004 03:57 AM
[bash] remove lines from a file Drimo Programming 3 03-20-2004 11:16 AM


All times are GMT -5. The time now is 06:01 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration