ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
There is less than 12 hours left to vote in the 2015 LinuxQuestions.org Members Choice Awards. Click here to go to the polls. Vote now and make sure your voice is heard!
Introduction to Linux - A Hands on Guide
This guide was created as an overview of the Linux Operating System, geared toward new users as an exploration tour and getting started guide, with exercises at the end of each chapter.
For more advanced trainees it can be a desktop reference, and a collection of the base knowledge needed to proceed with system and network administration. This book contains many real life examples derived from the author's experience as a Linux system and network administrator, trainer and consultant. They hope these examples will help you to get a better understanding of the Linux system and that you feel encouraged to try out things on your own.
Click Here to receive this Complete Guide absolutely free.
no that's not what i need , i want to remove duplicate ip's in the file 1 also need to make the output be like that "188.8.131.52" just the duplicate ip's who are deleted before , ip without port or others info
I am not certain if you want a list of the ip addresses from the file or a list of ip address that appear more than once. If you want a list of all unique IP addresses, than you can use "cut" to extract the field with the ip address. Since they might not appear in order, you then pipe that through "sort" and finally pipe that output through "uniq".
My first example used sort options which started on a certain field. For this, you don't need any options to "sort" or "uniq".
If you want the ip address that are duplicates, then use "uniq -d" or "uniq -D" instead.
i am at work now so i am unable to test anything. there needs to be some sort of data-mining in order to get the input to be parsed correctly. other than that i think grep and uniq have options (grep -c, uniq -c) to count number of matching lines.
Distribution: Debian Wheezy/Jessie/Sid, Linux Mint DE
If you get the AWK manual, there are examples of duplicates counting. You can also create every conceivable output. There is an example which almost literally do what you need. Somewhere at the start of the manual.
Sorry, I didn't understand your first question at all. On the second one, look at my earlier example. I piped the output of sort to the "sed" command. The sed command removed the port number.
You can use that same example but change the parameters to the uniq command.
Since you just want to retain the IP:PORT field, you can use the "cut" command and pipe that to the same "sed" command as in the earlier example.
Sometimes you want to use a program like 'tr' in the pipe to either change the delimiter or squeeze the whitespace, so that cut behaves better.
Try it out one part at a time so you understand what each part does. Use the up arrow and add pipe what you have so far to the next utility.
I can't tell from what you posted whether your file uses a tab delimiter or a number of spaces. "cut" uses a single tab by default. If you list uses spaces between fields, you can use "tr -s' '" to eliminate extra spaces between tabs:
try 1: eliminate extra spaces
tr -s ' ' <originalfile
try 2: cut out the fourth column:
tr -s ' ' <originalfile | cut -d ' ' -f4
try 3: get rid of the portnumber:
tr -s ' ' <originalfile | cut -d ' ' -f4 | sed 's/:[[:digit:]][[:digit:]]*$//'
finally: select just the dupes:
tr -s ' ' <originalfile | cut -d ' ' -f4 | sed 's/:[[:digit:]][[:digit:]]*$//' | uniq -d
Unix and Linux excel in these handy text handling utilities that each do a small job very well and can be piped form one to the next. At work, I installed Cygwin, so I can do this sort of thing easily.
I came up with a couple one liners to catalog backups and produce a PDF catalog that I put on the server.
The first one liner reads the directory of the DVD and produces a tab separated listing of the files and dates. The second one-liner merges each .tsv file and used enscript to pretty print the catalog. Another line (actually a two liner!) runs "ps2pdf" so that the output is a PDF file that anyone can open and read.
I would suggest that you print out the man pages for some of these common commands:
man -t cut | lpr
man -t tr | lpr
man -t uniq | lpr
man -t sort | lpr
Right now, try it by piping the output to kghostview or gv:
man -t cut | kghostview -
Having a printout with the options of the commands is handy when crafting your short script. Some trial and error is inevitable.
You may also want to print out the info manual for the coreutils package. For this, however, you need to install the coreutils source. There is a "make pdf" or "make ps" target to produce the pdf or postscript versions of the info manual.
If you use a distro that is RPM based, you can install the coreutils src.rpm package, then in the SPECS directory use:
sudo rpmbuild -bp coreutils.spec # applies patches if any
sudo cd coreutils-<version>
sudo make pdf
If you want to use awk, the info manual for gawk is very good. You may also have a gawk-doc package with give you a book "Gawk: Effective Awk Programming" which is excellent!