ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
i need advice please sirs , i have file have many numbers and i want to remove the replicator numbers in this file , but i don't know what's this numbers i just want to remove any replicator numbers
I'm assuming you mean duplicate numbers. This works when the file is just one column of numbers, otherwise you need to use the -k qualifier to tell sort which column(s) to use.
Since you want to eliminate lines with duplicate IP address, I take it that the port numbers aren't important.
In that case you can strip them off and then use the unique command.
sort -k 4 logfile | sed 's/:[[:digit:]]*$//' | uniq -f3
no that's not what i need , i want to remove duplicate ip's in the file 1 also need to make the output be like that "231.158.187.245" just the duplicate ip's who are deleted before , ip without port or others info
I am not certain if you want a list of the ip addresses from the file or a list of ip address that appear more than once. If you want a list of all unique IP addresses, than you can use "cut" to extract the field with the ip address. Since they might not appear in order, you then pipe that through "sort" and finally pipe that output through "uniq".
My first example used sort options which started on a certain field. For this, you don't need any options to "sort" or "uniq".
If you want the ip address that are duplicates, then use "uniq -d" or "uniq -D" instead.
i am at work now so i am unable to test anything. there needs to be some sort of data-mining in order to get the input to be parsed correctly. other than that i think grep and uniq have options (grep -c, uniq -c) to count number of matching lines.
Distribution: Debian /Jessie/Stretch/Sid, Linux Mint DE
Posts: 5,195
Rep:
Guru Mind,
If you get the AWK manual, there are examples of duplicates counting. You can also create every conceivable output. There is an example which almost literally do what you need. Somewhere at the start of the manual.
Sorry, I didn't understand your first question at all. On the second one, look at my earlier example. I piped the output of sort to the "sed" command. The sed command removed the port number.
You can use that same example but change the parameters to the uniq command.
Since you just want to retain the IP:PORT field, you can use the "cut" command and pipe that to the same "sed" command as in the earlier example.
Sometimes you want to use a program like 'tr' in the pipe to either change the delimiter or squeeze the whitespace, so that cut behaves better.
Try it out one part at a time so you understand what each part does. Use the up arrow and add pipe what you have so far to the next utility.
I can't tell from what you posted whether your file uses a tab delimiter or a number of spaces. "cut" uses a single tab by default. If you list uses spaces between fields, you can use "tr -s' '" to eliminate extra spaces between tabs:
try 1: eliminate extra spaces
tr -s ' ' <originalfile
try 2: cut out the fourth column:
tr -s ' ' <originalfile | cut -d ' ' -f4
try 3: get rid of the portnumber:
tr -s ' ' <originalfile | cut -d ' ' -f4 | sed 's/:[[:digit:]][[:digit:]]*$//'
finally: select just the dupes:
tr -s ' ' <originalfile | cut -d ' ' -f4 | sed 's/:[[:digit:]][[:digit:]]*$//' | uniq -d
Unix and Linux excel in these handy text handling utilities that each do a small job very well and can be piped form one to the next. At work, I installed Cygwin, so I can do this sort of thing easily.
I came up with a couple one liners to catalog backups and produce a PDF catalog that I put on the server.
The first one liner reads the directory of the DVD and produces a tab separated listing of the files and dates. The second one-liner merges each .tsv file and used enscript to pretty print the catalog. Another line (actually a two liner!) runs "ps2pdf" so that the output is a PDF file that anyone can open and read.
I would suggest that you print out the man pages for some of these common commands:
man -t cut | lpr
man -t tr | lpr
man -t uniq | lpr
man -t sort | lpr
Right now, try it by piping the output to kghostview or gv:
man -t cut | kghostview -
Having a printout with the options of the commands is handy when crafting your short script. Some trial and error is inevitable.
You may also want to print out the info manual for the coreutils package. For this, however, you need to install the coreutils source. There is a "make pdf" or "make ps" target to produce the pdf or postscript versions of the info manual.
./configure
make pdf
If you use a distro that is RPM based, you can install the coreutils src.rpm package, then in the SPECS directory use:
sudo rpmbuild -bp coreutils.spec # applies patches if any
cd ../BUILD
sudo cd coreutils-<version>
sudo ./configure
sudo make pdf
If you want to use awk, the info manual for gawk is very good. You may also have a gawk-doc package with give you a book "Gawk: Effective Awk Programming" which is excellent!
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.