bash : remove replicator words?, how?

Guru Mind · 05-16-2006, 05:20 AM

hi

i need advice please sirs , i have file have many numbers and i want to remove the replicator numbers in this file , but i don't know what's this numbers i just want to remove any replicator numbers

how can i do that by bash script?

thanks , allen

jim mcnamara · 05-16-2006, 06:13 AM

I'm assuming you mean duplicate numbers. This works when the file is just one column of numbers, otherwise you need to use the -k qualifier to tell sort which column(s) to use.

Code:

sort -u filename > newfile

Guru Mind · 05-16-2006, 06:30 AM

yes i was mean duplicate , anyway the problem still un fixed

filename :

PHP Code:



 0    0 127.0.0.1:pop          231.158.187.245:4907 
 0    0 127.0.0.1:pop          231.158.187.245:43123 
 0    0 127.0.0.1:pop          82.156.113.60:57802

------

when i using your command the output for newfile will be

PHP Code:



 0    0 127.0.0.1:pop          231.158.187.245:4907 
 0    0 127.0.0.1:pop          231.158.187.245:43123 
 0    0 127.0.0.1:pop          82.156.113.60:57802

---

i want to remove those duplicate numbers "231.158.187.245"

any idea ?

thanks , allen .

jschiwal · 05-16-2006, 07:14 AM

Since you want to eliminate lines with duplicate IP address, I take it that the port numbers aren't important.
In that case you can strip them off and then use the unique command.

sort -k 4 logfile | sed 's/:[[:digit:]]*$//' | uniq -f3

0 0 127.0.0.1:pop 231.158.187.245
0 0 127.0.0.1:pop 82.156.113.60

Guru Mind · 05-16-2006, 08:28 AM

thanks jschiwal for trick , everything is good now but i have new question

if i want to remove duplicate numbers here i mean " 231.158.187.245 " and also remove all other numbers.. for example i want the output for

Code:

tcp        0      0 127.0.0.1:pop          231.158.187.245:4907        TIME_WAIT

tcp        0      0 127.0.0.1:pop          231.158.187.245:4907        TIME_WAIT

tcp        0      0 127.0.0.1:pop          82.156.113.60:1312        TIME_WAIT

be like that

Code:

231.158.187.245

is that mean i must use awk? or whati must doing here? i want to save duplicate ip and remove others..

thanks , allen

Guru Mind · 05-16-2006, 01:15 PM

up..any help please?

schneidz · 05-16-2006, 01:51 PM

not sure what you are asking but i think:

Code:

grep -o 231.158.187.245

will get you close to what you need.

Guru Mind · 05-16-2006, 03:30 PM

no that's not what i need , i want to remove duplicate ip's in the file 1 also need to make the output be like that "231.158.187.245" just the duplicate ip's who are deleted before , ip without port or others info

is my question clear now ?

allen

jschiwal · 05-16-2006, 10:04 PM

I am not certain if you want a list of the ip addresses from the file or a list of ip address that appear more than once. If you want a list of all unique IP addresses, than you can use "cut" to extract the field with the ip address. Since they might not appear in order, you then pipe that through "sort" and finally pipe that output through "uniq".

My first example used sort options which started on a certain field. For this, you don't need any options to "sort" or "uniq".

If you want the ip address that are duplicates, then use "uniq -d" or "uniq -D" instead.

Guru Mind · 05-17-2006, 04:33 AM

thanks jschiwal for advice , but please can you put an example by small script like that what i want it..

Guru Mind · 05-17-2006, 06:49 AM

everything is clear now for me but i have 2 questions ( last questions

)

1- i need to make vaule for duplicate ip's , for example.. the ip must duplicated 30 times to get it in the output

2- when i used uniq -d or -D the command output give me the duplicate ip's and ports , and i need just the ip .

thanks .

schneidz · 05-17-2006, 10:02 AM

Code:

man grep
man uniq

i am at work now so i am unable to test anything. there needs to be some sort of data-mining in order to get the input to be parsed correctly. other than that i think grep and uniq have options (grep -c, uniq -c) to count number of matching lines.

jlinkels · 05-17-2006, 09:24 PM

Guru Mind,

If you get the AWK manual, there are examples of duplicates counting. You can also create every conceivable output. There is an example which almost literally do what you need. Somewhere at the start of the manual.

jlinkels

jschiwal · 05-18-2006, 12:24 AM

Sorry, I didn't understand your first question at all. On the second one, look at my earlier example. I piped the output of sort to the "sed" command. The sed command removed the port number.
You can use that same example but change the parameters to the uniq command.
Since you just want to retain the IP:PORT field, you can use the "cut" command and pipe that to the same "sed" command as in the earlier example.

Sometimes you want to use a program like 'tr' in the pipe to either change the delimiter or squeeze the whitespace, so that cut behaves better.

Try it out one part at a time so you understand what each part does. Use the up arrow and add pipe what you have so far to the next utility.

I can't tell from what you posted whether your file uses a tab delimiter or a number of spaces. "cut" uses a single tab by default. If you list uses spaces between fields, you can use "tr -s' '" to eliminate extra spaces between tabs:
try 1: eliminate extra spaces
tr -s ' ' <originalfile
try 2: cut out the fourth column:
tr -s ' ' <originalfile | cut -d ' ' -f4
try 3: get rid of the portnumber:
tr -s ' ' <originalfile | cut -d ' ' -f4 | sed 's/:[[:digit:]][[:digit:]]*$//'
finally: select just the dupes:
tr -s ' ' <originalfile | cut -d ' ' -f4 | sed 's/:[[:digit:]][[:digit:]]*$//' | uniq -d

Unix and Linux excel in these handy text handling utilities that each do a small job very well and can be piped form one to the next. At work, I installed Cygwin, so I can do this sort of thing easily.
I came up with a couple one liners to catalog backups and produce a PDF catalog that I put on the server.
The first one liner reads the directory of the DVD and produces a tab separated listing of the files and dates. The second one-liner merges each .tsv file and used enscript to pretty print the catalog. Another line (actually a two liner!) runs "ps2pdf" so that the output is a PDF file that anyone can open and read.

I would suggest that you print out the man pages for some of these common commands:
man -t cut | lpr
man -t tr | lpr
man -t uniq | lpr
man -t sort | lpr

Right now, try it by piping the output to kghostview or gv:
man -t cut | kghostview -

Having a printout with the options of the commands is handy when crafting your short script. Some trial and error is inevitable.

You may also want to print out the info manual for the coreutils package. For this, however, you need to install the coreutils source. There is a "make pdf" or "make ps" target to produce the pdf or postscript versions of the info manual.
./configure
make pdf

If you use a distro that is RPM based, you can install the coreutils src.rpm package, then in the SPECS directory use:
sudo rpmbuild -bp coreutils.spec # applies patches if any
cd ../BUILD
sudo cd coreutils-<version>
sudo ./configure
sudo make pdf

If you want to use awk, the info manual for gawk is very good. You may also have a gawk-doc package with give you a book "Gawk: Effective Awk Programming" which is excellent!