LinuxQuestions.org
Support LQ: Use code LQ3 and save $3 on Domain Registration
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 05-16-2006, 05:20 AM   #1
Guru Mind
Member
 
Registered: Dec 2005
Posts: 41

Rep: Reputation: 15
bash : remove replicator words?, how?


hi

i need advice please sirs , i have file have many numbers and i want to remove the replicator numbers in this file , but i don't know what's this numbers i just want to remove any replicator numbers

how can i do that by bash script?


thanks , allen
 
Old 05-16-2006, 06:13 AM   #2
jim mcnamara
Member
 
Registered: May 2002
Posts: 964

Rep: Reputation: 34
I'm assuming you mean duplicate numbers. This works when the file is just one column of numbers, otherwise you need to use the -k qualifier to tell sort which column(s) to use.

Code:
sort -u filename > newfile
 
Old 05-16-2006, 06:30 AM   #3
Guru Mind
Member
 
Registered: Dec 2005
Posts: 41

Original Poster
Rep: Reputation: 15
yes i was mean duplicate , anyway the problem still un fixed

filename :

PHP Code:
 0    0 127.0.0.1:pop          231.158.187.245:4907
 0    0 127.0.0.1
:pop          231.158.187.245:43123
 0    0 127.0.0.1
:pop          82.156.113.60:57802 
------

when i using your command the output for newfile will be

PHP Code:
 0    0 127.0.0.1:pop          231.158.187.245:4907
 0    0 127.0.0.1
:pop          231.158.187.245:43123
 0    0 127.0.0.1
:pop          82.156.113.60:57802 
---

i want to remove those duplicate numbers "231.158.187.245"

any idea ?

thanks , allen .

Last edited by Guru Mind; 05-16-2006 at 06:31 AM.
 
Old 05-16-2006, 07:14 AM   #4
jschiwal
Guru
 
Registered: Aug 2001
Location: Fargo, ND
Distribution: SuSE AMD64
Posts: 15,733

Rep: Reputation: 654Reputation: 654Reputation: 654Reputation: 654Reputation: 654Reputation: 654
Since you want to eliminate lines with duplicate IP address, I take it that the port numbers aren't important.
In that case you can strip them off and then use the unique command.

sort -k 4 logfile | sed 's/:[[:digit:]]*$//' | uniq -f3

0 0 127.0.0.1:pop 231.158.187.245
0 0 127.0.0.1:pop 82.156.113.60

Last edited by jschiwal; 05-16-2006 at 07:15 AM.
 
Old 05-16-2006, 08:28 AM   #5
Guru Mind
Member
 
Registered: Dec 2005
Posts: 41

Original Poster
Rep: Reputation: 15
thanks jschiwal for trick , everything is good now but i have new question

if i want to remove duplicate numbers here i mean " 231.158.187.245 " and also remove all other numbers.. for example i want the output for
Code:
tcp        0      0 127.0.0.1:pop          231.158.187.245:4907        TIME_WAIT

tcp        0      0 127.0.0.1:pop          231.158.187.245:4907        TIME_WAIT

tcp        0      0 127.0.0.1:pop          82.156.113.60:1312        TIME_WAIT
be like that

Code:
231.158.187.245
is that mean i must use awk? or whati must doing here? i want to save duplicate ip and remove others..

thanks , allen
 
Old 05-16-2006, 01:15 PM   #6
Guru Mind
Member
 
Registered: Dec 2005
Posts: 41

Original Poster
Rep: Reputation: 15
up..any help please?
 
Old 05-16-2006, 01:51 PM   #7
schneidz
Senior Member
 
Registered: May 2005
Location: boston, usa
Distribution: fc-15/ fc-20-live-usb/ aix
Posts: 3,849

Rep: Reputation: 592Reputation: 592Reputation: 592Reputation: 592Reputation: 592Reputation: 592
not sure what you are asking but i think:
Code:
grep -o 231.158.187.245
will get you close to what you need.
 
Old 05-16-2006, 03:30 PM   #8
Guru Mind
Member
 
Registered: Dec 2005
Posts: 41

Original Poster
Rep: Reputation: 15
no that's not what i need , i want to remove duplicate ip's in the file 1 also need to make the output be like that "231.158.187.245" just the duplicate ip's who are deleted before , ip without port or others info

is my question clear now ?

allen
 
Old 05-16-2006, 10:04 PM   #9
jschiwal
Guru
 
Registered: Aug 2001
Location: Fargo, ND
Distribution: SuSE AMD64
Posts: 15,733

Rep: Reputation: 654Reputation: 654Reputation: 654Reputation: 654Reputation: 654Reputation: 654
I am not certain if you want a list of the ip addresses from the file or a list of ip address that appear more than once. If you want a list of all unique IP addresses, than you can use "cut" to extract the field with the ip address. Since they might not appear in order, you then pipe that through "sort" and finally pipe that output through "uniq".

My first example used sort options which started on a certain field. For this, you don't need any options to "sort" or "uniq".

If you want the ip address that are duplicates, then use "uniq -d" or "uniq -D" instead.
 
Old 05-17-2006, 04:33 AM   #10
Guru Mind
Member
 
Registered: Dec 2005
Posts: 41

Original Poster
Rep: Reputation: 15
thanks jschiwal for advice , but please can you put an example by small script like that what i want it..
 
Old 05-17-2006, 06:49 AM   #11
Guru Mind
Member
 
Registered: Dec 2005
Posts: 41

Original Poster
Rep: Reputation: 15
everything is clear now for me but i have 2 questions ( last questions )

1- i need to make vaule for duplicate ip's , for example.. the ip must duplicated 30 times to get it in the output


2- when i used uniq -d or -D the command output give me the duplicate ip's and ports , and i need just the ip .

thanks .
 
Old 05-17-2006, 10:02 AM   #12
schneidz
Senior Member
 
Registered: May 2005
Location: boston, usa
Distribution: fc-15/ fc-20-live-usb/ aix
Posts: 3,849

Rep: Reputation: 592Reputation: 592Reputation: 592Reputation: 592Reputation: 592Reputation: 592
Code:
man grep
man uniq
i am at work now so i am unable to test anything. there needs to be some sort of data-mining in order to get the input to be parsed correctly. other than that i think grep and uniq have options (grep -c, uniq -c) to count number of matching lines.

Last edited by schneidz; 05-17-2006 at 10:03 AM.
 
Old 05-17-2006, 09:24 PM   #13
jlinkels
Senior Member
 
Registered: Oct 2003
Location: Bonaire
Distribution: Debian Lenny/Squeeze/Wheezy/Sid
Posts: 4,053

Rep: Reputation: 484Reputation: 484Reputation: 484Reputation: 484Reputation: 484
Guru Mind,

If you get the AWK manual, there are examples of duplicates counting. You can also create every conceivable output. There is an example which almost literally do what you need. Somewhere at the start of the manual.

jlinkels
 
Old 05-18-2006, 12:24 AM   #14
jschiwal
Guru
 
Registered: Aug 2001
Location: Fargo, ND
Distribution: SuSE AMD64
Posts: 15,733

Rep: Reputation: 654Reputation: 654Reputation: 654Reputation: 654Reputation: 654Reputation: 654
Sorry, I didn't understand your first question at all. On the second one, look at my earlier example. I piped the output of sort to the "sed" command. The sed command removed the port number.
You can use that same example but change the parameters to the uniq command.
Since you just want to retain the IP:PORT field, you can use the "cut" command and pipe that to the same "sed" command as in the earlier example.

Sometimes you want to use a program like 'tr' in the pipe to either change the delimiter or squeeze the whitespace, so that cut behaves better.

Try it out one part at a time so you understand what each part does. Use the up arrow and add pipe what you have so far to the next utility.

I can't tell from what you posted whether your file uses a tab delimiter or a number of spaces. "cut" uses a single tab by default. If you list uses spaces between fields, you can use "tr -s' '" to eliminate extra spaces between tabs:
try 1: eliminate extra spaces
tr -s ' ' <originalfile
try 2: cut out the fourth column:
tr -s ' ' <originalfile | cut -d ' ' -f4
try 3: get rid of the portnumber:
tr -s ' ' <originalfile | cut -d ' ' -f4 | sed 's/:[[:digit:]][[:digit:]]*$//'
finally: select just the dupes:
tr -s ' ' <originalfile | cut -d ' ' -f4 | sed 's/:[[:digit:]][[:digit:]]*$//' | uniq -d

Unix and Linux excel in these handy text handling utilities that each do a small job very well and can be piped form one to the next. At work, I installed Cygwin, so I can do this sort of thing easily.
I came up with a couple one liners to catalog backups and produce a PDF catalog that I put on the server.
The first one liner reads the directory of the DVD and produces a tab separated listing of the files and dates. The second one-liner merges each .tsv file and used enscript to pretty print the catalog. Another line (actually a two liner!) runs "ps2pdf" so that the output is a PDF file that anyone can open and read.

I would suggest that you print out the man pages for some of these common commands:
man -t cut | lpr
man -t tr | lpr
man -t uniq | lpr
man -t sort | lpr

Right now, try it by piping the output to kghostview or gv:
man -t cut | kghostview -

Having a printout with the options of the commands is handy when crafting your short script. Some trial and error is inevitable.

You may also want to print out the info manual for the coreutils package. For this, however, you need to install the coreutils source. There is a "make pdf" or "make ps" target to produce the pdf or postscript versions of the info manual.
./configure
make pdf

If you use a distro that is RPM based, you can install the coreutils src.rpm package, then in the SPECS directory use:
sudo rpmbuild -bp coreutils.spec # applies patches if any
cd ../BUILD
sudo cd coreutils-<version>
sudo ./configure
sudo make pdf

If you want to use awk, the info manual for gawk is very good. You may also have a gawk-doc package with give you a book "Gawk: Effective Awk Programming" which is excellent!

Last edited by jschiwal; 05-18-2006 at 12:30 AM.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
How do I remove words from ispell dictionary.hash? geokker Linux - Software 1 04-20-2006 04:55 AM
Bash - How to make For read lines instead of words in a list ? landuchi Programming 10 02-15-2006 01:36 PM
[bash] Put words from file to array mispunt Programming 4 11-04-2004 10:53 AM
BASH: First words in a line JordanH Programming 7 10-24-2004 10:00 AM
Giving multiple words as argument for bash ivanatora Linux - General 5 01-07-2004 02:04 AM


All times are GMT -5. The time now is 05:38 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration