[SOLVED] Remove the duplicate and count the line!!

grail · 09-07-2011, 09:23 AM

Correct ... on quick inspection you could change the test to:

Code:

/^[^ ]+COM/

anishkumarv · 09-07-2011, 02:04 PM

Hi grail,

Thanks alot for your immediate response..and its working now Really AWK/SED is awesome tool :-)

I started Reading
Sed & awk
By Dale Dougherty, Arnold Robbins

This book and practising too... what are the best ways to become master in AWK? need your guidance.

chrism01 · 09-07-2011, 07:59 PM

Try the Grymoire for both tools http://www.grymoire.com/Unix/ and then, like any tool, practise eg try solving LQ qns using ONLY those tools

grail · 09-07-2011, 08:36 PM

+1 to chrism01's suggestion. I can also recommend :- http://www.gnu.org/software/gawk/man...ode/index.html

anishkumarv · 09-08-2011, 04:51 AM

Hi chrism01 , grail

Thanks for your guidance for a new beginning in my linux world..:-)

Code:

$TTL 900
biz. IN SOA a.anish.biz. hostmaster.anish.biz. (
        45266628 ; Serial
        900 ; Refresh
        900 ; Retry
        604800 ; Expire
        86400 ) ; Minimum
;
; Generation start time = SEP-06-2011 21:06:03
; TLD RECORDS
a.anish.biz. 518400 IN A 165.154.124.45
b.anish.biz. 518400 IN A 165.154.125.45
; A RECORDS
NS1.000A IN A 209.190.16.82
NS2.000A IN A 72.36.219.162
;
; NS RECORDS
0--0 IN NS DOCS03.RZONE.DE.
0--0 IN NS SHADES04.RZONE.DE.
0--1 IN NS 01.DNSV.JP.
0--1 IN NS 02.DNSV.JP.
ZZZZZZZZZZZZZZZZZZ IN NS DOCS09.RZONE.DE.
ZZZZZZZZZZZZZZZZZZ IN NS SHADES17.RZONE.DE.
; Generation end time = SEP-06-2011 21:07:42
; END OF FILE

from this sample file also i need the same output but this file quite different compare to old..

I want the output like this

Code:

0--0 
0--1
ZZZZZZZZZZZZZZZZZZ

Total domains 3

iam woring for this i can able to remove the duplicates only not able to skip the first and last few lines..that also comes in count :-(

PTrenholme · 09-08-2011, 11:10 AM

If just looking for the lines that contain " IN NS " is sufficient, this would work

Code:

$ awk '/ IN NS /{++uniq[$1]} END {n=asorti(uniq,ind);for (i=1;i<=n;++i) {print ind[i] "\t(" uniq[ind[i]] " times)"};print "\nTotal domains: " n}' test3.txt
0--0    (2 times)
0--1    (2 times)
ZZZZZZZZZZZZZZZZZZ      (2 times)

Total domains: 3

grail · 09-08-2011, 07:28 PM

Basically the idea is you need to find whatever is the common item in all files you are looking at or a group of items
which are unique (these can be separated like /COM|NS/) and place that at the start as I have shown.

anishkumarv · 09-09-2011, 12:36 PM

Hi grail,

Thanks alot man,

Code:

awk '/^[^ ]+ IN NS/ && !_[$1]++{print $1; tot++}END{print "\nTotal",tot,"Domains"}' file

Now finally i need a idea alone to complete my script.

this process is we downlad a zone file and count the values and compare the current and previous zone files values then only we get the, how many new domains and all,

for example

i have to files named

08092011.com

09092011.com

and in file 08092011.com

Code:

awk '/Total/' u.txt | cut -f2 -d: |sed 's/^ *//g'

using this command i get the value alone

4

and in 09092011.com using the same process i get the value

6

now i want the output like subtract the file2 - file1 = 2

but this proces i can able to do normally..but if i suppose download one new in sep 10 then that file will be named as 10092011.com

and then the same process this time need to compare

09092011.com

10092011.com

this both file.

I don't know how to get the latest file to compare with downloaded file... need your guidance to complete this script.

grail · 09-10-2011, 02:36 AM

awk, sed and cut ... no no no. Awk can easily do all the other tasks and it is also not required to be so complicated:

Code:

awk '/Total/{if(tot)print "diff is",tot - $NF;else tot = $NF}' 09092011.com 10092011.com

anishkumarv · 09-11-2011, 11:38 AM

Hi grail,

Thanks a lot man, u made my work very easily :-) thanks alot

Now my script at finishing stage..if one task completes means...

Example:

We have dir zones

in that

Quote:

09092011.com
10092011.com
11092011.com
12092011.com

like that four files but if my script runs on 12th Sep means the scipt need to compare most recent 2 values alone

Code:

 awk '/Total/{if(tot)print "diff is",tot - $NF;else tot = $NF}' 11092011.com  12092011.com

no need to compare the other 2 files. everytime this script need to comnpare only most recent 2 files,

i think using for loop we can make this possible..working on the process grail, any ideas you have means share dude!!

anishkumarv · 09-11-2011, 02:48 PM

Hi grail,

i got an idea to finish this script but for that i need your help is it possoble to do like this

Example:

/var/tmp/v1 file contians 09092011.com this name

/var/tmp/v2 file contians 10092011.com this name

Code:

awk '/Total/{if(tot)print "diff is",tot - $NF;else tot = $NF}' /var/tmp/v1 /var/tmp/v2

i know this command wont work and is that any alternate way ? to get this 2 values from the file

Need your guidance to finish this thread.

grail · 09-11-2011, 08:19 PM

Assuming they are created each day (hence the time stamp reference) you could get the files with a simple find:

Code:

<your awk here> $(find /dir/with/files -type f -ctime -1 -name '*.com')

anishkumarv · 09-12-2011, 08:02 PM

Hi Grail,

Thanks alot for you support and ideas..:-)

Code:

awk '/Total/{if(tot)print "diff is",tot - $NF;else tot = $NF}'$(</var/tmp/v1) $(</var/tmp/v2)

using this code i can able to get the output

and

Code:

awk 'NR==FNR{a[$0];next} $0 in a' file1 file2

this is the command for intersection two files right? why i asking the question means, i did the same intersection process using database also but both the value are differs man plaese share your ideas, and thanks for your patience to answering my questions till now..need your guidance to solve this thread

grail · 09-13-2011, 02:57 AM

$(<) not required for the files on end of awk.

Your second awk says that there are lines in file2 that are exactly the same as file1. If the lines differ in any way
they will not be printed. So essentially yes it will show you the lines that are the same.

anishkumarv · 09-19-2011, 06:27 AM

Hi Grail,

Thank you man, With out you i cant finish this script , the most ugliest script in this world :-( but it fulfil my requirement,

Code:

#!/bin/bash
date=`date +%d-%m-%Y_%H.%M.%S`
today=`date +%d%m%Y`
yesterday=`date -d 'yesterday' '+%d%m%Y'`
RootPath=/var/domaincount/asia/
cd $RootPath
mkdir $today'asia'
cd $today'asia'
echo Intelliscan Process started for Asia TLD $date 
#################################################################################################
## Using Wget Downloading the Zone files it will try only one time
wget --tries=1 --ftp-user=USERNAME --ftp-password=PASSWORD ftp://ftp.anish.com:21/zonefile/anish.zone.gz
if [ $? -ne 0 ]
then
echo Download Not Success Domain count Failed With Error >> /var/tmp/log/asia/asiacount$date.log
exit
fi
###The downloaded file in Gunzip format from that we need to unzip and start the domain count process####
gunzip asia.zone.gz
mv asia.zone $RootPath/$today.asia
###### It will start the Count #####
awk '/^[^ ]+ASIA/ && !_[$1]++{print $1; tot++}END{print "Total",tot,"Domains"}' $RootPath/$today.asia > $RootPath/zonefile/$today.asia
a=`awk '/Total/ {print $2;}' $RootPath/zonefile/$today.asia > $RootPath/$today.count`
echo "$date Count For Asia TlD $a" >> /var/tmp/log/asia/asiacount$date.log
cd $RootPath/zonefile/
ls -l $RootPath/zonefile | sort -rg | awk '{print $9}' > /var/tmp/a
cd $RootPath
awk 'NR == 2 { print }' /var/tmp/a > $today.filename
cd $RootPath/zonefile
#b=`awk 'NR==FNR{a[$0];next} $0 in a' $(<$RootPath/$today.filename) $(<$RootPath/$yesterday.filename) | awk 'END {print NR}'`
awk 'NR==FNR{a[$0];next} $0 in a' $(<$RootPath/$today.filename) $(<$RootPath/$yesterday.filename) | awk 'END {print NR}' > $RootPath/$today.overallcount
b=`cat $RootPath/$today.overallcount`
echo "$date Overall Count For Asia TlD $b" >> /var/tmp/log/asia/asiacount$date.log
#awk 'NR==FNR{a=$0;next} {b=$0; print a-b}' $(<$RootPath/$today.overallcount) $(<$RootPath/$today.count)
#awk 'NR==FNR{a=$0;next} {b=$0; print a-b}' $(<$RootPath/$today.overallcount) $(<$RootPath/$yesterday.count)
c=`echo $(( $(<$RootPath/$today.overallcount) - $(<$RootPath/$today.count) ))`
echo "$date New Registration Domain Counts $c" >> /var/tmp/log/asia/asiacount$date.log
d=`echo $(( $(<$RootPath/$today.overallcount) - $(<$RootPath/$yesterday.count) ))`
echo "$date Deleted Domain Counts $c" >> /var/tmp/log/asia/asiacount$date.log
cat /var/tmp/log/asia/asiacount$date.log | mail -s "Asia Tld Count log" 07anis@gmail.com

Using this script, we compare yesterday zone files and today's zone files then we will find the deleted and new domains.

and my last doubt regarding this script,

Code:

awk '/^[^ ]+ASIA/; /^[^ ]+asia/ && !_[$1]++{print $1; tot++}END{print "Total",tot,"Domains"}'

Using this code, iam searching "ASIA and asia" tlds but i get the content but i didn't get the count??

Any suggestions welcome once i got answer for this i marked this thread solved, thanks a lot Linuxquestion.org for your support.