[SOLVED] Remove the duplicate and count the line!!
Linux - NewbieThis Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
1. Use camel case in a character list to get each possible name:
Code:
/^[^ ]+[Aa][Ss][Ii][Aa]/
2. Tell awk to ignore case:
Code:
BEGIN{IGNORECASE=1}/^[^ ]+asia/
I am not sure if I followed all of what you are trying to achieve, but the below might give you some ideas. There are probably more improvements to be had:
Code:
#!/bin/bash
current_date=`date +%d-%m-%Y_%H.%M.%S`
today=`date +%d%m%Y`
yesterday=`date -d 'yesterday' '+%d%m%Y'`
RootPath=/var/domaincount/asia/
MainPath=$RootPath${today}asia
LOG=/var/tmp/log/asia/asiacount$current_date.log
mkdir -p $MainPath
echo Intelliscan Process started for Asia TLD $current_date
exec 6>&1 >> $LOG
#################################################################################################
## Using Wget Downloading the Zone files it will try only one time
if ! wget --tries=1 --ftp-user=USERNAME --ftp-password=PASSWORD ftp://ftp.anish.com:21/zonefile/anish.zone.gz
then
echo Download Not Success Domain count Failed With Error
exit 1
fi
###The downloaded file in Gunzip format from that we need to unzip and start the domain count process####
gunzip asia.zone.gz > $MainPath/$today.asia
###### It will start the Count #####
awk '/^[^ ]+ASIA/ && !_[$1]++{print $1; tot++}END{print "Total",tot,"Domains"}' $MainPath/$today.asia > $RootPath/zonefile/$today.asia
awk '/Total/ {print $2}' $RootPath/zonefile/$today.asia > $RootPath/$today.count
a=$(< $RootPath/$today.count)
b=$(< $RootPath/$yesterday.count)
c=$(awk 'NR==FNR{a[$0];next} $0 in a{tot++}END{print tot}' $RootPath/zonefile/$today.asia $RootPath/zonefile/$yesterday.asia)
echo "$current_date Count For Asia TlD $a"
echo "$current_date Overall Count For Asia TlD $c"
echo "$current_date New Registration Domain Counts $((c - a))"
echo "$current_date Deleted Domain Counts $((c - b))"
exec >&6 6>&-
cat $LOG | mail -s "Asia Tld Count log" 07anis@gmail.com
This code working fine some times, but its sometimes skip the entire "www values" is that any way to avoid this If you know kindly post your idea.
You have the following in your code:
Code:
length(a)==2
As www.0008.asia has a length of 3 so it is ignored.
Quote:
that to duplicate anish.asia., ANISH.asia. both are same domains,
The regex is ignoring case but when being stored to be printed later you are not testing to see if they are the same, minus the case.
So you need to add a test for what is being stored in b
. . .
The regex is ignoring case but when being stored to be printed later you are not testing to see if they are the same, minus the case.
So you need to add a test for what is being stored in b
Or just use b[toupper($1)]++ or b[tolower($1)]++ to let AWK handle it, picking the case you want displayed.
(As a side comment, you seem to be ignoring the pending change to IP version 6, and assuming that the DNS records are always going to be in version 4 format.)
Or just use b[toupper($1)]++ or b[tolower($1)]++ to let AWK handle it, picking the case you want displayed.
(As a side comment, you seem to be ignoring the pending change to IP version 6, and assuming that the DNS records are always going to be in version 4 format.)
not sure if this might help to count the domains. I used this something similar to eliminate duplicates in a textfile. With some slight modifications it does count the unique domains:
Code:
awk 'BEGIN {IGNORECASE=1;count=0}
{
$1=gensub("([0-9A-Za-z]+\\.)*([0-9A-Za-z]+\\.[0-9A-Za-z]+)\\.*$", "\\2","1", $1)
for (i=0;i<count;i++) {
if (store[i] == $1 || /^[[:blank:]]*$/) {
next
}
}
store[count++]=$1
} END {
for (k=0;k<count;k++) {
print store[k]
}
print "Total: " count
}' "$1"
You quoted my post, above, but I don't think you tried it. Using the test data crts posted, here's what I get using the tolower function in the code you posted:
I think you may have misunderstood my comment. I didn't mean to suggest that you weren't trying to solve your problem. I just meant that I saw no evidence that you'd tried to use the tolower or toupper function on your array indies.
In any case, where you have _[$1]++ you need _[tolower($1)]++ because _["ASIA"] and _["asia"] refer to different array elements.
<edit>
Note the test for a non-null $1 in the condition of the print stanza. ($2=="asia"&&$1&&!_[tolower($1)]++)
I had to add that to eliminate a .asia line.
(This is what I was trying to write when he cat intervened. See the next post . . .)
</edit>
Last edited by PTrenholme; 09-23-2011 at 07:33 AM.
They are clever like that (mine has locked me out of my computer before )
@OP - PT has you on the right path. Without setting the index of your array to one case or the other you will always
receive all variations. Where you may have been getting confused is the IGNORECASE option I gave you. This affecting
the testing you do in your regex but does not affect the items you assign to the array.
Well glad you got a working solution, although of course it does not work with any of the data you have provided us but I assume the real data to be very different.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.