LinuxQuestions.org
Review your favorite Linux distribution.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 09-19-2011, 08:38 AM   #31
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191

Two options:

1. Use camel case in a character list to get each possible name:
Code:
/^[^ ]+[Aa][Ss][Ii][Aa]/
2. Tell awk to ignore case:
Code:
BEGIN{IGNORECASE=1}/^[^ ]+asia/
I am not sure if I followed all of what you are trying to achieve, but the below might give you some ideas. There are probably more improvements to be had:
Code:
#!/bin/bash

current_date=`date +%d-%m-%Y_%H.%M.%S`
today=`date +%d%m%Y`
yesterday=`date -d 'yesterday' '+%d%m%Y'`
RootPath=/var/domaincount/asia/
MainPath=$RootPath${today}asia
LOG=/var/tmp/log/asia/asiacount$current_date.log

mkdir -p $MainPath
echo Intelliscan Process started for Asia TLD $current_date 

exec 6>&1 >> $LOG

#################################################################################################
## Using Wget Downloading the Zone files it will try only one time
if ! wget --tries=1 --ftp-user=USERNAME --ftp-password=PASSWORD ftp://ftp.anish.com:21/zonefile/anish.zone.gz
then
    echo Download Not Success Domain count Failed With Error
    exit 1
fi
###The downloaded file in Gunzip format from that we need to unzip and start the domain count process####
gunzip asia.zone.gz > $MainPath/$today.asia

###### It will start the Count #####
awk '/^[^ ]+ASIA/ && !_[$1]++{print $1; tot++}END{print "Total",tot,"Domains"}' $MainPath/$today.asia > $RootPath/zonefile/$today.asia
awk '/Total/ {print $2}' $RootPath/zonefile/$today.asia > $RootPath/$today.count

a=$(< $RootPath/$today.count)
b=$(< $RootPath/$yesterday.count)
c=$(awk 'NR==FNR{a[$0];next} $0 in a{tot++}END{print tot}' $RootPath/zonefile/$today.asia $RootPath/zonefile/$yesterday.asia)

echo "$current_date Count For Asia TlD $a"
echo "$current_date Overall Count For Asia TlD $c"
echo "$current_date New Registration Domain Counts $((c - a))"
echo "$current_date Deleted Domain Counts $((c - b))"

exec >&6 6>&-
cat $LOG | mail -s "Asia Tld Count log" 07anis@gmail.com
 
1 members found this post helpful.
Old 09-19-2011, 02:21 PM   #32
anishkumarv
Member
 
Registered: Feb 2010
Location: chennai - India
Distribution: centos
Posts: 294

Original Poster
Rep: Reputation: 10
Hi Grail,

Compare to my script..sorry its not a script (just commands) to yours 100% better, Thanks a alot,
 
Old 09-19-2011, 03:51 PM   #33
anishkumarv
Member
 
Registered: Feb 2010
Location: chennai - India
Distribution: centos
Posts: 294

Original Poster
Rep: Reputation: 10
Hi Grail,

Code:
awk 'BEGIN{IGNORECASE=1}/^[^ ]+asia/ { gsub(/\.$/,"",$1);split($1,a,".")} length(a)==2{b[$1]++;}END{for (x in b)print x}'
using this code i get the main domain alone from a file:

Quote:
0008.ASIA. NS AS2.DNS.ASIA.CN.
0008.ASIA. NS AS2.DNS.ASIA.CN.
ns1.0008.asia NS AS2.DNS.ASIA.CN.
www.0008.asia NS AS2.DNS.ASIA.CN.
anish.asia NS AS2.DNS.ASIA.CN.
ns2.anish.asia NS AS2.DNS.ASIA.CN
like this

Quote:
0008.ASIA
anish.asia
This code working fine some times, but its sometimes skip the entire "www values" is that any way to avoid this If you know kindly post your idea.

Last edited by anishkumarv; 09-19-2011 at 08:36 PM.
 
Old 09-19-2011, 08:55 PM   #34
anishkumarv
Member
 
Registered: Feb 2010
Location: chennai - India
Distribution: centos
Posts: 294

Original Poster
Rep: Reputation: 10
Hi grail,

Again lot of issues one by one coming in my awk command :-(



0008.ASIA. NS AS2.DNS.ASIA.CN.
0008.ASIA. NS AS2.DNS.ASIA.CN.
ns1.0008.asia. NS AS2.DNS.ASIA.CN.
www.0008.asia. NS AS2.DNS.ASIA.CN.
anish.asia NS AS2.DNS.ASIA.CN.
ns2.anish.asia NS AS2.DNS.ASIA.CN
ANISH.asia. NS AS2.DNS.ASIA.CN.

suppose in a file content like this

using this command,
Code:
awk 'BEGIN{IGNORECASE=1}/^[^ ]+asia/ { gsub(/\.$/,"",$1);split($1,a,".")} length(a)==2{b[$1]++;}END{for (x in b)print x}'
i got output like this
Quote:
0008.ASIA.
anish.asia.
ANISH.asia.
that to duplicate anish.asia., ANISH.asia. both are same domains, i want to display either upper case or lower case.

Note: keep on posting my doubts in this thread in LQ in the sense, please don't think iam not trying like that, iam also working for this issues.
 
Old 09-20-2011, 01:00 AM   #35
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
So in order they were posted:
Quote:
This code working fine some times, but its sometimes skip the entire "www values" is that any way to avoid this If you know kindly post your idea.
You have the following in your code:
Code:
length(a)==2
As www.0008.asia has a length of 3 so it is ignored.

Quote:
that to duplicate anish.asia., ANISH.asia. both are same domains,
The regex is ignoring case but when being stored to be printed later you are not testing to see if they are the same, minus the case.
So you need to add a test for what is being stored in b
 
Old 09-20-2011, 11:09 AM   #36
PTrenholme
Senior Member
 
Registered: Dec 2004
Location: Olympia, WA, USA
Distribution: Fedora, (K)Ubuntu
Posts: 4,187

Rep: Reputation: 354Reputation: 354Reputation: 354Reputation: 354
Quote:
Originally Posted by grail View Post
. . .
The regex is ignoring case but when being stored to be printed later you are not testing to see if they are the same, minus the case.
So you need to add a test for what is being stored in b
Or just use b[toupper($1)]++ or b[tolower($1)]++ to let AWK handle it, picking the case you want displayed.

(As a side comment, you seem to be ignoring the pending change to IP version 6, and assuming that the DNS records are always going to be in version 4 format.)
 
1 members found this post helpful.
Old 09-22-2011, 10:11 AM   #37
anishkumarv
Member
 
Registered: Feb 2010
Location: chennai - India
Distribution: centos
Posts: 294

Original Poster
Rep: Reputation: 10
Quote:
Originally Posted by PTrenholme View Post
Or just use b[toupper($1)]++ or b[tolower($1)]++ to let AWK handle it, picking the case you want displayed.

(As a side comment, you seem to be ignoring the pending change to IP version 6, and assuming that the DNS records are always going to be in version 4 format.)
Code:
awk -F'[. ]' 'BEGIN{IGNORECASE=1}$3=="asia" {$1=$2;$2=$3} $2=="asia"&&!_[$1]++{print $1"."$2}END{print "Total",length(_),"Domains"}'
using this i can skip sub domains and all, but still this duplicate

Code:
0008.ASIA.
anish.asia.
ANISH.asia.
this kind of output only still i am getting.. totally sticking with this issue how to avoid this kind of duplicate..
 
Old 09-22-2011, 11:07 AM   #38
crts
Senior Member
 
Registered: Jan 2010
Posts: 2,020

Rep: Reputation: 757Reputation: 757Reputation: 757Reputation: 757Reputation: 757Reputation: 757Reputation: 757
Hi,

not sure if this might help to count the domains. I used this something similar to eliminate duplicates in a textfile. With some slight modifications it does count the unique domains:
Code:
awk 'BEGIN {IGNORECASE=1;count=0}
{
       $1=gensub("([0-9A-Za-z]+\\.)*([0-9A-Za-z]+\\.[0-9A-Za-z]+)\\.*$", "\\2","1", $1)
       for (i=0;i<count;i++) {
                if (store[i] == $1 || /^[[:blank:]]*$/) {
                        next
                }
        }
        store[count++]=$1
} END {
        for (k=0;k<count;k++) {
                print store[k]
        }
	print "Total: " count
}' "$1"
Tested with:
Code:
0008.ASIA. NS AS2.DNS.ASIA.CN.
0008.ASIA. NS AS2.DNS.ASIA.CN.
ns1.0008.asia. NS AS2.DNS.ASIA.CN.
www.0008.asia. NS AS2.DNS.ASIA.CN.
anish.asia NS AS2.DNS.ASIA.CN.
ns2.anish.asia NS AS2.DNS.ASIA.CN
ANISH.asia. NS AS2.DNS.ASIA.CN.
SWEATY.COM. NS NS1.PARKED.COM.
SWEATY.COM. NS NS2.PARKED.COM.
SWEATYANDREADY.COM. NS NS63.ANISH.COM.
SWEATYANDREADY.COM. NS NS64.ANISH.COM.
SWEATYBANDS.COM. NS NS03.ANISH.COM.
SWEATYBANDS.COM. NS NS04.ANISH.COM.
SWEATYBETTY.COM. NS NS67.ANISH.COM.
SWEATYBETTY.COM. NS NS68.ANISH.COM.
SWEATYDANCER.COM. NS NS13.ANISH.COM.
SWEATYDANCER.COM. NS NS14.ANISH.COM.
It requires GNU awk.
 
Old 09-22-2011, 01:05 PM   #39
PTrenholme
Senior Member
 
Registered: Dec 2004
Location: Olympia, WA, USA
Distribution: Fedora, (K)Ubuntu
Posts: 4,187

Rep: Reputation: 354Reputation: 354Reputation: 354Reputation: 354
You quoted my post, above, but I don't think you tried it. Using the test data crts posted, here's what I get using the tolower function in the code you posted:
Code:
$ cat <<EOF >sample.txt
> 0008.ASIA. NS AS2.DNS.ASIA.CN.
> 0008.ASIA. NS AS2.DNS.ASIA.CN.
> ns1.0008.asia. NS AS2.DNS.ASIA.CN.
> www.0008.asia. NS AS2.DNS.ASIA.CN.
> anish.asia NS AS2.DNS.ASIA.CN.
> ns2.anish.asia NS AS2.DNS.ASIA.CN
> ANISH.asia. NS AS2.DNS.ASIA.CN.
> SWEATY.COM. NS NS1.PARKED.COM.
> SWEATY.COM. NS NS2.PARKED.COM.
> SWEATYANDREADY.COM. NS NS63.ANISH.COM.
> SWEATYANDREADY.COM. NS NS64.ANISH.COM.
> SWEATYBANDS.COM. NS NS03.ANISH.COM.
> SWEATYBANDS.COM. NS NS04.ANISH.COM.
> SWEATYBETTY.COM. NS NS67.ANISH.COM.
> SWEATYBETTY.COM. NS NS68.ANISH.COM.
> SWEATYDANCER.COM. NS NS13.ANISH.COM.
> SWEATYDANCER.COM. NS NS14.ANISH.COM.
> EOF
$ awk 'BEGIN{IGNORECASE=1}/^[^ ]+asia/ { gsub(/\.$/,"",$1);split($1,a,".")} length(a)==2{b[tolower($1)]++;}END{for (x in b)print x}' sample.txt 
sweatybands.com.
sweatyandready.com.
anish.asia
sweatydancer.com.
sweaty.com.
sweatybetty.com.
0008.asia
 
Old 09-22-2011, 01:52 PM   #40
anishkumarv
Member
 
Registered: Feb 2010
Location: chennai - India
Distribution: centos
Posts: 294

Original Poster
Rep: Reputation: 10
hmmm.... i sticky with this nearly 2 weeks but you are telling iam not tried..

if you read my thread fully means u will find my exact file format, again i posting for you.


Code:
;start: 1315288329
;File created: 2011-09-06 05:52:09 IST
;Export host: 199.115.158.5
;Record count: 2330419
;Created by ANISH

$ORIGIN asia.
@ IN SOA A.COM.ANISH.INFO. NOC.ANISH.INFO. (
                                    2008334441 ; serial
                                    10800 ; refresh
                                    3600 ; retry
                                    2592000 ; expire
                                    86400 ; minimum
                                    )
$TTL 86400

0008.ASIA. NS AS2.DNS.ASIA.CN.
0008.ASIA. NS AS2.DNS.ASIA.CN.
ns1.0008.asia. NS AS2.DNS.ASIA.CN.
www.0008.asia. NS AS2.DNS.ASIA.CN.
anish.asia. NS AS2.DNS.ASIA.CN.
ns2.anish.asia NS AS2.DNS.ASIA.CN
ANISH.ASIA. NS AS2.DNS.ASIA.CN.

;End of file: 1315288329

Code:
awk -F'[. ]' 'BEGIN{IGNORECASE=1}$3=="asia" {$1=$2;$2=$3} $2=="asia"&&!_[$1]++{print $1"."$2}END{print "Total",length(_),"Domains"}' filename
using this code i get the output like this but

Code:
$ORIGIN.asia
0008.ASIA
anish.asia
ANISH.ASIA
Total 4 Domains

i want like this

Quote:
Quote:
0008.ASIA
anish.asia
Total 2 Domains
either this or


Quote:
0008.ASIA
ANISH.ASIA
Total 2 Domains
 
Old 09-22-2011, 06:19 PM   #41
PTrenholme
Senior Member
 
Registered: Dec 2004
Location: Olympia, WA, USA
Distribution: Fedora, (K)Ubuntu
Posts: 4,187

Rep: Reputation: 354Reputation: 354Reputation: 354Reputation: 354
I think you may have misunderstood my comment. I didn't mean to suggest that you weren't trying to solve your problem. I just meant that I saw no evidence that you'd tried to use the tolower or toupper function on your array indies.

In any case, where you have _[$1]++ you need _[tolower($1)]++ because _["ASIA"] and _["asia"] refer to different array elements.

So:
Code:
$ awk -F'[. ]' 'BEGIN{IGNORECASE=1}$3=="asia" {$1=$2;$2=$3} $2=="asia"&&$1&&!_[tolower($1)]++{print $1"."$2}END{print "Total",length(_),"Domains"}' sample2.txt
0008.ASIA
anish.asia
Total 2 Domains
<edit>
Note the test for a non-null $1 in the condition of the print stanza. ($2=="asia"&&$1&&!_[tolower($1)]++)
I had to add that to eliminate a .asia line.

(This is what I was trying to write when he cat intervened. See the next post . . .)
</edit>

Last edited by PTrenholme; 09-23-2011 at 07:33 AM.
 
Old 09-22-2011, 06:37 PM   #42
PTrenholme
Senior Member
 
Registered: Dec 2004
Location: Olympia, WA, USA
Distribution: Fedora, (K)Ubuntu
Posts: 4,187

Rep: Reputation: 354Reputation: 354Reputation: 354Reputation: 354
Error post. (My cat decided to play with my keyboard.)
 
Old 09-22-2011, 08:21 PM   #43
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
They are clever like that (mine has locked me out of my computer before )

@OP - PT has you on the right path. Without setting the index of your array to one case or the other you will always
receive all variations. Where you may have been getting confused is the IGNORECASE option I gave you. This affecting
the testing you do in your regex but does not affect the items you assign to the array.
 
Old 09-24-2011, 12:32 PM   #44
anishkumarv
Member
 
Registered: Feb 2010
Location: chennai - India
Distribution: centos
Posts: 294

Original Poster
Rep: Reputation: 10
Hi all,

Thanks a lot to me to solve this..from this thread i learned lot in AWK.

Mainly thanks to GRAIL -dude thanks for your patience to tolerate my bad english.


finally using this command i got the output what i expected

Code:
awk '(i=match($1,/[^.]+\.[Ii][Nn][Ff][Oo]/))&&(d=tolower(substr($1,i,RLENGTH)))&&!a[d]++{print d;tot++}END{print "Total",tot,"Domains"}'
Thanks alot guys :-)
 
Old 09-24-2011, 12:41 PM   #45
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
Well glad you got a working solution, although of course it does not work with any of the data you have provided us but I assume the real data to be very different.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] How to remove duplicate files from Two Folders? guessity Linux - Newbie 19 09-09-2013 01:52 PM
remove duplicate entries from first column?? kadvar Programming 2 05-12-2010 06:22 PM
Script to count # of chars per line (if line meets certain criteria) and get avg #? kmkocot Linux - Newbie 3 09-13-2009 11:05 AM
Perl question: delete line from text file with duplicate match at beginning of line mrealty Programming 7 04-01-2009 06:46 PM
duplicate the line of a text file to the same line powah Programming 4 01-11-2007 08:27 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 01:36 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration