script to compare users in files

s_linux · 03-28-2011, 10:40 AM

All,
I have two files with user DN's that exported from two different LDAP directories. I wanted to write a script that reads(checks) users (cn=user1) in file A and check to see if users(cn=user1) exists in file B and give me nice output with what users are missing in file B.
I have around 30k users in file A with following format..

Quote:

cn=user1,ou=some,o=org
cn=user2,ou=some,o=org
cn=user3,ou=some,o=org
cn=user4,ou=some,o=org
-
-
-
etc

I have same format in file B.
Anyone have an idea how I can do that with shell script.
Thanks

rtmistler · 03-28-2011, 11:55 AM

See this thread to determine how to write a loop in your script and then channel the line you read from file A into a grep command in file B, use the -c to get a count.

http://www.linuxquestions.org/questi...aratly-364259/

To do something this complex, I'd save the user name from file A and the "found count" from file B into an array defined in your script, and then process that array and create an output file using the entries where the found count is zero on a per-user basis.

A suggestion is to add:

set -x
set -v

Near the top of your script to output to stdout the flow of the script in order to debug it, and then later comment out those lines as you put the script into use.

Further, use functions instead of writing one big, hard to read script. If you aren't familiar with functions in a script, search for some examples, there are plenty.

s_linux · 04-01-2011, 04:11 PM

Thanks.
I know how that for like do something works. But what I need is

for line in filea;do
get the "cn=user1"
and store in a variable
then
check fileb to see that variable exists
if not write the whole dn to filec.

But not sure how I can get the only cn value and store it in a variable and then check to see if that cn value exists in fileb.

I have some users like cn=user one,ou=some,o=org
Thanks again..

s_linux · 04-01-2011, 04:22 PM

Also I just started writing script

Quote:

#!/bin/bash
#set -x

if [ -s filea ];then
for line in $(< filea);do
echo $line
linea=`$line |grep -e ".*CN=([0-9A-Za-z]+),*"`
echo "$linea"
done
fi

I just wanted to whats the value in "line" variable. I'm getting like

Quote:

cn=usera
one,ou=some,ou=some,o=org
cn=userb
one,ou=some,ou=some,o=org

when I run the script I see the "linea" variable as below
cn=usera
cn=userb

but I want to get "cn=usera one"
I'm not sure if the regular expression works but still testing..

Nominal Animal · 04-01-2011, 04:50 PM

Please tell me this is not homework. (No, I'm serious.

)

Quote:

Originally Posted by s_linux

Code:

cn=user1,ou=some,o=org
cn=user2,ou=some,o=org
cn=user3,ou=some,o=org
cn=user4,ou=some,o=org

I'd say awk would be a good match for this.

Code:

awk -v "file1=path-to-file1" -v "file2=path-to-file2" '
    BEGIN {
        RS="[\t\n\v\f\r ]*[\r\n]+[\t\n\v\f\r ]*"
        FS="[\t\v\f ]*[,][\t\v\f ]*"

        # Read first file into list1
        split("", list1)
        while ((getline < file1) > 0)
            for (i = 1; i <= NF; i++)
                if ($i ~ /^[\t\v\f ]*[Cc][Nn][\t\v\f ]*=/) {
                    cn = tolower($i)
                    sub(/^[\t\v\f ]*[Cc][Nn][\t\v\f ]*=[\t\v\f ]*/, "", cn)
                    sub(/[\t\v\f ]*$/, "", cn)
                    list1[cn] = $0
                }

        # Read second file into list2
        split("", list2)
        while ((getline < file2) > 0)
            for (i = 1; i <= NF; i++)
                if ($i ~ /^[\t\v\f ]*[Cc][Nn][\t\v\f ]*=/) {
                    cn = tolower($i)
                    sub(/^[\t\v\f ]*[Cc][Nn][\t\v\f ]*=[\t\v\f ]*/, "", cn)
                    sub(/[\t\v\f ]*$/, "", cn)
                    list2[cn] = $0
                }

        # List all users in list1 that do not exist in list2.
        for (cn in list1)
            if (!(cn in list2))
                printf("%s (%s)\n", cn, list1[cn])
    }'

In the above code, I tell awk that records are separated by newlines, and any leading or trailing whitespace is part of the separator. Fields are separated by commas, again whitespace being a part of the comma.

I used two identical loops to read in the files. They check if one of the fields is the common name field (cn=), and if so, adds the entire record (as a string) into an associative array keyed by the value of the common name in lower case -- I assume you wish the comparison to be case insensitive. (If not, use $i instead of tolower($i).

The sub commands remove the cn= part and any leading and trailing whitespace.

Finally, the script loops over all names in cn1, and outputs the ones that are not listed in cn2.

Note that unlike normal awk scripts, this one has no input files. It would have been pretty natural to read only the second user list in the BEGIN section, and use a normal rule to process each record in the first file; however, I think you'll probably want to do the check the other way too -- list all users that are listed in the second file but not the first -- and you can only do both if you read both into arrays. So I'm anticipating your needs a bit.

I hope this helps, but is not your homework,

grail · 04-02-2011, 03:14 AM

Could we not just use grep or comm?

Code:

#fileA
cn=user1,ou=some,o=org
cn=user2,ou=some,o=org
cn=user3,ou=some,o=org
cn=user4,ou=some,o=org

#fileB
cn=user1,ou=some,o=org
cn=user2,ou=some,o=org
cn=user4,ou=some,o=org

So we want what is in fileA but not in fileB:

Code:

grep -v -f fileB fileA
comm -3 fileA fileB

Both will return the third line in fileA. The nice thing about the grep is that the data does not need to be sorted as it does with comm.

s_linux · 04-05-2011, 10:01 AM

Thanks for your help.
This is NOT home work. I been learning scripting. I'm taking the chances whereever I can write a script in my company. I wrote few sofar. Now I have a requirement that needs to compare couple of files time to time and need to make the changes to the directory based on what we find from comparison.
since I'm also learning, I dont wanna use someone else script. I want to write myself so that I can learn and help others in the future.
so back to my previous post on 04-01-11, 04:22 PM, I'm not sure why the single line splits into 2-3 or may be more line based on the spaces in between.

Quote:

cn=usera
one,ou=some,ou=some,o=org
cn=userb
one,ou=some,ou=some,o=org
cn=userb
one
test,ou=some,ou=some,o=org

but actual data is ..

Quote:

cn=usera one,ou=some,ou=some,o=org
cn=userb one,ou=some,ou=some,o=org
cn=userb one test,ou=some,ou=some,o=org

if I can get in a single line, I can put that in a variable and then using regular expression,I can get the (cn=*) and using that cn value I think I can compare with second file.

Grail - I'm not sure if your solution works bcz dn context is entirely different in two files most of the cases. but will test it when I have similar dn in both files.

grail · 04-05-2011, 10:34 AM

Quote:

I'm not sure why the single line splits into 2-3 or may be more line based on the spaces in between.

This one is easy, it is because your for loop performs word splitting based on the value of IFS, which by default is white space, hence each space passes an individual piece
into your line variable.

I am a little more with the fact that i misread the question that the cn=userX will be what you need to look for but the rest is irrelevant.
So how about:

Code:

egrep -v $(cut -d',' -f1 fileB | sed ':a N;s/\n/|/;ta') fileA

Again this seems to work with the examples I provided previously.