LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   search for list of usernames in syslog quickly (https://www.linuxquestions.org/questions/linux-newbie-8/search-for-list-of-usernames-in-syslog-quickly-908977/)

winairmvs 10-19-2011 10:01 AM

search for list of usernames in syslog quickly
 
Hello, I am trying to find matching usernames from the passwd file in a syslog for dovecot. My current script loops through the passwd file and greps for that username in the syslog file finding the first match then moving to the next username. The passwd file is about 6000 lines long, so you can imagine this is taking forever to complete. I am wondering if there is a way to search for all the usernames in one grep statement, or a more efficient way to do this?

Here is what I have so far:

#!/bin/zsh

base="/usr/local/admin/report"
passwd="/etc/passwd"

userlist=$(cat ${passwd} | cut -d":" -f1)

IFS="
"

echo "" > ${base}/tmpgrep

for user in `echo ${userlist}`
do
grep -m 1 "${user}" /syslog/dovecot/maillog >> ${base}/tmpgrep

done

jthill 10-19-2011 10:40 AM

Quote:

Originally Posted by winairmvs (Post 4502574)
Hello, I am trying to find matching usernames from the passwd file in a syslog for dovecot.

Code:

$ cut -f1 -d: /etc/passwd | grep -F -f- /syslog/dovecot/maillog > $base/tmpgrep

crts 10-19-2011 10:58 AM

Quote:

Originally Posted by jthill (Post 4502611)
Code:

$ cut -f1 -d: /etc/passwd | grep -F -f- /syslog/dovecot/maillog > $base/tmpgrep

Hi,

basically a good idea. But it returns all matches if the pattern is matched multiple times. From the OP's example I take that he wants every name only printed once. A small modification:
Code:

cut -f1 -d: /etc/passwd|xargs -I{} grep -m 1 '{}' /syslog/dovecot/maillog > $base/tmpgrep

winairmvs 10-19-2011 11:10 AM

Quote:

Originally Posted by jthill (Post 4502611)
Code:

$ cut -f1 -d: /etc/passwd | grep -F -f- /syslog/dovecot/maillog > $base/tmpgrep

Cool, I didn't know I could do that with grep. Unfortunately, this grep is matching every line in the maillog file. I was reading the man page for grep and the -F options reads:

Treats each specified pattern as a string instead of a regular expression. A NULL string matches every line.

I am assuming it's getting back a null string and matching everything?

winairmvs 10-19-2011 11:25 AM

Quote:

Originally Posted by crts (Post 4502633)
Hi,

basically a good idea. But it returns all matches if the pattern is matched multiple times. From the OP's example I take that he wants every name only printed once. A small modification:
Code:

cut -f1 -d: /etc/passwd|xargs -I{} grep -m 1 '{}' /syslog/dovecot/maillog > $base/tmpgrep

crts, thanks for the updated code. Unfortunately this script runs very slowly, maybe even slower than how I was originally doing it.

jthill 10-19-2011 12:40 PM

Yes, I missed the -m1 part, my apologies.

Here's an awk-builder:
Code:

$ cut -f1 -d: /etc/passwd| sed 's,.*,/\\<&\\>/ \&\& !saw["&"] { saw["&"]=1; print },' > findthem
$ awk -f findthem /syslog/dovecot/maillog > $base/tmpgrep

I added word-boundary testing (\< and \>) while fixing it up.

I've tested this with 'awk -f findthem /etc/passwd /etc/passwd' and it works and also shows a weakness: some userids are also common words. It prints the root line twice because it matches root and also matches bin. It'd be easy enough to fix it so it prints a line only once no matter how many hits you get, but that won't help with the false matches in the real logs.

I don't have 6000 users. I tried it with apt-cache pkgnames output against the apt logs, awk took a few seconds and a few hundred meg compiling 35410 tests but did the job just fine.

jthill 10-19-2011 12:47 PM

... forgot to include the print-a-line-only-once alternative, haste makes waste, I knew that, really ...
Code:

$ echo '{ printit=0 }' > findthem
$ cut -f1 -d: /etc/passwd| sed 's,.*,/\\<&\\>/ \&\& !saw["&"] { saw["&"]=1; printit=1 },' >> findthem
$ echo 'printit { print }' >> findthem
$ awk -f findthem /syslog/dovecot/maillog > $base/tmpgrep


jthill 10-20-2011 02:51 PM

Here's an actually reasonable solution using GNU grep's --color=always.

Here's firstfind.awk:
Code:

# This awk postprocesses `grep --color=always` output, eliminating duplicate hits
BEGIN{FS="\0"}
{
        n=split($0,f,/\033\[(01;31)?m\033\[K/);
        printit=0
        for (i=2; i<n; i+=2) {
                if (!seen[f[i]]) {
                        printit=1;
                        break;
                }
        }
        if ( printit ) {
                text=f[1]
                for (i=2; i<n; i+=2) {
                        if (!seen[f[i]]) {
                                seen[f[i]]=1;
                                f[i]="\033[01;31m"f[i]"\033[m"
                        }
                        text=text""f[i]""f[i+1]
                }
                print text;
        }
}

and you feed it like so:
Code:

$ cut -f1 -d: /etc/passwd >userids
$ grep -wFf userids --color=always your-logfile-here | awk -f firstfind.awk

This handles scanning /var/log/apt/* for the first hits on apt-cache pkgnames (35410 names) very nicely.


All times are GMT -5. The time now is 06:45 AM.