Need help with log files and browser statistics
I am an extremely newbie to Linux. I'm taking a class on it right now but I am hopelessly lost in this project we are doing with log files. Our teacher never said anything within the class on log files so I have been basically looking them up online.
I have to find the number of visitors using MSIE, Firefox broken down by the version. Then I have to get number of people using any other browser. Now I can get it to the field that it's listed in by doing cut -f6 example.log I have tried several other things using grep,tr,cut, uniq, sort, and wc but I still can't get just the count of people by version of the browser. Can anyone please help me? Here is a few lines of the log files we are using: 192.168.28.168 user143 [08/May/2010:09:52:52] "GET /NoAuth/js/scriptaculous/scriptaculous.js?load=effects,controls HTTP/1.1" "http://www.example.com/index.html" "Mozilla/5.0 (Windows; U; Windows NT 5.1; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3 GTB7.0" 192.168.28.168 user147 [08/May/2010:09:52:52] "GET /NoAuth/js/prototype/prototype.js HTTP/1.1" "http://www.example.com/index.html" "Mozilla/5.0 (Windows; U; Windows NT 5.1; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3 GTB7.0" 192.168.28.168 user174 [08/May/2010:09:52:52] "GET /NoAuth/js/ahah.js HTTP/1.1" "http://www.example.com/index.html" "Mozilla/5.0 (Windows; U; Windows NT 5.1; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3 GTB7.0" 192.168.28.168 user82 [08/May/2010:09:52:52] "GET /NoAuth/js/titlebox-state.js HTTP/1.1" "http://www.example.com/index.html" "Mozilla/5.0 (Windows; U; Windows NT 5.1; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3 GTB7.0" 192.168.28.168 user14 [08/May/2010:09:52:52] "GET /NoAuth/css/validation.css HTTP/1.1" "http://www.example.com/index.html" "Mozilla/5.0 (Windows; U; Windows NT 5.1; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3 GTB7.0" |
The idea with anything like this is to cut it down into steps, starting simple, and then make it increasingly complex.
First of all, ignoring MSIE, how would you determine how many people had used Firefox? |
Well I know I would have to get the field to Firefox first and I'm pretty sure I would use something like cut -d I know the single ' ' but I have no clue what I would put in there. All the symbols confuse me even when I am sitting there with the book opened up to it. So I figure there would probably be 2 cut codes in there one that is to the field cut -f6 and the other cut -d. I started with grep example.log | cut -f6 | cut -d but I have no clue what else to put there besides sort and uniq -c.
|
Quote:
Can you figure out either of these? As I said, we will start simple and *then* make things more complex. Always try to break any complex task into simpler steps. |
okay so I used grep Firefox example.log | wc -l and it came back with the number 7871
|
Quote:
Another thing to bear in mind is that there are often many ways to achieve the same result in Linux, due to the strength and versatility of the commands. Ok. Next. Do a simple grep for Firefox but produce only the word Firefox on each line that is produced (i.e. the search term), not the whole line (hint: look at the grep options). |
Okay finally got that one it's grep -o Firefox example.log
|
Quote:
There may well be an easier way to do it, but using regular expressions in grep is a good way of going about this. In English, instead of just looking for "Firefox", you want to look for Firefox, followed by a forward slash /, followed by a group of characters that can be a number or a dot. Grep uses regular expressions by default. However it's a better idea to put double quotes around the search term if things get more complicated than a simple alphanumeric expression (it's not necessary in *this particular* case but it is good practice). So, Code:
grep "Firefox/" All you have to do now is find out how to use a regular expression search for the number/dot group of characters as well. |
|
A simplified cheat sheet of the regexp operators:
http://ryanstutorials.net/linuxtutor...tsheetgrep.php |
got it
code: grep "Firefox/.*" example.log or I can still do the grep -o Firefox/.* example.log now I have to sort it right? Think I have it : grep "Firefox/.*" example.log | sort | uniq -c |
Quote:
The regular expression you have used looks for Firefox, followed by a forward slash, followed by 0 or more characters. So, it will output everything until the end of the line. Not what we wanted. If you want to search for a collection of different characters, but don't care what order they're in, you use square brackets []. For example, to search for any alphabetical character, lower or upper case, you use [a-zA-Z] (i.e. 2 ranges of characters). If you want to search for any numerical digit, you use [0-9] If you want to search for one of a combination of lowercase characters and digits, you use [a-z0-9] You can also have other characters in there e.g. to search for a % or & symbol or the lowercase f, you can use [%&f] (it doesn't matter what order you put these in the square brackets). You can also mix ranges and individual characters in the same square brackets. But these only look for *one* character from within the square brackets. To look for 0 or more, you put a * afterwards. so []* Finally, a dot is one of the special characters in regular expressions, so if you want to look for an actual dot then you must "escape" it by putting a backslash in front of it e.g. \. You have what you need. Can you come up with the grep command to find Firefox, followed by a forward slash /, followed by a group of characters that can be a number or a dot? |
P.S. Excellent job with the
Code:
sort | uniq -c |
your right some of them have the (net clr and numbers after it) but others have Firefox and the version number.
So I did this one instead grep "Firefox/.[0-9.]* example.log | sort | uniq -c So the MSIE would be about the same right? answered my own question I guess this one is harder was that a , after the /? Firefox/, |
Quote:
Code:
grep "Firefox/.[0-9]* example.log To properly develop Linux commands and scripts, you need to be sitting in front of a keyboard and screen, testing what you're developing as you go along to see that it works correctly. Edit: I see that you've now changed the grep command in your previous post. By all means cut and paste the output from that revised command. |
All times are GMT -5. The time now is 02:17 AM. |