ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
hi, I'm trying to implement a script for our IT dept. to retrieve the status of the main servers in diff. dept. and notify (email/pager...etc) if there's trouble.
I use wget to retreive the file since it's a HTML file off of a web server. The parsing needs to be done in Linux Bash shell script. This is where I'm getting puzzled since it needs to look for a combination of keywords and passes the results onto another subroutine for processing(send email/pager...etc). Was wondering if anyone has idea on how to properly parse this requirements :
The keyword I need to look for is the word "alert". Once the script finds it then it needs to select out the {Department} and then email/page people. I know I can use a combo of sed and awk to get the keyword "alert" but how do I then traverse and pull out the {Department} which is 2 lines above the line for the {Status} ?
The problem is that if it finds the word "alert", it needs to "go back up" two lines, skipping the {When} line(We don't care when it happened), to get to the line where the {Department} is. Not sure how I should do this.
Also each <TR> table row's data cells have alternate BGCOLOR as you can see, so the bgcolor="red" tag appears in every other row of data. That adds to the complexity of my parsing. Any idea the right way I should do this ?
thanks for the advice. Unfortunately I have no control over how I get the input file. The monitoring
is done by a partner so I can only grab whatever output the web server sends out. So I still
need to figure out how to parse from the keyword "alert" back to the {Department}.....
Take the "marketing" dept. for example,
I can grep the word "alert" just fine from the line :
<td bgcolor="red"><a class="n1" bgcolor="red">alert</a></td>
(Let's call this LINE 3)
However, now that I know that a department is in trouble, how do I
know which department it is ? so now I must retrieve the word
"marketing" from the line :
<td bgcolor="red"><a class="n1" bgcolor="red">marketing</a></td>
(Let's call this LINE 1)
that's not a problem either. I can use a combination of SED and AWK
to cut out the word "marketing".
However, since this is the sequence of the lines from the HTML file :
LINE 1 : <td bgcolor="red"><a class="n1" bgcolor="red">marketing</a></td>
LINE 2 : <td bgcolor="red"><a class="n1" bgcolor="red">today</a></td>
LINE 3 : <td bgcolor="red"><a class="n1" bgcolor="red">alert</a></td>
the processing puts the Bash shell script at LINE 3 when it finds the word
"alert", so now there's no way for the script to look back at LINE 1 and
tell me that it's department "marketing" that's in trouble !!
That's what I'm trying to figure out.......
I'm not familiar with lynx, but I assume what david is suggesting is this:
1. You receive the markup code from wherever
2. You run lynx by pointing it to the local copy of the html you received
3. Lynx processes the document, and spits out text on the console as an interpretation of the file
4. Now that the html is presented as a processed web page, you would use grep and awk on the visual output of lynx to find your info since the table would be spat out on a single row/line of input.
I may be interpreting that incorrectly, but thought I would mention it.
Second, another option would be to use grep. Look at the "-B" option for grep. It will give you X lines of context before the matched text. So you could do something like:
$ grep -B 2 "alert" your_html_file
The output would look something like this:
<td bgcolor="red"><a class="n1" bgcolor="red">marketing</a></td>
<td bgcolor="red"><a class="n1" bgcolor="red">today</a></td>
<td bgcolor="red"><a class="n1" bgcolor="red">alert</a></td>
--
<td bgcolor="blue"><a class="n1">shipping</a></td>
<td bgcolor="blue"><a class="n1">today</a></td>
<td bgcolor="blue"><a class="n1">alert</a></td>
--
You could then do any number of other things. You could try to use sed/awk by themselves to parse every fourth line, you could pipe the data into wc (to count the lines of output), then use combinations of the head and tail commands to mask off everything but one line to process at a time, or whatever works.
Last edited by Dark_Helmet; 06-08-2004 at 07:18 PM.
That would give you exactly what you want. A list of each department that received an alert, one on each line.
The -B option to the first grep gives you two lines of previous context from each alert you have in the file.
The second grep takes that output, and filters out every line that does not include the name of a recognized department. This command will get lengthy if you have many, many different departments. This will reduce the output to the html lines that contain departments that received an alert.
The first cut removes the html markup on the line all the way to the beginning of the department name's text
The second cut removes the rest of the markup following the department's name.
---
Of course, you could substitute your own sed/awk stuff instead of the two cuts. Using cut is simple, but it does not easily lend itself to changing formats of line input. For instance, if anyone decided to make the department name bold, change the font size, or whatever, the cut commands would return useless data (the name of the new tag inserted).
oh now I see what David might have meant.
in that case I might try lynx since it takes away the problem of
analyzing the ever-changing HTML tags that could come out
of the web status server.
but thanks for the detail idea on the -B and the cut command.
I haven't used cut all that much. I will try your way and see
thanks a lot !!
Originally posted by fnd oh now I see what David might have meant.
in that case I might try lynx since it takes away the problem of
analyzing the ever-changing HTML tags that could come out
of the web status server.
Thats right although it doesn't just strip out the tags it reads them and formats the document accordingly so there will be one line for each department (since they all appear in one table row in the html code)
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.