[SOLVED] bash script to dynamically edit an html file
ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Hey all, I'm having a bit of a problem with a script I'm trying to write. I'll try to give as many details as possible without overwhelming anyone with huge code blocks....
Essentially what I want the script to do is edit an html file based off of the contents of that same file. I'll give an example. FYI, for any of you that are familiar with the html pages that nessus creates, this should look familiar.
I've been (no pun intended) bashing my head against this for several days now and I haven't had any real luck. I can get the basics, sed replaces and sed appends, but the compare and replace is killing me.
Until now, I had been working on the premise of stripping the ip and hostname from stanza 2 and putting them in a separate file (let's call it hostnames.txt), and then running some sort of nested loop that would compare the ip in the first line of hostnames.txt with each line of the nessus.html file. If it found a match, it would attempt a sed replace and then an append based on what those lines look like. I assumed that no other lines would match, and therefore no replace or append would take place until it found the appropriate line. Unfortunately, the nested while loops didn't work, so I've tried rewriting the script multiple times in different ways but nothing is working for me.
I'm relatively new to any bash script longer than 15 lines or so, so I would appreciate any "pointing in the right direction" that anyone can offer.
Hey all, I'm having a bit of a problem with a script I'm trying to write. I'll try to give as many details as possible without overwhelming anyone with huge code blocks....
Essentially what I want the script to do is edit an html file based off of the contents of that same file. I'll give an example. FYI, for any of you that are familiar with the html pages that nessus creates, this should look familiar.
I've been (no pun intended) bashing my head against this for several days now and I haven't had any real luck. I can get the basics, sed replaces and sed appends, but the compare and replace is killing me.
Until now, I had been working on the premise of stripping the ip and hostname from stanza 2 and putting them in a separate file (let's call it hostnames.txt), and then running some sort of nested loop that would compare the ip in the first line of hostnames.txt with each line of the nessus.html file. If it found a match, it would attempt a sed replace and then an append based on what those lines look like. I assumed that no other lines would match, and therefore no replace or append would take place until it found the appropriate line. Unfortunately, the nested while loops didn't work, so I've tried rewriting the script multiple times in different ways but nothing is working for me.
I'm relatively new to any bash script longer than 15 lines or so, so I would appreciate any "pointing in the right direction" that anyone can offer.
Thanks!
Mike
And why are you trying to do this in 'bash' in the first place ?
The book method is:
parse (i.e. convert into a data structure);
modify the data structure;
reconstitute from the modified data structure.
For example, Perl has had HTML parser modules for years, so using an HTML parser in Perl you can do the job.
I was doing this in bash for a couple reasons. 1. as little as I know bash, I know perl even less. I can usually "Forest Gump" my way through a bash script, but I know absolutely zero perl, and for the purposes of this project, I don't think I have time to learn it. And 2. None of the other guys who maintain our systems are perl-savvy either, so if the script needed maintenance, it could conceivably be a hassle.
That said, when someone mentions an 'html parser', I tend to think of a piece of code that is 'html-aware'. i.e. it knows what opening and closing tags are, it knows how to pull hyperlinks out if asked, etc. I was treating this as just text parsing. Some of the text happens to be <'s and >'s, but it's all just text, right?
...
That said, when someone mentions an 'html parser', I tend to think of a piece of code that is 'html-aware'. i.e. it knows what opening and closing tags are, it knows how to pull hyperlinks out if asked, etc. I was treating this as just text parsing. Some of the text happens to be <'s and >'s, but it's all just text, right?
Am I thinking about this incorrectly?
Yes, this is what an HTML parser is. I.e. it recognizes HTML constructs as they are defined in the standard.
Last edited by Sergei Steshenko; 04-14-2010 at 01:40 PM.
In reality, I only need to search the file for two types of lines. One to strip out the hostnames and ip's, and one to search for lines to replace/append. The first search is done and tested. It's the second search that's causing me problems when I try to loop it.
So to my understanding, while this is an html file that I'm parsing, it really has very little to do with html itself and more to do with parsing strings. And bash should be more than capable, yes?
So, since I'm not necessarily concerned with HTML constructs, I should be able to do this with a few well placed greps and seds, right?
...
Wrong. HTML is not line-oriented format. I.e. what one day is located on one line can another day be spread on several lines, but the meaning will and thus the way the original HTML page is rendered will stay the same.
So eventually, I'll rewrite this script into a language that makes more sense (probably python as that's the direction my shop is taking).
But for now... Can anyone assist me in doing this in bash? Let's assume for the sake of this argument that the html won't change from what I've posted in this thread.
Wrong. HTML is not line-oriented format. I.e. what one day is located on one line can another day be spread on several lines, but the meaning will and thus the way the original HTML page is rendered will stay the same.
Although perl/php is preferred; it's not impossible to make HTML pages "dynamic" with any shell...
I have many pages written dynamically in ksh for our sites...
Hey all, I'm having a bit of a problem with a script I'm trying to write. I'll try to give as many details as possible without overwhelming anyone with huge code blocks....
Essentially what I want the script to do is edit an html file based off of the contents of that same file. I'll give an example. FYI, for any of you that are familiar with the html pages that nessus creates, this should look familiar.
I've been (no pun intended) bashing my head against this for several days now and I haven't had any real luck. I can get the basics, sed replaces and sed appends, but the compare and replace is killing me.
Until now, I had been working on the premise of stripping the ip and hostname from stanza 2 and putting them in a separate file (let's call it hostnames.txt), and then running some sort of nested loop that would compare the ip in the first line of hostnames.txt with each line of the nessus.html file. If it found a match, it would attempt a sed replace and then an append based on what those lines look like. I assumed that no other lines would match, and therefore no replace or append would take place until it found the appropriate line. Unfortunately, the nested while loops didn't work, so I've tried rewriting the script multiple times in different ways but nothing is working for me.
I'm relatively new to any bash script longer than 15 lines or so, so I would appreciate any "pointing in the right direction" that anyone can offer.
Thanks!
Mike
What have you written so far? Can you post your code?
I agree wholeheartedly grail. Unfortunately, that's exactly the problem I'm having. I need the script to look through each line of File 1 until it finds a line like:
and then strip out the ip address, compare it to the hostnames.txt file and pull out the hostname that corresponds to that ip (they'd be on the same line). Then the script would need to do an append of the second line depending on what hostname it found.
Should I be looking at awk for that functionality?
And the change from 60 to 30 isn't an issue for the purposes of this thread. I can do that with sed pretty quickly. For the sake of non-complexity, we can say that I'd like the end result to go from this;
Wow. Thanks grail. I haven't tried this out yet, as I may need some help figuring out where my filenames go. Do I just replace "host" and "html" at the end of the script with my hostnames.txt and nessus.html file, respectively? If so, does this script just output to stdout? That's fine if it does, I can always redirect, I just want to understand what this script is doing.
Google tells me what gsub and gensub are, but what is "pc" and "match"? Are those just variables? Or I guess in this case, maybe arrays?
Again, I'm just trying to understand what it is that the script does. I'd hate to walk away from this with a cool awk one-liner, but no knowledge of how it works.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.