LinuxQuestions.org
Help answer threads with 0 replies.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 07-18-2008, 01:41 PM   #1
tekmann33
Member
 
Registered: Nov 2006
Posts: 188

Rep: Reputation: 30
simple bash script editing


This is probably a simple thing, but my experience with bash is limited.

I want to automatically edit a bunch of HTML files when they are generated every month and here is the basic criteria that I need to edit on:

1. For every table row <TR> that has the string "Total Files" in it, delete that entire table row.

2. For every string found that has the line "Hostname", rename it to "Connections".

3. For every string that has "Top * of * Total URLs", replace it with "Databases". (The * are automatically generated numbers)
 
Old 07-18-2008, 01:44 PM   #2
TB0ne
LQ Guru
 
Registered: Jul 2003
Location: Birmingham, Alabama
Distribution: SuSE, RedHat, Slack,CentOS
Posts: 18,336

Rep: Reputation: 3895Reputation: 3895Reputation: 3895Reputation: 3895Reputation: 3895Reputation: 3895Reputation: 3895Reputation: 3895Reputation: 3895Reputation: 3895Reputation: 3895
Quote:
Originally Posted by tekmann33 View Post
This is probably a simple thing, but my experience with bash is limited.

I want to automatically edit a bunch of HTML files when they are generated every month and here is the basic criteria that I need to edit on:

1. For every table row <TR> that has the string "Total Files" in it, delete that entire table row.

2. For every string found that has the line "Hostname", rename it to "Connections".

3. For every string that has "Top * of * Total URLs", replace it with "Databases". (The * are automatically generated numbers)
Yes, it is a simple thing. Look at the man page for sed, it should give you what you need.
 
Old 07-18-2008, 02:03 PM   #3
pixellany
LQ Veteran
 
Registered: Nov 2005
Location: Annapolis, MD
Distribution: Arch/XFCE
Posts: 17,802

Rep: Reputation: 738Reputation: 738Reputation: 738Reputation: 738Reputation: 738Reputation: 738Reputation: 738
Personally, I would not try to learn SED from the man page. Go here for an excellent tutorial: http://www.grymoire.com/Unix/Sed.html

The problem you describe is non-trivial in SED, if the <TR> structure spans multiple lines. (SED works one line at a time.)

When you say "every string that......" , it is ambiguous. You have to be able to define where the string starts and stops.

Look also at "AWK". The Grymoire site has a good tutorial on that also. In addition, you may want to look at "Bash Guide for Beginners" and "The Advanced Bash Scripting Guide"....both free at http://tldp.org
 
Old 07-18-2008, 03:34 PM   #4
matthewg42
Senior Member
 
Registered: Oct 2003
Location: UK
Distribution: Kubuntu 12.10 (using awesome wm though)
Posts: 3,530

Rep: Reputation: 63
Processing HTML with sed is a bit tricky, because you can't really know how it will be formatted. sed is line-oriented. If you can be sure that your HTML will be something like this:

Code:
...
<tr><td>something</td><td>else</td></tr>
<tr><td>Total files</td><td>17</td></tr>
...
...then it's pretty easy - you can just drop the lines which start with <tr> and have "Total files" on them.

However, consider this:
Code:
...
<tr>
  <td>something</td>
  <td>else</td>
</tr>
<tr>
  <td>Total files</td>
  <td>17</td>
</tr>
...
It's the same thing as far as HTML is concerned, but using sed won't cut it, as no single line can be filtered out.

For sure you can write a program with awk or perl which can do it, but it's annoyingly tricky for something so apparently simple. If you can remove the data before it is turned into HTML, it would probably be easier and more robust.

If not, you might want to consider using some HTML parsing library.
 
Old 07-18-2008, 08:22 PM   #5
makyo
Member
 
Registered: Aug 2006
Location: Saint Paul, MN, USA
Distribution: {Free,Open}BSD, CentOS, Debian, Fedora, Solaris, SuSE
Posts: 728

Rep: Reputation: 74
Hi.

Here is one way to delete a table row:
Code:
#!/usr/bin/perl

# @(#) p1       Demonstrate delete of line-spanning HTML table row.

use warnings;
use strict;

my ($debug);
$debug = 0;
$debug = 1;

my ($entire) = slurp();

$entire =~ s|<tr>\s*<td>Total files</td>.*?</tr>||ms;
print $entire;

sub slurp {

  # Best practices, p213 for a file.
  my $scalar = do { local $/; <> };
  return $scalar;
}

exit(0);
Driving this with a short shell script:
Code:
#!/bin/bash -

# @(#) s1       Demonstrate match across lines.

echo
echo "(Versions displayed with local utility \"version\")"
version >/dev/null 2>&1 && version =o $(_eat $0 $1) perl tidy
set -o nounset
echo

FILE=${1-data1}

echo " Data file $FILE:"
cat $FILE

echo
echo " Results:"
./p1 $FILE |
tidy -i -q

exit 0
To produce:
Code:
% ./s1

(Versions displayed with local utility "version")
Linux 2.6.11-x1
GNU bash 2.05b.0
perl 5.8.4
HTML Tidy for Linux/x86 released on 1st August 2004

 Data file data1:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 3.2//EN">
<html>
<head>
<meta name="generator" content=
"HTML Tidy for Linux/x86 (vers 1st August 2004), see www.w3.org">
<title>Stuff</title>
</head>
<body>
<table summary = "This is what is in this table">
<tr>
<td>something</td>
<td>else</td>
</tr>
<tr>
<td>Total files</td>
<td>17</td>
</tr>
</table>
</body>
</html>

 Results:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN">

<html>
<head>
  <meta name="generator" content=
  "HTML Tidy for Linux/x86 (vers 1st August 2004), see www.w3.org">

  <title>Stuff</title>
</head>

<body>
  <table summary="This is what is in this table">
    <tr>
      <td>something</td>

      <td>else</td>
    </tr>
  </table>
</body>
</html>
The entire file is read into a scalar, then the specific row is deleted, and whatever remains is written out.

The HTML was cleaned up on output with tidy.

See appropriate man pages for details ... cheers, makyo
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Bash script for editing contents of one file according to other kkpal Linux - Newbie 8 02-26-2008 04:49 AM
Simple Bash Script sachinh Linux - General 5 05-23-2007 10:54 PM
need help regarding my very simple bash script hottdogg Programming 3 05-02-2007 03:09 AM
Simple Bash Script Help njdownes Programming 2 03-05-2005 08:35 AM
file editing in a bash script Harpune Programming 4 11-23-2002 12:35 AM


All times are GMT -5. The time now is 08:52 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration