Formatting Fields and Text Being Displayed from Text File

devUnix · 02-09-2011, 07:04 AM

I want to display the contents of a particular log file (simple text file, I mean in Linux). But there is a problem: The contents need to be organized in a fixed format. Have a look at this log file:

sampleLog.txt

Code:

User Name: XYZ
Reported Problems Description: Blah! Blah! Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!
Remarks: None

So, while displaying the contents of above file on a web page, I want to format the field names found in the log file: User Name:, Reported Problems Description:, and Remarks:. These fields may contain a variable length of text and no specific line number is assumed for them to appear on.

Any ideas?

The desired output should look like this:

User Name: XYZ
Reported Problems Description: Blah! Blah! Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!
Remarks: None

Well, what I am trying to do may sound wierd to some of you. The filed "Reported Problems Description:" can possible contain text which embeds colon (

.

grail · 02-09-2011, 08:07 AM

You might want to explain what formatting you wish to perform as I do not see a difference apart from the bold text?

MTK358 · 02-09-2011, 08:32 AM

Code:

sed -i -e 's#\&#\&amp;#g' -e 's#>#\&gt;#g' -e 's#<#\&lt;#g' -r -e 's#^([^:]+:)(.*)$#<b>\1</b>\2#' -e 's#.*#&<br />#' logfile

Will turn it into valid HTML and make the part before the first ":" in the line bold.

devUnix · 02-09-2011, 08:33 AM

Quote:

Originally Posted by grail

You might want to explain what formatting you wish to perform as I do not see a difference apart from the bold text?

Yes, that "bold" formatting is exactly what I want to do (as of now).

devUnix · 02-09-2011, 08:56 AM

Quote:

Originally Posted by MTK358

Code:

sed -i -e 's#\&#\&amp;#g' -e 's#>#\&gt;#g' -e 's#<#\&lt;#g' -r -e 's#^([^:]+:)(.*)$#<b>\1</b>\2#' -e 's#.*#&<br />#' logfile

Will turn it into valid HTML and make the part before the first ":" in the line bold.

That is good. But it makes changes in the source log file itself. Of course, that is what "-i" is doing. But there is a problem. When I display the contents of the log file on a page, some extra tags are being displayed as they are.

For example: <User Name

Of course, your devised sed script does turn the field names in bold face and the other pieces of the text remain unaffected when displayed on a web page. But, as I said earlier, some extra tags are also being displayed along. I have closed to web page otherwise would show you the exact output but it is similar to what I have given above in the example.

MTK358 · 02-09-2011, 09:02 AM

Did you but <html> tags around the output and save it with a .html extension?

It works great for me:

Code:

<b>User Name:</b> XYZ<br />
<b>Reported Problems Description:</b> Blah! Blah! Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!<br />
<b>Remarks:</b> None<br />

crts · 02-09-2011, 09:09 AM

Quote:

some extra tags are being displayed as they are.

For example: <User Name

Did you run the script twice on the same file? Omit the -i option and redirect it instead:

Code:

sed -e .... logfile > logfile.html

Do this with an unmodified copy of logfile. Firefox displays it even without the wrapping '<html>' tags.

devUnix · 02-09-2011, 11:04 AM

Quote:

Originally Posted by MTK358

Did you but <html> tags around the output and save it with a .html extension?

Sed Script as given:

Code:

-bash-2.05b# cat test.sh
#!/bin/bash
sed -i -e 's#\&#\&amp;#g' -e 's#>#\&gt;#g' -e 's#<#\&lt;#g' -r -e 's#^([^:]+)(:.*)$#<b>\1</b>\2#' 09022011-IT.log
-bash-2.05b#

The Log file after executing the Sed Script has processed it:

-bash-2.05b# cat 09022011-IT.log
&lt;b&gt;User ID&lt;/b&gt;: XYZ
&lt;b&gt;Group&lt;/b&gt;: IT
&lt;b&gt;Shift Time&lt;/b&gt;: First
&lt;b&gt;Problems Reported &lt;/b&gt;:
None
-bash-2.05b#[/code]

PHP Script I have written:

Code:

-bash-2.05b# cat test.php
<?php
$FH=fopen("09022011-RA.log","r") or die("Error");
$result = fread($FH,1024);
fclose($FH);
echo $result;
?>

Output in IE (Web Browser):

Code:

<b>&lt;b&gt;User ID&lt;/b&gt;</b>: XYZ <b>&lt;b&gt;Group&lt;/b&gt;</b>: IT <b>&lt;b&gt;Shift Time&lt;/b&gt;</b>: First <b>&lt;b&gt;Problems Reported&lt;/b&gt;</b>: None

Nominal Animal · 02-09-2011, 01:01 PM

Since you use PHP, try the following.

This reads all /path/to/it-logs/DDMMYYYY-IT.log files (in their raw form unprocessed by any scripts), sorts them latest first, then outputs them each in a separate table:

Code:

<html>
 <head>
  <title>
   Example problem report
  </title>
  <style type="text/css">

    table.report {
        width: 40em !important;
        padding: 0 0 0 0;
        border: 1px solid #cccccc;
        margin: 0 0 2em 0;
        border-collapse: collapse;
        border-spacing: 0;
    }

    table.report td {
        padding: 0.5em 0.5em 0.5em 0.5em;
        border: 0 none;
        margin: 0 0 0 0;
        text-align: left;
        vertical-align: top;
        font-weight: normal;
    }

    table.report th {
        padding: 0.5em 0.5em 0.5em 0.5em;
        border: 0 none;
        margin: 0 0 0 0;
        text-align: right;
        vertical-align: top;
        font-weight: bold;
    }

    table.report th.title {
        padding: 0.5em 0.5em 0.5em 0.5em;
        border-top: 0 none;
        border-right: 0 none;
        border-bottom: 1px solid #cccccc;
        border-left: 0 none;
        background: #efefef;
        text-align: center;
        vertical-align: middle;
        font-weight: bold;
    }

  </style>
 </head>
 <body>
  <?PHP

    $files = glob('/path/to/it-logs/*-IT.log', GLOB_NOSORT);
    if ($files !== FALSE) {

        $temp = $files;
        $files = array();
        foreach ($temp as $logfile) {
            $index = preg_replace('/^.*\/([0-3][0-9][0-1][0-9][0-9][0-9][0-9][0-9]).*$/', '$1', $logfile);
            $time = mktime(0,0,0, intval(substr($index, 2, 2), 10),
                                  intval(substr($index, 0, 2), 10),
                                  intval(substr($index,4,4), 10));
            $files[$time] = $logfile;
        }
        krsort($files);

        foreach ($files as $time => $logfile) {

            $data = @file($logfile, FILE_SKIP_EMPTY_LINES);
            if ($data !== FALSE) {
                $title = date('D, j M Y', $time);
                echo " <table class=\"report\">\n";
                echo " <tr>\n";
                echo " <th class=\"title\" colspan=\"2\">", htmlentities($title, ENT_QUOTES, 'UTF-8'), "</th>\n";
                echo " </tr>\n";
                foreach ($data as $entry) {
                    @list($key, $value) = @explode(':', $entry, 2);
                    echo " <tr>\n";
                    echo " <th>", htmlentities(trim($key), ENT_COMPAT, 'UTF-8'), "</th>\n";
                    echo " <td>", htmlentities(trim($value), ENT_COMPAT, 'UTF-8'), "</td>\n";
                    echo " </tr>\n";
                }
                echo " </table>\n";
            }
        }
    }
?>
 </body>
</html>

glob() searches all file names.
The first foreach loop parses the file name, /path/to/it-logs/DDMMYYYY-IT.log, creates an unix timestamp (seconds since the epoch) based on the date, and adds the filename into a new array keyed by the timestamps.
krsort() sorts the file names based on the timestamps, latest first.
The second foreach loop processes each log file. The file() function reads the file as an array of lines.
The inner foreach loop processes each line from the current log file. The explode() function splits the line into two parts at the first colon (':').
The htmlentities() function is used to display the strings correctly in HTML. Since you're using Linux, your log files are most likely UTF-8, so I used that.
The rest is just prettyprinting. trim() is used to trim out leading and trailing whitespace; it is invisible in the end result, but the output HTML looks prettier.

You'll probably notice that since the file name array is keyed by the Unix timestamp, you can easily add query parameters limiting the output to desired dates. I recommend using the strtotime() function to parse any query start and end date parameters.

Hope this helps,

Nominal Animal

MTK358 · 02-09-2011, 01:19 PM

@devUnix

Could you post part of the actual log file?

devUnix · 02-10-2011, 02:29 PM

Quote:

Originally Posted by MTK358

@devUnix

Could you post part of the actual log file?

Yeah, sure. Here it is:

cat 09022011-IT.log (Original Log File)

User ID: XYZ
Group: IT
Shift Time: First
Problems Reported: All the servers are down. The company is going to lose its business. We are helpless and cannot do anything. The company should have hired more intelligent engineers.

devUnix · 02-10-2011, 02:41 PM

Quote:

Originally Posted by Nominal Animal

Hope this helps,

Nominal Animal

I appreciate your effort and willingness to come up with solutions. I will check out your devised script and will let you know how it works for me.

The main question / problem is: How to extract pieces of data / information from a text file which has varied field types. The good news is we know the field names or can identify them so that we can provide an enhanced view of the data/information to the user. In the present example, I have not used any text file which would contain fields delimited by some character or whitespace.

If there is a better way of organizing these items in a text file, please, do let me know. I am simply putting each item on a separate line but some of the items can consist of more than one line.

MTK358 · 02-10-2011, 02:59 PM

Quote:

Originally Posted by devUnix

I am simply putting each item on a separate line but some of the items can consist of more than one line.

You mean in the actual log file?

Then post a sample, because my solution relies on the idea that there's one "name: value" statement per line.

devUnix · 02-10-2011, 03:09 PM

Quote:

Originally Posted by Nominal Animal

Since you use PHP, try the following.

This reads all /path/to/it-logs/DDMMYYYY-IT.log files (in their raw form unprocessed by any scripts), sorts them latest first, then outputs them each in a separate table:

Code:

<html>
 <head>
  <title>
   Example problem report
  </title>
  <style type="text/css">
   table.report {
    width: 40em !important;
    padding: 0 0 0 0;
    border: 1px solid #cccccc;
    margin: 0 0 2em 0;
    border-collapse: collapse;
    border-spacing: 0;
   }
   table.report td {
    padding: 0.5em 0.5em 0.5em 0.5em;
    border: 0 none;
    margin: 0 0 0 0;
    text-align: left;
    vertical-align: top;
    font-weight: normal;
   }
   table.report th {
    padding: 0.5em 0.5em 0.5em 0.5em;
    border: 0 none;
    margin: 0 0 0 0;
    text-align: right;
    vertical-align: top;
    font-weight: bold;
   }
   table.report th.title {
    padding: 0.5em 0.5em 0.5em 0.5em;
    border-top: 0 none;
    border-right: 0 none;
    border-bottom: 1px solid #cccccc;
    border-left: 0 none;
    background: #efefef;
    text-align: center;
    vertical-align: middle;
    font-weight: bold;
   }
  </style>
 </head>
 <body>
<?PHP

   $files = glob('/path/to/it-logs/*-IT.log', GLOB_NOSORT);
   if ($files !== FALSE) {

       $temp  = $files;
       $files = array();
       foreach ($temp as $logfile) {
           $index = preg_replace('/^.*\/([0-3][0-9][0-1][0-9][0-9][0-9][0-9][0-9]).*$/', '$1', $logfile);
           $time = mktime(0,0,0, intval(substr($index, 2, 2), 10), intval(substr($index, 0, 2), 10), intval(substr($index,4,4), 10));
           $files[$time] = $logfile;
       }
       krsort($files);

       foreach ($files as $time => $logfile) {

           $data = @file($logfile, FILE_SKIP_EMPTY_LINES);
           if ($data !== FALSE) {
               $title = date('D, j M Y', $time);
               echo "  <table class=\"report\">\n";
               echo "   <tr>\n";
               echo "    <th class=\"title\" colspan=\"2\">", htmlentities($title, ENT_QUOTES, 'UTF-8'), "</th>\n";
               echo "   </tr>\n";
               foreach ($data as $entry) {
                   @list($key, $value) = @explode(':', $entry, 2);
                   echo "   <tr>\n";
                   echo "    <th>", htmlentities(trim($key), ENT_COMPAT, 'UTF-8'), "</th>\n";
                   echo "    <td>", htmlentities(trim($value), ENT_COMPAT, 'UTF-8'), "</td>\n";
                   echo "   </tr>\n";
               }
               echo "  </table>\n";
           }
       }
   }
?>
 </body>
</html>

glob() searches all file names.
The first foreach loop parses the file name, /path/to/it-logs/DDMMYYYY-IT.log, creates an unix timestamp (seconds since the epoch) based on the date, and adds the filename into a new array keyed by the timestamps.
krsort() sorts the file names based on the timestamps, latest first.
The second foreach loop processes each log file. The file() function reads the file as an array of lines.
The inner foreach loop processes each line from the current log file. The explode() function splits the line into two parts at the first colon (':').
The htmlentities() function is used to display the strings correctly in HTML. Since you're using Linux, your log files are most likely UTF-8, so I used that.
The rest is just prettyprinting. trim() is used to trim out leading and trailing whitespace; it is invisible in the end result, but the output HTML looks prettier.

You'll probably notice that since the file name array is keyed by the Unix timestamp, you can easily add query parameters limiting the output to desired dates. I recommend using the strtotime() function to parse any query start and end date parameters.

Hope this helps,

Nominal Animal

Okay, it works and doesn't work as well. For example, there are 5 log files and it is displaying only 3 of them. Secondly, sometimes the value is being displayed in the left column along with the field name:

Reported Problems

Nothing works fine in this company. Blah sdjfkhsd kfhsd fh sdjkfhs djkfh sdjkf sd fsh dfkjh sdkfj sdjkh fjkd sfjk sdkf sd fsd fsdfNone

that way both the field and the value are in bold face. For fields whose values are not longer than one line, are being displayed properly.

Well, you have done very well. I must say. You have also provided a good layout for the report. Thank you!

devUnix · 02-10-2011, 03:14 PM

Quote:

Originally Posted by MTK358

You mean in the actual log file?

Then post a sample, because my solution relies on the idea that there's one "name: value" statement per line.

Yes, the fields can have more than a line of text. In fact, hundreds of lines may go under, for example, "Problems Reported:".

cat 09022011-IT.log (Original Log File)

User ID: XYZ
Group: IT
Shift Time: First
Problems Reported: All the servers are down. The company is going to lose its business. We are helpless and cannot do anything. The company should have hired more intelligent engineers.

Above, I have given only 4 fields. There are actually more of them. But that would not make any difference because if we can work with those few fields then we can work with any number of fields.