LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (http://www.linuxquestions.org/questions/programming-9/)
-   -   Formatting Fields and Text Being Displayed from Text File (http://www.linuxquestions.org/questions/programming-9/formatting-fields-and-text-being-displayed-from-text-file-861602/)

devUnix 02-09-2011 07:04 AM

Formatting Fields and Text Being Displayed from Text File
 
I want to display the contents of a particular log file (simple text file, I mean in Linux). But there is a problem: The contents need to be organized in a fixed format. Have a look at this log file:

sampleLog.txt

Code:

User Name: XYZ
Reported Problems Description: Blah! Blah! Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!
Remarks: None

So, while displaying the contents of above file on a web page, I want to format the field names found in the log file: User Name:, Reported Problems Description:, and Remarks:. These fields may contain a variable length of text and no specific line number is assumed for them to appear on.

Any ideas?

The desired output should look like this:

User Name: XYZ
Reported Problems Description: Blah! Blah! Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!
Remarks: None


Well, what I am trying to do may sound wierd to some of you. The filed "Reported Problems Description:" can possible contain text which embeds colon (:).

grail 02-09-2011 08:07 AM

You might want to explain what formatting you wish to perform as I do not see a difference apart from the bold text?

MTK358 02-09-2011 08:32 AM

Code:

sed -i -e 's#\&#\&amp;#g' -e 's#>#\&gt;#g' -e 's#<#\&lt;#g' -r -e 's#^([^:]+:)(.*)$#<b>\1</b>\2#' -e 's#.*#&<br />#' logfile
Will turn it into valid HTML and make the part before the first ":" in the line bold.

devUnix 02-09-2011 08:33 AM

Quote:

Originally Posted by grail (Post 4253036)
You might want to explain what formatting you wish to perform as I do not see a difference apart from the bold text?

Yes, that "bold" formatting is exactly what I want to do (as of now).

devUnix 02-09-2011 08:56 AM

Quote:

Originally Posted by MTK358 (Post 4253060)
Code:

sed -i -e 's#\&#\&amp;#g' -e 's#>#\&gt;#g' -e 's#<#\&lt;#g' -r -e 's#^([^:]+:)(.*)$#<b>\1</b>\2#' -e 's#.*#&<br />#' logfile
Will turn it into valid HTML and make the part before the first ":" in the line bold.


That is good. But it makes changes in the source log file itself. Of course, that is what "-i" is doing. But there is a problem. When I display the contents of the log file on a page, some extra tags are being displayed as they are.

For example: <b>&lt;User Name

Of course, your devised sed script does turn the field names in bold face and the other pieces of the text remain unaffected when displayed on a web page. But, as I said earlier, some extra tags are also being displayed along. I have closed to web page otherwise would show you the exact output but it is similar to what I have given above in the example.

MTK358 02-09-2011 09:02 AM

Did you but <html> tags around the output and save it with a .html extension?

It works great for me:

Code:

<b>User Name:</b> XYZ<br />
<b>Reported Problems Description:</b> Blah! Blah! Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!<br />
<b>Remarks:</b> None<br />


crts 02-09-2011 09:09 AM

Quote:

some extra tags are being displayed as they are.

For example: <b>&lt;User Name
Did you run the script twice on the same file? Omit the -i option and redirect it instead:
Code:

sed -e .... logfile > logfile.html
Do this with an unmodified copy of logfile. Firefox displays it even without the wrapping '<html>' tags.

devUnix 02-09-2011 11:04 AM

Quote:

Originally Posted by MTK358 (Post 4253100)
Did you but <html> tags around the output and save it with a .html extension?


Sed Script as given:

Code:

-bash-2.05b# cat test.sh
#!/bin/bash
sed -i -e 's#\&#\&amp;#g' -e 's#>#\&gt;#g' -e 's#<#\&lt;#g' -r -e 's#^([^:]+)(:.*)$#<b>\1</b>\2#' 09022011-IT.log
-bash-2.05b#


The Log file after executing the Sed Script has processed it:

-bash-2.05b# cat 09022011-IT.log
<b>&lt;b&gt;&amp;lt;b&amp;gt;User ID&amp;lt;/b&amp;gt;&lt;/b&gt;</b>: XYZ
<b>&lt;b&gt;&amp;lt;b&amp;gt;Group&amp;lt;/b&amp;gt;&lt;/b&gt;</b>: IT
<b>&lt;b&gt;&amp;lt;b&amp;gt;Shift Time&amp;lt;/b&amp;gt;&lt;/b&gt;</b>: First
<b>&lt;b&gt;&amp;lt;b&amp;gt;Problems Reported &amp;lt;/b&amp;gt;&lt;/b&gt;</b>:
None
-bash-2.05b#[/code]



PHP Script I have written:

Code:

-bash-2.05b# cat test.php
<?php
$FH=fopen("09022011-RA.log","r") or die("Error");
$result = fread($FH,1024);
fclose($FH);
echo $result;
?>

Output in IE (Web Browser):

Code:

<b>&lt;b&gt;User ID&lt;/b&gt;</b>: XYZ <b>&lt;b&gt;Group&lt;/b&gt;</b>: IT <b>&lt;b&gt;Shift Time&lt;/b&gt;</b>: First <b>&lt;b&gt;Problems Reported&lt;/b&gt;</b>: None

Nominal Animal 02-09-2011 01:01 PM

Since you use PHP, try the following.

This reads all /path/to/it-logs/DDMMYYYY-IT.log files (in their raw form unprocessed by any scripts), sorts them latest first, then outputs them each in a separate table:
Code:

<html>
 <head>
  <title>
  Example problem report
  </title>
  <style type="text/css">

    table.report {
        width: 40em !important;
        padding: 0 0 0 0;
        border: 1px solid #cccccc;
        margin: 0 0 2em 0;
        border-collapse: collapse;
        border-spacing: 0;
    }

    table.report td {
        padding: 0.5em 0.5em 0.5em 0.5em;
        border: 0 none;
        margin: 0 0 0 0;
        text-align: left;
        vertical-align: top;
        font-weight: normal;
    }

    table.report th {
        padding: 0.5em 0.5em 0.5em 0.5em;
        border: 0 none;
        margin: 0 0 0 0;
        text-align: right;
        vertical-align: top;
        font-weight: bold;
    }

    table.report th.title {
        padding: 0.5em 0.5em 0.5em 0.5em;
        border-top: 0 none;
        border-right: 0 none;
        border-bottom: 1px solid #cccccc;
        border-left: 0 none;
        background: #efefef;
        text-align: center;
        vertical-align: middle;
        font-weight: bold;
    }

  </style>
 </head>
 <body>
  <?PHP

    $files = glob('/path/to/it-logs/*-IT.log', GLOB_NOSORT);
    if ($files !== FALSE) {

        $temp = $files;
        $files = array();
        foreach ($temp as $logfile) {
            $index = preg_replace('/^.*\/([0-3][0-9][0-1][0-9][0-9][0-9][0-9][0-9]).*$/', '$1', $logfile);
            $time = mktime(0,0,0, intval(substr($index, 2, 2), 10),
                                  intval(substr($index, 0, 2), 10),
                                  intval(substr($index,4,4), 10));
            $files[$time] = $logfile;
        }
        krsort($files);

        foreach ($files as $time => $logfile) {

            $data = @file($logfile, FILE_SKIP_EMPTY_LINES);
            if ($data !== FALSE) {
                $title = date('D, j M Y', $time);
                echo " <table class=\"report\">\n";
                echo " <tr>\n";
                echo " <th class=\"title\" colspan=\"2\">", htmlentities($title, ENT_QUOTES, 'UTF-8'), "</th>\n";
                echo " </tr>\n";
                foreach ($data as $entry) {
                    @list($key, $value) = @explode(':', $entry, 2);
                    echo " <tr>\n";
                    echo " <th>", htmlentities(trim($key), ENT_COMPAT, 'UTF-8'), "</th>\n";
                    echo " <td>", htmlentities(trim($value), ENT_COMPAT, 'UTF-8'), "</td>\n";
                    echo " </tr>\n";
                }
                echo " </table>\n";
            }
        }
    }
?>
 </body>
</html>

  • glob() searches all file names.
  • The first foreach loop parses the file name, /path/to/it-logs/DDMMYYYY-IT.log, creates an unix timestamp (seconds since the epoch) based on the date, and adds the filename into a new array keyed by the timestamps.
  • krsort() sorts the file names based on the timestamps, latest first.
  • The second foreach loop processes each log file. The file() function reads the file as an array of lines.
  • The inner foreach loop processes each line from the current log file. The explode() function splits the line into two parts at the first colon (':').
  • The htmlentities() function is used to display the strings correctly in HTML. Since you're using Linux, your log files are most likely UTF-8, so I used that.
  • The rest is just prettyprinting. trim() is used to trim out leading and trailing whitespace; it is invisible in the end result, but the output HTML looks prettier.
You'll probably notice that since the file name array is keyed by the Unix timestamp, you can easily add query parameters limiting the output to desired dates. I recommend using the strtotime() function to parse any query start and end date parameters.

Hope this helps,
Nominal Animal

MTK358 02-09-2011 01:19 PM

@devUnix

Could you post part of the actual log file?

devUnix 02-10-2011 02:29 PM

Quote:

Originally Posted by MTK358 (Post 4253347)
@devUnix

Could you post part of the actual log file?

Yeah, sure. Here it is:

cat 09022011-IT.log (Original Log File)

User ID: XYZ
Group: IT
Shift Time: First
Problems Reported: All the servers are down. The company is going to lose its business. We are helpless and cannot do anything. The company should have hired more intelligent engineers.

devUnix 02-10-2011 02:41 PM

Quote:

Originally Posted by Nominal Animal (Post 4253326)
Hope this helps,
Nominal Animal

I appreciate your effort and willingness to come up with solutions. I will check out your devised script and will let you know how it works for me.

The main question / problem is: How to extract pieces of data / information from a text file which has varied field types. The good news is we know the field names or can identify them so that we can provide an enhanced view of the data/information to the user. In the present example, I have not used any text file which would contain fields delimited by some character or whitespace.

If there is a better way of organizing these items in a text file, please, do let me know. I am simply putting each item on a separate line but some of the items can consist of more than one line.

MTK358 02-10-2011 02:59 PM

Quote:

Originally Posted by devUnix (Post 4254437)
I am simply putting each item on a separate line but some of the items can consist of more than one line.

You mean in the actual log file?

Then post a sample, because my solution relies on the idea that there's one "name: value" statement per line.

devUnix 02-10-2011 03:09 PM

Quote:

Originally Posted by Nominal Animal (Post 4253326)
Since you use PHP, try the following.

This reads all /path/to/it-logs/DDMMYYYY-IT.log files (in their raw form unprocessed by any scripts), sorts them latest first, then outputs them each in a separate table:
Code:

<html>
 <head>
  <title>
  Example problem report
  </title>
  <style type="text/css">
  table.report {
    width: 40em !important;
    padding: 0 0 0 0;
    border: 1px solid #cccccc;
    margin: 0 0 2em 0;
    border-collapse: collapse;
    border-spacing: 0;
  }
  table.report td {
    padding: 0.5em 0.5em 0.5em 0.5em;
    border: 0 none;
    margin: 0 0 0 0;
    text-align: left;
    vertical-align: top;
    font-weight: normal;
  }
  table.report th {
    padding: 0.5em 0.5em 0.5em 0.5em;
    border: 0 none;
    margin: 0 0 0 0;
    text-align: right;
    vertical-align: top;
    font-weight: bold;
  }
  table.report th.title {
    padding: 0.5em 0.5em 0.5em 0.5em;
    border-top: 0 none;
    border-right: 0 none;
    border-bottom: 1px solid #cccccc;
    border-left: 0 none;
    background: #efefef;
    text-align: center;
    vertical-align: middle;
    font-weight: bold;
  }
  </style>
 </head>
 <body>
<?PHP

  $files = glob('/path/to/it-logs/*-IT.log', GLOB_NOSORT);
  if ($files !== FALSE) {

      $temp  = $files;
      $files = array();
      foreach ($temp as $logfile) {
          $index = preg_replace('/^.*\/([0-3][0-9][0-1][0-9][0-9][0-9][0-9][0-9]).*$/', '$1', $logfile);
          $time = mktime(0,0,0, intval(substr($index, 2, 2), 10), intval(substr($index, 0, 2), 10), intval(substr($index,4,4), 10));
          $files[$time] = $logfile;
      }
      krsort($files);

      foreach ($files as $time => $logfile) {

          $data = @file($logfile, FILE_SKIP_EMPTY_LINES);
          if ($data !== FALSE) {
              $title = date('D, j M Y', $time);
              echo "  <table class=\"report\">\n";
              echo "  <tr>\n";
              echo "    <th class=\"title\" colspan=\"2\">", htmlentities($title, ENT_QUOTES, 'UTF-8'), "</th>\n";
              echo "  </tr>\n";
              foreach ($data as $entry) {
                  @list($key, $value) = @explode(':', $entry, 2);
                  echo "  <tr>\n";
                  echo "    <th>", htmlentities(trim($key), ENT_COMPAT, 'UTF-8'), "</th>\n";
                  echo "    <td>", htmlentities(trim($value), ENT_COMPAT, 'UTF-8'), "</td>\n";
                  echo "  </tr>\n";
              }
              echo "  </table>\n";
          }
      }
  }
?>
 </body>
</html>

  • glob() searches all file names.
  • The first foreach loop parses the file name, /path/to/it-logs/DDMMYYYY-IT.log, creates an unix timestamp (seconds since the epoch) based on the date, and adds the filename into a new array keyed by the timestamps.
  • krsort() sorts the file names based on the timestamps, latest first.
  • The second foreach loop processes each log file. The file() function reads the file as an array of lines.
  • The inner foreach loop processes each line from the current log file. The explode() function splits the line into two parts at the first colon (':').
  • The htmlentities() function is used to display the strings correctly in HTML. Since you're using Linux, your log files are most likely UTF-8, so I used that.
  • The rest is just prettyprinting. trim() is used to trim out leading and trailing whitespace; it is invisible in the end result, but the output HTML looks prettier.
You'll probably notice that since the file name array is keyed by the Unix timestamp, you can easily add query parameters limiting the output to desired dates. I recommend using the strtotime() function to parse any query start and end date parameters.

Hope this helps,
Nominal Animal




Okay, it works and doesn't work as well. For example, there are 5 log files and it is displaying only 3 of them. Secondly, sometimes the value is being displayed in the left column along with the field name:

Reported Problems

Nothing works fine in this company. Blah sdjfkhsd kfhsd fh sdjkfhs djkfh sdjkf sd fsh dfkjh sdkfj sdjkh fjkd sfjk sdkf sd fsd fsdfNone


that way both the field and the value are in bold face. For fields whose values are not longer than one line, are being displayed properly.

Well, you have done very well. I must say. You have also provided a good layout for the report. Thank you!

devUnix 02-10-2011 03:14 PM

Quote:

Originally Posted by MTK358 (Post 4254447)
You mean in the actual log file?

Then post a sample, because my solution relies on the idea that there's one "name: value" statement per line.


Yes, the fields can have more than a line of text. In fact, hundreds of lines may go under, for example, "Problems Reported:".


cat 09022011-IT.log (Original Log File)

User ID: XYZ
Group: IT
Shift Time: First
Problems Reported: All the servers are down. The company is going to lose its business. We are helpless and cannot do anything. The company should have hired more intelligent engineers.


Above, I have given only 4 fields. There are actually more of them. But that would not make any difference because if we can work with those few fields then we can work with any number of fields.


All times are GMT -5. The time now is 04:04 PM.