LinuxQuestions.org
Help answer threads with 0 replies.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 02-09-2011, 07:04 AM   #1
devUnix
Member
 
Registered: Oct 2010
Posts: 606

Rep: Reputation: 59
Formatting Fields and Text Being Displayed from Text File


I want to display the contents of a particular log file (simple text file, I mean in Linux). But there is a problem: The contents need to be organized in a fixed format. Have a look at this log file:

sampleLog.txt

Code:
User Name: XYZ
Reported Problems Description: Blah! Blah! Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!
Remarks: None
So, while displaying the contents of above file on a web page, I want to format the field names found in the log file: User Name:, Reported Problems Description:, and Remarks:. These fields may contain a variable length of text and no specific line number is assumed for them to appear on.

Any ideas?

The desired output should look like this:

User Name: XYZ
Reported Problems Description: Blah! Blah! Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!
Remarks: None


Well, what I am trying to do may sound wierd to some of you. The filed "Reported Problems Description:" can possible contain text which embeds colon (.

Last edited by devUnix; 02-09-2011 at 09:03 AM.
 
Old 02-09-2011, 08:07 AM   #2
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
You might want to explain what formatting you wish to perform as I do not see a difference apart from the bold text?
 
Old 02-09-2011, 08:32 AM   #3
MTK358
LQ 5k Club
 
Registered: Sep 2009
Posts: 6,443
Blog Entries: 3

Rep: Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723
Code:
sed -i -e 's#\&#\&amp;#g' -e 's#>#\&gt;#g' -e 's#<#\&lt;#g' -r -e 's#^([^:]+:)(.*)$#<b>\1</b>\2#' -e 's#.*#&<br />#' logfile
Will turn it into valid HTML and make the part before the first ":" in the line bold.

Last edited by MTK358; 02-09-2011 at 08:35 AM.
 
Old 02-09-2011, 08:33 AM   #4
devUnix
Member
 
Registered: Oct 2010
Posts: 606

Original Poster
Rep: Reputation: 59
Quote:
Originally Posted by grail View Post
You might want to explain what formatting you wish to perform as I do not see a difference apart from the bold text?
Yes, that "bold" formatting is exactly what I want to do (as of now).
 
Old 02-09-2011, 08:56 AM   #5
devUnix
Member
 
Registered: Oct 2010
Posts: 606

Original Poster
Rep: Reputation: 59
Quote:
Originally Posted by MTK358 View Post
Code:
sed -i -e 's#\&#\&amp;#g' -e 's#>#\&gt;#g' -e 's#<#\&lt;#g' -r -e 's#^([^:]+:)(.*)$#<b>\1</b>\2#' -e 's#.*#&<br />#' logfile
Will turn it into valid HTML and make the part before the first ":" in the line bold.

That is good. But it makes changes in the source log file itself. Of course, that is what "-i" is doing. But there is a problem. When I display the contents of the log file on a page, some extra tags are being displayed as they are.

For example: <b>&lt;User Name

Of course, your devised sed script does turn the field names in bold face and the other pieces of the text remain unaffected when displayed on a web page. But, as I said earlier, some extra tags are also being displayed along. I have closed to web page otherwise would show you the exact output but it is similar to what I have given above in the example.
 
Old 02-09-2011, 09:02 AM   #6
MTK358
LQ 5k Club
 
Registered: Sep 2009
Posts: 6,443
Blog Entries: 3

Rep: Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723
Did you but <html> tags around the output and save it with a .html extension?

It works great for me:

Code:
<b>User Name:</b> XYZ<br />
<b>Reported Problems Description:</b> Blah! Blah! Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!<br />
<b>Remarks:</b> None<br />
 
Old 02-09-2011, 09:09 AM   #7
crts
Senior Member
 
Registered: Jan 2010
Posts: 2,020

Rep: Reputation: 757Reputation: 757Reputation: 757Reputation: 757Reputation: 757Reputation: 757Reputation: 757
Quote:
some extra tags are being displayed as they are.

For example: <b>&lt;User Name
Did you run the script twice on the same file? Omit the -i option and redirect it instead:
Code:
sed -e .... logfile > logfile.html
Do this with an unmodified copy of logfile. Firefox displays it even without the wrapping '<html>' tags.
 
Old 02-09-2011, 11:04 AM   #8
devUnix
Member
 
Registered: Oct 2010
Posts: 606

Original Poster
Rep: Reputation: 59
Quote:
Originally Posted by MTK358 View Post
Did you but <html> tags around the output and save it with a .html extension?

Sed Script as given:

Code:
-bash-2.05b# cat test.sh
#!/bin/bash
sed -i -e 's#\&#\&amp;#g' -e 's#>#\&gt;#g' -e 's#<#\&lt;#g' -r -e 's#^([^:]+)(:.*)$#<b>\1</b>\2#' 09022011-IT.log
-bash-2.05b#

The Log file after executing the Sed Script has processed it:

-bash-2.05b# cat 09022011-IT.log
<b>&lt;b&gt;&amp;lt;b&amp;gt;User ID&amp;lt;/b&amp;gt;&lt;/b&gt;</b>: XYZ
<b>&lt;b&gt;&amp;lt;b&amp;gt;Group&amp;lt;/b&amp;gt;&lt;/b&gt;</b>: IT
<b>&lt;b&gt;&amp;lt;b&amp;gt;Shift Time&amp;lt;/b&amp;gt;&lt;/b&gt;</b>: First
<b>&lt;b&gt;&amp;lt;b&amp;gt;Problems Reported &amp;lt;/b&amp;gt;&lt;/b&gt;</b>:
None
-bash-2.05b#[/code]



PHP Script I have written:

Code:
-bash-2.05b# cat test.php
<?php
$FH=fopen("09022011-RA.log","r") or die("Error");
$result = fread($FH,1024);
fclose($FH);
echo $result;
?>
Output in IE (Web Browser):

Code:
<b>&lt;b&gt;User ID&lt;/b&gt;</b>: XYZ <b>&lt;b&gt;Group&lt;/b&gt;</b>: IT <b>&lt;b&gt;Shift Time&lt;/b&gt;</b>: First <b>&lt;b&gt;Problems Reported&lt;/b&gt;</b>: None

Last edited by devUnix; 02-09-2011 at 11:10 AM.
 
Old 02-09-2011, 01:01 PM   #9
Nominal Animal
Senior Member
 
Registered: Dec 2010
Location: Finland
Distribution: Xubuntu, CentOS, LFS
Posts: 1,723
Blog Entries: 3

Rep: Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948
Since you use PHP, try the following.

This reads all /path/to/it-logs/DDMMYYYY-IT.log files (in their raw form unprocessed by any scripts), sorts them latest first, then outputs them each in a separate table:
Code:
<html>
 <head>
  <title>
   Example problem report
  </title>
  <style type="text/css">

    table.report {
        width: 40em !important;
        padding: 0 0 0 0;
        border: 1px solid #cccccc;
        margin: 0 0 2em 0;
        border-collapse: collapse;
        border-spacing: 0;
    }

    table.report td {
        padding: 0.5em 0.5em 0.5em 0.5em;
        border: 0 none;
        margin: 0 0 0 0;
        text-align: left;
        vertical-align: top;
        font-weight: normal;
    }

    table.report th {
        padding: 0.5em 0.5em 0.5em 0.5em;
        border: 0 none;
        margin: 0 0 0 0;
        text-align: right;
        vertical-align: top;
        font-weight: bold;
    }

    table.report th.title {
        padding: 0.5em 0.5em 0.5em 0.5em;
        border-top: 0 none;
        border-right: 0 none;
        border-bottom: 1px solid #cccccc;
        border-left: 0 none;
        background: #efefef;
        text-align: center;
        vertical-align: middle;
        font-weight: bold;
    }

  </style>
 </head>
 <body>
  <?PHP

    $files = glob('/path/to/it-logs/*-IT.log', GLOB_NOSORT);
    if ($files !== FALSE) {

        $temp = $files;
        $files = array();
        foreach ($temp as $logfile) {
            $index = preg_replace('/^.*\/([0-3][0-9][0-1][0-9][0-9][0-9][0-9][0-9]).*$/', '$1', $logfile);
            $time = mktime(0,0,0, intval(substr($index, 2, 2), 10),
                                  intval(substr($index, 0, 2), 10),
                                  intval(substr($index,4,4), 10));
            $files[$time] = $logfile;
        }
        krsort($files);

        foreach ($files as $time => $logfile) {

            $data = @file($logfile, FILE_SKIP_EMPTY_LINES);
            if ($data !== FALSE) {
                $title = date('D, j M Y', $time);
                echo " <table class=\"report\">\n";
                echo " <tr>\n";
                echo " <th class=\"title\" colspan=\"2\">", htmlentities($title, ENT_QUOTES, 'UTF-8'), "</th>\n";
                echo " </tr>\n";
                foreach ($data as $entry) {
                    @list($key, $value) = @explode(':', $entry, 2);
                    echo " <tr>\n";
                    echo " <th>", htmlentities(trim($key), ENT_COMPAT, 'UTF-8'), "</th>\n";
                    echo " <td>", htmlentities(trim($value), ENT_COMPAT, 'UTF-8'), "</td>\n";
                    echo " </tr>\n";
                }
                echo " </table>\n";
            }
        }
    }
?>
 </body>
</html>
  • glob() searches all file names.
  • The first foreach loop parses the file name, /path/to/it-logs/DDMMYYYY-IT.log, creates an unix timestamp (seconds since the epoch) based on the date, and adds the filename into a new array keyed by the timestamps.
  • krsort() sorts the file names based on the timestamps, latest first.
  • The second foreach loop processes each log file. The file() function reads the file as an array of lines.
  • The inner foreach loop processes each line from the current log file. The explode() function splits the line into two parts at the first colon (':').
  • The htmlentities() function is used to display the strings correctly in HTML. Since you're using Linux, your log files are most likely UTF-8, so I used that.
  • The rest is just prettyprinting. trim() is used to trim out leading and trailing whitespace; it is invisible in the end result, but the output HTML looks prettier.
You'll probably notice that since the file name array is keyed by the Unix timestamp, you can easily add query parameters limiting the output to desired dates. I recommend using the strtotime() function to parse any query start and end date parameters.

Hope this helps,
Nominal Animal

Last edited by Nominal Animal; 03-21-2011 at 07:04 AM.
 
Old 02-09-2011, 01:19 PM   #10
MTK358
LQ 5k Club
 
Registered: Sep 2009
Posts: 6,443
Blog Entries: 3

Rep: Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723
@devUnix

Could you post part of the actual log file?
 
Old 02-10-2011, 02:29 PM   #11
devUnix
Member
 
Registered: Oct 2010
Posts: 606

Original Poster
Rep: Reputation: 59
Quote:
Originally Posted by MTK358 View Post
@devUnix

Could you post part of the actual log file?
Yeah, sure. Here it is:

cat 09022011-IT.log (Original Log File)

User ID: XYZ
Group: IT
Shift Time: First
Problems Reported: All the servers are down. The company is going to lose its business. We are helpless and cannot do anything. The company should have hired more intelligent engineers.

Last edited by devUnix; 02-10-2011 at 02:30 PM.
 
Old 02-10-2011, 02:41 PM   #12
devUnix
Member
 
Registered: Oct 2010
Posts: 606

Original Poster
Rep: Reputation: 59
Quote:
Originally Posted by Nominal Animal View Post
Hope this helps,
Nominal Animal
I appreciate your effort and willingness to come up with solutions. I will check out your devised script and will let you know how it works for me.

The main question / problem is: How to extract pieces of data / information from a text file which has varied field types. The good news is we know the field names or can identify them so that we can provide an enhanced view of the data/information to the user. In the present example, I have not used any text file which would contain fields delimited by some character or whitespace.

If there is a better way of organizing these items in a text file, please, do let me know. I am simply putting each item on a separate line but some of the items can consist of more than one line.
 
Old 02-10-2011, 02:59 PM   #13
MTK358
LQ 5k Club
 
Registered: Sep 2009
Posts: 6,443
Blog Entries: 3

Rep: Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723
Quote:
Originally Posted by devUnix View Post
I am simply putting each item on a separate line but some of the items can consist of more than one line.
You mean in the actual log file?

Then post a sample, because my solution relies on the idea that there's one "name: value" statement per line.
 
Old 02-10-2011, 03:09 PM   #14
devUnix
Member
 
Registered: Oct 2010
Posts: 606

Original Poster
Rep: Reputation: 59
Quote:
Originally Posted by Nominal Animal View Post
Since you use PHP, try the following.

This reads all /path/to/it-logs/DDMMYYYY-IT.log files (in their raw form unprocessed by any scripts), sorts them latest first, then outputs them each in a separate table:
Code:
<html>
 <head>
  <title>
   Example problem report
  </title>
  <style type="text/css">
   table.report {
    width: 40em !important;
    padding: 0 0 0 0;
    border: 1px solid #cccccc;
    margin: 0 0 2em 0;
    border-collapse: collapse;
    border-spacing: 0;
   }
   table.report td {
    padding: 0.5em 0.5em 0.5em 0.5em;
    border: 0 none;
    margin: 0 0 0 0;
    text-align: left;
    vertical-align: top;
    font-weight: normal;
   }
   table.report th {
    padding: 0.5em 0.5em 0.5em 0.5em;
    border: 0 none;
    margin: 0 0 0 0;
    text-align: right;
    vertical-align: top;
    font-weight: bold;
   }
   table.report th.title {
    padding: 0.5em 0.5em 0.5em 0.5em;
    border-top: 0 none;
    border-right: 0 none;
    border-bottom: 1px solid #cccccc;
    border-left: 0 none;
    background: #efefef;
    text-align: center;
    vertical-align: middle;
    font-weight: bold;
   }
  </style>
 </head>
 <body>
<?PHP

   $files = glob('/path/to/it-logs/*-IT.log', GLOB_NOSORT);
   if ($files !== FALSE) {

       $temp  = $files;
       $files = array();
       foreach ($temp as $logfile) {
           $index = preg_replace('/^.*\/([0-3][0-9][0-1][0-9][0-9][0-9][0-9][0-9]).*$/', '$1', $logfile);
           $time = mktime(0,0,0, intval(substr($index, 2, 2), 10), intval(substr($index, 0, 2), 10), intval(substr($index,4,4), 10));
           $files[$time] = $logfile;
       }
       krsort($files);

       foreach ($files as $time => $logfile) {

           $data = @file($logfile, FILE_SKIP_EMPTY_LINES);
           if ($data !== FALSE) {
               $title = date('D, j M Y', $time);
               echo "  <table class=\"report\">\n";
               echo "   <tr>\n";
               echo "    <th class=\"title\" colspan=\"2\">", htmlentities($title, ENT_QUOTES, 'UTF-8'), "</th>\n";
               echo "   </tr>\n";
               foreach ($data as $entry) {
                   @list($key, $value) = @explode(':', $entry, 2);
                   echo "   <tr>\n";
                   echo "    <th>", htmlentities(trim($key), ENT_COMPAT, 'UTF-8'), "</th>\n";
                   echo "    <td>", htmlentities(trim($value), ENT_COMPAT, 'UTF-8'), "</td>\n";
                   echo "   </tr>\n";
               }
               echo "  </table>\n";
           }
       }
   }
?>
 </body>
</html>
  • glob() searches all file names.
  • The first foreach loop parses the file name, /path/to/it-logs/DDMMYYYY-IT.log, creates an unix timestamp (seconds since the epoch) based on the date, and adds the filename into a new array keyed by the timestamps.
  • krsort() sorts the file names based on the timestamps, latest first.
  • The second foreach loop processes each log file. The file() function reads the file as an array of lines.
  • The inner foreach loop processes each line from the current log file. The explode() function splits the line into two parts at the first colon (':').
  • The htmlentities() function is used to display the strings correctly in HTML. Since you're using Linux, your log files are most likely UTF-8, so I used that.
  • The rest is just prettyprinting. trim() is used to trim out leading and trailing whitespace; it is invisible in the end result, but the output HTML looks prettier.
You'll probably notice that since the file name array is keyed by the Unix timestamp, you can easily add query parameters limiting the output to desired dates. I recommend using the strtotime() function to parse any query start and end date parameters.

Hope this helps,
Nominal Animal



Okay, it works and doesn't work as well. For example, there are 5 log files and it is displaying only 3 of them. Secondly, sometimes the value is being displayed in the left column along with the field name:

Reported Problems

Nothing works fine in this company. Blah sdjfkhsd kfhsd fh sdjkfhs djkfh sdjkf sd fsh dfkjh sdkfj sdjkh fjkd sfjk sdkf sd fsd fsdfNone


that way both the field and the value are in bold face. For fields whose values are not longer than one line, are being displayed properly.

Well, you have done very well. I must say. You have also provided a good layout for the report. Thank you!
 
Old 02-10-2011, 03:14 PM   #15
devUnix
Member
 
Registered: Oct 2010
Posts: 606

Original Poster
Rep: Reputation: 59
Quote:
Originally Posted by MTK358 View Post
You mean in the actual log file?

Then post a sample, because my solution relies on the idea that there's one "name: value" statement per line.

Yes, the fields can have more than a line of text. In fact, hundreds of lines may go under, for example, "Problems Reported:".


cat 09022011-IT.log (Original Log File)

User ID: XYZ
Group: IT
Shift Time: First
Problems Reported: All the servers are down. The company is going to lose its business. We are helpless and cannot do anything. The company should have hired more intelligent engineers.


Above, I have given only 4 fields. There are actually more of them. But that would not make any difference because if we can work with those few fields then we can work with any number of fields.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Will gawk extract bits of text fields from a few thousand identically structured file taskmaster Linux - Software 4 11-10-2010 08:46 PM
[SOLVED] Comparing and Formatting the text file flamingo_l Programming 13 10-13-2010 03:16 AM
How to parse text file to a set text column width and output to new text file? jsstevenson Programming 12 04-23-2008 02:36 PM
how not to print the 4th field from a text file with six fields livetoday Red Hat 3 10-02-2007 01:19 PM
Can't enter text in certain Java text fields TheBelush Linux - Software 4 04-27-2005 05:29 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 02:09 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration