Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game. |
| Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
 |
GNU/Linux Basic Guide
This 255-page guide will provide you with the keys to understand the philosophy of free software, teach you how to use and handle it, and give you the tools required to move easily in the world of GNU/Linux. Many users and administrators will be taking their first steps with this GNU/Linux Basic guide and it will show you how to approach and solve the problems you encounter.
Click Here to receive this Complete Guide absolutely free. |
|
 |
|
02-09-2011, 07:04 AM
|
#1
|
|
Member
Registered: Oct 2010
Location: Bengaluru, India
Distribution: RHEL 5.1 on My PC, & SunOS / Sun Solaris, RHEL, SuSe, Debian, FreeBSD and other Linux flavors @ Work
Posts: 438
Rep:
|
Formatting Fields and Text Being Displayed from Text File
I want to display the contents of a particular log file (simple text file, I mean in Linux). But there is a problem: The contents need to be organized in a fixed format. Have a look at this log file:
sampleLog.txt
Code:
User Name: XYZ
Reported Problems Description: Blah! Blah! Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!
Remarks: None
So, while displaying the contents of above file on a web page, I want to format the field names found in the log file: User Name:, Reported Problems Description:, and Remarks:. These fields may contain a variable length of text and no specific line number is assumed for them to appear on.
Any ideas?
The desired output should look like this:
User Name: XYZ
Reported Problems Description: Blah! Blah! Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!
Remarks: None
Well, what I am trying to do may sound wierd to some of you. The filed "Reported Problems Description:" can possible contain text which embeds colon (  .
Last edited by devUnix; 02-09-2011 at 09:03 AM.
|
|
|
|
02-09-2011, 08:07 AM
|
#2
|
|
Guru
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 6,306
|
You might want to explain what formatting you wish to perform as I do not see a difference apart from the bold text?
|
|
|
|
02-09-2011, 08:32 AM
|
#3
|
|
LQ 5k Club
Registered: Sep 2009
Distribution: Arch x86_64
Posts: 6,443
|
Code:
sed -i -e 's#\&#\&#g' -e 's#>#\>#g' -e 's#<#\<#g' -r -e 's#^([^:]+:)(.*)$#<b>\1</b>\2#' -e 's#.*#&<br />#' logfile
Will turn it into valid HTML and make the part before the first ":" in the line bold.
Last edited by MTK358; 02-09-2011 at 08:35 AM.
|
|
|
|
02-09-2011, 08:33 AM
|
#4
|
|
Member
Registered: Oct 2010
Location: Bengaluru, India
Distribution: RHEL 5.1 on My PC, & SunOS / Sun Solaris, RHEL, SuSe, Debian, FreeBSD and other Linux flavors @ Work
Posts: 438
Original Poster
Rep:
|
Quote:
Originally Posted by grail
You might want to explain what formatting you wish to perform as I do not see a difference apart from the bold text?
|
Yes, that "bold" formatting is exactly what I want to do (as of now).
|
|
|
|
02-09-2011, 08:56 AM
|
#5
|
|
Member
Registered: Oct 2010
Location: Bengaluru, India
Distribution: RHEL 5.1 on My PC, & SunOS / Sun Solaris, RHEL, SuSe, Debian, FreeBSD and other Linux flavors @ Work
Posts: 438
Original Poster
Rep:
|
Quote:
Originally Posted by MTK358
Code:
sed -i -e 's#\&#\&#g' -e 's#>#\>#g' -e 's#<#\<#g' -r -e 's#^([^:]+:)(.*)$#<b>\1</b>\2#' -e 's#.*#&<br />#' logfile
Will turn it into valid HTML and make the part before the first ":" in the line bold.
|
That is good. But it makes changes in the source log file itself. Of course, that is what "-i" is doing. But there is a problem. When I display the contents of the log file on a page, some extra tags are being displayed as they are.
For example: <b>< User Name
Of course, your devised sed script does turn the field names in bold face and the other pieces of the text remain unaffected when displayed on a web page. But, as I said earlier, some extra tags are also being displayed along. I have closed to web page otherwise would show you the exact output but it is similar to what I have given above in the example.
|
|
|
|
02-09-2011, 09:02 AM
|
#6
|
|
LQ 5k Club
Registered: Sep 2009
Distribution: Arch x86_64
Posts: 6,443
|
Did you but <html> tags around the output and save it with a .html extension?
It works great for me:
Code:
<b>User Name:</b> XYZ<br />
<b>Reported Problems Description:</b> Blah! Blah! Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!Blah! Blah!<br />
<b>Remarks:</b> None<br />
|
|
|
|
02-09-2011, 09:09 AM
|
#7
|
|
Senior Member
Registered: Jan 2010
Posts: 1,604
|
Quote:
some extra tags are being displayed as they are.
For example: <b><User Name
|
Did you run the script twice on the same file? Omit the -i option and redirect it instead:
Code:
sed -e .... logfile > logfile.html
Do this with an unmodified copy of logfile. Firefox displays it even without the wrapping '<html>' tags.
|
|
|
|
02-09-2011, 11:04 AM
|
#8
|
|
Member
Registered: Oct 2010
Location: Bengaluru, India
Distribution: RHEL 5.1 on My PC, & SunOS / Sun Solaris, RHEL, SuSe, Debian, FreeBSD and other Linux flavors @ Work
Posts: 438
Original Poster
Rep:
|
Quote:
Originally Posted by MTK358
Did you but <html> tags around the output and save it with a .html extension?
|
Sed Script as given:
Code:
-bash-2.05b# cat test.sh
#!/bin/bash
sed -i -e 's#\&#\&#g' -e 's#>#\>#g' -e 's#<#\<#g' -r -e 's#^([^:]+)(:.*)$#<b>\1</b>\2#' 09022011-IT.log
-bash-2.05b#
The Log file after executing the Sed Script has processed it:
-bash-2.05b# cat 09022011-IT.log
<b><b>&lt;b&gt;User ID&lt;/b&gt;</b></b>: XYZ
<b><b>&lt;b&gt;Group&lt;/b&gt;</b></b>: IT
<b><b>&lt;b&gt;Shift Time&lt;/b&gt;</b></b>: First
<b><b>&lt;b&gt;Problems Reported &lt;/b&gt;</b></b>:
None
-bash-2.05b#[/code]
PHP Script I have written:
Code:
-bash-2.05b# cat test.php
<?php
$FH=fopen("09022011-RA.log","r") or die("Error");
$result = fread($FH,1024);
fclose($FH);
echo $result;
?>
Output in IE (Web Browser):
Code:
<b><b>User ID</b></b>: XYZ <b><b>Group</b></b>: IT <b><b>Shift Time</b></b>: First <b><b>Problems Reported</b></b>: None
Last edited by devUnix; 02-09-2011 at 11:10 AM.
|
|
|
|
02-09-2011, 01:01 PM
|
#9
|
|
Senior Member
Registered: Dec 2010
Location: Finland
Distribution: Xubuntu, CentOS, LFS
Posts: 1,723
|
Since you use PHP, try the following.
This reads all /path/to/it-logs/DDMMYYYY-IT.log files (in their raw form unprocessed by any scripts), sorts them latest first, then outputs them each in a separate table:
Code:
<html>
<head>
<title>
Example problem report
</title>
<style type="text/css">
table.report {
width: 40em !important;
padding: 0 0 0 0;
border: 1px solid #cccccc;
margin: 0 0 2em 0;
border-collapse: collapse;
border-spacing: 0;
}
table.report td {
padding: 0.5em 0.5em 0.5em 0.5em;
border: 0 none;
margin: 0 0 0 0;
text-align: left;
vertical-align: top;
font-weight: normal;
}
table.report th {
padding: 0.5em 0.5em 0.5em 0.5em;
border: 0 none;
margin: 0 0 0 0;
text-align: right;
vertical-align: top;
font-weight: bold;
}
table.report th.title {
padding: 0.5em 0.5em 0.5em 0.5em;
border-top: 0 none;
border-right: 0 none;
border-bottom: 1px solid #cccccc;
border-left: 0 none;
background: #efefef;
text-align: center;
vertical-align: middle;
font-weight: bold;
}
</style>
</head>
<body>
<?PHP
$files = glob('/path/to/it-logs/*-IT.log', GLOB_NOSORT);
if ($files !== FALSE) {
$temp = $files;
$files = array();
foreach ($temp as $logfile) {
$index = preg_replace('/^.*\/([0-3][0-9][0-1][0-9][0-9][0-9][0-9][0-9]).*$/', '$1', $logfile);
$time = mktime(0,0,0, intval(substr($index, 2, 2), 10),
intval(substr($index, 0, 2), 10),
intval(substr($index,4,4), 10));
$files[$time] = $logfile;
}
krsort($files);
foreach ($files as $time => $logfile) {
$data = @file($logfile, FILE_SKIP_EMPTY_LINES);
if ($data !== FALSE) {
$title = date('D, j M Y', $time);
echo " <table class=\"report\">\n";
echo " <tr>\n";
echo " <th class=\"title\" colspan=\"2\">", htmlentities($title, ENT_QUOTES, 'UTF-8'), "</th>\n";
echo " </tr>\n";
foreach ($data as $entry) {
@list($key, $value) = @explode(':', $entry, 2);
echo " <tr>\n";
echo " <th>", htmlentities(trim($key), ENT_COMPAT, 'UTF-8'), "</th>\n";
echo " <td>", htmlentities(trim($value), ENT_COMPAT, 'UTF-8'), "</td>\n";
echo " </tr>\n";
}
echo " </table>\n";
}
}
}
?>
</body>
</html>
- glob() searches all file names.
- The first foreach loop parses the file name, /path/to/it-logs/DDMMYYYY-IT.log, creates an unix timestamp (seconds since the epoch) based on the date, and adds the filename into a new array keyed by the timestamps.
- krsort() sorts the file names based on the timestamps, latest first.
- The second foreach loop processes each log file. The file() function reads the file as an array of lines.
- The inner foreach loop processes each line from the current log file. The explode() function splits the line into two parts at the first colon (':').
- The htmlentities() function is used to display the strings correctly in HTML. Since you're using Linux, your log files are most likely UTF-8, so I used that.
- The rest is just prettyprinting. trim() is used to trim out leading and trailing whitespace; it is invisible in the end result, but the output HTML looks prettier.
You'll probably notice that since the file name array is keyed by the Unix timestamp, you can easily add query parameters limiting the output to desired dates. I recommend using the strtotime() function to parse any query start and end date parameters.
Hope this helps, Nominal Animal
Last edited by Nominal Animal; 03-21-2011 at 07:04 AM.
|
|
|
|
02-09-2011, 01:19 PM
|
#10
|
|
LQ 5k Club
Registered: Sep 2009
Distribution: Arch x86_64
Posts: 6,443
|
@devUnix
Could you post part of the actual log file?
|
|
|
|
02-10-2011, 02:29 PM
|
#11
|
|
Member
Registered: Oct 2010
Location: Bengaluru, India
Distribution: RHEL 5.1 on My PC, & SunOS / Sun Solaris, RHEL, SuSe, Debian, FreeBSD and other Linux flavors @ Work
Posts: 438
Original Poster
Rep:
|
Quote:
Originally Posted by MTK358
@devUnix
Could you post part of the actual log file?
|
Yeah, sure. Here it is:
cat 09022011-IT.log (Original Log File)
User ID: XYZ
Group: IT
Shift Time: First
Problems Reported: All the servers are down. The company is going to lose its business. We are helpless and cannot do anything. The company should have hired more intelligent engineers.
Last edited by devUnix; 02-10-2011 at 02:30 PM.
|
|
|
|
02-10-2011, 02:41 PM
|
#12
|
|
Member
Registered: Oct 2010
Location: Bengaluru, India
Distribution: RHEL 5.1 on My PC, & SunOS / Sun Solaris, RHEL, SuSe, Debian, FreeBSD and other Linux flavors @ Work
Posts: 438
Original Poster
Rep:
|
Quote:
Originally Posted by Nominal Animal
Hope this helps, Nominal Animal
|
I appreciate your effort and willingness to come up with solutions. I will check out your devised script and will let you know how it works for me.
The main question / problem is: How to extract pieces of data / information from a text file which has varied field types. The good news is we know the field names or can identify them so that we can provide an enhanced view of the data/information to the user. In the present example, I have not used any text file which would contain fields delimited by some character or whitespace.
If there is a better way of organizing these items in a text file, please, do let me know. I am simply putting each item on a separate line but some of the items can consist of more than one line.
|
|
|
|
02-10-2011, 02:59 PM
|
#13
|
|
LQ 5k Club
Registered: Sep 2009
Distribution: Arch x86_64
Posts: 6,443
|
Quote:
Originally Posted by devUnix
I am simply putting each item on a separate line but some of the items can consist of more than one line.
|
You mean in the actual log file?
Then post a sample, because my solution relies on the idea that there's one "name: value" statement per line.
|
|
|
|
02-10-2011, 03:09 PM
|
#14
|
|
Member
Registered: Oct 2010
Location: Bengaluru, India
Distribution: RHEL 5.1 on My PC, & SunOS / Sun Solaris, RHEL, SuSe, Debian, FreeBSD and other Linux flavors @ Work
Posts: 438
Original Poster
Rep:
|
Quote:
Originally Posted by Nominal Animal
Since you use PHP, try the following.
This reads all /path/to/it-logs/DDMMYYYY-IT.log files (in their raw form unprocessed by any scripts), sorts them latest first, then outputs them each in a separate table:
Code:
<html>
<head>
<title>
Example problem report
</title>
<style type="text/css">
table.report {
width: 40em !important;
padding: 0 0 0 0;
border: 1px solid #cccccc;
margin: 0 0 2em 0;
border-collapse: collapse;
border-spacing: 0;
}
table.report td {
padding: 0.5em 0.5em 0.5em 0.5em;
border: 0 none;
margin: 0 0 0 0;
text-align: left;
vertical-align: top;
font-weight: normal;
}
table.report th {
padding: 0.5em 0.5em 0.5em 0.5em;
border: 0 none;
margin: 0 0 0 0;
text-align: right;
vertical-align: top;
font-weight: bold;
}
table.report th.title {
padding: 0.5em 0.5em 0.5em 0.5em;
border-top: 0 none;
border-right: 0 none;
border-bottom: 1px solid #cccccc;
border-left: 0 none;
background: #efefef;
text-align: center;
vertical-align: middle;
font-weight: bold;
}
</style>
</head>
<body>
<?PHP
$files = glob('/path/to/it-logs/*-IT.log', GLOB_NOSORT);
if ($files !== FALSE) {
$temp = $files;
$files = array();
foreach ($temp as $logfile) {
$index = preg_replace('/^.*\/([0-3][0-9][0-1][0-9][0-9][0-9][0-9][0-9]).*$/', '$1', $logfile);
$time = mktime(0,0,0, intval(substr($index, 2, 2), 10), intval(substr($index, 0, 2), 10), intval(substr($index,4,4), 10));
$files[$time] = $logfile;
}
krsort($files);
foreach ($files as $time => $logfile) {
$data = @file($logfile, FILE_SKIP_EMPTY_LINES);
if ($data !== FALSE) {
$title = date('D, j M Y', $time);
echo " <table class=\"report\">\n";
echo " <tr>\n";
echo " <th class=\"title\" colspan=\"2\">", htmlentities($title, ENT_QUOTES, 'UTF-8'), "</th>\n";
echo " </tr>\n";
foreach ($data as $entry) {
@list($key, $value) = @explode(':', $entry, 2);
echo " <tr>\n";
echo " <th>", htmlentities(trim($key), ENT_COMPAT, 'UTF-8'), "</th>\n";
echo " <td>", htmlentities(trim($value), ENT_COMPAT, 'UTF-8'), "</td>\n";
echo " </tr>\n";
}
echo " </table>\n";
}
}
}
?>
</body>
</html>
- glob() searches all file names.
- The first foreach loop parses the file name, /path/to/it-logs/DDMMYYYY-IT.log, creates an unix timestamp (seconds since the epoch) based on the date, and adds the filename into a new array keyed by the timestamps.
- krsort() sorts the file names based on the timestamps, latest first.
- The second foreach loop processes each log file. The file() function reads the file as an array of lines.
- The inner foreach loop processes each line from the current log file. The explode() function splits the line into two parts at the first colon (':').
- The htmlentities() function is used to display the strings correctly in HTML. Since you're using Linux, your log files are most likely UTF-8, so I used that.
- The rest is just prettyprinting. trim() is used to trim out leading and trailing whitespace; it is invisible in the end result, but the output HTML looks prettier.
You'll probably notice that since the file name array is keyed by the Unix timestamp, you can easily add query parameters limiting the output to desired dates. I recommend using the strtotime() function to parse any query start and end date parameters.
Hope this helps, Nominal Animal
|
Okay, it works and doesn't work as well. For example, there are 5 log files and it is displaying only 3 of them. Secondly, sometimes the value is being displayed in the left column along with the field name:
Reported Problems
Nothing works fine in this company. Blah sdjfkhsd kfhsd fh sdjkfhs djkfh sdjkf sd fsh dfkjh sdkfj sdjkh fjkd sfjk sdkf sd fsd fsdfNone
that way both the field and the value are in bold face. For fields whose values are not longer than one line, are being displayed properly.
Well, you have done very well. I must say. You have also provided a good layout for the report. Thank you!
|
|
|
|
02-10-2011, 03:14 PM
|
#15
|
|
Member
Registered: Oct 2010
Location: Bengaluru, India
Distribution: RHEL 5.1 on My PC, & SunOS / Sun Solaris, RHEL, SuSe, Debian, FreeBSD and other Linux flavors @ Work
Posts: 438
Original Poster
Rep:
|
Quote:
Originally Posted by MTK358
You mean in the actual log file?
Then post a sample, because my solution relies on the idea that there's one "name: value" statement per line.
|
Yes, the fields can have more than a line of text. In fact, hundreds of lines may go under, for example, "Problems Reported:".
cat 09022011-IT.log (Original Log File)
User ID: XYZ
Group: IT
Shift Time: First
Problems Reported: All the servers are down. The company is going to lose its business. We are helpless and cannot do anything. The company should have hired more intelligent engineers.
Above, I have given only 4 fields. There are actually more of them. But that would not make any difference because if we can work with those few fields then we can work with any number of fields.
|
|
|
|
| Thread Tools |
Search this Thread |
|
|
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
All times are GMT -5. The time now is 09:04 AM.
|
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|