Formatting Fields and Text Being Displayed from Text File
ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Formatting Fields and Text Being Displayed from Text File
I want to display the contents of a particular log file (simple text file, I mean in Linux). But there is a problem: The contents need to be organized in a fixed format. Have a look at this log file:
So, while displaying the contents of above file on a web page, I want to format the field names found in the log file: User Name:, Reported Problems Description:, and Remarks:. These fields may contain a variable length of text and no specific line number is assumed for them to appear on.
Well, what I am trying to do may sound wierd to some of you. The filed "Reported Problems Description:" can possible contain text which embeds colon (.
Will turn it into valid HTML and make the part before the first ":" in the line bold.
That is good. But it makes changes in the source log file itself. Of course, that is what "-i" is doing. But there is a problem. When I display the contents of the log file on a page, some extra tags are being displayed as they are.
For example: <b><User Name
Of course, your devised sed script does turn the field names in bold face and the other pieces of the text remain unaffected when displayed on a web page. But, as I said earlier, some extra tags are also being displayed along. I have closed to web page otherwise would show you the exact output but it is similar to what I have given above in the example.
This reads all /path/to/it-logs/DDMMYYYY-IT.log files (in their raw form unprocessed by any scripts), sorts them latest first, then outputs them each in a separate table:
The first foreach loop parses the file name, /path/to/it-logs/DDMMYYYY-IT.log, creates an unix timestamp (seconds since the epoch) based on the date, and adds the filename into a new array keyed by the timestamps.
krsort() sorts the file names based on the timestamps, latest first.
The second foreach loop processes each log file. The file() function reads the file as an array of lines.
The inner foreach loop processes each line from the current log file. The explode() function splits the line into two parts at the first colon (':').
The htmlentities() function is used to display the strings correctly in HTML. Since you're using Linux, your log files are most likely UTF-8, so I used that.
The rest is just prettyprinting. trim() is used to trim out leading and trailing whitespace; it is invisible in the end result, but the output HTML looks prettier.
You'll probably notice that since the file name array is keyed by the Unix timestamp, you can easily add query parameters limiting the output to desired dates. I recommend using the strtotime() function to parse any query start and end date parameters.
Hope this helps,
Nominal Animal
Last edited by Nominal Animal; 03-21-2011 at 07:04 AM.
User ID: XYZ
Group: IT
Shift Time: First
Problems Reported: All the servers are down. The company is going to lose its business. We are helpless and cannot do anything. The company should have hired more intelligent engineers.
I appreciate your effort and willingness to come up with solutions. I will check out your devised script and will let you know how it works for me.
The main question / problem is: How to extract pieces of data / information from a text file which has varied field types. The good news is we know the field names or can identify them so that we can provide an enhanced view of the data/information to the user. In the present example, I have not used any text file which would contain fields delimited by some character or whitespace.
If there is a better way of organizing these items in a text file, please, do let me know. I am simply putting each item on a separate line but some of the items can consist of more than one line.
This reads all /path/to/it-logs/DDMMYYYY-IT.log files (in their raw form unprocessed by any scripts), sorts them latest first, then outputs them each in a separate table:
The first foreach loop parses the file name, /path/to/it-logs/DDMMYYYY-IT.log, creates an unix timestamp (seconds since the epoch) based on the date, and adds the filename into a new array keyed by the timestamps.
krsort() sorts the file names based on the timestamps, latest first.
The second foreach loop processes each log file. The file() function reads the file as an array of lines.
The inner foreach loop processes each line from the current log file. The explode() function splits the line into two parts at the first colon (':').
The htmlentities() function is used to display the strings correctly in HTML. Since you're using Linux, your log files are most likely UTF-8, so I used that.
The rest is just prettyprinting. trim() is used to trim out leading and trailing whitespace; it is invisible in the end result, but the output HTML looks prettier.
You'll probably notice that since the file name array is keyed by the Unix timestamp, you can easily add query parameters limiting the output to desired dates. I recommend using the strtotime() function to parse any query start and end date parameters.
Hope this helps,
Nominal Animal
Okay, it works and doesn't work as well. For example, there are 5 log files and it is displaying only 3 of them. Secondly, sometimes the value is being displayed in the left column along with the field name:
Reported Problems
Nothing works fine in this company. Blah sdjfkhsd kfhsd fh sdjkfhs djkfh sdjkf sd fsh dfkjh sdkfj sdjkh fjkd sfjk sdkf sd fsd fsdfNone
that way both the field and the value are in bold face. For fields whose values are not longer than one line, are being displayed properly.
Well, you have done very well. I must say. You have also provided a good layout for the report. Thank you!
Then post a sample, because my solution relies on the idea that there's one "name: value" statement per line.
Yes, the fields can have more than a line of text. In fact, hundreds of lines may go under, for example, "Problems Reported:".
cat 09022011-IT.log (Original Log File)
User ID: XYZ
Group: IT
Shift Time: First
Problems Reported: All the servers are down. The company is going to lose its business. We are helpless and cannot do anything. The company should have hired more intelligent engineers.
Above, I have given only 4 fields. There are actually more of them. But that would not make any difference because if we can work with those few fields then we can work with any number of fields.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.