ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I'm thinking I want to use some form of code to make a single pass at /var/log/mail.log which is usually under 150 MB and do pattern matching to append output to separate new *.txt files.
To me, the "best" way is the fastest. Right now I'm thinking grep? But I'm looking for suggestions on better ways to tackle this? For example, is a single pass on a file programmatically better than chunking up multiple passes looking for the discrete elements?
I'm thinking I want to use some form of code to make a single pass at /var/log/mail.log which is usually under 150 MB and do pattern matching to append output to separate new *.txt files.
To me, the "best" way is the fastest. Right now I'm thinking grep? But I'm looking for suggestions on better ways to tackle this? For example, is a single pass on a file programmatically better than chunking up multiple passes looking for the discrete elements?
Hi,
I would suggest using perl. perl ist very fast and powerful in parsing lines e.g. to distribute and reformat lines to make them more readable for humans.
OK, two recommendations for perl so far, I can live with that. Is doing a single pass best practice? Or should I be sweeping the file multiple times to glean the criteria I am after? Or should I be creating TMP files with subsets of data and assembling those into a finished file? I know each can be done, I'm just wondering what is proper etiquette for future changes and maintainability.
I'm thinking I want to use some form of code to make a single pass at /var/log/mail.log which is usually under 150 MB and do pattern matching to append output to separate new *.txt files.
Are you wanting to grab specific lines or fragments associated with an individual message: start...this ... that ... end ... and so on?
Are you wanting to see what exists in your file right now or do you want a report on whatever might be in the entire file?
grep might work well for a one time grab of lines matching some pattern.
If you want to scan this log file routinely, and slice and dice its content into a periodic report, I'd suggest sed and awk for the totally geek effort. perl might offer a more satisfying implementation for a manage-my-workstation application.
I haven't fully defined in my head all criteria I'm going to parse yet but the idea is to produce a collection of files that are less overwhelming to consume than the raw mail log. One file might contain all items blocked from a domain. Another might contain all items accepted for one domain or a user. They will also have totals... stuff like that.
...
should I be sweeping the file multiple times to glean the criteria I am after? Or should I be creating TMP files with subsets of data and assembling those ...
We might be able to make more effective recommendations, if we had an overview of what you want to accomplish. There might be an existing application [email has been around so long it is hard to imagine that something doesn't already do almost everything] that someone can tell you about.
Humans want and pay for benefits
Features deliver benefits
Components implement features
Applications are collections of required and optional components
What is the primary benefit of what you are wanting to accomplish?
I want to create a raw text (*.txt) file that contains XXXX and deliver that file to each of my end-users.
Which application features are needed to deliver that benefit?
I need a per-end-user file of raw text (*.txt) of their mail.log entries.
Which implementation details make each application feature possible?
I need to scan /var/log/mail.log and select XXX for each end-user.
How do I best implement those details?
(You might need to define "best" first, but ...) Perl [my opinion] will make it straight forward to
SaintDanBert, I hear what you're saying, all excellent points. Problem is, I don't know what I don't know (yet).
From my perspective, the recommendations that have come so far have been helpful because without much scope I didn't "hear use C++", etc. Granted that might be what I need to do later on but I feel I have more than what I came to the forum with; a path to explore. From here I can begin to formulate better questions as I uncover what I could want to deliver.
SaintDanBert, I hear what you're saying, all excellent points. Problem is, I don't know what I don't know (yet).
From my perspective, the recommendations that have come so far have been helpful because without much scope I didn't "hear use C++", etc. Granted that might be what I need to do later on but I feel I have more than what I came to the forum with; a path to explore. From here I can begin to formulate better questions as I uncover what I could want to deliver.
if you have a big file, use grep+awk. grep for its fast pattern searching algorithm and awk for processing/manipulating text. Otherwise, just awk is enough for file processing. If you want to search a big file from the end, use tail.
I would suggest using perl. perl ist very fast and powerful in parsing lines e.g. to distribute and reformat lines to make them more readable for humans.
"... apps always have a database whether they planned for it or not ..."
Quote:
Originally Posted by elfoozo
...
I feel I have more than what I came to the forum with; a path to explore. From here I can begin to formulate better questions as I uncover what I could want to deliver.
If you want to create reports or similar from the raw log file,
you might load the raw file data into a "database" of some sort.
This might work really well if you might want to look at the details
historically: "Show me all rejects from May of ought-four".
I put the word "database" in quotes because you don't need the heavy
weight of a MySQL or Postgres to gain the benefits that data base
management software might offer to a data mining problem.
** read your log
** post to your "database"
** update your data index files
** search your "database" for details that match your current desire
** format a text file "report" based on your search results
(grinning) We'll always have more suggestions than you want to use,
~~~ 0;-Dan
Definitely sounds like a job for Perl; pattern matching & text mangling is what its really good at. You can probably do it in one pass as well, as you can open and close files on demand for each type of output.
Really it depends on waiting for you to decide what kind of output you want as to whether one or more passes is reqd eg overlapping rec sets ie recs that satisfy >1 criteria.
Just FYI, Perl is 'compiled-on-the-fly' so its pretty swift. http://www.perl.com/doc/FMTEYEWTK/comp-vs-interp.html
Based on what you're saying you're basically looking for informational reports generated from the mail logs on your system. I definitely suggest perl for this. If not perl then awk... but really, perl is the easiest route here and most expandable for future use with the least difficulty unless you're already an awk expert.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.