LinuxQuestions.org
Review your favorite Linux distribution.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 03-23-2010, 12:15 PM   #1
elfoozo
Member
 
Registered: Feb 2004
Location: Washington, USA
Distribution: Debian
Posts: 265

Rep: Reputation: 32
Parsing a file?


I'm thinking I want to use some form of code to make a single pass at /var/log/mail.log which is usually under 150 MB and do pattern matching to append output to separate new *.txt files.

To me, the "best" way is the fastest. Right now I'm thinking grep? But I'm looking for suggestions on better ways to tackle this? For example, is a single pass on a file programmatically better than chunking up multiple passes looking for the discrete elements?
 
Old 03-23-2010, 12:35 PM   #2
rweaver
Senior Member
 
Registered: Dec 2008
Location: Louisville, OH
Distribution: Debian, CentOS, Slackware, RHEL, Gentoo
Posts: 1,833

Rep: Reputation: 167Reputation: 167
Grep is fine if you're pulling a couple distinct elements and want to discard the rest, otherwise I'd say you're better to move into perl.
 
1 members found this post helpful.
Old 03-23-2010, 12:40 PM   #3
winni
LQ Newbie
 
Registered: Jul 2005
Location: Olching, Germany (near Munich)
Distribution: Novell/SuSE 10.2
Posts: 3

Rep: Reputation: 1
parsing log files

Quote:
Originally Posted by elfoozo View Post
I'm thinking I want to use some form of code to make a single pass at /var/log/mail.log which is usually under 150 MB and do pattern matching to append output to separate new *.txt files.

To me, the "best" way is the fastest. Right now I'm thinking grep? But I'm looking for suggestions on better ways to tackle this? For example, is a single pass on a file programmatically better than chunking up multiple passes looking for the discrete elements?
Hi,

I would suggest using perl. perl ist very fast and powerful in parsing lines e.g. to distribute and reformat lines to make them more readable for humans.

Hope this helps,
Winfried
 
1 members found this post helpful.
Old 03-23-2010, 01:01 PM   #4
elfoozo
Member
 
Registered: Feb 2004
Location: Washington, USA
Distribution: Debian
Posts: 265

Original Poster
Rep: Reputation: 32
OK, two recommendations for perl so far, I can live with that. Is doing a single pass best practice? Or should I be sweeping the file multiple times to glean the criteria I am after? Or should I be creating TMP files with subsets of data and assembling those into a finished file? I know each can be done, I'm just wondering what is proper etiquette for future changes and maintainability.
 
Old 03-23-2010, 01:01 PM   #5
H_TeXMeX_H
LQ Guru
 
Registered: Oct 2005
Location: $RANDOM
Distribution: slackware64
Posts: 12,928
Blog Entries: 2

Rep: Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301
grep should work fine, if you require something more complex, then you can go for perl, but probably grep is faster. Single pass will be much faster.

Last edited by H_TeXMeX_H; 03-23-2010 at 01:02 PM.
 
1 members found this post helpful.
Old 03-23-2010, 02:06 PM   #6
SaintDanBert
Senior Member
 
Registered: Jan 2009
Location: "North Shore" Louisiana USA
Distribution: Mint-20.1 with Cinnamon
Posts: 1,771
Blog Entries: 3

Rep: Reputation: 108Reputation: 108
Quote:
Originally Posted by elfoozo View Post
I'm thinking I want to use some form of code to make a single pass at /var/log/mail.log which is usually under 150 MB and do pattern matching to append output to separate new *.txt files.
Are you wanting to grab specific lines or fragments associated with an individual message: start...this ... that ... end ... and so on?

Are you wanting to see what exists in your file right now or do you want a report on whatever might be in the entire file?

grep might work well for a one time grab of lines matching some pattern.

If you want to scan this log file routinely, and slice and dice its content into a periodic report, I'd suggest sed and awk for the totally geek effort. perl might offer a more satisfying implementation for a manage-my-workstation application.

Cheers,
~~~ 8d;-Dan
 
1 members found this post helpful.
Old 03-23-2010, 02:19 PM   #7
elfoozo
Member
 
Registered: Feb 2004
Location: Washington, USA
Distribution: Debian
Posts: 265

Original Poster
Rep: Reputation: 32
I haven't fully defined in my head all criteria I'm going to parse yet but the idea is to produce a collection of files that are less overwhelming to consume than the raw mail log. One file might contain all items blocked from a domain. Another might contain all items accepted for one domain or a user. They will also have totals... stuff like that.
 
Old 03-23-2010, 02:21 PM   #8
SaintDanBert
Senior Member
 
Registered: Jan 2009
Location: "North Shore" Louisiana USA
Distribution: Mint-20.1 with Cinnamon
Posts: 1,771
Blog Entries: 3

Rep: Reputation: 108Reputation: 108
Quote:
Originally Posted by elfoozo View Post
...
should I be sweeping the file multiple times to glean the criteria I am after? Or should I be creating TMP files with subsets of data and assembling those ...
We might be able to make more effective recommendations, if we had an overview of what you want to accomplish. There might be an existing application [email has been around so long it is hard to imagine that something doesn't already do almost everything] that someone can tell you about.
  • Humans want and pay for benefits
  • Features deliver benefits
  • Components implement features
  • Applications are collections of required and optional components

What is the primary benefit of what you are wanting to accomplish?
I want to create a raw text (*.txt) file that contains XXXX and deliver that file to each of my end-users.

Which application features are needed to deliver that benefit?
I need a per-end-user file of raw text (*.txt) of their mail.log entries.

Which implementation details make each application feature possible?
I need to scan /var/log/mail.log and select XXX for each end-user.

How do I best implement those details?
(You might need to define "best" first, but ...) Perl [my opinion] will make it straight forward to
  • open a data file /var/log/mail.log
  • setup for each end-user
  • open a results file, mumble.txt
  • gather details for each end-user
  • close a results file
  • wrap-up this end-user and prepare for the next
  • reset data file for next end user
  • close the data file
 
1 members found this post helpful.
Old 03-23-2010, 02:39 PM   #9
elfoozo
Member
 
Registered: Feb 2004
Location: Washington, USA
Distribution: Debian
Posts: 265

Original Poster
Rep: Reputation: 32
SaintDanBert, I hear what you're saying, all excellent points. Problem is, I don't know what I don't know (yet).

From my perspective, the recommendations that have come so far have been helpful because without much scope I didn't "hear use C++", etc. Granted that might be what I need to do later on but I feel I have more than what I came to the forum with; a path to explore. From here I can begin to formulate better questions as I uncover what I could want to deliver.
 
Old 03-23-2010, 07:48 PM   #10
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,697
Blog Entries: 5

Rep: Reputation: 244Reputation: 244Reputation: 244
Quote:
Originally Posted by elfoozo View Post
SaintDanBert, I hear what you're saying, all excellent points. Problem is, I don't know what I don't know (yet).

From my perspective, the recommendations that have come so far have been helpful because without much scope I didn't "hear use C++", etc. Granted that might be what I need to do later on but I feel I have more than what I came to the forum with; a path to explore. From here I can begin to formulate better questions as I uncover what I could want to deliver.
if you have a big file, use grep+awk. grep for its fast pattern searching algorithm and awk for processing/manipulating text. Otherwise, just awk is enough for file processing. If you want to search a big file from the end, use tail.
 
Old 03-23-2010, 07:49 PM   #11
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,697
Blog Entries: 5

Rep: Reputation: 244Reputation: 244Reputation: 244
Quote:
Originally Posted by winni View Post
Hi,

I would suggest using perl. perl ist very fast and powerful in parsing lines e.g. to distribute and reformat lines to make them more readable for humans.

Hope this helps,
Winfried
to add to this list, so is Python, awk, grep, etc
 
Old 03-24-2010, 01:10 PM   #12
SaintDanBert
Senior Member
 
Registered: Jan 2009
Location: "North Shore" Louisiana USA
Distribution: Mint-20.1 with Cinnamon
Posts: 1,771
Blog Entries: 3

Rep: Reputation: 108Reputation: 108
"... apps always have a database whether they planned for it or not ..."

Quote:
Originally Posted by elfoozo View Post
...
I feel I have more than what I came to the forum with; a path to explore. From here I can begin to formulate better questions as I uncover what I could want to deliver.
If you want to create reports or similar from the raw log file,
you might load the raw file data into a "database" of some sort.
This might work really well if you might want to look at the details
historically: "Show me all rejects from May of ought-four".
I put the word "database" in quotes because you don't need the heavy
weight of a MySQL or Postgres to gain the benefits that data base
management software might offer to a data mining problem.

** read your log
** post to your "database"
** update your data index files
** search your "database" for details that match your current desire
** format a text file "report" based on your search results

(grinning) We'll always have more suggestions than you want to use,
~~~ 0;-Dan
 
Old 03-24-2010, 09:17 PM   #13
chrism01
LQ Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Rocky 9.2
Posts: 18,359

Rep: Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751
Definitely sounds like a job for Perl; pattern matching & text mangling is what its really good at. You can probably do it in one pass as well, as you can open and close files on demand for each type of output.
Really it depends on waiting for you to decide what kind of output you want as to whether one or more passes is reqd eg overlapping rec sets ie recs that satisfy >1 criteria.
Just FYI, Perl is 'compiled-on-the-fly' so its pretty swift.
http://www.perl.com/doc/FMTEYEWTK/comp-vs-interp.html
 
Old 03-26-2010, 03:25 PM   #14
rweaver
Senior Member
 
Registered: Dec 2008
Location: Louisville, OH
Distribution: Debian, CentOS, Slackware, RHEL, Gentoo
Posts: 1,833

Rep: Reputation: 167Reputation: 167
Based on what you're saying you're basically looking for informational reports generated from the mail logs on your system. I definitely suggest perl for this. If not perl then awk... but really, perl is the easiest route here and most expandable for future use with the least difficulty unless you're already an awk expert.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
File parsing gayat Programming 1 01-08-2010 01:38 AM
(Debian) xorg.conf file, (EE) Problem parsing the config file unclerick94 Linux - Newbie 1 07-28-2009 02:27 PM
Reading/Wirting file/parsing xml file using javascript fakhrul Programming 1 08-14-2007 05:08 PM
parsing fields of a file sang_froid Programming 2 10-26-2006 02:16 AM
Need help with file parsing BrianK Programming 2 09-02-2005 05:58 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 02:38 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration