LinuxQuestions.org
Did you know LQ has a Linux Hardware Compatibility List?
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 03-13-2010, 01:52 PM   #1
nze
LQ Newbie
 
Registered: Mar 2010
Posts: 5

Rep: Reputation: 6
Question Tricky text file parsing [portability?]


Hi,

I'm trying to extract information from text logs, the task is a bit non-trivial though.
First of all, I'd like to do it a portable way. Personally I'm in the linux world, but I'd like to share the script with a community which I know to largely consist of Windows users.

So here's an idea of what the raw logs look like:
Code:
["class1"] Name1
noise
noise 70 noise
5 foo
noisenoise
bar 200
noise
["class2"] Name2.1

["class3"] Name3
noise
foobar
foobar
foobar
["class2"] Name2.1

["class2"] Name2.2
The parsing itself can be broken down into the following points:
  1. getting rid of some useless information, eg deleting all the lines matching certain regex
  2. counting all the class2 items with similar names and writing one line that includes the name and number of occurrences
  3. sorting according to classes and names
  4. this is where it gets a bit challenging: getting rid of 'static' data in class1
The last point requires some further explanation:
I want to match entries from class1 with some .lua configuration files. There are like 10 files and the every class1 entry has a small configuration matching its name. Now I need to compare the data of Name and keep only the lines containing numeric values in the range specified in the configuration. Here's a snippet of a .lua config:

Code:
Name1
foobar 1-10 
foobar 100-300
Unfortunately foo and bar are not found in the config, i just have foobar there, so the matching has to be done according to the value ranges, these are always under the form '[0-9]\{2,3\}-[0-9]\{2,3\}'.

The output of the log above would then look like this:
Code:
["class1"] Name1
5 foo
bar 200

["class2"] Name2.1 * 2
["class2"] Name2.2 * 1

["class3"] Name3
foobar
foobar
foobar
So, how do I get started? As I mentioned earlier I'd prefer not having to rely on typical linux tools like sed. Maybe Python could do the job? If so, please give me some hints what to do/where to look.

thanks for reading and any help!
 
Old 03-13-2010, 02:06 PM   #2
Sergei Steshenko
Senior Member
 
Registered: May 2005
Posts: 4,481

Rep: Reputation: 453Reputation: 453Reputation: 453Reputation: 453Reputation: 453
Quote:
Originally Posted by nze View Post
Hi,

...

So, how do I get started? As I mentioned earlier I'd prefer not having to rely on typical linux tools like sed. Maybe Python could do the job? If so, please give me some hints what to do/where to look.

thanks for reading and any help!
There is, of course, Perl for Linux and there is Strawberry Perl for Windows. The beauty of the latter Perl is that it also comes as a self-sufficient tree which can be put into any directory and run from there, i.e. one doesn't even need to have Administrator privileges under Windows to be able to run Strawberry Perl.

Look for Portable Strawberry Perl in order to get the version not requiring Administrator privileges/installation.

Last edited by Sergei Steshenko; 03-14-2010 at 08:51 AM.
 
Old 03-13-2010, 11:56 PM   #3
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,695
Blog Entries: 5

Rep: Reputation: 241Reputation: 241Reputation: 241
Quote:
Originally Posted by nze View Post
Hi,
Maybe Python could do the job?
yes of course. Its portable as well.

Quote:
If so, please give me some hints what to do/where to look.
first you should learn about it if you have not already. see my sig


As for your questions.
Quote:
The parsing itself can be broken down into the following points:[LIST=1][*]getting rid of some useless information, eg deleting all the lines matching certain regex
Python has very good string manipulation capabilities, so normally, regex is not needed. for example to delete all line matching certain pattern, in the simplest case,

Code:
import fileinput
for line in fileinput.FileInput("file",inplace=0):
    if "pattern" not in line:
        print line.strip()
in the above, you can use fileinput module to do "inplace" editing. (much like sed's -i option). change to "1" to take effect.

Quote:
[*]counting all the class2 items with similar names and writing one line that includes the name and number of occurrences
there are data structures you can use , eg lists, dictionaries to enable you to count stuff. for example, once you put your stuffs into lists, you can use len() function to count. eg
Code:
mylist=[]
mylist.append("one")
mylist.append("two")
print len(mylist)
Quote:
[*]sorting according to classes and names
Python has sort(), sorted() functions to do this

these are just tip of the iceberg, to get the real deal, head down to the Python doc site and learn about it.
 
  


Reply

Tags
portable, processing, python, text


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Parsing text file sandeepsudeep Linux - Newbie 7 10-09-2007 05:34 AM
I need help parsing text from a text file rsmccain Linux - General 2 01-05-2006 02:43 PM
Need help parsing text file scilec Programming 5 12-02-2004 01:00 PM
need help parsing text file airman99 Linux - General 2 10-08-2004 09:09 PM
Parsing a file for a string of text jamesmwlv Linux - General 2 12-02-2002 07:13 PM


All times are GMT -5. The time now is 04:36 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration