Hi,
I'm trying to extract information from text logs, the task is a bit non-trivial though.
First of all, I'd like to do it a portable way. Personally I'm in the linux world, but I'd like to share the script with a community which I know to largely consist of Windows users.
So here's an idea of what the raw logs look like:
Code:
["class1"] Name1
noise
noise 70 noise
5 foo
noisenoise
bar 200
noise
["class2"] Name2.1
["class3"] Name3
noise
foobar
foobar
foobar
["class2"] Name2.1
["class2"] Name2.2
The parsing itself can be broken down into the following points:
- getting rid of some useless information, eg deleting all the lines matching certain regex
- counting all the class2 items with similar names and writing one line that includes the name and number of occurrences
- sorting according to classes and names
- this is where it gets a bit challenging: getting rid of 'static' data in class1
The last point requires some further explanation:
I want to match entries from class1 with some .lua configuration files. There are like 10 files and the every class1 entry has a small configuration matching its name. Now I need to compare the data of Name and keep only the lines containing numeric values in the range specified in the configuration. Here's a snippet of a .lua config:
Code:
Name1
foobar 1-10
foobar 100-300
Unfortunately foo and bar are not found in the config, i just have foobar there, so the matching has to be done according to the value ranges, these are always under the form '[0-9]\{2,3\}-[0-9]\{2,3\}'.
The output of the log above would then look like this:
Code:
["class1"] Name1
5 foo
bar 200
["class2"] Name2.1 * 2
["class2"] Name2.2 * 1
["class3"] Name3
foobar
foobar
foobar
So, how do I get started? As I mentioned earlier I'd prefer not having to rely on typical linux tools like sed. Maybe Python could do the job? If so, please give me some hints what to do/where to look.
thanks for reading and any help!