Quote:
Originally Posted by fantas
If the OP would have wanted a _good_ answer (maybe he's found a sufficient answer already here ?) then he'd have got back into this thread.
|
I got sidetracked.
I setup a brute-force method as a temporary solution so that I could get something working. Getting to that point required alot more coding - in which I got lost, now I'm back.
For the record, I'm looking for words, not just "ab bc cd" etc. that said, there are probably 50 or so keywords I'm parsing for - all start with capital letters, and probably not every letter of the alphabet - I like the idea of tokenizing & throwing away things I don't need.
There's really no typical file (or string) size I'm processing. File sizes for this app are on a bell curve anywhere from 50 or so bytes up to 1-2GB with the peak of the bell being around 5-10MB, I'd say. Much of that data is numeric - kind of like arguments for for the keyword. For instance, here's a sample:
Code:
##CreationDate Wed Feb 20 19:25:54 2008
version 3.04
Declare "resource" "string"
Declare "dirmap" "string"
Declare "minmax" "int"
Declare "serverresource" "string"
AttributeBegin
ResourceBegin
Attribute "identifier" "name" ["some_stuff"]
ConcatTransform [1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1]
ShadingInterpolation "smooth"
IfBegin "!defined(RATFilterLightSource)"
IfEnd
IfBegin "!defined(RATFilterSurface)"
Color [0 0 1]
Opacity [1 1 1]
Surface "mtorLambert" "float refractiveIndex" [1] "float diffuseCoeff" [0.8] "color ambientColor" [0 0 0] "color incandescence" [0 0 0] "float translucenceCoeff" [0] "float glowIntensity" [0]
IfEnd
PointsGeneralPolygons [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1] [4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4] [1 21 20 0 2
22 21 1 3 23 22 2 4 24 23 3 5 25 24 4 6 26 25 5 7 27 26 6] "P" [20.6299 183.861 -736.018 21.2625 184.402 -737.719 22.2606 185.245 -739.074] "facevarying normal N" [-0.655181 -0.491973 -0.573324 -0.686153 -0.448746 -0.572557]
ResourceEnd
AttributeEnd
... in this example, what I'm looking for (at the moment), is the "PointsGeneralPolygon" keyword followed by it's arguments. The issues is that "PointsGeneralPolygon" is not the only keyword... there are many - which I'll be adding as I add features (one group of which is "facevarying normal N" as you can see here) - and their order is important. So not only would I rather not parse the whole file/string for each keyword, but I have to go in order being that some of the keywords relate to transformation matrices which affect the polygons.
The machines this will run on are pretty stout with at least 2GB of memory (and often 4) - not all of which is devoted to my app (this is all eventually drawing to a gl window, so a decent amount of memory is used up there too). I like the idea of chunking the raw data into smaller sizes for processing to save memory, though that shouldn't be necessary in all but the extreme cases.
I don't know if this clears things up or makes any difference - just after all the effort put into the replies, I thought I'd put some effort into an explanation and a thanks.