LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 02-19-2008, 11:56 AM   #16
ta0kira
Senior Member
 
Registered: Sep 2004
Distribution: FreeBSD 9.1, Kubuntu 12.10
Posts: 3,078

Rep: Reputation: Disabled

strtok/strsep are essentially the same as the search we're trying to do (but strtok copies the data,) except with 1-char strings instead of multi. If you're going to do that, you might as well just go ahead and make the comparison to the first char or every target string. I think the thinking needs to take place at the point of "know what you're looking for" rather than "know how to look efficiently because you haven't actually broken down what you're looking for."
ta0kira

Last edited by ta0kira; 02-19-2008 at 12:12 PM.
 
Old 02-19-2008, 12:33 PM   #17
fantas
Member
 
Registered: Jun 2007
Location: Bavaria
Distribution: slackware, xubuntu
Posts: 143

Rep: Reputation: 22
Cool

Quote:
Originally Posted by ta0kira View Post
<snip>
(but strtok copies the data,)
<snip>
That's not accurate AFAIK. Going by a C reference info it's always working on the same buffer (destructively in the most cases depending on c lib). But that's just another detail which is not really decisive for answering this thread.

If the OP would have wanted a _good_ answer (maybe he's found a sufficient answer already here ?) then he'd have got back into this thread.
 
Old 02-19-2008, 06:29 PM   #18
BrianK
Senior Member
 
Registered: Mar 2002
Location: Los Angeles, CA
Distribution: Debian, Ubuntu
Posts: 1,334

Original Poster
Rep: Reputation: 51
Just wanted to post a thanks. I've been following the discussion. Very informative.
 
Old 02-20-2008, 09:53 PM   #19
BrianK
Senior Member
 
Registered: Mar 2002
Location: Los Angeles, CA
Distribution: Debian, Ubuntu
Posts: 1,334

Original Poster
Rep: Reputation: 51
Quote:
Originally Posted by fantas View Post
If the OP would have wanted a _good_ answer (maybe he's found a sufficient answer already here ?) then he'd have got back into this thread.
I got sidetracked.

I setup a brute-force method as a temporary solution so that I could get something working. Getting to that point required alot more coding - in which I got lost, now I'm back.

For the record, I'm looking for words, not just "ab bc cd" etc. that said, there are probably 50 or so keywords I'm parsing for - all start with capital letters, and probably not every letter of the alphabet - I like the idea of tokenizing & throwing away things I don't need.

There's really no typical file (or string) size I'm processing. File sizes for this app are on a bell curve anywhere from 50 or so bytes up to 1-2GB with the peak of the bell being around 5-10MB, I'd say. Much of that data is numeric - kind of like arguments for for the keyword. For instance, here's a sample:
Code:
##CreationDate Wed Feb 20 19:25:54 2008

version 3.04
Declare "resource" "string"
Declare "dirmap" "string"
Declare "minmax" "int"
Declare "serverresource" "string"

	AttributeBegin 
		ResourceBegin 
		Attribute "identifier" "name" ["some_stuff"]
		ConcatTransform [1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1]
		ShadingInterpolation "smooth"
		IfBegin "!defined(RATFilterLightSource)"
		IfEnd 
		IfBegin "!defined(RATFilterSurface)"
		Color [0 0 1]
		Opacity [1 1 1]
		Surface "mtorLambert" "float refractiveIndex" [1] "float diffuseCoeff" [0.8] "color ambientColor" [0 0 0] "color incandescence" [0 0 0] "float translucenceCoeff" [0] "float glowIntensity" [0]
		IfEnd 
		PointsGeneralPolygons [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1] [4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
		4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4] [1 21 20 0 2
		22 21 1 3 23 22 2 4 24 23 3 5 25 24 4 6 26 25 5 7 27 26 6]  "P" [20.6299 183.861 -736.018 21.2625 184.402 -737.719 22.2606 185.245 -739.074]  "facevarying normal N" [-0.655181 -0.491973 -0.573324 -0.686153 -0.448746 -0.572557]
		ResourceEnd 
	AttributeEnd
... in this example, what I'm looking for (at the moment), is the "PointsGeneralPolygon" keyword followed by it's arguments. The issues is that "PointsGeneralPolygon" is not the only keyword... there are many - which I'll be adding as I add features (one group of which is "facevarying normal N" as you can see here) - and their order is important. So not only would I rather not parse the whole file/string for each keyword, but I have to go in order being that some of the keywords relate to transformation matrices which affect the polygons.

The machines this will run on are pretty stout with at least 2GB of memory (and often 4) - not all of which is devoted to my app (this is all eventually drawing to a gl window, so a decent amount of memory is used up there too). I like the idea of chunking the raw data into smaller sizes for processing to save memory, though that shouldn't be necessary in all but the extreme cases.

I don't know if this clears things up or makes any difference - just after all the effort put into the replies, I thought I'd put some effort into an explanation and a thanks.
 
Old 02-21-2008, 02:32 PM   #20
tuxdev
Senior Member
 
Registered: Jul 2005
Distribution: Slackware
Posts: 2,012

Rep: Reputation: 115Reputation: 115
Since we're talking about an actual language, use a lexical analyzer and parser generators. It'll save you a lot more headaches in the future when you need to do more complicated stuff.
 
Old 02-21-2008, 02:59 PM   #21
BrianK
Senior Member
 
Registered: Mar 2002
Location: Los Angeles, CA
Distribution: Debian, Ubuntu
Posts: 1,334

Original Poster
Rep: Reputation: 51
Quote:
Originally Posted by tuxdev View Post
Since we're talking about an actual language, use a lexical analyzer and parser generators. It'll save you a lot more headaches in the future when you need to do more complicated stuff.
::smacks forehead::

The last time I wrote a parser for these files was about 7 years ago at another company - before I knew about yacc/lex. After I found out about those tools, I thought, "Man, that would be a perfect, more robust, replacement for my by-hand parsing."

Now that I have to do it again, I'd totally forgotten about these tools.

Thanks for the reminder.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
LDAP API - Searching through multiple OU smurff Programming 3 09-22-2006 03:22 AM
searching for a string of charcters in some files hhegab Programming 2 04-16-2005 05:07 PM
Searching for a string krazykow Solaris / OpenSolaris 1 03-17-2005 11:55 AM
searching for multiple files ryedunn Linux - Newbie 4 09-27-2004 03:21 PM
mvoing multiple items gonus Linux - General 2 01-20-2003 07:18 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 08:26 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration