need *fast* algorithm for binary file search

Tinkster · 12-04-2002, 03:54 PM

Hi guys ...

I have to search huge binary files that consist
of a number of building blocks in no particular
order, but some of them with a fixed structure,
for strings to parse the information and am looking
for a fast search algorithm, preferably ready-to-use
code in C++ :) and return the position of the searched
for pattern within the file.

Cheers,
Tink

P.S.: Huge is > 100MB :}

DavidPhillips · 12-05-2002, 02:15 AM

how about grep?

DavidPhillips · 12-05-2002, 02:17 AM

grep -n string file

Tinkster · 12-05-2002, 01:10 PM

LOL ... thanks, but thanks no ... first of all,
grep works line-oriented, and I need the location
of my hits withinthe BINARY file ... :}

And no, I've looked at the source and DON'T want
to use it, would take me ages to understand it, not
to speak of modify, C++-ify and use ...

Cheers,
Tink

Azrael · 12-05-2002, 04:41 PM

May be you want to have a look at string matching algorithms like Knuth-Morris-Pratt. These have a lesser complexity than the naive version, but they will take their time and of course space.

llama_meme · 12-05-2002, 04:46 PM

Quote:

LOL ... thanks, but thanks no ... first of all,
grep works line-oriented, and I need the location
of my hits withinthe BINARY file ... :}

grep works fine with binary files (it doesn't try to operate on a per line basis). Not sure if you can get it to print the byte-offset of the match, but it's worth a look at the man page methinks.

the strings command might be useful (depending on what exactly you're doing, you didn't make it very clear)

Alex

Tinkster · 12-05-2002, 06:25 PM

I have a bunch of files that contain data from "foreign"
echo-sounders, and am in the process of writing a tool that
converts their heterogenous chunk into our tidy set of files :}

The files are binary, contain different objects (configuration,
sounder setup, actual acoustic data, navigational data,
annotations, ...) in no particular order, and some of them
unfortunately of varied length, too they can be very huge,
the biggest ones I saw over 120MB ... I want to split the files
into chunks in memory, and write the apropriate sections
inthe apropriate files that we use to store that kind of information
so we can analyze data that we didn't record using our own
equipment/software.

Quote:

grep works fine with binary files (it doesn't try to operate on a per line basis). Not sure if you can get it to print the byte-offset of the match, but it's worth a look at the man page methinks.

Hmm ...
it outputs gibberish 'til it hits end-of-file or a \0 ...
at least it does here. with other options the offset
doesn't seem right, either.

Quote:

(depending on what exactly you're doing, you didn't make it very clear)

Quote:

looking for a fast search algorithm, preferably ready-to-use
code in C++ :)

Cheers,
Tink