Efficient search technique for text file of size 2 mb or more
ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Efficient search technique for text file of size 2 mb or more
Hi all,
If i want to implement c program that finds out user-specified number or word from the text file , having size arnd 2 mb or more..
(text file is a combination of words and numbers)
I recommend a bash script as a front end to find possible text files, then submit the files to your C program. Look at 'man find'. I'm pretty sure that can take care of the size thing. Then look at 'man file' or 'man stat'; after obtaining a list of everything > 2MB, you can use these to determine if they are text files or not. You'll have to 'grep' and/or 'sed' to get something pretty looking out of it, though.
ta0kira
If the file is not sorted, and you don't have a clue on where to find the thing you're searching for, the linear search is the way to go. A C program reading data to a buffer, searching the buffer and reading new fragment is a simple and rather effective way.
Your first question, as Mara noted, is whether there's any order in the file itself (is the file sorted? can you read a line at a time, or is it just a random byte stream? Etc etc)
The next question is whether you need to parse the entire file itself for each query, or whether it makes sense to index the file (as the Wikipedia article addy86 suggests).
It would be interesting to do some tests, but I think it's unlikely you could easily write a C program that would necessarily out-perform "grep" or "awk" for basic pattern matching (i.e. "search") speed and efficiency. (I'm prepared to be 100% wrong about that statement, by the way ;-))
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.