Linux - NewbieThis Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Introduction to Linux - A Hands on Guide
This guide was created as an overview of the Linux Operating System, geared toward new users as an exploration tour and getting started guide, with exercises at the end of each chapter.
For more advanced trainees it can be a desktop reference, and a collection of the base knowledge needed to proceed with system and network administration. This book contains many real life examples derived from the author's experience as a Linux system and network administrator, trainer and consultant. They hope these examples will help you to get a better understanding of the Linux system and that you feel encouraged to try out things on your own.
Click Here to receive this Complete Guide absolutely free.
I have a doubt, i have a file which contains around 3000 elements, now i want to search for an element in that file....
what will be the Efficient way of searching ? without consuming much memory and time...
1) is this efficient if i take all the 3000 elements from file to a list or array and sort them and search ?
2) or is it efficient to search directly inside the file ??
Can anyone please suggest some algorithms for an efficient way to search in this case ??
You might need to explain what you mean by 'element', as there are different search methods for various kinds of data (eg fixed length vs variable length).
It will also depend on whether you are looking for a known element (or sequence of elements), and/or need to do wild card matching.
It is not efficient to load the file and then sort and search. Sorting takes of the order of n x log(n) operations, whereas searching takes of the order of n operations. Loading the file would also be wasteful of memory.
However, if you are doing lots of searches, then this might become more reasonable (especially if you have only 3000 elements, and the elements are not too large - it isn't like we are short on memory these days).
I would strongly recommend the third book "Sorting and Searching" from Donald Knuth's magnum opus "The Art of Computer Programming", which comprehensively covers many search algorithms.
A typical string search algorithm is the Boyer-Moore one. But this is only relevant if the sequences you are looking for consist of more than element (eg strings of characters).
To use a string search algorithm for a file, you might buffer the file through a circular array.
Of course, if you aren't actually required to write a program, you could use grep :-)
Last edited by neonsignal; 07-27-2009 at 07:30 AM.
wow thanks for all the replies...
i will make it more clear...
Element in my case is not an integer or float, its a string containing less than 300 characters, and yes i am looking for a known element, The elements are not sorted, and in my case i can define best for me as "It should not take much RAM and it should be performed in a minimal time possible (even if i want to compromise among one i would say even if it takes a second or two more it should not consume much RAM)"
and i need one more clarification, say i will write a program for searching and all these stuff, but what if i use system("cat -n <file> | grep <word>"); will this take less time and memory ? or it is just to save coding time ? anything is fine for me which takes less time and RAM
Just checking - is this a real world application or a programming assignment?
Here is an example, but it won't win any prizes :-)
// quick and dirty record search
// include files
// takes two arguments: filename of record list and string to be matched
int main(int argc, char **argv)
// open file
if (argc != 3) return 1;
if (!(pFile = fopen(argv, "r"))) return 1;
// read each record
char *record = 0; size_t lBuffer = 0;
unsigned int index = 0;
while ((length = getline(&record, &lBuffer, pFile)) != -1)
// if record matches
if (length) record[length-1] = 0;
if (strcmp(record, argv) == 0)
// show matching index
// free getline space and exit
if (record) free(record);
Last edited by neonsignal; 07-29-2009 at 06:38 AM.
[QUOTE = chrism01]
Sounds a lot like homework, in which case show us what you've tried?
Ha ha NO, its not a homework, if that is the case i would not have struggled this much, the thing is my code should run on an embedded system, where busybox is not installed i.e. i cant use grep or system like that, and also RAM is very Less around 64MB i think.. so was thinking for an efficient design which takes minimal memory and minimum time... !!
Thanks for the code neonsignal
but we are directly comparing the string in the file, i may have around 3000 to 4000 entries, in which case if the element is at 2999th position, then the 2998 Iterations will be wasted (i.e. time taken is more), so what might be the best way to compromise and also i dont want to take all the elements to local array since it may take roughly around 3000*(string length) number of bytes..
Thanks soo much for the replies...
Assuming the entries in the files are sorted, Can anyone please give me a simple hash table implementation for this (I mean a code with how hash table is generated ?? ) because i vaguely know to generate some index file for the entries, but still want to know exactly...
Please give me a simple code example of a file containing 10 entries of strings in a sorted order and how to generate hash table for that ??
To adapt this to a file based database, you would store file offset indexes in the hash table instead of the strings themselves (to save memory). I leave this as an exercise for the reader.
If you aren't confident with writing the code, try to start with something simple (such as the linear search). You might even find it works well enough for your application.
A lot of decisions depend on the specifications of your embedded platform, and you have provided little information either about what this is running on, what the purpose is, or how it is to function. It isn't even clear to me why speed is the main issue, given how small the set of strings is?
Whether the data set is fixed or dynamic will also make a lot of difference to the choice of algorithms (for example, a fixed data set can be perfect hashed, or can use dictionary compression so that it fits in a small space in memory). The file system could be linearly accessible (like a disk), or randomly accessible (like an SD card) - also relevant to the choice of algorithms. Or how often searches occur compared to inserts? And many other factors.
No one method is superior, they all have advantages and disadvantages depending on the application.
Should you want to provide detail and/or code, then people will be able to give suggestions for improvements.
Thanks for those information, i implemented a very simple Indexing method, which contains the offset and size, but i need one more info,
say i have a file file1.txt and one more index file for this say file1_index.txt, now i read the file1_index.txt for a string and i got the offset as 234, now i have to go to the 234th line in the file1.txt, how can i go directly to that line in the file1.txt