LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   Best Data Structure to store data in C (https://www.linuxquestions.org/questions/programming-9/best-data-structure-to-store-data-in-c-908535/)

meenakshi.khurana 10-17-2011 02:51 AM

Best Data Structure to store data in C
 
I need to load thousands of records in a text file into memory. The record comprises of two fields: one is number and other is string

Eg:

8914,BHR
9918,ORS

Which is the best data structure to hold this data? The operations performed on this data would be:
1) Exhaustive Searching
2) Limited addition, addition of 5-10 records a day.

Also, what will be the best technique to insert new record dynamically in this scenario without being restarting the program? Currently i am using IPC (Linux Message Queues) and Signals to indicate the C binary for availability of new records.

lyle_s 10-17-2011 06:53 AM

A dynamic array sounds about right:

http://developer.gnome.org/glib/stable/glib-Arrays.html

You probably already have GLib.

Lyle.

ta0kira 10-17-2011 01:29 PM

What type of searching are you doing? I suggest a balanced binary tree sorted based on either of the columns, depending on which is more important. Alternatively, you could represent each as a pair (e.g. struct pair { int val; char name[32]; };) and have two balanced binary trees, each sorted based on one of the columns, but holding a pointer to a struct pair (both trees representing a different ordering of the same data.) This is really only helpful if sorting order will help with your search, though.
Kevin Barry

johnsfine 10-17-2011 01:53 PM

Quote:

Originally Posted by meenakshi.khurana (Post 4500306)
I need to load thousands of records in a text file into memory.

Thousands is actually pretty trivial for a computer. So it might not matter much how you store the data.

Quote:

1) Exhaustive Searching
Searching for which field? Or is it both? Searching for exact match of a field? Or for partial match?

For exact match searching, a hash is best.

If ordering matters or for partial matches on the beginning of a field, a tree structure such as red black or AVL is likely best.

To search on either key, you might want a hash or tree for each.

For more complicated fast partial match searching you might want a more complicated index.

Quote:

Originally Posted by meenakshi.khurana (Post 4500306)
2) Limited addition, addition of 5-10 records a day.

Also, what will be the best technique to insert new record dynamically

Why do you think good technique for that matters? 5 to ten insertions per day into a data structure of thousands! If you needed to totally rebuild a complicated index across ten thousand records ten time a day, that still shouldn't be a performance problem worth worrying about.

So if "Exhaustive Searching" was really true, that could justify some serious performance consideration. But I think you might be talking about a problem so small that performance doesn't matter.

theNbomr 10-17-2011 01:53 PM

Maybe you should consider the use of a RDB such as sqlite, which will do efficient searching for you, and allow you to focus on the business logic of your application, rather than the minutia of small data structures. This can potentially add value to your data by making it accessible through other applications, and provide relatively easy methods for complex search criteria. Creation, retrieval, update and deletion (crud) of data comes pretty much for free with an SQL database tool.

--- rod.

sundialsvcs 10-17-2011 08:07 PM

What you want to minimize, in this "in memory" (sic...) data structure, is exactly one thing: virtual memory page-faults. Don't try to "save memory." Minimize the number of storage accesses required to locate a particular value.

But the previous suggestion, to throw the whole thing into an SQLite database file, is an excellent and superior suggestion.

meenakshi.khurana 10-18-2011 12:04 AM

Quote:

Originally Posted by johnsfine (Post 4500745)
Thousands is actually pretty trivial for a computer. So it might not matter much how you store the data.

I know thousands is pretty trivial. Thats why i was using arrays till date for same with no performance issues. But i want to optimize the code.

Quote:

Originally Posted by johnsfine (Post 4500745)
Searching for which field? Or is it both? Searching for exact match of a field? Or for partial match?

Searching on first field and retrieving second field corresponding to that. Search should be exact match.

Quote:

Originally Posted by johnsfine (Post 4500745)

Why do you think good technique for that matters? 5 to ten insertions per day into a data structure of thousands! If you needed to totally rebuild a complicated index across ten thousand records ten time a day, that still shouldn't be a performance problem worth worrying about.

So if "Exhaustive Searching" was really true, that could justify some serious performance consideration. But I think you might be talking about a problem so small that performance doesn't matter.


I asked for good technique to inform a running C binary about availability of new records not for inserting new records. As i told earlier even, i am using signals and Message queues for same currently. So, i just want to confirm if this method is ok or something better can be use.

And yes you are right, this problem is not that big. Currently i have thousands of data and this may extend to lakhs in coming time. So, should i continue using arrays or should i use dynamic data structure like hash. Please suggest.

Thanks in advance.

chrism01 10-18-2011 08:54 PM

Hmmm, well according to https://secure.wikimedia.org/wikipedia/en/wiki/Lakh = 100,000 in Western notation, so for data as per OP, that should all fit in mem; they're not very big fields.
Assuming that the 1st field is ALWAYS unique, then a hash would be good for fast searching as described, OTOH SQLLite would be easier to interface with generally.
If its always just a few new recs per day, then your current method of notification would work, but again using SQL would enable you to use other tools & be able to easily export as well.


All times are GMT -5. The time now is 01:04 AM.