LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 10-17-2011, 02:51 AM   #1
meenakshi.khurana
LQ Newbie
 
Registered: Mar 2010
Location: India
Distribution: Fedora,Redhat
Posts: 19

Rep: Reputation: 5
Best Data Structure to store data in C


I need to load thousands of records in a text file into memory. The record comprises of two fields: one is number and other is string

Eg:

8914,BHR
9918,ORS

Which is the best data structure to hold this data? The operations performed on this data would be:
1) Exhaustive Searching
2) Limited addition, addition of 5-10 records a day.

Also, what will be the best technique to insert new record dynamically in this scenario without being restarting the program? Currently i am using IPC (Linux Message Queues) and Signals to indicate the C binary for availability of new records.
 
Old 10-17-2011, 06:53 AM   #2
lyle_s
Member
 
Registered: Jul 2003
Distribution: Slackware
Posts: 392

Rep: Reputation: 55
A dynamic array sounds about right:

http://developer.gnome.org/glib/stable/glib-Arrays.html

You probably already have GLib.

Lyle.
 
Old 10-17-2011, 01:29 PM   #3
ta0kira
Senior Member
 
Registered: Sep 2004
Distribution: FreeBSD 9.1, Kubuntu 12.10
Posts: 3,078

Rep: Reputation: Disabled
What type of searching are you doing? I suggest a balanced binary tree sorted based on either of the columns, depending on which is more important. Alternatively, you could represent each as a pair (e.g. struct pair { int val; char name[32]; };) and have two balanced binary trees, each sorted based on one of the columns, but holding a pointer to a struct pair (both trees representing a different ordering of the same data.) This is really only helpful if sorting order will help with your search, though.
Kevin Barry
 
Old 10-17-2011, 01:53 PM   #4
johnsfine
LQ Guru
 
Registered: Dec 2007
Distribution: Centos
Posts: 5,286

Rep: Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197
Quote:
Originally Posted by meenakshi.khurana View Post
I need to load thousands of records in a text file into memory.
Thousands is actually pretty trivial for a computer. So it might not matter much how you store the data.

Quote:
1) Exhaustive Searching
Searching for which field? Or is it both? Searching for exact match of a field? Or for partial match?

For exact match searching, a hash is best.

If ordering matters or for partial matches on the beginning of a field, a tree structure such as red black or AVL is likely best.

To search on either key, you might want a hash or tree for each.

For more complicated fast partial match searching you might want a more complicated index.

Quote:
Originally Posted by meenakshi.khurana View Post
2) Limited addition, addition of 5-10 records a day.

Also, what will be the best technique to insert new record dynamically
Why do you think good technique for that matters? 5 to ten insertions per day into a data structure of thousands! If you needed to totally rebuild a complicated index across ten thousand records ten time a day, that still shouldn't be a performance problem worth worrying about.

So if "Exhaustive Searching" was really true, that could justify some serious performance consideration. But I think you might be talking about a problem so small that performance doesn't matter.

Last edited by johnsfine; 10-17-2011 at 01:58 PM.
 
Old 10-17-2011, 01:53 PM   #5
theNbomr
LQ 5k Club
 
Registered: Aug 2005
Distribution: OpenSuse, Fedora, Redhat, Debian
Posts: 5,399
Blog Entries: 2

Rep: Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908
Maybe you should consider the use of a RDB such as sqlite, which will do efficient searching for you, and allow you to focus on the business logic of your application, rather than the minutia of small data structures. This can potentially add value to your data by making it accessible through other applications, and provide relatively easy methods for complex search criteria. Creation, retrieval, update and deletion (crud) of data comes pretty much for free with an SQL database tool.

--- rod.
 
Old 10-17-2011, 08:07 PM   #6
sundialsvcs
LQ Guru
 
Registered: Feb 2004
Location: SE Tennessee, USA
Distribution: Gentoo, LFS
Posts: 10,609
Blog Entries: 4

Rep: Reputation: 3905Reputation: 3905Reputation: 3905Reputation: 3905Reputation: 3905Reputation: 3905Reputation: 3905Reputation: 3905Reputation: 3905Reputation: 3905Reputation: 3905
What you want to minimize, in this "in memory" (sic...) data structure, is exactly one thing: virtual memory page-faults. Don't try to "save memory." Minimize the number of storage accesses required to locate a particular value.

But the previous suggestion, to throw the whole thing into an SQLite database file, is an excellent and superior suggestion.
 
Old 10-18-2011, 12:04 AM   #7
meenakshi.khurana
LQ Newbie
 
Registered: Mar 2010
Location: India
Distribution: Fedora,Redhat
Posts: 19

Original Poster
Rep: Reputation: 5
Quote:
Originally Posted by johnsfine View Post
Thousands is actually pretty trivial for a computer. So it might not matter much how you store the data.
I know thousands is pretty trivial. Thats why i was using arrays till date for same with no performance issues. But i want to optimize the code.

Quote:
Originally Posted by johnsfine View Post
Searching for which field? Or is it both? Searching for exact match of a field? Or for partial match?
Searching on first field and retrieving second field corresponding to that. Search should be exact match.

Quote:
Originally Posted by johnsfine View Post

Why do you think good technique for that matters? 5 to ten insertions per day into a data structure of thousands! If you needed to totally rebuild a complicated index across ten thousand records ten time a day, that still shouldn't be a performance problem worth worrying about.

So if "Exhaustive Searching" was really true, that could justify some serious performance consideration. But I think you might be talking about a problem so small that performance doesn't matter.

I asked for good technique to inform a running C binary about availability of new records not for inserting new records. As i told earlier even, i am using signals and Message queues for same currently. So, i just want to confirm if this method is ok or something better can be use.

And yes you are right, this problem is not that big. Currently i have thousands of data and this may extend to lakhs in coming time. So, should i continue using arrays or should i use dynamic data structure like hash. Please suggest.

Thanks in advance.
 
Old 10-18-2011, 08:54 PM   #8
chrism01
LQ Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Rocky 9.2
Posts: 18,348

Rep: Reputation: 2749Reputation: 2749Reputation: 2749Reputation: 2749Reputation: 2749Reputation: 2749Reputation: 2749Reputation: 2749Reputation: 2749Reputation: 2749Reputation: 2749
Hmmm, well according to https://secure.wikimedia.org/wikipedia/en/wiki/Lakh = 100,000 in Western notation, so for data as per OP, that should all fit in mem; they're not very big fields.
Assuming that the 1st field is ALWAYS unique, then a hash would be good for fast searching as described, OTOH SQLLite would be easier to interface with generally.
If its always just a few new recs per day, then your current method of notification would work, but again using SQL would enable you to use other tools & be able to easily export as well.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
inode data structure? surabhi suppu Programming 2 04-14-2011 02:07 PM
C++ - Store a text file in a Data Structure nilly16 Programming 3 05-26-2009 06:42 AM
small-endian to big-endian conversion of data to store in a structure NancyT Programming 2 11-26-2008 10:06 AM
store data in / bruse Linux - Newbie 2 05-01-2005 11:28 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 06:50 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration