Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game. |
| Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
 |
GNU/Linux Basic Guide
This 255-page guide will provide you with the keys to understand the philosophy of free software, teach you how to use and handle it, and give you the tools required to move easily in the world of GNU/Linux. Many users and administrators will be taking their first steps with this GNU/Linux Basic guide and it will show you how to approach and solve the problems you encounter.
Click Here to receive this Complete Guide absolutely free. |
|
 |
06-11-2004, 04:02 AM
|
#1
|
|
Member
Registered: Jan 2004
Location: Berne, Switzerland
Distribution: FC4, Gentoo
Posts: 112
Rep:
|
Hash-sort like in PERL in C++
I have a nice algorithm in PERL that uses hashes and works fine. But it uses way too much memory. Thats why I want to do the same in C. Searching for an implementation of PERL hashes in C, the best I found was the C++ STL map container. I would have prefered pure C, but maybe hash problems are easier to solve in C++.
I want to base my program on the following example code from the web. This example just creates, fills and then prints a hash.
But my problem is now that I have to sort the hash by value (not by key!). I tried this with the sort command, but this doesn't work and just gives a mass of errors when compiling. Anybody know whats wrong about this sort command I inserted in the otherwise correctly functioning example?
#include <iostream>
#include <map>
#include <string>
using namespace std;
int main()
{
typedef map<string,float> StringFloatMap;
StringFloatMap coll;
// insert some elements into the collection
coll["VAT"] = 0.15;
coll["Pi"] = 3.1415;
coll["an arbitrary number"] = 4983.223;
coll["Null"] = 0;
StringFloatMap::iterator pos;
/* THIS IS THE COMMAND THAN DOES NOT WORK !!!!! */
sort(coll.begin(),coll.end());
/* ------*/
for (pos = coll.begin(); pos != coll.end(); ++pos) {
cout << "key: \"" << pos->first << "\" "
<< "value: " << pos->second << endl;
}
}
|
|
|
|
06-11-2004, 06:10 AM
|
#2
|
|
Member
Registered: Jun 2003
Location: SEUK
Distribution: Debian & OS X
Posts: 194
Rep:
|
You can't std::sort a map. It is by definition sorted by key.
In addition, if, for whatever reason, you want a hashed map, use hash_map.
I am not entirely sure what to suggest for you purposes -
maybe create another map, reversing the key and value,
or maybe just a std::set using the values,
or maybe create a class containing both your key and value and then use a std::vector and define two comparison functors to use with std::sort...
I am a lttle rusty on this stuff but if you describe your requirements a bit more someone else might come up with some suitable ideas.
|
|
|
|
06-11-2004, 07:05 AM
|
#3
|
|
Member
Registered: Jan 2004
Location: Berne, Switzerland
Distribution: FC4, Gentoo
Posts: 112
Original Poster
Rep:
|
thanks for your reply! ok, I could use hash_map, this looks also good, but I don't know the difference to the other map...
What I want to do is the following:
I have a protein sequence, that mean a string of characters. e.g. MAGRTGTGTGTGAAA
Now I want to find repeats in this string. To do this, I create hashes of all possible and reasonable substrings like e.g. "MA", "AG", "GR", "TG","GRTG","TGAAA". For each occurence of the same substring I increment the corresponding hash. So the "GT" hast would have the value 3 and so on. Because I'm interested mostly in the most frequent repeats, it would be the most efficient to sort the hash by value now, so I can interate and get "GT"=3 for this string as the first element. I did this in PERL and it works fine, but because it is now memory efficient, I want to do this in C. But how can I sort the hash_map by its value?
Thanks for any help!
|
|
|
|
06-11-2004, 07:06 AM
|
#4
|
|
Member
Registered: Jan 2004
Location: Berne, Switzerland
Distribution: FC4, Gentoo
Posts: 112
Original Poster
Rep:
|
I meant: It was NOT memory efficient in perl...
|
|
|
|
06-11-2004, 08:40 AM
|
#5
|
|
Member
Registered: Jun 2003
Location: SEUK
Distribution: Debian & OS X
Posts: 194
Rep:
|
You just have a trade off here between various features.
My first instinct is to say that in comparison to searching a DNA sequence, ordering the results is going to have a fairly small overhead, but that depends on the number of sequences you ae looking for - it may be 10, but it may be 10,000. I know of one friend who was doing something similar and they were only looking for a handful of sequences in sequences of hundrends of millions codon, so creating another map would carry almost no overhead in comparison. (Bear in mind it might, although unlikely need to be a multimap as you could have the same number of two sequences.)
Code:
map<string, int> found;
// search and increment
multimap<int,string> freq;
for (iter = found.begin() to found.end())
freq[iter->second] = iter->first;
On the other hand, with large numbers of sequences you might just want to index each with a number in a std::vector (much as you are doing now but in a map) and create a vector of "sequence_counts" and a map for quick key lookup.
e.g. some rough pseudo-code
Code:
struct seq_count
{
string name;
int count;
seq_count(string s) : name(s), count(0) {}
// overload operator< to call sort with seq_count's
bool operator< (const seq_count& rhs)
{ return count < rhs.count; }
};
int main()
{
map<string,int> seq_table;
vector<seq_count> histogram;
// you might want to call reserve or vector ctor using the number of sequences
for (int i=0 to all sequences)
{
seq_table["seq_str"] = i;
histogram[i].name = "seq_str";
}
// search gene
histogram[seq_table["found_str"]].count++;
// sort the sequernces by value
sort(histogram.begin(), histogram.end());
}
Anyway, HTH
Last edited by dakensta; 06-11-2004 at 08:46 AM.
|
|
|
|
| Thread Tools |
Search this Thread |
|
|
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
All times are GMT -5. The time now is 03:07 AM.
|
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|