LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 06-11-2004, 04:02 AM   #1
nyk
Member
 
Registered: Jan 2004
Location: Berne, Switzerland
Distribution: FC4, Gentoo
Posts: 112

Rep: Reputation: 15
Hash-sort like in PERL in C++


I have a nice algorithm in PERL that uses hashes and works fine. But it uses way too much memory. Thats why I want to do the same in C. Searching for an implementation of PERL hashes in C, the best I found was the C++ STL map container. I would have prefered pure C, but maybe hash problems are easier to solve in C++.
I want to base my program on the following example code from the web. This example just creates, fills and then prints a hash.
But my problem is now that I have to sort the hash by value (not by key!). I tried this with the sort command, but this doesn't work and just gives a mass of errors when compiling. Anybody know whats wrong about this sort command I inserted in the otherwise correctly functioning example?

#include <iostream>
#include <map>
#include <string>
using namespace std;

int main()
{
typedef map<string,float> StringFloatMap;
StringFloatMap coll;
// insert some elements into the collection
coll["VAT"] = 0.15;
coll["Pi"] = 3.1415;
coll["an arbitrary number"] = 4983.223;
coll["Null"] = 0;
StringFloatMap::iterator pos;

/* THIS IS THE COMMAND THAN DOES NOT WORK !!!!! */
sort(coll.begin(),coll.end());
/* ------*/

for (pos = coll.begin(); pos != coll.end(); ++pos) {
cout << "key: \"" << pos->first << "\" "
<< "value: " << pos->second << endl;
}
}
 
Old 06-11-2004, 06:10 AM   #2
dakensta
Member
 
Registered: Jun 2003
Location: SEUK
Distribution: Debian & OS X
Posts: 194

Rep: Reputation: 35
You can't std::sort a map. It is by definition sorted by key.

In addition, if, for whatever reason, you want a hashed map, use hash_map.

I am not entirely sure what to suggest for you purposes -
maybe create another map, reversing the key and value,
or maybe just a std::set using the values,
or maybe create a class containing both your key and value and then use a std::vector and define two comparison functors to use with std::sort...

I am a lttle rusty on this stuff but if you describe your requirements a bit more someone else might come up with some suitable ideas.
 
Old 06-11-2004, 07:05 AM   #3
nyk
Member
 
Registered: Jan 2004
Location: Berne, Switzerland
Distribution: FC4, Gentoo
Posts: 112

Original Poster
Rep: Reputation: 15
thanks for your reply! ok, I could use hash_map, this looks also good, but I don't know the difference to the other map...

What I want to do is the following:
I have a protein sequence, that mean a string of characters. e.g. MAGRTGTGTGTGAAA
Now I want to find repeats in this string. To do this, I create hashes of all possible and reasonable substrings like e.g. "MA", "AG", "GR", "TG","GRTG","TGAAA". For each occurence of the same substring I increment the corresponding hash. So the "GT" hast would have the value 3 and so on. Because I'm interested mostly in the most frequent repeats, it would be the most efficient to sort the hash by value now, so I can interate and get "GT"=3 for this string as the first element. I did this in PERL and it works fine, but because it is now memory efficient, I want to do this in C. But how can I sort the hash_map by its value?

Thanks for any help!
 
Old 06-11-2004, 07:06 AM   #4
nyk
Member
 
Registered: Jan 2004
Location: Berne, Switzerland
Distribution: FC4, Gentoo
Posts: 112

Original Poster
Rep: Reputation: 15
I meant: It was NOT memory efficient in perl...
 
Old 06-11-2004, 08:40 AM   #5
dakensta
Member
 
Registered: Jun 2003
Location: SEUK
Distribution: Debian & OS X
Posts: 194

Rep: Reputation: 35
You just have a trade off here between various features.

My first instinct is to say that in comparison to searching a DNA sequence, ordering the results is going to have a fairly small overhead, but that depends on the number of sequences you ae looking for - it may be 10, but it may be 10,000. I know of one friend who was doing something similar and they were only looking for a handful of sequences in sequences of hundrends of millions codon, so creating another map would carry almost no overhead in comparison. (Bear in mind it might, although unlikely need to be a multimap as you could have the same number of two sequences.)

Code:
map<string, int> found;
// search and increment
multimap<int,string> freq;
for (iter = found.begin() to found.end())
 freq[iter->second] = iter->first;
On the other hand, with large numbers of sequences you might just want to index each with a number in a std::vector (much as you are doing now but in a map) and create a vector of "sequence_counts" and a map for quick key lookup.

e.g. some rough pseudo-code

Code:
struct seq_count
{
  string name;
  int count;
  seq_count(string s) : name(s), count(0) {}

  // overload operator< to call sort with seq_count's
  bool operator< (const seq_count& rhs)
 {  return count < rhs.count; }
};

int main()
{
map<string,int> seq_table;
vector<seq_count> histogram;
// you might want to call reserve or vector ctor using the number of sequences

for (int i=0 to all sequences)
{
  seq_table["seq_str"] = i;
  histogram[i].name = "seq_str";
}

// search gene
histogram[seq_table["found_str"]].count++;

// sort the sequernces by value
sort(histogram.begin(), histogram.end());
}

Anyway, HTH

Last edited by dakensta; 06-11-2004 at 08:46 AM.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Perl and Hash automatic PB0711 Programming 3 09-23-2005 02:14 AM
Generating hash in perl ? MikeFoo1 Programming 3 05-21-2005 06:03 PM
Perl: hash tying amnesty_puppy Programming 1 01-17-2005 11:03 PM
Perl XML::Simple - Hash Question smaida Programming 2 05-26-2004 04:20 AM
renaming ENV hash in Perl nbp Programming 4 11-01-2003 03:07 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 06:16 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration