Hash table in C++? STL?

**Kevin Goodsell** · Jul 22 '05, 09:44 AM

Re: Hash table in C++? STL?

Claudio Puviani wrote:
[color=blue]
> "Kevin Goodsell" <usenet2.spamfr ee.fusion@never box.com> wrote
>[color=green]
>>So why the prime table size? I can't see an obvious reason.
>>If hash values are approximately uniformly distributed between
>>0 and N, I'd think that ideal table sizes would be numbers that
>>evenly divide N, because any remainder would split the table
>>into two areas with different probabilities of any particular
>>value hashing into them.[/color]
>
>
> If the hash values are uniformly distributed (which is one characteristic of a
> good hashing function), then any table size would be equivalent; i.e., there
> would be no "ideal" size. The reason that prime numbers are used for table
> sizes is that many hashing functions are NOT good and sometimes generate hash
> values that are multiples of one or more numbers. Those values, modulo a
> relatively non-prime number will cluster. Those same values modulo a relatively
> prime number will uniformly span the entire set of 0..N-1. In other words,
> primality is used as insurance against bad hashing functions. You can persuade
> yourself of this by modeling it in a spreadsheet.
>[/color]

I thought it might be something like that. Although, I disagree that
there's no "ideal" size with uniformly distributed hash values. I
definitely think some sizes are better than others (though it may be a
small difference). As a trivial example, suppose the range of hash
values is 0 to 75, and the table size is 50. Hash values from 0 to 25,
and also from 51 to 75, end up in the lower 25 slots, while only those
hash values in the range 26 to 50 end up in the upper 25 slots. As a
result, the probability of a key hashing to the first 25 slots is twice
as high as the probability of hashing to the last 25.

I doubt this makes any difference in practice, but it is what I had in
mind in my previous post.

-Kevin
--
My email address is valid, but changes periodically.
To contact me please use the address from a recent posting.

**pembed2003** · Jul 22 '05, 09:44 AM

Re: Hash table in C++? STL?

Michiel.Salters @logicacmg.com (Michiel Salters) wrote in message news:<fcaee77e. 0404150102.1f08 d3dd@posting.go ogle.com>...[color=blue]
> pembed2003@yaho o.com (pembed2003) wrote in message news:<db593abd. 0404141727.4137 8bf2@posting.go ogle.com>...
>[color=green]
> > Hi John,
> > Can you point me to the documentation where I can find the map and
> > hash_map data structure? Thanks for the help![/color]
>
> std::map is documented in quite a number of books, e.g. Accelerated C++
> (Koenig&Moo) or Generic Programming and the STL(Austern)
>
> http://www.amazon.com/exec/obidos/ASIN/020170353X/
> http://www.amazon.com/exec/obidos/ASIN/0201309564/
>
> hash_map isn't in the standard, so there are multiple different
> versions around. Get the documentation from the same source as
> your code. Your compiler vendor may have included something.
>[/color]

Thanks for everyone's help so it looks like map is the standard and
should be available for any platform? But hash_map is not standard.

The reason I am asking for hash table is because I need to write a
function that finds the overlap of two integer array. I am currently
doing:

#include<stdlib .h>
#include<string .h>
#include<sys/int_limits.h>

#define SIZE ((sizeof(int)) * (INT16_MAX) * (2))

void find(int* one,int c1,int* two,int c2){
int i;
int* buffer = malloc(SIZE);
memset((void*)b uffer,0,SIZE);
for(i = 0; i < c1; i++)
buffer[one[i] + INT16_MAX] = 1;
for(i = 0; i < c2; i++)
if(buffer[two[i] + INT16_MAX])
printf("%d\n",t wo[i]);
free(buffer);
}

int main(int argc,char** argv){
int a[] = {1,3,45,6,4,-1,8};
int b[] = {4,2,4,222,8,45 ,-1};
find(a,7,b,7);
}

this gives me a O(n) algr. but seems to be wasting a lot of memory so
I am thinking using a hashtable instead. am i in the right track?
Thanks!

**Siemel Naran** · Jul 22 '05, 09:44 AM

Re: Hash table in C++? STL?

"Claudio Puviani" <puviani@hotmai l.com> wrote in message news:Hexfc.6607 9[color=blue]
> "Kevin Goodsell" <usenet2.spamfr ee.fusion@never box.com> wrote[/color]
[color=blue][color=green]
> > So why the prime table size? I can't see an obvious reason.
> > If hash values are approximately uniformly distributed between
> > 0 and N, I'd think that ideal table sizes would be numbers that
> > evenly divide N, because any remainder would split the table
> > into two areas with different probabilities of any particular
> > value hashing into them.[/color]
>
> If the hash values are uniformly distributed (which is one characteristic[/color]
of a[color=blue]
> good hashing function), then any table size would be equivalent; i.e.,[/color]
there[color=blue]
> would be no "ideal" size. The reason that prime numbers are used for table
> sizes is that many hashing functions are NOT good and sometimes generate[/color]
hash[color=blue]
> values that are multiples of one or more numbers. Those values, modulo a
> relatively non-prime number will cluster. Those same values modulo a[/color]
relatively[color=blue]
> prime number will uniformly span the entire set of 0..N-1. In other words,
> primality is used as insurance against bad hashing functions. You can[/color]
persuade[color=blue]
> yourself of this by modeling it in a spreadsheet.[/color]

This makes sense. Can you write the barebones of a C++ program that can
illustrate this? I guess you could write a spreadshet too, but the
newsgroup only wants free text. I can test it if you want.

We could call call a function hash_function, and we could use my simple
function in the other post, maybe multiplying by the constant Kevin
suggested.

**Siemel Naran** · Jul 22 '05, 09:44 AM

Re: Hash table in C++? STL?

"Kevin Goodsell" <usenet2.spamfr ee.fusion@never box.com> wrote in message
news:Gnqfc.1129 9
[color=blue][color=green]
> > size_t hash(const char * s, int buckets) {
> > size_t h = 0;
> > for ( ; *s; ++s) h += *s;
> > return h%bucket.
> > }[/color]
>
> Kernighan & Pike's _The Practice of Programming_ mentioned a similar
> hash function, but it included (if I recall correctly) multiplying the
> total by a constant before adding in the next character value. This
> constant has been experimentally determined to be effective for ASCII
> text. I'm not sure if anyone really knows why it works. I don't recall
> what the value was, but I'm sure someone can tell us.[/color]

I did some research, and they said multiplying by the alphabet size (ie.
number of valid chars).

[color=blue][color=green]
> > But we need hash only the first maxchars characters.[/color]
>
> Actually, I think K&P also talked about this, and suggested it was a bad
> idea. This is from memory again, so I may be wrong, but I believe they
> mentioned Java using a scheme like this originally. It turned out to not
> work well, and was changed to use the entire string.[/color]

Suppose our company coding standard is to prefix namespace and class names
with ABC_. Thus ABC_Contact, ABC_Thing. Imagine if our hash function used
just the first 3 chars!

So I think the number of chars to use depends on the scenario, but it does
intuitively make sense to me. We could perhaps use the last N chars, middle
N chars, etc. A templated hashing function makes all this possible.

**Siemel Naran** · Jul 22 '05, 09:44 AM

Re: Hash table in C++? STL?

"pembed2003 " <pembed2003@yah oo.com> wrote in message
[color=blue]
> #define SIZE ((sizeof(int)) * (INT16_MAX) * (2))
>
> void find(int* one,int c1,int* two,int c2){
> int i;
> int* buffer = malloc(SIZE);
> memset((void*)b uffer,0,SIZE);
> for(i = 0; i < c1; i++)
> buffer[one[i] + INT16_MAX] = 1;
> for(i = 0; i < c2; i++)
> if(buffer[two[i] + INT16_MAX])
> printf("%d\n",t wo[i]);
> free(buffer);
> }[/color]

It seems you don't use the first INT16_MAX chars of buffer, or maybe I'm
reading something wrong.
[color=blue]
> int main(int argc,char** argv){
> int a[] = {1,3,45,6,4,-1,8};
> int b[] = {4,2,4,222,8,45 ,-1};
> find(a,7,b,7);
> }
>
> this gives me a O(n) algr. but seems to be wasting a lot of memory so
> I am thinking using a hashtable instead. am i in the right track?
> Thanks![/color]

If the array size 'a' or 'one' is small then the O(N^2) algorithm is fine.
Either loop over the chars in 'one' or 'two', then over the chars in the
other loop. If you want to use STL, you can use std::count_if and supply a
binary predicate that employs std::find (the binary predicate will have a
constructor that takes the array 'one' as an argument).

Note you can also sort the elements in 'one' and 'two', then loop over both
containers simultaneously. Running time O(lg(N)) for the sort if using
std::sort which normally uses quicksort, and O(N) for the scan. I don't
think the standard provides such an algorithm though.

You could sort the array in O(N) time using radix sort (but it will use so
much memory, which is what you want to avoid), or some other specialized
algorithm. You could also sort only 'one' and loop over the chars in 'two',
but use binary_search to see if a char in 'one' exists in 'two'.

If the array size is big, what difference will a hashtable make? You're
using the hashtable to store elements visited, right? If the hashtable
bucket size is small then you'll have lots of collisions (lots of elements
mapping to the same element) and looking one element could be linear time
(if the collided elements are stored in a vector). If the hashtable bucket
size is big then you won't have lots of collisions, but then might as well
use your original solution.

**John Harrison** · Jul 22 '05, 09:44 AM

Re: Hash table in C++? STL?

>[color=blue]
> Thanks for everyone's help so it looks like map is the standard and
> should be available for any platform? But hash_map is not standard.
>[/color]

Right.
[color=blue]
> The reason I am asking for hash table is because I need to write a
> function that finds the overlap of two integer array. I am currently
> doing:
>
> #include<stdlib .h>
> #include<string .h>
> #include<sys/int_limits.h>
>
> #define SIZE ((sizeof(int)) * (INT16_MAX) * (2))
>
> void find(int* one,int c1,int* two,int c2){
> int i;
> int* buffer = malloc(SIZE);
> memset((void*)b uffer,0,SIZE);
> for(i = 0; i < c1; i++)
> buffer[one[i] + INT16_MAX] = 1;
> for(i = 0; i < c2; i++)
> if(buffer[two[i] + INT16_MAX])
> printf("%d\n",t wo[i]);
> free(buffer);
> }
>
> int main(int argc,char** argv){
> int a[] = {1,3,45,6,4,-1,8};
> int b[] = {4,2,4,222,8,45 ,-1};
> find(a,7,b,7);
> }
>
> this gives me a O(n) algr. but seems to be wasting a lot of memory so
> I am thinking using a hashtable instead. am i in the right track?
> Thanks![/color]

I think the standard answer would be to use a set. This is untested code.

#include <set>
#include <algorithm>
#include <iterator>
#include <iostream>

void find(int* one,int c1,int* two,int c2)
{
std::set<int> common;
common.insert(o ne, one + c1);
common.insert(t wo, two + c2);
// output common elements to cout
std::copy(commo n.begin(), common.end(),
std::ostream_it erator<int>(std ::cout, "\n"));
}

That is an O(N lg N) algorithm.

john

**Kevin Goodsell** · Jul 22 '05, 09:44 AM

Re: Hash table in C++? STL?

Siemel Naran wrote:[color=blue]
> "Kevin Goodsell" <usenet2.spamfr ee.fusion@never box.com> wrote in message
> news:Gnqfc.1129 9
>[color=green]
>>
>>Kernighan & Pike's _The Practice of Programming_ mentioned a similar
>>hash function, but it included (if I recall correctly) multiplying the
>>total by a constant before adding in the next character value. This
>>constant has been experimentally determined to be effective for ASCII
>>text. I'm not sure if anyone really knows why it works. I don't recall
>>what the value was, but I'm sure someone can tell us.[/color]
>
>
> I did some research, and they said multiplying by the alphabet size (ie.
> number of valid chars).[/color]

I don't recall that. The particular version I was talking about has been
discussed from time to time in the newsgroups, and I found one such
discussion here:

http://groups.google.com/groups?threadm=bvftas%245l6%241%40titan.btinternet.com

If I did that right, the link should take you to Richard Heathfield's
reply to the thread's original poster.

They mention a few points. First, K&R2 contains an almost identical
hashing function on page 144 using 31 as the constant. Second, the
values suggested by K&P were apparently 31 and 37. Finally, Chris Torek
suggests 33, a value he got from James Gosling. During widespread
experimentation for the Berkley DB library, it did better on average
than 31.

-Kevin
--
My email address is valid, but changes periodically.
To contact me please use the address from a recent posting.

**pembed2003** · Jul 22 '05, 09:44 AM

Re: Hash table in C++? STL?

"John Harrison" <john_andronicu s@hotmail.com> wrote in message news:<c5l5pa$2t 12k$1@ID-196037.news.uni-berlin.de>...[color=blue][color=green]
> >
> > Hi John,
> > Can you point me to the documentation where I can find the map and
> > hash_map data structure? Thanks for the help![/color]
>
> http://www.dinkumware.com/refxcpp.html
>
> but remember because hasp_map is non standard the description here might
> differ from the implementation you have (if you have one at all).
>
> john[/color]

I read the documentation but I don't fully understand it. I don't want
to sound like a dump axx but so far, I have tried:

#include<map>
#include<string >

using namespace std;

int main(int argc,char** argv){
std::map<std::s tring,int>m();
m.insert("hello ",1); // doesn't compile? why?
return 0;
}

I am sure I did something wrong maybe even in the constructor. What I
mean to do is:

* m is a map whoes key is a std::string and values will be of type int
* insert the string "hello" into m with value 1.

I am trying to read the man page with 'man map.h' but nothing show up.
Where is the man page and how can I find map.h if I want to look at
it?

Can someone tell me what's wrong? Can you post a short working example
so I can learn from it? Thanks!

**Kevin Goodsell** · Jul 22 '05, 09:44 AM

Re: Hash table in C++? STL?

pembed2003 wrote:
[color=blue]
>
> I read the documentation but I don't fully understand it. I don't want
> to sound like a dump axx but so far, I have tried:
>
> #include<map>
> #include<string >
>
> using namespace std;
>
> int main(int argc,char** argv){
> std::map<std::s tring,int>m();[/color]

I don't think you really wanted to declare a function called 'm' that
takes no arguments and returns a std::map. Lose the parentheses.
[color=blue]
> m.insert("hello ",1); // doesn't compile? why?[/color]

I don't believe there is a form of map::insert that takes two arguments.
Try one of these:

m["hello"] = 1;

OR

m.insert(std::m ap<std::string, int>::value_typ e("hello", 1));

Obviously I'd recommend the former for this case. When you do need the
insert() function it might help to have a typedef handy (in fact, this
is nice in many cases):

typedef std::map<std::s tring, int> Map;
Map m;
m.insert(Map::v alue_type("hell o", 1));
[color=blue]
> return 0;
> }
>
> I am sure I did something wrong maybe even in the constructor. What I
> mean to do is:
>
> * m is a map whoes key is a std::string and values will be of type int
> * insert the string "hello" into m with value 1.
>
> I am trying to read the man page with 'man map.h' but nothing show up.
> Where is the man page and how can I find map.h if I want to look at
> it?[/color]

1) 'man pages' are not defined by the standard. If you have them, the
names, locations, commands, etc. depend on your system.

2) Why would it be map.h? The header is <map>. Typically the file name
is the same as the name in the brackets, i.e. 'map'. This is not
required of course, but it's usually the case.

3) Locations of headers (if they exist as files -- which they typically
do) depend on your implementation. Consult the documentation or a group
that discusses your system and/or compiler.

-Kevin
--
My email address is valid, but changes periodically.
To contact me please use the address from a recent posting.

**Kevin Goodsell** · Jul 22 '05, 09:44 AM

Re: Hash table in C++? STL?

Kevin Goodsell wrote:
[color=blue]
>
> I don't believe there is a form of map::insert that takes two arguments.[/color]

Come to think of it, that's wrong. There is a form that takes a "hint"
argument, for example. But there is not one that takes the key and value
as separate arguments, I'm fairly sure.

-Kevin
--
My email address is valid, but changes periodically.
To contact me please use the address from a recent posting.

**Jerry Coffin** · Jul 22 '05, 09:45 AM

Re: Hash table in C++? STL?

"Siemel Naran" <SiemelNaran@RE MOVE.att.net> wrote in message news:<A6pfc.323 37$i74.709706@b gtnsc04-news.ops.worldn et.att.net>...

[ ... ]
[color=blue]
> size_t hash(const char * s, int buckets) {
> size_t h = 0;
> for ( ; *s; ++s) h += *s;
> return h%bucket.
> }
>
> But we need hash only the first maxchars characters.[/color]

As hash functions go, this is better than many, but less than ideal
for general use. Long strings might produce values larger than a
size_t can represent, producing values that vary from one
implementation to another, while short strings can only produce quite
small values (e.g. with 8-bit characters, a 3-character string can
only produce a value up to 765 and typical English text will only
produce values in a MUCH more restricted range). As one final caveat,
since it uses plain char, which might be signed, it could produce
negative hash values, which can often cause problems.

One way of dealing with this is something like this:

// warning: untested code.

size_t hash(unsigned char const *key, size_t buckets) {

const int bits = 32; // assume 32 bits is enough for the hash
value.
size_t h = 0x55555555; // alternating 0's and 1's.
size_t carry;
size_t shift = bits - CHAR_BIT;
size_t mask = ((1 << CHAR_BIT) -1) << shift;

while (*key) {
h ^= *key++;
carry = (h & mask) >> shift;
h <<= CHAR_BIT;
h |= carry;
}
return h % buckets;
}

The left-shift is to propagate the effects of the input characters
across the entire hash value more quickly. The right-shift (producing
an N-bit rotate) keeps from losing the effects of the characters from
early in the string.

The result is that every bit in the input string has an effect on the
output, and the effects of the characters get distributed throughout
the output fairly quickly.

Results are undoubtedly open to variation, but most of the
"improvemen ts" I've tried to this scheme have taken more time than
they saved.
[color=blue]
> The hash may increase numbuckets dynamically as it grows, so numbuckets is a
> parameter to operator(). But the number of chars to hash is a constant, and
> should not change as we increase or decrease the number of buckets, so it is
> argument to the constructor and gets stored in the class.[/color]

In typical use, you normally want to hash the entire key.

**pembed2003** · Jul 22 '05, 09:45 AM

Re: Hash table in C++? STL?

"Siemel Naran" <SiemelNaran@RE MOVE.att.net> wrote in message news:<wfCfc.345 40$i74.784430@b gtnsc04-news.ops.worldn et.att.net>...[color=blue]
> "pembed2003 " <pembed2003@yah oo.com> wrote in message
>[color=green]
> > #define SIZE ((sizeof(int)) * (INT16_MAX) * (2))
> >
> > void find(int* one,int c1,int* two,int c2){
> > int i;
> > int* buffer = malloc(SIZE);
> > memset((void*)b uffer,0,SIZE);
> > for(i = 0; i < c1; i++)
> > buffer[one[i] + INT16_MAX] = 1;
> > for(i = 0; i < c2; i++)
> > if(buffer[two[i] + INT16_MAX])
> > printf("%d\n",t wo[i]);
> > free(buffer);
> > }[/color]
>
> It seems you don't use the first INT16_MAX chars of buffer, or maybe I'm
> reading something wrong.
>[color=green]
> > int main(int argc,char** argv){
> > int a[] = {1,3,45,6,4,-1,8};
> > int b[] = {4,2,4,222,8,45 ,-1};
> > find(a,7,b,7);
> > }
> >
> > this gives me a O(n) algr. but seems to be wasting a lot of memory so
> > I am thinking using a hashtable instead. am i in the right track?
> > Thanks![/color]
>
> If the array size 'a' or 'one' is small then the O(N^2) algorithm is fine.
>[/color]

Is my solution a O(N^2) solution? I think it's a O(N) right?
Becasue...

Assume array a and b both have N elements, the program loops through
both array exactly once, so we have:

for(i = 0; i < N; i++) // O(N)
array_one[i] = 1;
for(i = 0; i < N; i++) // O(N)
does array_two[i] in array_one[i]

N + N = 2N

which is a O(N)
[color=blue]
> Either loop over the chars in 'one' or 'two', then over the chars in the
> other loop. If you want to use STL, you can use std::count_if and supply a
> binary predicate that employs std::find (the binary predicate will have a
> constructor that takes the array 'one' as an argument).
>[/color]

Will a binary search beat a hash lookup? I know this has no exact
answer but generally speaking will a binary search be faster than a
hash lookup?
[color=blue]
> Note you can also sort the elements in 'one' and 'two', then loop over both
> containers simultaneously. Running time O(lg(N)) for the sort if using
> std::sort which normally uses quicksort, and O(N) for the scan. I don't
> think the standard provides such an algorithm though.[/color]

But this solution requires:

1. Sort one array with best case O(N*log N) using a quicksort
2. Loop through another array lookup for dups which is a O(N)

I think it goes like:

sort(array_one) ; // O(N*log N)
for(i = 0; i < N; i++) // O(N)
look for array_two[i] in array_one

which is a O(N*log N) algr. It doesn't look like it will beat my O(N)
algr. Am I right?

After I discovered map, I am actually doing:

void find(int* a,int c1,int* b,int c2){
std::map<int,in t> m;
for(int i = 0; i < c1; i++)
m[a[i]] = 1;
for(int j = 0; j < c2; j++)
if(m[b[j]])
std::cout<<b[j]<<std::endl;
}

What do you think about that? Thanks!

**Siemel Naran** · Jul 22 '05, 09:45 AM

Re: Hash table in C++? STL?

"Kevin Goodsell" <usenet2.spamfr ee.fusion@never box.com> wrote in message
news:C%
[color=blue]
> I don't believe there is a form of map::insert that takes two arguments.
>
> Come to think of it, that's wrong. There is a form that takes a "hint"
> argument, for example. But there is not one that takes the key and value
> as separate arguments, I'm fairly sure.[/color]

Right. There's also another 2 argument insert, namely map::insert(Ite r
begin, Iter end) where Iter is a template argument. It's basically batch
insert. If the range is sorted then the running time of the algorithm is
O(N). If the range is not sorted then the running time is O(N*lg(N)). So
this is a really smart function that detects if your range is sorted! I
think it probably calls the 2 argument insert where the first argument is a
hint.

**Siemel Naran** · Jul 22 '05, 09:45 AM

Re: Hash table in C++? STL?

"pembed2003 " <pembed2003@yah oo.com> wrote in message[color=blue]
> "Siemel Naran" <SiemelNaran@RE MOVE.att.net> wrote in message[/color]
news:<wfCfc.345 40
[color=blue][color=green][color=darkred]
> > > #define SIZE ((sizeof(int)) * (INT16_MAX) * (2))[/color][/color][/color]

Prefer to use const size_t instead of #define.
[color=blue][color=green][color=darkred]
> > > void find(int* one,int c1,int* two,int c2){
> > > int i;
> > > int* buffer = malloc(SIZE);
> > > memset((void*)b uffer,0,SIZE);
> > > for(i = 0; i < c1; i++)
> > > buffer[one[i] + INT16_MAX] = 1;
> > > for(i = 0; i < c2; i++)
> > > if(buffer[two[i] + INT16_MAX])
> > > printf("%d\n",t wo[i]);
> > > free(buffer);
> > > }[/color][/color][/color]
[color=blue][color=green][color=darkred]
> > > this gives me a O(n) algr. but seems to be wasting a lot of memory so
> > > I am thinking using a hashtable instead. am i in the right track?
> > > Thanks![/color]
> >
> > If the array size 'a' or 'one' is small then the O(N^2) algorithm is[/color][/color]
fine.[color=blue]
>
> Is my solution a O(N^2) solution? I think it's a O(N) right?[/color]

It is O(N). Sorry for confusing. The O(N^2) algorithm is when you scan all
the chars in 'one', and for each scan all the chars in 'two'.
[color=blue]
> Becasue...
>
> Assume array a and b both have N elements, the program loops through
> both array exactly once, so we have:
>
> for(i = 0; i < N; i++) // O(N)
> array_one[i] = 1;
> for(i = 0; i < N; i++) // O(N)
> does array_two[i] in array_one[i]
>
> N + N = 2N
>
> which is a O(N)[/color]

Right.
[color=blue][color=green]
> > Either loop over the chars in 'one' or 'two', then over the chars in the
> > other loop. If you want to use STL, you can use std::count_if and[/color][/color]
supply a[color=blue][color=green]
> > binary predicate that employs std::find (the binary predicate will have[/color][/color]
a[color=blue][color=green]
> > constructor that takes the array 'one' as an argument).[/color]
>
> Will a binary search beat a hash lookup? I know this has no exact
> answer but generally speaking will a binary search be faster than a
> hash lookup?[/color]

Guessing hash lookup is faster in general. It's worth running an experiment
to see for your case. Who knows, maybe the hash function is too complicated
to compute and the sorted array is faster. It's pretty easy to all methods
given you have an implementation of hash_map available. If you can, post
your results if you do the experiment, because it's nice to know. (State
the size of N, result using: original algorithm, hash_map, std::map,
binary_search, N^2 method, double sort).
[color=blue][color=green]
> > Note you can also sort the elements in 'one' and 'two', then loop over[/color][/color]
both[color=blue][color=green]
> > containers simultaneously. Running time O(lg(N)) for the sort if using
> > std::sort which normally uses quicksort, and O(N) for the scan. I don't
> > think the standard provides such an algorithm though.[/color]
>
> But this solution requires:
>
> 1. Sort one array with best case O(N*log N) using a quicksort
> 2. Loop through another array lookup for dups which is a O(N)
>
> I think it goes like:
>
> sort(array_one) ; // O(N*log N)
> for(i = 0; i < N; i++) // O(N)
> look for array_two[i] in array_one
>
> which is a O(N*log N) algr. It doesn't look like it will beat my O(N)
> algr. Am I right?[/color]

Yes, I think your O(N) algorithm is probably best, but it consumes lots of
space. Nevertheless, it is possible that the binary search method is faster
than hash_table.

Also, you can sort both arrays, and walk through each simultaneously in
linear time.
[color=blue]
> After I discovered map, I am actually doing:
>
> void find(int* a,int c1,int* b,int c2){
> std::map<int,in t> m;
> for(int i = 0; i < c1; i++)
> m[a[i]] = 1;
> for(int j = 0; j < c2; j++)
> if(m[b[j]])
> std::cout<<b[j]<<std::endl;
> }
>
> What do you think about that? Thanks![/color]

Looks right. You can use std::set too, which in principle should be a tad
faster. How fast does it run? Also note that a node in a map is pretty
expensive because you have to allocate space for one parent and two child
pointers, and if sizeof(int)==si zeof(node*), this is 4 times the memory per
node, plus the cost of maintaining the pointers. I'm willing to bet you a
$1 that sorting 'a' with std::sort and using std::binary_sea rch instead is
faster than map and uses less memory.

You can replace std::map with std::hash_map if you are using STL. Don't
know about the other implementations , probably something similar.

original algorithm:
What you originally wrote
Still not sure why array size is double, why the multiply by 2 at the end?
const size_t SIZE = ((sizeof(int)) * (INT16_MAX) * (2))

std::map:
The above algorithm

hash_map (probably largest:
The above algorithm with std::map replaced by hash_map. Maybe there is a
function to set the bucket size that you can try to tweak for optimal
performance?

binary_search:
Similar to the above algorithm.
void find(int* a,int c1,int* b,int c2){
std::sort(a, a+c1);
for(int j = 0; j < c2; j++)
if(std::binary_ search(a, a+c1, b[j]))
std::cout<<b[j]<<std::endl;
}

N^2 method:
void find(int* a,int c1,int* b,int c2){
for(int j = 0; j < c2; j++)
if(std::find(a, a+c1, b[j]))
std::cout<<b[j]<<std::endl;
}

double sort:
void find(int* a,int c1,int* b,int c2){
std::sort(a, a+c1);
std::sort(b, b+c1);
int i =0, j = 0;
while (true) {
if (i==c1 || j==c2) break;
const int ai = a[i];
const int bj = b[j];
if (ai < bj) ++i;
else if (ai > bj) ++j;
else {
std::cout<<bj<< std::endl;
++i;
++j;
}
}
}
It's 1:30am, so I don't claim the above is error free!

**Siemel Naran** · Jul 22 '05, 09:45 AM

Re: Hash table in C++? STL?

"John Harrison" <john_andronicu s@hotmail.com> wrote in message
news:c5msbf$3he bc$1@ID-
[color=blue]
> void find(int* one,int c1,int* two,int c2)
> {
> std::set<int> common;
> common.insert(o ne, one + c1);
> common.insert(t wo, two + c2);
> // output common elements to cout
> std::copy(commo n.begin(), common.end(),
> std::ostream_it erator<int>(std ::cout, "\n"));
> }
>
> That is an O(N lg N) algorithm.[/color]

This is a good algorithm!

Hash table in C++? STL?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment