
Provides fast access to objects either by name or by numerical key. If properly used, a hash table is faster than binary search.
The diagram above shows the internal arrangement, normally transparent to the user. Since the hash table is based on an array, it is attached [just like an array] to a holder object. The array contains pointers to individual buckets, implemented as singly-linked rings. Note that OrgC++ uses no special objects for the bucket. The rings are formed by pointers, which are stored directly inside the objects (under the ZZ_EXT_..statement), one pointer per object.
A hash table is treated as a fixed-size array. If you want to change the size, you have to call resize().
The performance of a hash table depends on the algorithm which converts a key into an index. Sometimes, the key may be a combination of several names or numbers. To provide flexibility, OrgC++ requires that, for each hash table, the user provides two functions:
inline int id_class::cmp(oType *obj1,oType *obj2) which compares two objects of type oTYPE (returns 0 if they have the same key). It also controls the order of the objects inside the buckets (see Indexed tables below.)
inline int id_class::hash(oType *obj,int size) which, for a given object and the hash table size, generates the bucket index.
Note that both functions are NOT assigned to class oType, but rather to the class that internally represents the hash organization. Normally, except for this case, you don't even know the name of this class, you only deal with its instance, id.
If you don't have any special requirements, you can use the default functions provided by OrgC++:
int ZZhashStr(char *str,int size) for a given text string and a hash table size, generates the bucket index;
int ZZhashInt(int i,int size) for a given integer and a hash table size, generates the bucket index.
The default functions ZZhashStr() and ZZhashInt() are based on the multiplicative method (Knuth, Standish), using the improved golden ratio. ZZhashStr() first converts the text string into an integer, and then calls ZZhashInt() to randomize the number. The conversion of the text into an integer is based on adding individual bytes multiplied by an increasing factor, which generally leads to different indices for strings ABC, CBA, and BCA.
The following example uses two hash tables on the object Employee, one based on a character string (name), and the other on an integer key (IDnum). Programs test25b.c and test25c.c show these hash tables used together with other organizations.
#define HASH_SIZE 200 class Header { // header for the hash table ZZ_EXT_Header ... public: Header(); }; class Employee { ZZ_EXT_Employee int IDnum; char *name; ... public: char *getName(void){return name;} int getID(void){return IDnum;} void setName(char *n){name=n;} void setID(int i){IDnum=i;} }; ZZ_HYPER_HASH(byIDnum,Header,Employee); ZZ_HYPER_HASH(byName,Header,Employee);Header::Header(){ // for each header, form hash tables automatically byIDnum.form(h,HASH_SIZE); byName.form(h,HASH_SIZE); if(util.error()){printf("cannot allocate hash tables\n"); }int main(void){ Employee *p,e; Header *hh=new Header; ... // use hashing without worrying about details byIDnum.add(h,p); byName.add(h,p); // when searching for an object, set up a dummy // object with an attribute for which you search e.setID(127); p=byIDnum.get(h,&e); e.setName("Brown J."); p=byName.get(h,&e); ... }// Each hash table requires two functions that control hashing // ZZhashStr() and ZZhashInt() are default functions provided inline int byName_class::cmp(Employee *p1,Employee *p2) {return(strcmp(p1->getName(),p2->getName()));}inline int byName_class::hash(Employee *p,int size) { int ZZhashStr(char *name,int size); return(ZZhashStr(p->getName(),size)); }T inline int byIDnum_class::cmp(Employee *p1,Employee *p2) {return(p1->getID() - p2->getID()); } inline int byIDnum_class::hash(Employee *p,int size) { int ZZhashInt(int key,int size); return(ZZhashInt(p->getID(),size)); }
In addition to its basic operation with pseudo)random access of buckets, the HASH class can be also used as an indexed, sorted dictionary. For example, assume that we want buckets sorted by the first letter of each word, and words alphabetically sorted within each bucket. In this case, function hash() does not provide random hashing. Instead, it uses the first character as the bucket index.
Function cmp() detects identical objects as for the random hashing, but it is also controls the order of objects within individual buckets, using the same syntax as the compare function required for !!TODO Ask Jiri???? or when sorting collections or other OrgC++ organizations. The following code sample demonstrates this approach. For the complete program, look at test51.c.
class Root { ZZ_EXT_Root ... }; class Word { ZZ_EXT_Word ... }; ZZ_HYPER_HASH(dict,Root,Word); ZZ_HYPER_NAME(word,Word); ZZ_HYPER_UTILITIES(util):inline dict_class::cmp(Word *w1,Word *w2){ return strcmp(word.fwd(w1), word.fwd(w2)); } inline int dict_class::hash(Word *w,int size){ char* p=word.fwd(w); unsigned char c=(*p); // first character int i=(int)(c)(unsigned char)('a')); if(i<0 || i>=size)... // error return i; } unsigned char ua,uz; ua='a'; uz='z'; dict.form((int)(uz,ua));
When saving/retrieving an object to/from disk, a hash table is treated as part of the holder object, and automatically stored with it. The SWEEP operation, when collecting all objects, expands through the hash table in the same way it does with pointers and arrays.
When restoring data from the disk, the same hashing function is automatically assumed. If you changed the hash function after saving data to disk, call resize() after restoring data from the disk. This will reload the table using the new hashing function.
Under special circumstances, you may want to switch between two or more hashing functions while running your program. The two or more optional hashing functions must be included in function id_class::hash(), using an external switch. For example:
class Block {
ZZ_EXT_Block
public:
static int hashControl;
...
};
class Label {
ZZ_EXT_Label
public:
char *name;
...
};
int Block::hashControl=0; // controls hashing function
ZZ_HYPER_HASH(bHash,Block,Label);
...
int main(){
... // using hashControl=0 Block::hashControl=1;
bHash.newFun(bp); // resets the hash table 'in place'
}
... // now working with hashControl=1
int bHash_class::hash(Label *p,int size){
if(!hashControl) return ZZhashStr(p>name,size);
else return ZZhashStr(&(p>name[1]),size);
}
Note how hashControl changes the hashing function from hashing the full name to hashing only a part of the name, starting from the second character. For another, similar example, look at test25b.c.
| class HOLDER; class OBJECT; ZZ_HYPER_HASH(id,HOLDER,OBJECT); void id.form(HOLDER *hp,int size); |
forms a hash table of the given size on the holder object hp. |
| void* id.formed(HOLDER *hp); | allows to check whether the the hash table has been formed; returns NULL if it has not been. |
| void id.resize(HOLDER *hp,int size); | resets the table to a new size, reloads all data, and releases the old table. |
| void id.add(id,HOLDER *hp,OBJECT *p); | adds object p to the table; if the same key already exists, the object is not added; |
| void del(HOLDER *hp,OBJECT *p); | deletes object p from the table; |
| int id.add2(id,HOLDER *hp,OBJECT *p,int sameKey); | adds object hp to the table; when sameKey=1 identical keys are permitted, when sameKey=0 they are not; when the object is not added add2() returns 1; |
| OBJECT* id.get(HOLDER *hp,OBJECT *p); | returns the object that has the same key as p. |
| int id.size(HOLDER *hp,int *totNum); | returns the current size and (through the second parameter) the total number of objects in the hash table. |
| OBJECT* id.slot(HOLDER *hp,int slot); | returns the first object in the given slot. |
| id_iterator it(OBJECT *s); | declares an iterator, and initializes for traversing objects in the same bucket as object s. |
| void it.start(OBJECT *s); | initializes previously declared iterator to start another loop. |
| void id.free(HOLDER *hp); | deallocates the whole table, including the array header. |
| void id.newFun(HOLDER *hp); | without reallocating the array, it resets the entire hash table. Typical use: Changing the hash() or cmp() functions. |
Use the following double loop to traverse all objects in the table:
int i,size,totNum;
HOLDER *hp;
OBJECT *s,*p;
...
size=id.size(hp,totNum);D
for(i=0; i<<size; i ++){
s=id.slot(hp,i);
id_iterator it(s);
while(p= ++it){
.. here you have object p ..
}
}
IMPORTANT: Check the error condition after form(), or resize() by calling util.error(). If it is not 0, there is an allocation problem.
| Next Section 11.14 Pager |