Visitors

.

Since Jan1,2000

The secret of efficient software design: Internal data organization

Jiri Soukup, President, Code Farms Inc. 7214 Jock Trail, Richmond, Ont.,Canada, KOA 2Z0, eMail: jiri@codefarms.com, 613-838-4829, fax 613-838-3316
Abstract

Programmers often start coding without thinking about data. As the program evolves, pointers form an intricate network which is difficult to debug. The major improvement introduced by object-oriented programming is that it forces the programmer to consider both data and functions (methods) right from the beginning. However, even in object-oriented languages, relations between objects are often not treated properly, causing a new style of complex, spaghetti-like code. This paper explains a new method of managing internal program data. This method completely separates data objects from relations, improving code clarity and dramatically increasing software productivity. An additional benefit is that it also improves run-time performance. The methodology has been implemented as a library which works with regular C or C++, without being a special language. The library provides generic, fully-typed data structures which are automatically persistent.

Introduction

Every program involves two basic parts: data and algorithms. In a good program, the two parts must be carefully balanced and play into each others hand.

For example, let us consider a program that stores a set of towns connected by highways, and calculates the shortest connection between two given towns.

Before we start to code, we have to consider several alternative means of managing internal data:

  • If we store the towns in an array, we use the minimum amount of memory per town.
  • If the array is sorted then, for a given name, we can quickly find the corresponding town, using binary search.
  • If we plan to add new towns frequently, a linked list provides a more flexible solution than an array.
  • If we plan to delete towns frequently, the list should be doubly-linked.
  • Since linked lists do not allow fast searches, we may need a hash table.
These are the most important decisions a programmer makes, because they will affect both software design and program performance. Different data structures often leads to different algorithms.

As can be seen from this example, the important part of the decision is not how we form data objects, but what relations (data structures) we use. The general popularity of object-oriented programming has helped to recognize the importance of data. However, many textbooks stress objects and neglect the importance of relations.

Perhaps, the root of this problem lies in the object oriented paradigm itself. This paradigm assumes that objects contain both data and methods (functions) that operate on the data.

This model seems to fail when implementing some data structures. For example, in our network problem, every town must contain references (pointers or indices) to adjacent highways, and every highway must contain references to adjacent towns. The function that adds a new highway must modify pointers in both highway and town objects, and therefore cannot belong to either object, if we strictly adhere to the object-oriented paradigm.

Instead of getting involved in esoteric discussions, I will present a practical way of managing data in C and C++ programs, by coding the townAighway example, using the C++ version of the Code Farms library. I will refer to it simply as the library. Readers interested in a comparison of different methods and reasons for the selection of this particular library may look at [2]. Note that Code Farms library is also available in plain C.

STEP 1: Conceptual design

We can start with a naive representation of data as shown in Fig.l. We have two classes, Town and Hwy. Each Hwy includes an id# and a distance. Each Town has a variable-length name, which is kept as a separate object.

Fig.2 shows a more elegant Booch notation, where we have only one box for each type. Bars represent relations. Since Booch does not have a special notation for a graph, we can compose graph from two l-TO-N relations. One relation links highways that start in a given town, the other links highways that end there.

The program will have a simple interface consisting of 3 commands:

add ltownl] [town2] [id] [distance] add highway,
route [towel] [town2] ... find a route,
exit .. exit the program and store data on disk.

We assume that new towns are automatically created when required and that, when invoking the program, the old data is automatically restored from the disk.

This means that the names given by the user will have to be translated to Town pointers. A fast search by name is required. Fig.3 adds a hash table as a collection of Towns (l-TO-N relation) on a dummy object Root.

The route searching algorithm will expand from town1 to its neighbours, then expend to their neighbours, updating the distance (best), and using pointer route to record the best route direction. The algorithm will need a circular stack of Towns that were updated in the last round - you can imagine these towns as the front of a wave expanding from town1.

For this reason, Fig.3 adds another collection of Towns (wave) under Root. Note that the basic collection in the library is a ring. Compare Fig.3 with Fig.4, which shows the pointers that will actually be used. The complexity of this network is the reason why C and C++ programs are difficult to debug without a data structure library.

The last important consideration is the persistence of the data. We assume that, after invoking the program, old data will automatically be pulled in, and when exiting the program, the data will be stored to the same file, called backup. Each of these steps is only one command in the library.

STEP 2: Mapping data into the library

We will use one class for each object type, with two special statements in it:


ZZ_EXT_ links the class to the
                 library
ZZ_INIT(); initializes all pointers

struct Root {
 ZZ_EXT_Root
public:
 Root() {ZZ_INIT(Root); }
};

struct Hwy {
 int id;
 int dist;
 ZZ_EXT_Hwy
public:
 Hwy() {ZZ_INIT(Hwy); };
 Hwy(int a,int b);
 int getDist(void) {return dist; }
 int getID(void){return id;)
};

struct Town {
 ZZ_EXT_Town
public:
 Town() { ZZ_INIT(Town); };
 Town(char *nm);
 int best; // temporary
 Hwy *route; // temporary
};
 

After declaring all object types (classes), we declare the relations (data organization). If you are familiar with databases, you can think about the following statements as being similarto adatabase schema.In a way, they resemble templates, but something much more complex and efficient happens behind the scene:


ZZ_HYPER_SINGLE_GRAPH(netw,Town,Hwy);
ZZ_HYPER_NAME(name,Town);
ZZ_HYPER_HASH(hash,Root,Town);
ZZ_HYPER_DOUBLE_COLLECT(wave,Root,Town);
ZZ_HYPER_UTILITIES (util);
These five lines precisely declare the entire organization. We have:
SINGLE_GRAPH - undirected graph with Towns as nodes and Hwys as edges; internally, edges adjacent to each node form a singly-linked list; netw is an identification name for the this organization.
NAME - similar to the string class; assigns a variable length name to each Town; it's identification name is name.
HASH - generic hash table of Towns; it lives on a Root; its identification name is hash. It can be controlled by a user-provided hashing function, or by a default function from the library.
DOUBLE_COLLECT - double collection which encapsulates a doubly linked list of Towns under a Root; wave is its id name.
UTILITIES - memory allocation and disk IO (persistence) utilities. Word HYPER refers to a new concept of the hyper-class, used in this particular library.

Each HYPER declaration creates one instance of a special interface class, which contains no data, only methods for the given data organization. Even though these classes are global, they cannot be used out of local scope, if the class on which they operate is local.

STEP 3: Coding the algorithm

Once organization is declared, the library automatically provides all functions required for its access/modification.

The entire program, including the data declarations above is only 180 lines long, and took half a day to code and fully debug. I apologize for small font and crowded coding style in the enclosed listing; this paper is limited to 6 pages.

Comments in the listing will help you with the logic of the algorithm:


(0) contains data declaration shown in the paper
(1) strAlloc() allocates another copy of string nm
(2) route and best are initialized for the routing
(3) adding highway h between towns t[O] and t[l]
(4) add the name to the dummy Town object, s
(5) hash table search (this table lives on root)
(6) adding t to the hash table
(7) follow route pointers from town2 to towns
(8) for given Hwy h, nodes() returns adjacent Towns
(9) if the first child of wave is NULL, the wave is empty
(1O) fort) loop goes through the wave round-robin style.
When t==child, it is the end of the next round.
(11) iterator ni runs through all Hwys adjacent to t
(12) add tt to the wave
(13) delete tt from the wave
(14) clear() is similar to search(), but no cost is
considered. The expansion proceeds only through
Towns modified in the last search (route!=NULL).
(15) start search by putting townl into the wave
(16) testing whether there is file backup
(17) re-opening the old data in one command
(18) for the first call, start with a new root,
and form a new hash table.
(19) interactive loop that reads commands
(20) one command saves all data to disk
(21) functions that control hashing refer to
ZhashStr() provided by the library
(22) this file has been automatically generated
by the class generator, which is a part
of the library.
Conclusions:

The paper demonstrates how this new method improves code clarity. From the conceptual design up to final debugging, clear organization makes the software more systematic and organized. Experience on large industrial projects indicates, on average, 2-3 times faster coding and debugging, with much improved maintenance and code re-usability.

References:


[1] Soukup J.: Organized C: A unified method of
    handling data in CAD algorithms and databases,
    X H FIR 27-th Design Automation Conference, June
    1990, pp.425429.
[2] Soukup J.: Beyond templates, Coo Report, May
    1992 (to be published).




#include <stdio.h> #define Zmain #include "zzincl.h" // (22) #include "data.h" // (O) #define INF OX7FFFFFFF Town::Town(char *nm){ ZZ INIT(Town); char *n=util.strAlloc(nm); // (1) name.add(this,n); route=NULL; best=lNF; // (2) } Hwy::Hwy(int a, int b){ ZZ_lNIT(Hwy); id=a; dist=b; } Root *root; //------------------------------------ // add a new hwy between two towns void addRoute(char *name1, char *name2, int id, int dist){ Town *t[2],*getTown(char *); Hwy *h; t[O]=getTown(name1); t[l]=getTown(name2); h=new Hwy(id,dist); netw.add(t,h); // (3) } //------------------------------------ // get town, create new if not found Town *getTown(char *nm){ Town *t; static Town s; name.add(&s,nm); // (4) t=hash.get(root,&s); // (S) if(lt)t=new Town(nm); nrn=narne.del(&s); hash.add(root,t); // (6) return(t); } //------------------------------------- // print a route, find total distance void prtRoute(Town *t){ // (7) int tot; Hwy *h; Town *tt[2]; tot=O; if(l (t->route)) { printf("no connection\n"); return; } while(t){ printf("%s",name.fwd(t)); h=t->route; if(lh)break; printf("(%d) " ,h->getID()); netw.nodes(tt,h); // (8) if(tt[O])t=tt[1]; else t=tt[O]; tot+=h->getDist(); } printf(" dist=%d\n",tot); } //-------------------------------------- // find route, mark by 'route' pointers void search(Town *t2){ Town *t,*tt,*f v,*nt; Hwy *h; int bot; bot=O; for(t=wave.child(root);;t=nt){ // (9) if(t=wave.child(root)){ //(10) if(t2->route && t2->bestbestbest; netw_iterator ni(t); while(h=ni++) { //(1 1) tt=ni.adj(); if(t->best+h->distbest) { tt->route=h; tt->best=t->best+h->dist; fw=wave.f vd(tt); if(!fw)wave.add(root,tt); // (12) } } nt=wave.bwd(t); wave.del(root,t); // (13) if(nt_t)break; } } //---------------------------------------- // re-initiali7e used Towns void clear(Town *tl){ //(14) Town *t,*tt,*fw,*nt; Hwy *h; wave.add(root,t1); for(t=wave.child(root);; t=nt){ if(t->route||t==t1){ netw_iterator ni(t); while(h=ni+F) { tt=ni.adj(); if(tt->route){ fw=wave.fwd(tt); if(!fw)wave.add(root,tt); } } } nt=wave.bwd(t); wave.del(root,t); t->route-NULL; t->best=lNF; if(nt_t)break; } //------------------------------------------ // for given town names, find the route void findRoute(char Ftownl,char *town2){ Town *t1,*t2,*getTown(char *); t1=getTown(town1); t2=getTown(town2); if(t1==t2) { printf("same point\n"); return; } wave.add(root,t1); //(15) t1->best=0; search(t2); prtRoute(t2); clear(t1); } //------------------------------------------- #define BSIZE 80 int main(void){ char buff[BSlZE],cmd[BSIZE], town1[BSIZE],town2[BSIZE]; static char *t[]=("Root"); char *v[1]; FILE *fp; int id,dist; fp=fopen("backup","r"); //(16) fclose(fp); if(fp){ util.open("backup",1,v,t); //(17) root=(Root *)(v[0]); } else { root=new Root; hash.form(root,200); //(18) } while(fgets(buff,BSIZE,stdin)) { //(19) sscanf(buff,"%s",cmd); if(!stromp(cmd,"route")) { sscanf(buff,"%s %s %s",cmd,townl,town2); findRoute(townl ,town2); } else if(!strcmp(cmd,"add")){ sscanf(buff,"%s %s %s %d %d",cmd,town1,town2,&id,&dist); addRoute(town 1 ,town2,id,dist); } else if(!stromp(cmd,"exit"))break; else printf("please try again \n"); } v[0]=(char *)root; util.save("backup",l,v,t); //(20) return(0); } //------------------------------------------------- int zz_hashCmp_hash(char *t1,char *t2){ return(strcmp(name.fwd((Town *)t1), name.fwd((Town *)t2))); } int zz_hashInd_hash(char *t,int size){ int ZZhashStr(char *,int); return(ZhashStr(name.fwd((Town *)t),size)); } #include"zzfunc.c" //(22)

 

Home | About Us | Products | Services | Downloads | Support | Publications | Contact | Sitemap