Code Farms Inc

1.2 LARGE PROJECTS


This section explains the scope of this book, what a "large" project is, and why we are interested in such projects.

We use the term large project interchangeably with serious project. What determines whether a project is "serious" or not? Does it mean that some C++ styles are not good for such projects? The answers to these two questions determine the scope of this book.

Serious projects are intended for some commercial, scientific, or industrial use. This means that the software must be solid and free of bugs, and that it will have to be supported over a long period of time.

It is well known that most software cost is in maintenance, not in code development. In that'sense, serious software is almost synonymous with maintainable software. Software is easy to maintain if it has clear internal organization, if it consists of modules that can be tested independently, and if it has logic which is easy to understand for a new programmer assigned to the project.

Most textbook examples of C++ are not serious projects. Textbooks usually do not show all of the details needed for production-type software, and the examples which they contain are not designed for long term maintenance and support.

Serious projects are usually fairly large in size because they reflect real life, which is never simple. The size of the problem naturally adds to the complexity of the software. The existence of many mutually interacting classes may seriously impact the complexity and maintainability of software.

When considering large projects, how you divide the project between individual programmers, how these individuals interact, and how the resulting code is updated are often more important than the programming style. When hundreds of programmers work simultaneously on the same project, the probability of a software error due to miscommunication or due to uncoordinated updates is greatly increased.

Note that this book is concerned with programming techniques and C++ style, but it does not discuss the required project organization and management. The problem of how to manage large object-oriented projects is currently being researched at many places, and all new CASE tools are trying to address this issue.

Since this book is about large programs, it contains program listings which are generally longer than customary in textbooks. Long program listings can make a book boring and difficult to read, but without documenting the new techniques with fully operational code, a text could fall into the same trap as many C++ books_that of distorting the picture by pretending that the problems are simple and the code is short, while this is not the case.

When comparing sizes, we can divide software projects into three categories:

  1. Small projects are typically coded by a single person over a period of up to several weeks. These projects are simple enough that, with relatively little effort, one person can understand (and carry in his/her head) the entire logic of the program.
  2. Large projects, usually coded by a small group (3 - 10 programmers), contain many files. Some large projects may be developed by a single person over a long period of time (for example some C++ compilers); other projects may require the cooperation of more than 10 people. Large projects typically contain 10,000 - 100,000 lines of code, and are beyond the mental capacity of a single programmer. On the other hand, the communication among subprojects and individual programmers is not so complex that it would become a major design issue. A typical example of a large project is a VLSI layout system.
  3. Large software systems are typically designed by an army of programmers, and often involve millions of lines of code. At this level, the design of individual classes and coding style are less critical. System architecture, project management, version control, database design, and other systematic issues are more important. For example, a telephone switching system is a large software system, and may comprise 5 million lines of code.
The programming techniques discussed in this book will help you regardless of the size of your project. If you violate the recommended principles on a small project, your project will be more complex but, most likely, still within a manageable range. For a large project, however, the new techniques are very important; without them, a project may fail before being completed, or may be found unmaintainable latex For large systems, project organization may be of bigger importance than software architecture of individual modules. However, since large systems are composed of large projects, the new techniques are essential.

It is important to stress that this book is about programming techniques applicable to any C++ development, not only about libraries and, in particular, not about the Code Farms library. This library is used in some examples in order to demonstrate the viability of the new approach, but it could easily be replaced by classes that you develop yourself. Chapter 4 explains the new class organization, Chapter 5 shows its application to class libraries, and Examples 6.1 and 6.2 provide the complete code for a more complex application.

The key idea is to control dependencies between classes in such a way that instead of having a big knot of mutually dependent objects, you have layers of classes, where any class depends only on classes from lower layers. This arrangement permits more independent testing, and makes software easier for new programmers to understand.

As shown below, this almost obvious idea has a major impact on the architecture of practically any C++ project. Some techniques recommended in existing C++ textbooks are not always appropriate from this point of view, and need to be updated. The typical examples are polylithic data structures such as aggregations, associations, graphs, entity relationship models, and Booch's "mechanisms" in general. Even the implementation of such basic data organizations as linked lists can be improved by using the new technique. The new concept is equally important for those who design class libraries as for those who use them. The new methodology has been successfully tested on dozens of commercial applications. Some of these projects are described as Case Studies (Chapter 9).

In large projects, various classes usually form groups (mechanisms, design patterns) of closely cooperating classes, with many classes participating in more than one pattern. It is most critical how these patterns (mechanisms) are implemented. Currently used techniques can easily make all classes mutually interdependent, rendering the software extremely complex and difficult to maintain. The pattern classes introduced in Chapter 4 treat patterns as objects, and lead to more structured class dependencies where cycles are avoided.

Most large projects will need persistent data sooner or later, because they deal with complex problems that cannot be solved within one program run, and the data must be saved to disk before the next session. Sometimes the data exceed the available memory and must therefore, at least partially, reside on disk. Large projects that do not need persistent data are relatively rare. For example, a C compiler reads the input (program source), processes it, and produces the output (object file). When the compilation is finished, all the internal data are lost.

Adding persistency to existing software is extremely difficult (see Case Study 5). In many cases, it is wise to make data persistent even though, in the beginning of the project, it does not seem to be necessary. Also, there is an interesting connection between patterns (mechanisms) and persistency. Relations between objects are usually implemented through reference pointers, but the storage of pointers is the main problem when implementing persistency. Usually this important point is omitted, and the two problems are treated as completely independent. The result may be software which is unnecessarily complex and inefficient. Note that many class libraries take the latter approach, combining foundation classes (data patterns) with persistent data. Chapter 8, which represents a relatively large portion of this book, looks at different ways of implementing persistent data from a more global point of view.

This book discusses neither object-oriented databases, in which data normally reside on disk and are retrieved only when required, nor distributed architectures, in which the diskbound data again essentially form a database. The persistent data described in Chapter 8 are typically internal program data which normally reside in the virtual memory, and are stored and retrieved from disk between different program sessions.

Code Farms Inc | www.codefarms.com | info@codefarms.com