1.2 LARGE PROJECTS
This section explains the scope of this book, what a "large" project
is, and why we are interested in such projects.
We use the term large project interchangeably with serious project.
What determines whether a project is "serious" or not? Does it mean that
some C++ styles are not good for such projects? The answers to these two
questions determine the scope of this book.
Serious projects are intended for some commercial, scientific, or industrial
use. This means that the software must be solid and free of bugs, and
that it will have to be supported over a long period of time.
It is well known that most software cost is in maintenance, not in code
development. In that'sense, serious software is almost synonymous with
maintainable software. Software is easy to maintain if it has clear internal
organization, if it consists of modules that can be tested independently,
and if it has logic which is easy to understand for a new programmer assigned
to the project.
Most textbook examples of C++ are not serious projects. Textbooks usually
do not show all of the details needed for production-type software, and
the examples which they contain are not designed for long term maintenance
and support.
Serious projects are usually fairly large in size because they reflect
real life, which is never simple. The size of the problem naturally adds
to the complexity of the software. The existence of many mutually interacting
classes may seriously impact the complexity and maintainability of software.
When considering large projects, how you divide the project between
individual programmers, how these individuals interact, and how the resulting
code is updated are often more important than the programming style. When
hundreds of programmers work simultaneously on the same project, the probability
of a software error due to miscommunication or due to uncoordinated updates
is greatly increased.
Note that this book is concerned with programming techniques and C++
style, but it does not discuss the required project organization and management.
The problem of how to manage large object-oriented projects is currently
being researched at many places, and all new CASE tools are trying to
address this issue.
Since this book is about large programs, it contains program listings
which are generally longer than customary in textbooks. Long program listings
can make a book boring and difficult to read, but without documenting
the new techniques with fully operational code, a text could fall into
the same trap as many C++ books_that of distorting the picture by pretending
that the problems are simple and the code is short, while this is not
the case.
When comparing sizes, we can divide software projects into three categories:
- Small projects are typically coded by a single person over a period
of up to several weeks. These projects are simple enough that, with
relatively little effort, one person can understand (and carry in his/her
head) the entire logic of the program.
- Large projects, usually coded by a small group (3 - 10 programmers),
contain many files. Some large projects may be developed by a single
person over a long period of time (for example some C++ compilers);
other projects may require the cooperation of more than 10 people. Large
projects typically contain 10,000 - 100,000 lines of code, and are beyond
the mental capacity of a single programmer. On the other hand, the communication
among subprojects and individual programmers is not so complex that
it would become a major design issue. A typical example of a large project
is a VLSI layout system.
- Large software systems are typically designed by an army of programmers,
and often involve millions of lines of code. At this level, the design
of individual classes and coding style are less critical. System architecture,
project management, version control, database design, and other systematic
issues are more important. For example, a telephone switching system
is a large software system, and may comprise 5 million lines of code.
The programming techniques discussed in this book will help you regardless
of the size of your project. If you violate the recommended principles on
a small project, your project will be more complex but, most likely, still
within a manageable range. For a large project, however, the new techniques
are very important; without them, a project may fail before being completed,
or may be found unmaintainable latex For large systems, project organization
may be of bigger importance than software architecture of individual modules.
However, since large systems are composed of large projects, the new techniques
are essential.
It is important to stress that this book is about programming techniques
applicable to any C++ development, not only about libraries and, in particular,
not about the Code Farms library. This library is used in some examples
in order to demonstrate the viability of the new approach, but it could
easily be replaced by classes that you develop yourself. Chapter 4 explains
the new class organization, Chapter 5 shows its application to class libraries,
and Examples 6.1 and 6.2 provide the complete code for a more complex
application.
The key idea is to control dependencies between classes in such a way
that instead of having a big knot of mutually dependent objects, you have
layers of classes, where any class depends only on classes from lower
layers. This arrangement permits more independent testing, and makes software
easier for new programmers to understand.
As shown below, this almost obvious idea has a major impact on the architecture
of practically any C++ project. Some techniques recommended in existing
C++ textbooks are not always appropriate from this point of view, and
need to be updated. The typical examples are polylithic data structures
such as aggregations, associations, graphs, entity relationship models,
and Booch's "mechanisms" in general. Even the implementation of such basic
data organizations as linked lists can be improved by using the new technique.
The new concept is equally important for those who design class libraries
as for those who use them. The new methodology has been successfully tested
on dozens of commercial applications. Some of these projects are described
as Case Studies (Chapter 9).
In large projects, various classes usually form groups (mechanisms,
design patterns) of closely cooperating classes, with many classes participating
in more than one pattern. It is most critical how these patterns (mechanisms)
are implemented. Currently used techniques can easily make all classes
mutually interdependent, rendering the software extremely complex and
difficult to maintain. The pattern classes introduced in Chapter 4 treat
patterns as objects, and lead to more structured class dependencies where
cycles are avoided.
Most large projects will need persistent data sooner or later, because
they deal with complex problems that cannot be solved within one program
run, and the data must be saved to disk before the next session. Sometimes
the data exceed the available memory and must therefore, at least partially,
reside on disk. Large projects that do not need persistent data are relatively
rare. For example, a C compiler reads the input (program source), processes
it, and produces the output (object file). When the compilation is finished,
all the internal data are lost.
Adding persistency to existing software is extremely difficult (see
Case Study 5). In many cases, it is wise to make data persistent even
though, in the beginning of the project, it does not seem to be necessary.
Also, there is an interesting connection between patterns (mechanisms)
and persistency. Relations between objects are usually implemented through
reference pointers, but the storage of pointers is the main problem when
implementing persistency. Usually this important point is omitted, and
the two problems are treated as completely independent. The result may
be software which is unnecessarily complex and inefficient. Note that
many class libraries take the latter approach, combining foundation classes
(data patterns) with persistent data. Chapter 8, which represents a relatively
large portion of this book, looks at different ways of implementing persistent
data from a more global point of view.
This book discusses neither object-oriented databases, in which data
normally reside on disk and are retrieved only when required, nor distributed
architectures, in which the diskbound data again essentially form a database.
The persistent data described in Chapter 8 are typically internal program
data which normally reside in the virtual memory, and are stored and retrieved
from disk between different program sessions.

|