Comparing DOL, PPF, and POST++

COMPARING FEATURES

(1) Mechanism of persistence:

  • DOL can switch between three styles of persistence:
    • In the ASCII mode, the code generator generates serialization functions which save the data in a format readable by different platforms.
    • In the Binary mode, the code generator generates serialization functions which save data in a format which is more efficient (less space, faster access) but is not portable between different platforms.
    • Memory blasting uses a special memory paging with a bit map marking the location of pointers. It is the most efficient of the three modes, but the data is not portable.
  • PPF stores objects of each class in a separate file and pages them in/out of memory on demand. The user can control the paging (page size, how many pages in memory) separately for each class. This method is suitable for both large data sets and for transactions consisting of only a few objects.
  • POST++ is based on memory mapping and shadow pages transactions. Details are not described in its documentation.

 

(2) Adding persistence to existing code:

  • POST++: A statement listing all references (members which are pointers) is required for each class.
  • PPF: All pointers in your code (even in the implementation methods) must be replaced by a smart pointer.
  • DOL: All data structures must be replaced by DOL data structures. STL cannot be used.

 

(3) Use of code generator:

  • DOL: yes
  • PPF: no
  • POST++: no

 

(4) Handling of strings:

  • DOL: String is a special data structure.
  • PPF: Special smart pointer is provided for persistent strings.
  • POST++: String is a special data structure.

 

(5) Data structures:

  • PPF: No data structures, provides only persistence
  • DOL: The prime purpose of DOL is persistent data structures, not the persistence by itself. DOL intrusive data structures are built on a new paradigm which provides automatic integrity checking. There are similar classes to POST++ but DOL also has: rings, graphs, binary heap, hierarchy (one-to-many), many-to-many, reference, queues, dynamically assigned properties, access to the type info, chains of free objects by type.
  • POST++: The prime purpose of POST++ is the persistence, but it also has a container class library. This library provides classes similar to DOL but it also has: matrices, AVL-tree, R-tree, and T-tree.

 

(6) Default constructors:

  • PPF and DOL: Default constructors used as usual.
  • POST++: Default constructors are not allowed, and must be replaced by constructors with a dummy parameter or by a wrapper function such as create().

 

(7) Virtual functions:

  • DOL: Virtual functions can be used as usual.
  • PPF: When virtual functions are present, different smart pointer must be used.
  • POST++: Some restrictions apply.

 

(8) Data portability (e.g. using data from a PC on a SUN):

  • DOL: ASCII saving mode allows to move data between platforms.
  • PPF: Data not portable
  • POST++: Data not portable

 

(9) Schema migration (e.g. using data from disk after changing the data structure):

  • DOL: yes, in ASCII mode.
  • PPF: no
  • POST++: no

 

COMPARING PERFORMANCE

The same simple problem was coded with POST++, PPF, and DOL. Each program creates a doubly linked ring of LinkNodes. The number of these nodes is given as the program parameter.

    class LinkNode {
        ...
        LinkNode *next;
        LinkNode *prev;
        int id[SZ];
    };

Since each LinkNode is 12 bytes long, this problem needs the memory space of 60MB:

You can also download all these programs including executables and the batch files which compile and run the programs. Each time, after creating the data the program saves it to disk and stops. In the second run, it opens the data and and traverses the entire list. The time to perform each step is recorded.

The test was performed on a Sony VAIO laptop with Pentium III, 900 MHz, 128MB RAM, and 15GB Toshiba hard drive. The computer was re-booted before running these tests.

The main problem when comparing perfomance on large data sets is the irregularity with which the Windows run the system resources. Repeated calls to the same program run faster, but not always for exactly the same time. For this reason, we ran the test in this sequence:

  • We compiled all three programs.
  • We ran 5-times the POST++ version, then the PPF version, and then the DOL version.
  • We rejected the first runs (the cold start), and calculated averages of individual times.
-------------------------------------------------------------
             Each run for 5,000,000 LinnkNodes
-------------------------------------------------------------
	create	closeDB	total		openDB	count	total
-------------------------------------------------------------
POST++	139	29	168		 0	21	21
PPF	 24	 2	 26		 0	 9	 9
DOL	  7	14	 21		13	 1	14
-------------------------------------------------------------
POST++	 21	43	 53		 0	11	11
PPF	 22	 1	 23		 0	 8	 8
DOL	  7	18	 25		12	 1	13
-------------------------------------------------------------
POST++	 15	57	 72		 0	10	10
PPF	 25	 0	 25		 0	 7	 7
DOL	  7	12	 19		15	 1	16
-------------------------------------------------------------
POST++	 24	41	 65		 1	 9	10
PPF	 19	 1	 20		 0	14	14
DOL	  7	30	 37		13	 1	14
-------------------------------------------------------------
POST++	 12	49	 61		 0	17	17
PPF	 25	 1	 26		 0	 8	 8
DOL	  7	14	 21		 8	 1	 9
-------------------------------------------------------------
Averages runs 2 to 5:
-------------------------------------------------------------
POST++	 18.0	47.5	 62.5		 0.3	11.8	12.0
PPF	 22.8	 0.8	 23.5		 0.0	 9.3	 9.3
DOL	  7.0	18.5	 25.5		12.0	 1.0	13.0
-------------------------------------------------------------

The disk footprint (size of the DB stored to disk) was:
   POST++  122.2 MB
   PPF	   60.0 MB
   DOL	   61.9 MB
-------------------------------------------------------------

CONCLUSIONS FOR THIS PARTICULAR PROBLEM:

  • When creating and saving the data (run1), PPF and DOL run at the same speed, about 2.5x faster then POST++.
  • When restoring the data from the disk and making one traversal (run2), the speed of all the three programs was about the same.
  • PPF and DOL are both about 2x more efficient than POST++ in how they store the data on disk.

Note that in spite of different internal algorithms, the speed of data access depends mostly on the disk I/O. DOL and PPF store raw images of objects, while POST++ stores additional information about each object. This in our opinion is the main reason for the difference in the performance.

SUMMARY:

We chose to compare our products with POST++, because we believe that it is best of our competition. Yet, as we have shown on a larger (60MB) benchmark, the POST++ footprint is 2x bigger and the time of saving/retrieving is about 2x longer than for DOL. POST++ cannot save data in the transactional mode as PPF does, and it is a free, unsupported software from Russia.

POST++ does not use a code generator as DOL does, but it does not handle some cases of virtual functions, its default constructors are reserved, its data are not portable between different platforms, and it cannot handle schema migration. To be fair, POST++ is easier to retrofit on existing projects but we will soon provide an optional interface in a similar style. The fact that POST++ supports STL may be an advantage for some, but even that is just a matter of time. Our new IN_CODE modelling is already making STL obsolete.