Analytic benchmarks

Author(s): Manuel Carro (adapted to Ciao Prolog).

This module provides a set of analytic benchmarks which try to isolate and measure the speed of certain very common operations in a Prolog system. The benchmarks come from a variety of sources, mostly within former ECRC (please look at the comments in the source code) and were posted by Jean-Claude Syre on 24th June 1986 to the Prolog Digest. Further packaging was done by Thoma Sjoland and SICS. They were adapted to Ciao Prolog by Manuel Carro, by:

  • Changing the syntax to Ciao Prolog, when needed, including extensive use of higher-order programming for the benchmarking loops.

  • Adapting the size of the benchmarks and the number of repetitions to fit modern computers and Prolog compilation.

  • Changing the format of the output and recording performance data in the incore database.

  • Changing benchmarking loops to be failure-driven.

  • Adding a void initial dry run to heat up the caches.

The comments corresponding to the original programs follow. They have been largely unchanged, except to reflect changes in the program interface necessary to perform the modularization and to adapt them to Ciao Prolog. Of course the number of repeated calls was changed. The original comments are still in the source files.

Testing Calls

This is the one you always dreamed to test! Like all benchmarks, it uses a loop calling the actual benchmark program. The benchmark program consists of a sequence of 200 predicates having no arguments, no choice points, NOTHING. 200 is chosen to have sufficient accuracy in measuring the execution time.

The results show the effect of pure calls, and the Klips performance can be called the peak performance of the prolog system. Note that the peak performance has very little significance to classify the overall performance of a Prolog system.

Testing non-deterministic behavior

This program contains a series of 3 different benchmark predicates.

The predicate choice_point/1 tests calls invoking the creation of a choice point, i.e. a branch point where the execution will possibly come back to in case of backtracking. It does NOT backtrack. Two versions are proposed, one with and the other without arguments.

We then present two predicates to evaluate the mechanism of backtracking during execution. Both predicates create one choice_point and then backtrack 20 times on every loop iteration step. baktrak1/1 exhibits a kind of backtracking called deep, while baktrak2/1 deals with shallow backtracking. Both are worth being tried, whatever your particular Prolog System is.

Testing environment handling

The creation and deletion of environments are an important feature in prolog machines. The following program attempts to evaluate that. A usual condition to have environments is that the clause is made of several goals. Thus there will be calls in the clause creating environments, and some work to set the parameters of each call. Three arguments per goal were chosen because this number comes close to the average number of arguments of a predicate and to the average number of permanent variables in an environment. The arguments were arranged in different orders for every goal, because we did not want to measure the merits of register transfer optimisations. Note that these transfers exist, so the results cannot be compared with those given by the program which creates choice points (generally slower).

Another version, envir0ar/1, with 0 argument in each call, can also be usefully tried

Testing indexing mechanisms

We give only one test for indexing, i.e. the selection of a clause due to the type of an argument. This program does not test the merits of indexing on an argument other than the first one. It does not test for multiple indexing either. It does not show the inefficiency which occurs if 2 choice points per clause are created. This may happen e.g. in Warren's indexing scheme.

Each of these tests would require an extra benchmark program. The program given below tests the main point in indexing. Right now we think it is not worth adding all this complexity to the benchmarks, in order to measure all the details in indexing. Therefore we give only this single test.

Testing unification

We have 6 programs to evaluate the process of unification in the Prolog system:

  • Test of list construction via unification.

  • Test of list matching unification.

  • Test of structure construction via unification This program is equivalent to construct_list, except that it uses the standard structure representation instead of the simplified list notation.

  • Test of structure matching via unification. This predicate matches a list of 100 elements in structure notation.

  • Test to match a nested structure. This predicate tests the (compiled) unification of a complex structure.

  • Test of general unification of 2 complex structures. This predicate tests general unification. We call it general unification, because it cannot be analysed at compile time. Therefore this kind of unification cannot be compiled and, even in a compiled system, it must be handled at run time, exactly as by an interpreter. This is done by a general procedure for unification. The name of the benchmark therefore does not reflect that the unification is general, i.e. including all Prolog types (e.g. it does not contain variables), but it reflects the use of the procedure for general unification as opposed to specific, compiled unification.

Manuel Carro: note that in this case the term "Logical Inference" is a bit contrived, since by design some of these (head) unifications are very more compled, naturally being slower and giving slow KLIPS results.

Testing dereferencing

Program to benchmark the dereferencing speed. It constructs a list containing 500 variables which are then bound together. Since different systems use different strategies for binding variables on the global stack, the whole is made for two lists and the long variable chain is created only in one of them.

Manuel Carro: different results in this benchmark are not likely to affect larger, general programs. It is a well-known fact that n programs tend not to generate long dereferencing chains. Empirical measurements show that dereference chains of length greater than three are extremely rare. So a suboptimal / optimal behavior in this test is not likely to affect greatly the overall speed of a system.

Testing the cut

It seems almost impossible to isolate the cut operator in a simple test program. However, the cut-testing program in this benchmark set contains a lot of cut at exec time. It may be regarded as a partial test of cut, and may be worthwhile for some software implementations of Prolog. cuttest/1 calls the cutit11 predicate, which performs 100 calls to a predicate cutt1 where a cut operator appears in the second clause. Having indexing makes the evaluation of the cut more accurate, so please indicate in our result whether or not your Prolog system uses indexing, to clarify the comparison with others.

Assorted small programs

Here we deal with prolog programs that do something, while being still small but representative of some well-known Prolog computations. This set should be augmented by other programs, some of them might come from your ideas.

Some of the following programs were taken from the Berkeley paper by Peter Van Roy "A Prolog Compiler for the PLM". Other programs were kindly donated by the following ECRC members: Helmut Simonis, Mehmet Dincbas, Micha Meier and Pascal Van Hentenryck.

The programs have been statically analysed and they represent fairly standard programs as far as the statistical averages are concerned. That is the arity of most clauses is 2 or 3 and there are usually 2 or 3 clauses per predicate. The programs range from fairly trivial programs like fibonacci series to problems such as Hamiltonian graph traversal.

Also, some more programs have been added since the last release and some corrections have been made. Most of the writes were removed in order to reduce i/o activity.

The programs added were symbolic differentiation (from Warren's paper) and a quick sort algorithm using difference lists. The last addition is a bit of a rogue: its a naive reverse, where one can enter the list length. The list gets constructed and then gets reversed.

We are grateful to Saumya Debray from Stony Brook and others for comments, suggestions, feedback and useful inputs.

These benchmarks were run on a VAX 785 with 8 Meg of memory, under 4.2 BSD Unix. The interpreter was C-Prolog version 1.5.

This entire file (without mail/net headers) contains 584 lines.

Name      |      Call by      |  # of Inferences  | KLips
          |                   |  (one iteration)  | (C-Prolog)
fib       | fibonacci(1).     |        4932       |   2.0
map       | map(200).         |          68       |   1.3
mham      | mham(1).          |      493824       |   1.7
mutest    | mutest(1).        |        1366       |   2.3
quicksort | qs(10).           |         601       |   1.9
queens    | qu(10).           |         684       |   1.7
query     | query(1).         |        2294       |   0.9
sym_diff  | differen(150).    |          71       |   1.5
diff_lists| diff(50).         |         608       |   2.1
nrev  10  | nrev.             |          66       |   2.0
nrev  30  | nrev.             |         496       |   2.5
nrev  50  | nrev.             |        1326       |   2.5
nrev 100  | nrev.             |        5151       |   2.5
nrev 150  | nrev.             |       11476       |   2.5
nrev 200  | nrev.             |       20301       |   2.5

Usage and interface

Documentation on exports



Main entry point. Execute all benchmarks and report on the performance obtained. This makes it easy to run the set of benchmarks as an executable. Its behavior regarding printing gathered data can be controlled with the list of flags passed as argument. Data is always asserted and available to other programs through the dump_benchmark_data/0 and access_benchmark_data/8 predicates.

  • The following properties should hold at call time:
    (term_typing:nonvar/1)Flags is currently a term which is not a free variable.
    (basic_props:list/2)Flags is a list of benchmark_usages.



Options which determine what this module should do with the execution results when called through the main/1 entry point (i.e., if compiled to an executable). It is defined as


with the following meaning:

  • '--no-human': do not dump human-readable data.

  • '--no-machine': do not dump data as a series of facts (which is a machine-readable format) which can be saved to a file and later read back in Prolog.

  • '--send-info': send a mail to the Ciao developers with the information gathered plus a terse description of the machine (O.S., architecture, CPU type and speed). The existence of a suitable user command to send mail is expected. No message is sent otherwise. No sensible (personal, etc.) information is gathered or sent.

  • --base-file-name file-name: use file-name as a base to generate file with the reports this module generates. The machine-oriented file will have the .pl extension and the human-oriented file will have the .txt extension.

The options aboce can be used when calling main/1 predicate or as command-line options for an executable built from this file. Note that the default options print available data both in human-readable and machine-readable formats.

  • Call and exit should be compatible with:
    (basic_props:atm/1)Flag is an atom.



Run the set of benchmarks in this program and save the speed information gathered. They can be later accessed using the predicates generate_machine_file/0 or generate_human_file/0.



    Print to standard output a human-readable report of the information gathered by running just_benchmarks/0.



      Print to standard output a machine-readable report of the information gathered by running just_benchmarks/0.



        Send a message to the Ciao developers with a report of the information gathered by running just_benchmarks/0.

          No further documentation available for this predicate.

          No further documentation available for this predicate.

          Known bugs and planned improvements

          • The actual logical inferences each benchmark does has to be checked.