Replacing Virtual Methods with Templates

Abstract

Virtual methods are a very powerful system to implement the basics of the object oriented abstraction. On the other hand, virtual methods have hidden costs. This document describes a way to replace virtual methods with templates to be able to maintain an high abstraction without paying the hidden cost of virtual methods.

Introduction

In this document we explore how it is possible to use templates to substitute the virtual methods call in C++.

virtual methods are a way used by object oriented languages to implement late binding. When a method is called on an object, the late binding system takes care of calling the right method inside a hierarchy of objects. For instance:

class base {
public:
  virtual void foo() { /* do something here */ }
};

class derived: public base {
  virtual void foo() { /* do something else here */ }
};

int main(int argc, char** argv) {
  base *x = new derived();
  x->foo(); // the foo method of derived class is called
  return 0;
}

Also if the object that we are using is a pointer to a base class object, the method is dispatched (at run time) in such a way that the method that is really called is the one belonging to the derived class.

In the rest of this document, for our examples and explanations, we will use the g++ compiler version 4.3.3 on a x86_64 architecture running ubuntu.

This document is recommended to readers who already know C++, templates and what a virtual table is. A little bit of x86_64 assembly knowledge is needed too.

Our Master Example

The example that we will use in this document is the following one. Suppose that we want to create a family of loggers to log the activities of a program and we want to be able to log on different systems: files, remote network logging servers and so on.

Inheritance and Virtual Methods

The standard theory of object oriented modeling and programming teaches us that the best way to do this job is to use a hierarchy of classes rooted in a base class, namely the logger class that implements the methods that the user will use to report warning, errors and simple informative messages.

To provide logging on different media, we demand the real writing of the log messages to a method that will be specialized in the sub classes, namely the write method. We keep this last method protected because we do not want the user to invoke it directly.

A "toy" code for this example is reported below:

#include <iostream>
#include <string>
#include <vector>

class logger {
public:
  void warn(const std::string& msg){ write("WARN", msg); }
  void error(const std::string& msg){ write("ERROR", msg); }
  void info(const std::string& msg){ write("INFO", msg); }

  virtual ~logger() { }

protected:
  virtual void write( const std::string& kind, const std::string& msg ) = 0;
};

class file_logger: public logger {
  /*...*/
public:
  file_logger( const std::string& file ) { /*...*/ }
  ~file_logger() { /*...*/ }

protected:
  virtual void write( const std::string& kind, const std::string& msg ) {
    /*...*/
  }
};

class network_logger: public logger {
  /*...*/
public:
  network_logger( const std::string& address, unsigned short port ) { /*...*/ }
  ~network_logger() { /*...*/ }

protected:
  virtual void write( const std::string& kind, const std::string& msg ) {
    /*...*/
  }
};

Here, we do not implement the various write methods for simplicity (along with constructors, etc.) and the warn, error and info methods are just invocations of the write instead of a nice formatting of the message with the timestamp and other data that could be useful in a logger.

The previous code can be easily used as following:

// as pointer to the base class
logger *l = new file_logger( "foo.log" );
l->warn("this is a warning message");

// as automatic variable on the stack
network_logger nl( "10.0.0.32", 8888 );
nl.warn("this is another warning message");

Where is the Cost?

The method is clean, nice and correct. The level of abstraction is the right one and we could be happy... But we are not (now, I have a satanic smile on my face).

Just to be curious, we want to see what the warn method is implemented by the C++ compile. To do so, we compile the file that contains the example code (inheritance.cpp) using the -S and the -O0 options.

The -S tells the compile to compile the code into the equivalent assembler and the -O0 option disable optimizations (we do not want the code to be inlined etc.). The resulting code for the warn method is the following one:

        /* inheritance.cpp 7 */
        /*   void warn(const std::string& msg){ write("WARN", msg); } */
        pushq %rbp
.LCFI92:
        movq %rsp, %rbp
.LCFI93:
        pushq %r12
.LCFI94:
        pushq %rbx
.LCFI95:
        subq $64, %rsp
.LCFI96:
        movq %rdi, -40(%rbp)
        movq %rsi, -48(%rbp)

        /* inheritance.cpp 7 */
        /*   void warn(const std::string& msg){ write("WARN", msg); } */
        movq -40(%rbp), %rax
        movq (%rax), %rax
        addq $16, %rax
        movq (%rax), %rax
        movq %rax, -64(%rbp)
        leaq -17(%rbp), %rdi
        call _ZNSaIcEC1Ev
        leaq -17(%rbp), %rdx
        leaq -32(%rbp), %rdi
        movl $.LC3, %esi
.LEHB0:
        call _ZNSsC1EPKcRKSaIcE
.LEHE0:
        movq -48(%rbp), %rdx
        leaq -32(%rbp), %rsi
        movq -40(%rbp), %rdi
.LEHB1:
        call *-64(%rbp)
.LEHE1:
        leaq -32(%rbp), %rax
        movq %rax, -56(%rbp)
        movq -56(%rbp), %rdi
.LEHB2:
        call _ZNSsD1Ev
.LEHE2:
        jmp .L88
.L86:
        movq %rax, -80(%rbp)
        movq %rdx, -72(%rbp)
.L83:
        movl -72(%rbp), %r12d
        movq -80(%rbp), %rbx
        leaq -32(%rbp), %rax
        movq %rax, -56(%rbp)
        movq -56(%rbp), %rdi
        call _ZNSsD1Ev
        movq %rbx, -80(%rbp)
        movslq %r12d,%r12
        movq %r12, -72(%rbp)
        jmp .L84
.L88:
        leaq -17(%rbp), %rdi
        call _ZNSaIcED1Ev
        addq $64, %rsp
        popq %rbx
        popq %r12
        leave
        ret
.L87:
        movq %rax, -80(%rbp)
        movq %rdx, -72(%rbp)
.L84:
        movl -72(%rbp), %r12d
        movq -80(%rbp), %rbx
        leaq -17(%rbp), %rdi
        call _ZNSaIcED1Ev
        movq %rbx, -80(%rbp)
        movslq %r12d,%r12
        movq -80(%rbp), %rdi
.LEHB3:
        call _Unwind_Resume
.LEHE3:
.LFE1198:
        .size _ZN6logger4warnERKSs, .-_ZN6logger4warnERKSs
.globl __gxx_personality_v0
        .section .gcc_except_table,"a",@progbits
.LLSDA1198:
        .byte 0xff
        .byte 0xff
        .byte 0x1
        .uleb128 .LLSDACSE1198-.LLSDACSB1198
.LLSDACSB1198:
        .uleb128 .LEHB0-.LFB1198
        .uleb128 .LEHE0-.LEHB0
        .uleb128 .L87-.LFB1198
        .uleb128 0x0
        .uleb128 .LEHB1-.LFB1198
        .uleb128 .LEHE1-.LEHB1
        .uleb128 .L86-.LFB1198
        .uleb128 0x0
        .uleb128 .LEHB2-.LFB1198
        .uleb128 .LEHE2-.LEHB2
        .uleb128 .L87-.LFB1198
        .uleb128 0x0
        .uleb128 .LEHB3-.LFB1198
        .uleb128 .LEHE3-.LEHB3
        .uleb128 0x0
        .uleb128 0x0

The first part (labels LCFI92 to LCFI96) deals with arguments passed to the function. When we arrive to the body of the function we meet this piece of code:

        movq -40(%rbp), %rax
        movq (%rax), %rax
        addq $16, %rax
        movq (%rax), %rax
        movq %rax, -64(%rbp)

This code reads the virtual table of the object and finds out the pointer (in memory) where the right write is placed. Then, after some packing of arguments to be passed to that function, the write is called with the following piece of code:

        call *-64(%rbp)

The last part of the code is the code used to clean stuff and to return from the method and the code to be executed in case of exception. We are not interested in it.

Can you guess where the problem is?

The problem is that every time that we call the warn method we have to load the virtual table, look inside it the location of the right write method and call it using an indirection. The indirection is needed because the address of the method is not in the code of the warn but it is in the virtual table.

Can we do better? Yes, we can!

Using Templates to Mimicry Virtual Methods

If we do not want to pay the cost of looking inside the virtual table and deference the pointer to the write method, we just need not to have a virtual table and know a priori where the write code is.

It looks hard, but it is not.

The easy way to do this is to use templates. Using templates we can move the logic of the write method not to derived classes but to specific base classes that the logger will implement. The base classes that provide the write method are the base classes of our logger and they are passed as template parameters of the logger. Moreover, the write method does not need to be virtual and we do not need a virtual table (we could need for other reasons, but this is another story...) and we know the exact location of the method to be called.

The code that implements the idea previously described is the following one:

#include <iostream>
#include <string>

template<typename writer>
class logger: public writer {
public:
  void warn(const std::string& msg){ this->write("WARN", msg); }
  void error(const std::string& msg){ this->write("ERROR", msg); }
  void info(const std::string& msg){ this->write("INFO", msg); }
};

class file_writer {
  /*...*/
public:
  void init( const std::string& file ) { /*...*/ }

protected:
  void write( const std::string& kind, const std::string& msg ) {
    /*...*/
  }
};

class network_writer {
  /*...*/
public:
  void init( const std::string& address, unsigned short port ) { /*...*/ }

protected:
  void write( const std::string& kind, const std::string& msg ) {
    /*...*/
  }
};

typedef logger<file_writer> file_logger;
typedef logger<network_writer> network_logger;

The logger is no more the base class but it is a derived class. And what about the base class? The possible base classes are the ones that provide the logic for the write method. We call that classes writers.

The writer classes implement both the write method and the init method. The last one is used to initialize the objects (as we did before with subclasses constructors) because we define them as base classes for the logger and this one cannot have the "constructors" for all the possible super classes.

We provide two typedefs to guarantee an easy usage and easy remember names.

The previous code can be easily used as following:

network_logger nl;
nl.init( "10.0.0.32", 8888 );
nl.warn("this is another warning message");

On the opposite of the virtual method example, we construct the object on the stack, then we initialize. As last step, we can call the method.

Where is the advantage?

Now we want to see if the new code is better than the old one (i.e., it does not need to deal virtual tables and it knows where the write function is).

As before, we will read the assembly code for the warn method:

        /* template.cpp 7 */
        /*   void warn(const std::string& msg){ this->write("WARN", msg); } */
        pushq %rbp
.LCFI15:
        movq %rsp, %rbp
.LCFI16:
        pushq %r12
.LCFI17:
        pushq %rbx
.LCFI18:
        subq $64, %rsp
.LCFI19:
        movq %rdi, -40(%rbp)
        movq %rsi, -48(%rbp)

        /* template.cpp 7 */
        /*   void warn(const std::string& msg){ this->write("WARN", msg); } */
        leaq -17(%rbp), %rdi
        call _ZNSaIcEC1Ev
        leaq -17(%rbp), %rdx
        leaq -32(%rbp), %rdi
        movl $.LC3, %esi
.LEHB0:
        call _ZNSsC1EPKcRKSaIcE
.LEHE0:
        movq -40(%rbp), %rdi
        movq -48(%rbp), %rdx
        leaq -32(%rbp), %rsi
.LEHB1:
        call _ZN11file_writer5writeERKSsS1_
.LEHE1:
        leaq -32(%rbp), %rax
        movq %rax, -56(%rbp)
        movq -56(%rbp), %rdi
.LEHB2:
        call _ZNSsD1Ev
.LEHE2:
        jmp .L21
.L19:
        movq %rax, -72(%rbp)
        movq %rdx, -64(%rbp)
.L16:
        movl -64(%rbp), %r12d
        movq -72(%rbp), %rbx
        leaq -32(%rbp), %rax
        movq %rax, -56(%rbp)
        movq -56(%rbp), %rdi
        call _ZNSsD1Ev
        movq %rbx, -72(%rbp)
        movslq %r12d,%r12
        movq %r12, -64(%rbp)
        jmp .L17
.L21:
        leaq -17(%rbp), %rdi
        call _ZNSaIcED1Ev
        addq $64, %rsp
        popq %rbx
        popq %r12
        leave
        ret
.L20:
        movq %rax, -72(%rbp)
        movq %rdx, -64(%rbp)
.L17:
        movl -64(%rbp), %r12d
        movq -72(%rbp), %rbx
        leaq -17(%rbp), %rdi
        call _ZNSaIcED1Ev
        movq %rbx, -72(%rbp)
        movslq %r12d,%r12
        movq -72(%rbp), %rdi
.LEHB3:
        call _Unwind_Resume
.LEHE3:
.LFE976:
        .size _ZN6loggerI11file_writerE4warnERKSs, .-_ZN6loggerI11file_writerE4warnERKSs
.globl __gxx_personality_v0
        .section .gcc_except_table,"a",@progbits
.LLSDA976:
        .byte 0xff
        .byte 0xff
        .byte 0x1
        .uleb128 .LLSDACSE976-.LLSDACSB976
.LLSDACSB976:
        .uleb128 .LEHB0-.LFB976
        .uleb128 .LEHE0-.LEHB0
        .uleb128 .L20-.LFB976
        .uleb128 0x0
        .uleb128 .LEHB1-.LFB976
        .uleb128 .LEHE1-.LEHB1
        .uleb128 .L19-.LFB976
        .uleb128 0x0
        .uleb128 .LEHB2-.LFB976
        .uleb128 .LEHE2-.LEHB2
        .uleb128 .L20-.LFB976
        .uleb128 0x0
        .uleb128 .LEHB3-.LFB976
        .uleb128 .LEHE3-.LEHB3
        .uleb128 0x0
        .uleb128 0x0

Two beautiful news!

The first one is that the code to deal with the virtual table is no more generated.

The second one is that the code to call the write method, now, points directly to the position of the code:

        call _ZN11file_writer5writeERKSsS1_

Mission accomplished!

There is another good point to consider that we can spot from this example. Since the code that calls the write method points directly to the position of the code, it is possible, at higher optimization levels, to directly inline the code of the write method inside the warn method. This is not possible, in general, with virtual method calls because the address of the write method is known only at run-time.

Where are the disadvantages?

The disadvantage of the proposed system is that we loose the polymorphic characteristic of the code using virtual methods. For instance with the virtual methods based system it was possible to write code like this:

void f( logger *l ) {
   /*...*/
   l->warn("something");
   /*...*/
}

...

logger *l = new file_logger( "foo.log" );
logger *m = new network_logger( "10.0.0.32", 8888 );

f(l);
f(m);

...

The previous code can be still written using even more templates in this way:

template< typename ALOGGER >
void f( ALOGGER& l ) {
   /*...*/
   l.warn("something");
   /*...*/
}

...

file_logger l( "foo.log" );
network_logger m( "10.0.0.32", 8888 );

f(l);
f(m);
...

Because the compiler creates the specialized version of the code for each type used as type of the parameter.

It is obviously not possible to write code like this:

...

std::vector v;
v.push_back( new file_logger( "foo.log" ) );
v.push_back( new network_logger( "10.0.0.32", 8888 ) );

for( std::vector::iterator i = v.begin(); i != v.end(); ++i ) {
    (*i)->warn("something")
}
...

Because of the lack of the base class common to the specialized loggers.

But for the scope of this document, that was "substitute virtual methods with templates" a solution that takes care of a lot of cases but not all possible cases is good enough... after all the two systems are complementary and should be used for different purposes too.

Comparing the Two Codes

Here, we have the two pieces of assembly code, side by side, to be easily compared to see the differences. Apart from the missing virtual table code and the direct call of the right method, we can spot some lines switched: after inspection, we noticed that the switched lines are indeed semantically equivalent and it is just a matter of who wrote the code generators (with and without templates).

        pushq %rbp
.LCFI92:
        movq %rsp, %rbp
.LCFI93:
        pushq %r12
.LCFI94:
        pushq %rbx
.LCFI95:
        subq $64, %rsp
.LCFI96:
        movq %rdi, -40(%rbp)
        movq %rsi, -48(%rbp)

        movq -40(%rbp), %rax
        movq (%rax), %rax
        addq $16, %rax
        movq (%rax), %rax
        movq %rax, -64(%rbp)
        leaq -17(%rbp), %rdi
        call _ZNSaIcEC1Ev
        leaq -17(%rbp), %rdx
        leaq -32(%rbp), %rdi
        movl $.LC3, %esi
.LEHB0:
        call _ZNSsC1EPKcRKSaIcE
.LEHE0:
        movq -48(%rbp), %rdx
        leaq -32(%rbp), %rsi
        movq -40(%rbp), %rdi
.LEHB1:
        call *-64(%rbp)
.LEHE1:
        leaq -32(%rbp), %rax
        movq %rax, -56(%rbp)
        movq -56(%rbp), %rdi
.LEHB2:
        call _ZNSsD1Ev
.LEHE2:
        jmp .L88
.L86:
        movq %rax, -80(%rbp)
        movq %rdx, -72(%rbp)
.L83:
        movl -72(%rbp), %r12d
        movq -80(%rbp), %rbx
        leaq -32(%rbp), %rax
        movq %rax, -56(%rbp)
        movq -56(%rbp), %rdi
        call _ZNSsD1Ev
        movq %rbx, -80(%rbp)
        movslq %r12d,%r12
        movq %r12, -72(%rbp)
        jmp .L84
.L88:
        leaq -17(%rbp), %rdi
        call _ZNSaIcED1Ev
        addq $64, %rsp
        popq %rbx
        popq %r12
        leave
        ret

        pushq %rbp
.LCFI15:
        movq %rsp, %rbp
.LCFI16:
        pushq %r12
.LCFI17:
        pushq %rbx
.LCFI18:
        subq $64, %rsp
.LCFI19:
        movq %rdi, -40(%rbp)
        movq %rsi, -48(%rbp)






        leaq -17(%rbp), %rdi
        call _ZNSaIcEC1Ev
        leaq -17(%rbp), %rdx
        leaq -32(%rbp), %rdi
        movl $.LC3, %esi
.LEHB0:
        call _ZNSsC1EPKcRKSaIcE
.LEHE0:
        movq -40(%rbp), %rdi
        movq -48(%rbp), %rdx
        leaq -32(%rbp), %rsi
.LEHB1:
        call _ZN11file_writer5writeERKSsS1_
.LEHE1:
        leaq -32(%rbp), %rax
        movq %rax, -56(%rbp)
        movq -56(%rbp), %rdi
.LEHB2:
        call _ZNSsD1Ev
.LEHE2:
        jmp .L21
.L19:
        movq %rax, -72(%rbp)
        movq %rdx, -64(%rbp)
.L16:
        movl -64(%rbp), %r12d
        movq -72(%rbp), %rbx
        leaq -32(%rbp), %rax
        movq %rax, -56(%rbp)
        movq -56(%rbp), %rdi
        call _ZNSsD1Ev
        movq %rbx, -72(%rbp)
        movslq %r12d,%r12
        movq %r12, -64(%rbp)
        jmp .L17
.L21:
        leaq -17(%rbp), %rdi
        call _ZNSaIcED1Ev
        addq $64, %rsp
        popq %rbx
        popq %r12
        leave
        ret

Conclusions

In this document we explored how the virtual methods are implemented and we discovered the hidden costs of such method. Then, we found a way to have a way to abstract concepts with templates and we used it to substitute virtual calls.

Edit (2014-06-06 23.27 GMT)

After many years — this piece was originally written in the October of 2009 — I got an email from a reader (Mohamed Sakr — 3dsakr [at] gmail [dot] com) who pointed out the similarities with the Curiously Reoccurring Template Pattern (CRTP, from now on) claiming, as well, to prefer that one to my approach. Fair enough.

While the practical effect of both the CRTP pattern and the system described here are exactly the same, as both eliminate the need for resolving the method call through the virtual table (saving an jump in memory), there are a couple of suble differences.

Let's start from the explanation of the differences. With the CRTP, the code of the logger above would have looked like the following one:

template<typename writer>
class logger {
public:
  void warn(const std::string& msg) {
       static_cast(this)->write("WARN", msg);
  }
  /*...error(), info()...*/
};

class file_logger: public logger<file_logger>{
    /*...*/
public:
  void write( const std::string& kind, const std::string& msg ) {
    /*...*/
  }
};

class network_logger: public logger<network_logger> {
  /*...*/
public:
  void write( const std::string& kind, const std::string& msg ) {
    /*...*/
  }
};

The very first difference is that both the file_logger and the network_logger now inherit from the logger base class, fact that, in all honesty, is elegant makes sense.

The second difference is that the logger needs to perform a down-cast of itself (i.e., casting the this pointer) to the derived class. While this is acceptable, it is not something that I percieve as elegant or, at least, not too elegant.

The third difference is that the write method in the derived classes needs to be declared as public and it cannot be hidden as in the solution proposed above. Why this? Essentially because a protected section of a class is visible only to the classes inheriting from that base class. With the CRTP, since the inheritance relation changes from logger is-a writer to file_logger is-a logger, the logger cannot see the methods in the file_logger class. By design.

The following one is the output of gcc (version 4.8.1) when asked to compile the code using the CRTP with the write methos marked as protected:

crtp.cpp: In instantiation of 'void logger<writer>::warn(const std::string&) [with writer = file_logger]':
crtp.cpp:29:24:   required from here
crtp.cpp:21:9: error: 'void file_logger::write(const std::string&, const std::string&)' is protected
    void write(const std::string& kind, const std::string& msg)
         ^
crtp.cpp:13:9: error: within this context
         static_cast(this)->write("WARN", msg);
         ^

While, according to my taste, the first difference is a positive one, the second one is not too elegant but it works, the third difference is something that I do not like because it breaks the general — and initial — idea that the write method should be an implementation detail and, for this reason, not visible and/or callable from the outside.

That being said, the CRTP is a nice way of adding some behaviour, both internal and external, to classes. For instance when the base class needs to perform operations only in the constructor and destructor methods (as in the case of a base class implementing a counter for the number of instances of the objects, of a certain type, around the system) or in the case of the base class implementing a generic Singleton pattern providing a get_instance method to all the derived classes.