Introduction to metaprogramming in C++

This book excerpt from “C++ Templates, Second Edition” explains how to get more functional, more efficient, and more easily maintained code where user-defined computation happens at translation time

Introduction to metaprogramming in C++

Metaprogramming consists of programming a program. In other words, you lay out code that the programming system executes to generate new code that implements the functionality you really want. Usually, the term metaprogramming implies a reflexive attribute: The metaprogramming component is part of the program for which it generates a bit of code (that is, an additional or different bit of the program).

Why would metaprogramming be desirable? As with most other programming techniques, the goal is to achieve more functionality with less effort, where effort can be measured as code size, maintenance cost, and so forth. What characterizes metaprogramming is that some user-defined computation happens at translation time. The underlying motivation is often performance (things computed at translation time can frequently be optimized away) or interface simplicity (a metaprogram is generally shorter than what it expands to) or both.

Metaprogramming often relies on the concepts of traits and type functions.

The state of modern C++ metaprogramming

C++ metaprogramming techniques evolved over time. Let’s categorize the various approaches to metaprogramming that are in common use in modern C++.

Value metaprogramming

In the first edition of this book, we were limited to the features introduced in the original C++ standard (published in 1998, with minor corrections in 2003). In that world, composing simple compile-time (“meta-”) computations was a minor challenge. We therefore devoted a good chunk of this chapter to doing just that; one fairly advanced example computed the square root of an integer value at compile time using recursive template instantiations. C++ 11 and, especially, C++ 14 removed most of that challenge with the introduction of constexpr functions. (The C++11 constexpr capabilities were sufficient to solve many common challenges, but the programming model was not always pleasant (for example, no loop statements were available, so iterative computation had to exploit recursive function calls). C++ 14 enabled loop statements and various other constructs.)

For example, since C++ 14, a compile-time function to compute a square root is easily written as follows:

template<typename T>
constexpr T sqrt(T x)
   // handle cases where x and its square root are 
// equal as a special case to simplify // the iteration criterion for larger x: if (x <= 1) { return x; } // repeatedly determine in which half of
// a [lo, hi] interval the square root of x is located, // until the interval is reduced to just one value: T lo = 0, hi = x; for (;;) { auto mid = (hi+lo)/2, midSquared = mid*mid; if (lo+1 >= hi || midSquared == x) { // mid must be the square root: return mid; } // continue with the higher/lower half-interval: if (midSquared < x) { lo = mid; } else { hi = mid; } } }

This algorithm searches for the answer by repeatedly halving an interval known to contain the square root of x (the roots of 0 and 1 are treated as special cases to keep the convergence criterion simple). This sqrt() function can be evaluated at compile or run time:

static_assert(sqrt(25) == 5, "");  
// OK (evaluated at compile time) static_assert(sqrt(40) == 6, "");
// OK (evaluated at compile time) std::array<int, sqrt(40)+1> arr;
// declares array of 7 elements (compile time) long long l = 53478; std::cout << sqrt(l) << '\n';
// prints 231 (evaluated at run time)

This function’s implementation may not be the most efficient at run time (where exploiting peculiarities of the machine often pays off), but because it is meant to perform compile-time computations, absolute efficiency is less important than portability. Note that no advanced “template magic” is in sight in that square root example, only the usual template argument deduction for a function template. The code is plain C++ and is not particularly challenging to read.

Value metaprogramming (that is, programming the computation of compile-time values) as we did above is occasionally quite useful, but there are two additional kinds of metaprogramming that can be performed with modern C++ (say, C++ 14 and C++ 17): type metaprogramming and hybrid metaprogramming.

Type metaprogramming

By relying on recursive template instantiation—a mainstay of template-based metaprogramming—we can perform type computations that are considerably more complex.

Consider the following small example:

// primary template: in general we yield the given type: 
template<typename T>
struct RemoveAllExtentsT {
   using Type = T;

// partial specializations for array types (with and without bounds): 
template<typename T, std::size_t SZ>
struct RemoveAllExtentsT<T[SZ]> {
   using Type = typename RemoveAllExtentsT<T>::Type;

template<typename T> struct RemoveAllExtentsT<T[]> { using Type = typename RemoveAllExtentsT<T>::Type; }; template<typename T> using RemoveAllExtents = typename RemoveAllExtentsT<T>::Type;

Here, RemoveAllExtents is a type metafunction (that is, a computational device that produces a result type) that will remove an arbitrary number of top-level “array layers” from a type. (The C++ standard library provides a corresponding type trait std::remove_all_extents.)

You can use it as follows:

RemoveAllExtents<int[]>          // yields int
RemoveAllExtents<int[5][10]>     // yields int
RemoveAllExtents<int[][10]>      // yields int
RemoveAllExtents<int(*)[5]>      // yields int(*)[5] 

The metafunction performs its task by having the partial specialization that matches the top-level array case recursively “invoke” the metafunction itself.

Computing with values would be very limited if all that was available to you were scalar values. Fortunately, just about any programming language has at least one container of values construct that greatly magnifies the power of that language (and most languages have a variety of container kinds, such as arrays/vectors and hash tables). The same is true of type metaprogramming: Adding a “container of types” construct greatly increases the applicability of the discipline. Fortunately, modern C++ includes mechanisms that enable the development of such a container.

Hybrid metaprogramming

With value metaprogramming and type metaprogramming, you can compute values and types at compile time. Ultimately, however, we’re interested in runtime effects, so we use these metaprograms in run time code in places where types and constants are expected. Metaprogramming can do more than that, however: You can programmatically assemble at compile time bits of code with a runtime effect. We call that hybrid metaprogramming.

To illustrate this principle, let’s start with a simple example: computing the dot-product of two std::array values. Recall that std::array is a fixed-length container template declared as follows:

namespace std {
   template<typename T, size_t N> struct array;

where N is the number of elements (of type T) in the array. Given two objects of the same array type, their dot-product can be computed as follows:

template<typename T, std::size_t N>
auto dotProduct(std::array<T, N> const& x, std::array<T, N> const& y)
   T result{};
   for (std::size_t k = 0; k<N; ++k) {
     result += x[k]*y[k];
   return result;

A straightforward compilation of the for loop will produce branching instructions that on some machines may cause some overhead compared to a straight-line execution of

result += x[0]*y[0]; 
result += x[1]*y[1]; 
result += x[2]*y[2]; 
result += x[3]*y[3]; 

Fortunately, modern compilers will optimize the loop into whichever form is most efficient for the target platform. For the sake of discussion, however, let’s rewrite the dotProduct() implementation in a way that avoids the loop:

template<typename T, std::size_t N>
struct DotProductT {
    static inline T result(T* a, T* b) {
        return *a * *b  +  DotProduct<T, N-1>::result(a+1,b+1);

// partial specialization as end criteria 
template<typename T>
struct DotProductT<T, 0> {
    static inline T result(T*, T*) {
       return T{};

template<typename T, std::size_t N>
auto dotProduct(std::array<T, N> const& x,
                std::array<T, N> const& y)
   return DotProductT<T, N>::result(x.begin(), y.begin());

This is known as loop unrolling. We generally recommend against explicitly unrolling loops in portable code because the details that determine the best unrolling strategy depend highly on the target platform and the loop body; the compiler usually does a much better job of taking those considerations into account.

This new implementation delegates the work to a class template DotProductT. That lets you use recursive template instantiation with class template partial specialization to end the recursion. Note how each instantiation of DotProductT produces the sum of one term of the dot-product and the dot-product of the remaining components of the array. For values of type std::array<T,N> there will therefore be N instances of the primary template and one instance of the terminating partial specialization. For this to be efficient, it is critical that the compiler inlines every invocation of the static member functions result(). Fortunately, that is generally true when even a moderate level of compiler optimizations is enabled.

We specify the inline keyword explicitly here because some compilers (notably Clang) take this as hint to try a little harder to inline calls. From a language point of view, these functions are implicitly inline because they are defined in the body of their enclosing class.

The central observation about this code is that it blends a compile-time computation (achieved here through recursive template instantiation) that determines the overall structure of the code with a runtime computation (calling result()) that determines the specific runtime effect.

We mentioned earlier that type metaprogramming is greatly enhanced by the availability of a “container of types.” We’ve already seen that in hybrid metaprogramming a fixed-length array type can be useful. Nonetheless, the true “hero container” of hybrid metaprogramming is the tuple. A tuple is a sequence of values, each with a selectable type. The C++ standard library includes a std::tuple class template that supports that notion. For example,

std::tuple<int, std::string, bool> tVal{42, "Answer", true};

defines a variable tVal that aggregates three values of types int, std::string, and bool (in that specific order). The type of tVal above is very similar to a simple struct type like:

struct MyTriple {
   int v1;
   std::string v2;
   bool v3; 

Given that in std::array and std::tuple you have flexible counterparts to array types and (simple) struct types, it is natural to wonder whether a counterpart to simple union types would also be useful for hybrid computation. The answer is yes. The C++ standard library introduced the std::variant template for this purpose in C++ 17.

Because std::tuple and std::variant, like struct types, are heterogeneous types, hybrid metaprogramming that uses such types is sometimes called heterogeneous metaprogramming.

Hybrid metaprogramming for unit types

Another example demonstrating the power of hybrid computing is libraries that can compute results of values of different unit types. The value computation is performed at run time, but the computation of the resulting units it determined at compile time.

Let’s illustrate this with a highly simplified example. We are going to keep track of units in terms of their ratio (fraction) of a principal unit. For example, if the principal unit for time is a second, a millisecond is represented with ratio of 1/1000 and a minute with ratio 60/1. The key, then, is to define a ratio type where each value has its own type:

template<unsigned N, unsigned D = 1>
struct Ratio {
   static constexpr unsigned num = N; // numerator 
   static constexpr unsigned den = D; // denominator 
   using Type = Ratio<num, den>; 

Now we can define compile-time computations such as adding two units:

1 2 3 Page 1
Page 1 of 3