% vim:spell spelllang=en_us textwidth=75

\documentclass{scrartcl}
\usepackage[utf8]{inputenc}
\usepackage{amsfonts}

\title{A generic expression template library for arithmetic data types}
\author{Tom Bachmann\footnote{\texttt{e\_mc\_h2@web.de}}}

\begin{document}
\maketitle

\section*{Introduction}

  This note describes the design I plan to implement this summer for a c++
wrapper for the FLINT library\footnote{\texttt{www.flintlib.org}}.

Any kinds of comments are appreciated.

\emph{Added after the summer}: this document turned out to be reasonably
accurate. I updated it slightly to reflect what was actually done.

\paragraph{Overview}
In first approximation, FLINT implements arithmetic operations on and
representations of elements of specific rings, \emph{with unlimited
precision}. The implemented rings include $\mathbb{Z}, \mathbb{Q},
\mathbb{Z}[X], \mathbb{Z}[X_{ij}], \mathbb{F}_q[X], \mathbb{Z}/n\mathbb{Z}$
and similar ones. FLINT is written in C, so arithmetic operations look like
\texttt{fmpz\_add(target, source1, source2)}. In crude terms, this note
describes how to write a C++ wrapper, allowing us to turn the above
expression into \texttt{target = source1 + source2}.


\section*{Objectives}

The wrapper library has to satisfy a number of competing objectives:

\begin{description}
\item[Performance] Whenever feasible, the C++ code should compile down into
    equivalent C code which is as close in performance to hand-written code as
    possible.
\item[Portability] The library should be usable on as many different
    compilers and compiler versions as possible.
\item[Easy extensibility] It should be straightforward for FLINT developers
    (which may only know C) to extend the wrapper library to add a new data
    type.
\item[Completeness] The wrapper library should expose all FLINT C
    functions.
\end{description}

There are also the following secondary objectives:

\begin{itemize}
\item If possible, the library should be sufficiently generic to be used by
    other open source projects seeking to create a C++ wrapper.
\item If possible, the wrapper should anticipate and/or facilitate
    generics.
\item The wrapper should allow for further layers of abstraction, e.g. to
    provide an NTL-compatible interface.
\end{itemize}


\section*{Design}

In order to meet the above goals, we have made the following decisions.
Performance will be achieved using expression templates, following the
C++98 standard. To improve diagnostics, we will use static assert
frequently (in C++98 mode we use standard implementations, in C++11 mode we
use the language internal one). There will be no automatic casts. The
wrapper library will be split clearly into FLINT-specific and generic
parts.

\subsection*{Overview}

Consider an expression like \texttt{fmpz a = b + c * d}. In an expression
template library, this consists of two stages. The first is expression
template \emph{parsing}, and the second is \emph{execution}.
The expression \texttt{b + c
* d} results in a temporary object, the type of which encodes the
operations performed, and the state of which consists of references to the
arguments \texttt{b}, \texttt{c} and \texttt{d}. The assignment then
triggers the execution. This means the library figures out how to most
efficiently evaluate the expression. In general, this may require the
allocation of a temporary \texttt{t}, and then executing \texttt{t = c *
d}, \texttt{a = b + t}. In the case of a newly initialized \texttt{a} as
here, we can avoid \texttt{t} and use \texttt{a} as a temporary.
Additionally, for some data types the C library may support addmul-type
operations, so we can compile down to a single C call.

\subsection*{Expression template representation and execution}

In order to facilitate reuse and extensibility, we decouple and modularize
the steps explained above. The main expression template class has
declaration as follows:

\begin{verbatim}
template<class Derived, class Operation, class Data>
class Expression
...
\end{verbatim}

The template argument \texttt{Operation} specifies which operation is to be
executed (for example ``plus'', ``times'' or even ``exp''), whereas
\texttt{Data} encodes the actual data required to perform the operation.
Part of the body of the class could look like this:

\begin{verbatim}
{
protected:
  Data data;

public:
  typedef Operation op_t;
  typedef Data data_t;

  explicit Expression(Data d) : data (d) {};

  template<class Right>
  typename Derived::template type<Plus, Pair<Expressien, Right> >
    operator+(const Right& r)
  {
    return Derived::template type<Plus, Pair<Expression, Right> >(
      Pair<Expression, Right>(*this, r);
  }
};
\end{verbatim}

There are a few peculiarities to note. Firstly \texttt{Plus} is just
an empty type, which simply serves as a tag. Secondly \texttt{Pair} is just
what it sounds like (essentially \texttt{std::pair}). But more importantly,
the template parameter \texttt{Derived} is used to automatically wrap
expression templates into a convenience derived class. It could look like
this:

\begin{verbatim}
template<class Policy>
struct derived
{
  template<class Operation, class Data>
  struct type
    : Expression<derived, Operation, Data>
  {
    template<class T>
    explicit type(const T& t) : Expression(t) {}

    void foo() {Policy();}
  };
};
\end{verbatim}

We still have not reached the type the user is actually going to
instantiate. For this, we can use a special type of operation that
signifies an immediate data. For example:

\begin{verbatim}
typedef derived<StandardPolicy>::type<Immediate, fmpz_t> Fmpz;
\end{verbatim}

All the indirection allows us to have instances of \texttt{Fmpz} come with a member
function \texttt{foo} which is customised by the ``StandardPolicy''. Note
that currently there are no non-trivial constructors or destructors for
Fmpz. These can be injected either via traits (see below), or via member
template enabling.

Next, we describe the execution stage. For this, we extend the
\texttt{Expression} class by an assignment operator, which uses a traits
library to run the execution. A simple version might look like this:

\begin{verbatim}
template<...> class Expression
{
...
  template<class Other>
  typename Derived::template type<Operation, Data>& operator=(const Other& o)
  {
    typedef traits::evaluate<Other> evaluator;
    traits::assign<Expression, typename evaluator::return_type>::doit(
      *this, evaluator::doit(o));
  }
};

namespace traits {
template<class T> struct evaluate;

template<class Operation, class Data>
struct evaluate_helper;

template<class Policy, class Operation, class Data>
struct evaluate<derived<Policy>::type<Operation, Data> >
{
  typedef evaluate_helper<Operation, Data> helper;
  typedef typename helper::return_type return_type;

  template<class T>
  return_type doit(const T& t){return helper::doit(t.data());}
};

template<>
struct evaluate_helper<Plus, Pair<Fmpz, Fmpz> >
{
  typedef Fmpz return_type;
  Fmpz doit(const Pair<Fmpz, Fmpz>& data)
  {
    ...
  }
};

template<class Operation, class T, class U>
struct evaluate_helper<Operation, Pair<T, U> >
{
  typedef evaluate<T> leval_t;
  typedef evaluate<U> reval_t;
  typedef evaluate_helper<Operation, Pair<
      typename leval_t::return_type, typename reval_t::return_type> > eh_t;
  typedef typename eh_t::return_type return_type;
  return_type doit(const Pair<T, U>& data)
  {
    leval_t::return_type t1 = leval_t::doit(data.left.data());
    reval_t::return_type t2 = reval_t::doit(data.right.data());
    return eh_t::doit(make_pair(t1, t2));
  }
};
}
\end{verbatim}

This code can evaluate arbitrary additions of Fmpz. It should be extended
to use three argument add, etc.

\subsection*{Temporary allocation}

The above design is functional, but results in unnecessarily many
temporaries. Instead, ...

\subsection*{Conflict resolution for traits}

A common problem with partial template specialisation is that instantiation
can fail completely if no partial order can be established. I propose two
ways around this: conditional template enabling, and a priority system. By
conditional template enabling I mean the equivalent of
\texttt{boost.enable\_if}. By a priority system, I mean adding an
additional integer template parameter \texttt{priority}, and having most
traits only enable themselves if the priority is a certain fixed number.
Then, using SFINAE techniques, we can iterate through the priorities and
use the highest-priority match.

This latter technique has the desirable property of being easy to
understand and use by non-experts.

\section*{Future plans}

The following sections contain deliberations about things which I did not
manage to work on this summer.

\subsection*{Class structure for (FLINT) generics}

Corresponding to any ring (or sometimes module),
there will be two classes: the \emph{context}
representing the ring itself, and the \emph{element} representing elements
of the ring. Contexts are immutable, but elements are usually not. Often
the context will be essentially empty (e.g. for $\mathbb{Z}$), but
sometimes it may hold data common to all elements, e.g. the modulus $n$ of
$\mathbb{Z}/n\mathbb{Z}.$ Every element instance holds a reference to a
context. In general arithmetic operations are only supported if the
contexts agree, but this is not usually checked.

We also differentiate between \emph{primitive
types}, \emph{compound types} and \emph{specialised types}. The primitive
types such as $\mathbb{Z}, \mathbb{Q}_p$
are the building blocks and are atomic from the point of view of this
wrapper. Compound types such as $A[T]$ (for any ring $A$) or $Frac(A)$ (for
domains $A$) are built from primitive and compound types. They have
implementations of arithmetic operations in terms of operations in $A,$ and
so are generic. Finally the specialised types are versions of compound
types with particular arithmetic implementations tailored to the particular
type, e.g. $\mathbb{Z}[T]$ (where multiplication can be implemented using
polynomial reconstruction and multiplication in $\mathbb{Z}$).

For the initial implementation, the generic types will not come with
arithmetic implementations. Instead I will focus on wrapping all the
particular implementations in FLINT.

Tables \ref{tab:primitive-types}, \ref{tab:compound-types} and
\ref{tab:specialised-types} list the primitive, compound, and specialised
types of the FLINT wrapper.

\begin{table}[h]
\begin{center}
\begin{tabular}{cc}
FLINT name & representing \\
\hline
fmpz & $\mathbb{Z}$ \\
padic & $\mathbb{Q}_p$ (to fixed accuracy) \\
ulong & $\mathbb{Z}/2^{s}\mathbb{Z}$ (where $s$ is the machine wordsize) \\
\end{tabular}
\end{center}
\caption{Primitive types for the FLINT wrapper.}
\label{tab:primitive-types}
\end{table}

\begin{table}[h]
\begin{center}
\begin{tabular}{cc}
wrapper name & representing \\
\hline
poly & $A[T]$ \\
fraction & $Frac(A)$ ($A$ a domain) \\
PIquotient & $A/aA$ \\
vector (not actually a ring) & $A^n$ (with $n$ large) \\
matrix (not always a ring) & $n \times m$ matrices over $A$ (with $m, n$ large) \\
\end{tabular}
\end{center}
\caption{Compound types for the FLINT wrapper.}
\label{tab:compound-types}
\end{table}

A potential later addition could be fixed-size vectors and matrices with
automatically unrolled operations.

\begin{table}[h]
\begin{center}
\begin{tabular}{cc}
FLINT name & specialising compound type \\
\hline
fmpq & \texttt{fraction<fmpz>} \\
fmpz\_poly\_q & \texttt{fraction<poly<fmpz>>} \\
\\
``mod'' & \texttt{PIquotient<fmpz>} \\
nmod & \texttt{PIquotient<ulong>} \\
\\
fmpz\_poly & \texttt{poly<fmpz>} \\
fmpq\_poly & \texttt{poly<fmpq>} \\
nmod\_poly & \texttt{poly<nmod>} \\
fmpz\_mod\_poly & \texttt{poly<PIquotient<fmpz> } \\
\\
fmpz\_vec & \texttt{vector<fmpz>} \\
nmod\_vec & \texttt{vector<PIquotient<ulong>>} \\
\\
fmpz\_mat & \texttt{matrix<fmpz>} \\
fmpq\_mat & \texttt{matrix<fraction<fmpz>>} \\
fmpz\_poly\_mat & \texttt{matrix<poly<fmpz>>} \\
nmod\_mat & \texttt{matrix<PIquotient<ulong>>} \\
nmod\_poly\_mat & \texttt{matrix<poly<PIquotient<ulong>>>}
\end{tabular}
\end{center}
\caption{Specialised types for the FLINT wrapper.}
\label{tab:specialised-types}
\end{table}

\subsection*{Wrapper class with additional member functions}

The above design already allows for members on every type. But suppose I
want to add a class \texttt{Tmpz} which behaves much
like \texttt{Fmpz}, except that it has different member functions. That is,
suppose I want to create a drop-in replacement for some other library,
leveraging the flint backend. This can be done as follows (in the
simplified notation without temporary avoidance):

\begin{verbatim}
struct derived2
{
  template<class Operation, class Data>
  struct type
    : Expression<derived2, Operation, Data>
  {
    template<class T>
    explicit type(const T& t) : Expression(t) {}

    void bar() {}
  }
};
typedef derived2::type<Immediate, Fmpz> Tmpz;

namespace traits {
template<class Data>
struct convert;

template<class Operation, class Data>
struct convert<derived2::template type<Operation, Data>
{
  typedef derived<StandardPolicy>::template type<Operation,
      typename convert<Data>::return_type> return_type;
  template<class T>
  return_type doit(const T& t)
  {
    return return_type(convert<Data>::doit(t.data()));
  }
};

template<>
struct convert<Tmpz>
{
  typedef Fmpz return_type;

  template<class T>
  return_type doit(const T& t)
  {
    return t.data();
  }
}

template<class Operation, class Data>
struct evaluate<derived2::type<Operation, Data> >
{
  typedef convert<derived2::type<Operation, Data> > eh2_t;
  typedef evaluate<eh2_t::return_type>::return_type ev_t;
  typedef ev_t::return_type return_type;
  template<class T>
  return_type doit(const T& t)
  {
    return ev_t::doit(eh2_t::doit(t));
  }
};
}
\end{verbatim}

That is to say, evaluation of Tmpz first converts the expression template
into the Fmpz equivalent (this does not incur any cost), then evaluates the
Fmpz, and then (using the implementation of assign not shown) wraps into
Tmpz again. The beauty of this approach is that the underlying type is
never leaked accidentally to the user, yet still all the optimizations for
FLINT expression templates apply.

\end{document}