In this post: yet another code-deduplication trick with trade-offs regarding readability and documentation generation. Make of it what you will.

Many useful operations can be derived in terms of indexing functions:

module type Indexable1 = sig
  type 'a t

  val get    : 'a t -> int -> 'a
  val length : _ t -> int
end

module Foldable_of_indexable1 (I : Indexable1) : sig
  open I

  val iter      :        ('a -> unit) -> 'a t -> unit
  val iteri     : (int -> 'a -> bool) -> 'a t -> unit
  val fold_left : ('acc -> 'a -> 'acc) -> 'acc -> 'a t -> 'acc
  val exists    : ('a -> bool) -> 'a t -> bool
  val for_all   : ('a -> bool) -> 'a t -> bool
  val is_empty  : _ t -> bool
  (* ... *)
end

For many types, including array, the get-based definitions are identical to their hand-optimised equivalents (modulo functor application). We can imagine avoiding a lot of standard-library boilerplate – and potential for API inconsistency – by using many such functors 1. We'd end up defining exactly one iter function that suffices for all Indexable types:

let iter f t =
  for i = 0 to length t - 1 do
    f (get t i)
  done

All good so far. Now we reflect on the fact that string values are indexable containers of char values. Indeed, our definition of iter is exactly equal to the one in Stdlib.String.iter. Unfortunately, our Indexable1 module type can only describe parametric containers:

module _ : Indexable1 with type 'a t := string = Stdlib.String
Error: Signature mismatch:
       ...
       Values do not match:
         val get : t -> int -> char
       is not included in
         val get : t -> int -> 'a
       File "string.mli", line 52, characters 0-57: Actual declaration

The problem is that it's impossible to tell the type system something like

'a t = string \quad implies \quad 'a = char

as part of our substitution. Roughly speaking, this means that any indexable type with unboxed elements can't benefit from our Foldable_of_iterable1 definitions, even if their own definitions would be identical! Strings, bytes, unboxed arrays and unboxed vectors all suffer from this problem.

A bad solution

As ever, we could solve the problem with copy-paste: a new Indexable0 module type for non-parametric containers, and a new functor Foldable_of_indexable0 with exactly the same implementations as our previous one.

(* Non-parametric indexable types *)
module type Indexable0 = sig
  type t
  type elt

  val get    : t -> int -> elt
  val length : t -> int
end

module Foldable_of_indexable0 (I : Indexable0) : sig
  open I

  (* All with the same implementation as before... *)
  val iter      :        (elt -> unit) -> t -> unit
  val iteri     : (int -> elt -> bool) -> t -> unit
  val fold_left : ('acc -> elt -> 'acc) -> 'acc -> t -> 'acc
  val exists    : (elt -> bool) -> t -> bool
  val for_all   : (elt -> bool) -> t -> bool
  val is_empty  : t -> bool
  (* ... *)
end

This definition suffers from the dual problem when we try to apply it to parameterised containers like 'a array:

module _ : Indexable0 with type t := 'a array = Stdlib.Array
Error: The type variable 'a is unbound in this type declaration.

This time, we wanted to be able to say something like

elt = 'a \quad implies \quad t = 'a array \quad (where 'a is universally quantified),

which is even less meaningful than our previous attempt since now we're expecting substitution to introduce new equalities and quantifications. We really do need both Indexable0 and Indexable1 with this approach.

A better (?) solution

Interestingly, it's possible to generalise Indexable0 and Indexable1 with another layer of indirection by making elt a type operator:

module type IndexableN = sig
  type 'a t
  type 'a elt

  val get    : 'a t -> int -> 'a elt
  val length : _ t -> int
end

elt carries the type equalities needed for the Indexable1 case, without forbidding the non-parametric implementation needed for the Indexable0 case. Arrays can set 'a elt = 'a, and strings can set 'a elt = char. Indeed, we can do this in the general case:

(** [Indexable0] is a special-case of [IndexableN] *)
module Indexable0_to_N = functor
  (T : Indexable0) ->
  (T : IndexableN with type 'a t := T.t and type 'a elt := elt)

(** [Indexable1] is a special-case of [IndexableN] *)
module Indexable1_to_N = functor
  (T : Indexable1) ->
  (T : IndexableN with type 'a t := 'a T.t and type 'a elt := 'a)

Now we can define a single Foldable_of_indexableN functor (with exactly the same implementations as before), and it will work for polymorphic and monomorphic containers. Neat!

What did it cost?

At time of writing, this trick has a cost in documentation complexity: Odoc doesn't inline the type substitutions introduced by our IndexableX_to_N functors, so the documentation for our Array and String modules will appear more general than necessary. The new Odoc model fixes this issue with better support for type substitutions, meaning that such trickery doesn't need to impact your users.

One unavoidable limitation is in what sort of operations we can put in the Foldable_of_indexable functor. Suppose our initial functor included a sum : int t -> int following function:

val sum : int t -> int

sum relies on the fact that it's possible to put int values in the container, which is clearly not possible for strings. Naturally, the type system prevents us from violating parametricity in this way:

module type Summable =
  type 'a t
  val sum : int t -> int
end

module _ (T : Summable) = (T : Summable with type 'a t := string)
Error: Values do not match:
         val sum : int t -> int
       is not included in
         val sum : string -> int

To state the obvious, we can't rely on parametricity in our implementations if we want them to work on non-parametric containers.

Post-script: Haskell suffers too

The typeclasses in Haskell's base have the same "polymorphic-instances-only" property as our Indexable1 signature (unsurprising, since Base doesn't provide any unboxed container types).

class Indexable1 f where        -- Polymorphic instances only
  get    :: f a -> Int -> a
  length :: f a -> Int

A similar trick can be performed there to generalise the typeclass instances for monomorphic containers like Text:

{-# LANGUAGE TypeFamilies #-}

type family   Elt container     -- Relate containers to their element type
type instance Elt [a]  = a
type instance Elt Text = Char

class IndexableN c where
  get    :: c -> Int -> Elt c
  length :: c -> Int

instance IndexableN [a] where   -- Polymorphic instance
  get    = (!!)
  length = Prelude.length

instance IndexableN Text where  -- Monomorphic instance
  get    = Text.index
  length = Text.length

As in the OCaml version, we use an Elt type operator to carry the equality needed for the monomorphic case. This time we used type families to specify the relations explicitly, but we could have used multi-parameter type classes for something more akin to the OCaml functor implementation. See the mono-traversable package for more of this sort of trickery in Haskell.


  1. This is the approach taken by Jane Street's base, and is very similar to the Haskell notion of building standard libraries from type-class instances.