More on Hakyll

Posted on September 7, 2013

On Personal Websites

I have had a website running, off and on, since the mid-90s. At first, it was a simple collection of HTML files with a few images. Web technologies developed rapidly, though, and it became time-consuming to do anything interesting with my sites.

I’ve used a number of “simplifying” tools over the years, but most of them relied on dynamically generating pages from a SQL database of some sort, and I just don’t want to deal with that right now. I want simple markup again, and I want it to be served directly and efficiently by any old plain web server.

My Basic Requirements

HTML is no longer really a markup language, it’s more of a structured document modeling language. I don’t want to compose text documents in code, I want to write them naturally in my native language. A few stylistic clues in my text formatting should suffice for specifying the structure of these simple text documents.

So, Markdown provides exactly the sort of markup language I want, but a web site requires more structure. Exactly the structure provided by HTML, in fact. So, Pandoc converts Markdown to HTML (among many other formats), which provides the core functionality I need.

But there’s more– a bunch of HTML files don’t quite make a cohesive site. They need some common additional structure, such as headers and footers. They need a navigation structure to link them together. And it would be nice to be able to generate content syndication feeds as well.

What I need is a system that allows me to compose simple text files written in a text-based markup language together with some HTML templates into a static set of interlinked pure-HTML pages, along with some other resources, whenever I write a new document.

Enter Hakyll

The Hakyll system provides a flexible language for describing just that sort of system. It is essentially a static site compiler built from a set of rules for transforming source documents to output documents.

This code snippet shows the rule for compiling top-level pages of my site from Markdown to HTML:

    match "*.markdown" $ do
        route   $ setExtension "html"
        compile $ pandocCompiler
            >>= loadAndApplyTemplate "templates/default.html" defaultContext
            >>= relativizeUrls

The code is a form of Embedded Domain Specific Language (EDSL) that is both a legal Haskell expression as well as a succinct description of what I want Hakyll to do for this particular case.

The first line describes what files this rule should apply to. It’s expressed in a format similar to UNIX filename glob format. This one says that it applies to all files at the top level of the site source directory that have names ending in ".markdown".

Digressions on Haskell

For those not familiar with Haskell, the $ character is a right-associative operator that denotes function application– the left-operand is the function, the right-operand is the argument. This is in addition to the normal way of expressing function application, which is a left-associative operation denoted by juxtaposing the operands.

Normal function application has a very high precedence in the Haskell grammar, so the $ form is often used in place of parentheses to allow a secondary function application to provide the argument to a primary function application.

With that digression out of the way, the second line can be seen as a nested function application– the route function is passed the setExtension "html" function. As another digression, there are two interesting things to say about this nested application:

  1. The function application setExtension "html" evaluates to a value that is itself a function– the function that takes a filename, possibly with some extension already, and produces a new filename with the extension "html" instead. So setExtensions is a higher-order function in two ways, because it is used as an argument to another function and also because it returns a function as its result.

  2. The arguments to Haskell functions are not necessarily evaluated before the functions themselves. So if the rule on line 1 never matched any files, the setExtension "html" expression would never need to be evaluated. If the rule found multiple matches, however, the expression would evaluate only once to the function that sets filename extensions to "html".

Regardless of the language mechanics behind the scene, the effect of the second line is to ensure that when the rule completes, the resulting file will have the "html" extension rather than the "markdown" extension it started with.

Back to the Example

The third line starts the expression that does the bulk of the work. It calls the compile function, and specifies the compiler as the pandocCompiler function with its output piped through a couple of post-processors, loadAndApplyTemplates and relativizeUrls.

The pandocCompiler is built in to Hakyll, and it links to the Pandoc markup processor mentioned earlier. In default form, as it’s used here, it translates Markdown documents to HTML documents.

As the name implies, loadAndApplyTemplates applies the template we give it along with a Context, which is a data structure that describes the mappings between template variables and the values they should expand to. We use the default one, which provides template variables such as "title", "url", and "body" to the template based on the values from the item that’s being processed.

Finally, relativizeUrls will find all the links in the document and change them from absolute URL form, e.g. "/foo.html"; to relative URL form, e.g. "foo.html". This allows us to have absolute URLs for syndication feeds, but relative URLs for the site itself.

This example covered only one of eight rules I’m currently using, but hopefully it gives an idea of how simple and flexible the rule-based system is.

Hakyll’s Rule Processing

Like the make language, Hakyll’s rule processor keeps track of the relationships between source files and build products, and it only runs rules for which the input files are newer than the existing build products. If you just add a new bit of content, such as a blog entry, only a few rules may need to run again. On the other hand, changing a core template may require rebuilding most of the site!

Full rebuilds are one of the areas in which Hakyll really shines, though. Since its rules are a language embedded within Haskell, a Hakyll site builder is a compiled and optimized binary program whose single purpose is to rebuild your site as quickly as possible.

By default, it stores all its intermediate work on disk, but if you have the memory to work with, it can also keep all its intermediate work in memory, which makes it even faster. For my own site, which only has a few files so far, rebuilding the entire thing is nearly instant even on an old 32-bit PC, so I haven’t bothered with any optimization.