Trying to Statically Render Maths | sapient's tips, takes, and transhumanism

So, I’m creating this website of mine, with weird and curious content of all types.

I’ve got a relatively clean (if script-modified to be less hardcoded) version of the wonderful simple.css and a nice system of CSS-only filtering of listing page contents. No CDNs or what have you involved, either - as long as whatever URL I am hosting this on is still up, it should remain possible to download the whole site cleanly and look at it in your browser as if it was on your own filesystem without running a single line of JavaScript.

Clean and efficient, most definitively (though some the hugo templates I’ve constructed could not be described as such). This isn’t to brag, of course - it’s quite simple to do yourself if you just use the right tooling to make static site generation simple. The CSS-based filtering is slightly more complex, but still comparably trivial.

Being the slightly weird person that I am with interests in reasoning with infinities and combining seemingly-unrelated abstractions, I want to do more. It shouldn’t be much more, I think to myself. The task does not impose any requirements for dynamic modification, either server or client side. So far, I’ve avoided any kind of dependency on complex, heavy frameworks of any kind - either for building my site (unless you count Hugo as heavy and complex, which, while I have issues with it, I would not consider this one of them) - or for the users of my site to download and run when loading a page (I am philosophically against running code to generate an already statically-defined document).

The particular “more” that I am interested in? Adding mathematics to my pages. A task I know to be possible, as, after all, I have gone on many ADHD-fuelled dives across the depths of Wikipedia to learn and consume information about science and mathematics - and read information pages from many a university. None of these pages impose the strict requirements of no-clientside-javascript-for-static-documents as I myself do, but at somepoint in the process chunks of HTML containing mathematics are generated from LaTeX maths.

So, I do some research. The two - to my understanding - primary methods of rendering maths into HTML are KaTeX and MathJax. These are very impressive projects and all of the people who worked on them are wonderful for doing so. The problem? They’re all JavaScript. The way many sites use them is to include a script and then create a configuration JS object to tell them in which elements they should parse maths, as well as usually including CSS and font files to enable correct display of the relevant HTML.

The latter two, of course, are fine for my own requirements. However, the client-side JavaScript? I consider that unacceptable for my task. Thankfully, both KaTeX and MathJax provide options for generating HTML from TeX mathematics. Unfortunately, they do so by using npm/nodejs. To my understanding, KaTeX is the only one of the two to provide a clean, simple command that converts maths into html from the commandline, but it still has the massive issue of being extremely tough to properly integrate, specifically when accounting for the other problems.

What other problems?, you may be asking. Well, dear reader, the other problems caused by conflicts between Hugo’s markdown parser syntaxes and LaTeX maths syntaxes, of course! Anyone even passingly familiar with LaTeX knows that a simple _-character (that is, an underscore) is used to indicate subscripts. Things like x₅ or aᵩbᵨc₊. And anyone who writes in most common forms of markdown knows that _ can be used to place emphasis or boldness onto a section of text (some variants use * and _ interchangeably and some use them to separate emphasis and boldness, but the point remains that _ is a semantically meaningful character in most Markdown).

Now, what does this mean? In practise, the first thing necessary is getting the maths unscathed from the generated HTML. Preprocessing maths within the markdown is very hacky because of the need to keep copies of the original, non-processed content (among other things).

In a plain HTML document, with mathematics delimited by single or double $-signs (the delimiter usually used with KaTeX and MathJax scripts, it is convenient and non-meaningful in HTML), this would be unnecessary, but in markdown? If there isn’t some way to get the LaTeX not to be treated as normal text, everything will be mangled.¹

A further problem emerges when you desire to use hugo serve for dynamic preview, though this one is fairly trivially solved with a conditional to insert the KaTeX javascript and modify via the standard, client-side-dynamic way.

Either way, one little $ sign and the associated maths functionality appears to make “clean and fully static sites” go cry in a corner without a lot more work (work that makes me sad) involved in the process of making it actually doable. Now, first to note, this would be much easier if you could make Hugo transparently pass $...$ and $$...$$ delimited text into the HTML output (though this causes issues with the Hugo security model, I think).

The second option I came up with is one of the cleaner ones. Use `-and-```-delimited <code>/<pre> HTML blocks and detect the dollar signs, which does work but provides the hassle of detecting which <code>/<pre> blocks are for mathematics and which aren’t. Furthermore, it interferes with the little LaTeX output preview I get with one of my vim plugins, which I much prefer to have than not.

And third - the option I intend to use to get maths through to the HTML - is a Hugo Shortcode. In particular, a very short shortcode, something like {{ < m "$x = 5^x$" />}} for inline maths and {{ < dm "$x = 5 \times \int _{1} ^\aleph {z^x} dz$" >}} for display maths (spaces to prevent parsing). This preserves the previews and ensures another extremely useful property - while the other two methods require manually specifying in the frontmatter the presence of maths to include things like CSS and HTML, Hugo provides a clean method of checking if a shortcode is used within a page.

Now I’ve sorted that first part out, the next real issue is calling KaTeX to process the mathematics into html - after it’s already been generated by Hugo. This part is where things are really messy, because KaTeX is an npm package, and this means that all the heavyweight of node and npm (though if I can get away with avoiding npm, I will) is involved. It means a whole new packaging ecosystem - not well integrated with the system packager either - and wrangling that to work in some post-processing scripting mess using npm exec, or similar.

And then, and then, syncing the associated KaTeX CSS and font files statically added to the pages from whatever system is used to download them, with the KaTeX javascript-file version that npm decides to use. What a mess indeed.

And so, a demonstration: $\sum_{k=0}^{\color{red}n-1} x^k = \frac{ x ^ {\color{red}n} - 1}{x - 1}$ , $\forall x \in \reals: x <= |x|$

Look ma, no JavaScript! $\sum_{k=0}^{n-1}{\color{green} x^k} = \frac{{\color{green} x}^{n - 1}}{x-1}$

How I did it!

There is another option - a very cursed one, which is to say don’t, based around reversing the relevant markdown transformations into HTML to extract the locations of underscores and other semantically relevant characters, but this has a number of issues - notably it’s fragility and breakage of markdown effect previews in the text editor. A similar preview-breaking issue can emerge from dollar signs as well, but is more easily counteracted if passthrough can occur. ↩︎