Statically Rendering Maths | sapient's tips, takes, and transhumanism

So, in my last blog post I spent a lot of time explaining how maths is typically rendered on the internet and my issues with integrating it into this website (notably with my philosophy on when javascript should be present in static pages - that is, never), as well as being generally irritated. In it I explained roughly what KaTeX is and my ideas on how to integrate KaTeX into my site without running clientside JS while preserving all the nice features of Hugo (what my website is constructed with), some of my LaTeX vim preview plugins, and various other things.

Having successfully integrated KaTeX into my website build, then, I want to discuss the whole (pretty complex) experience, explaining how I did it. If you go to the git repository for my site and clone it, you can also run git diff f7c76a47 98d147c0 to directly see what code changes I made to actually make KaTeX work.

My Workflow and `build.sh`

Before we discuss how I integrated KaTeX into my website, we need to discuss exactly how I’m building that website in the first place. Of course, you can find the full thing on the git repository and examine code directly.

For those who don’t know, my site is built using hugo for a number of reasons (performance especially). But this is an oversimplification of how I build and test the site itself because when I build the site I don’t run hugo directly, but a wrapper bash script build.sh in the root directory of this project, which is where the real magic happens.

The first parts of build.sh are as follows:

#!/usr/bin/env bash
# http://redsymbol.net/articles/unofficial-bash-strict-mode/
# IFS works much better with bash arrays nya

# NOTE: IPNS simply does not work... so don't bother with it nya
set -euo pipefail
IFS=$'\n\t'

This is the so called bash strict mode, which provides some nice extra error checks. I include it at the start of all my bash scripts and it will be omitted from now on for clarity. Continuing…

source build-scripts/little-utils.sh
check_for_hugo_root

log "Ensuring KaTeX is appropriately prepared."
source build-scripts/npm-processing.sh
npm_install_components
npm_ensure_katex

This part of build.sh is where the first half of KaTeX configuration occurs, but for the purposes of understanding the build script, that is less important. What’s important is that a script called little-utils.sh is imported, which provides tools to log to stderr similar to how you can do in Rust, as well as a function that checks for the presence of a config.toml in the current directory (to ensure Hugo will work). The next part - after source build-scripts/npm-processing.sh:

Ensures that NPM exists on the system
Defines functions to install all components and to move the required CSS, JS, and font files into a static directory for accessing via the website and other build scripts. We will examine that imported file later when going over NPM stuff.

function postprocess_html {
    # -name before -type to avoid stat() call with -type nya.
    # Also see https://unix.stackexchange.com/questions/156008/is-it-possible-to-use-find-exec-sh-c-safely
    # for why we don't just inject the filename directly into the command nya.
    # the second sh is the command string ^.^
    local temp_parsed="$(mktemp --tmpdir 'tempfile.XXXXXXXXXXXXXXXXXXXX')"
    find ./public -name '*.html' -type f \
        -exec sh -c 'echo "Postprocessing " "$1" 1>&2; cat "$1" | ./build-scripts/maths-feedthrough.py 1>"$2"; cp "$2" "$1"' sh \{\} "$temp_parsed" \;
    rm "$temp_parsed"
}

The function definition in the next chunk of build.sh has a lot of long lines and is pretty horrific, but the function name should give an idea of what exactly this is for - HTML post-processing. This is where stuff that can’t be easily integrated directly into hugo gets done (for now, that’s only KaTeX, and it is where the conversion from LaTeX maths into HTML actually occurs).

Most of it is managing a temporary file (even if the argument is --tmpdir, that just tells mktemp to use /tmp as the directory to hold a file) - as it turns out that redirecting a file into itself - for example,

cat file1 | sed s/a/b/g | tee file2 > file1

will actually erase file1 entirely - presumably because it gets opened for writing by the last redirect the moment the pipeline is made.

The beef of the processing, however, is a find command that runs through every HTML file in the Hugo output directory, and calls the following inline mini-sh-script with $1=<path to HTML file> and $2=<path to temp file>¹.

echo "Postprocessing " "$1" 1>&2
cat "$1" | ./build-scripts/maths-feedthrough.py 1>"$2"
cp "$2" "$1"'

Effectively, it logs the file being processed then passes it through a filter - in this case, just build-scripts/maths-feedthrough.py - and rewrites it with the modified output. That python script - maths-feedthrough.py - is the actual command that finds maths in the HTML files and runs it through the KaTeX CLI.

The final part of build.sh and hence my workflow (before uploading to the various places I host the site) is the following chunk of bash script.

# Generate associated hugo html, check for username leakage, 
# (and also postprocess that html, before the checks)
function generate_hugo_html {
    # Construct the site in /public meow
    log 'Generating site in ./public with `hugo -D --minify`...'
    hugo --minify

    log "Entering HTML postprocessing step! nya!"
    postprocess_html

    # Attempt to find any accidental name leakage and quit if found nya
    if [[ $(grep -r "$USER" public | wc -l) -ne '0' ]]; then
        log "Found username in outputted files... INFOLEAK WARNING"
        log 'Printing `grep -r "$USER" public`'
        grep -r "$USER" public
        exit 2
    fi
}

generate_hugo_html

This bit, of course, runs Hugo itself along with postprocessing it’s output. It also performs a neat little security check to terminate violently in case my IRL name is accidentally leaked in either an article or due to some postprocessing step injecting a full directory path or something like that.

That’s the workflow. Now to go into how I started to integrate KaTeX, via figuring out npm.

Discovering NPM and pain

The first task at hand when integrating KaTeX into the project (and into any project, of course), is being able to actually get the damn thing into a consistent directory with a known location. For a long while, much exploration was had attempting to delve the depths of ~/.npm/node_modules/_npx/[some random git hash here]/katex and figure out why the fuck it was installed there after doing something like npm install katex or npm exec katex (I still don’t know exactly how it got there).

Then, there was the journey through ~/node_modules, in which I had no damn clue why anything would install in one of these places or the other and in fact how the hell NPM decided to install things anywhere at all. And I promise you, I looked in the docs (and had a nice discussion with one of my sysadmin friends which was worth any amount of NPM-induced pain), though probably not quite as thoroughly as I could have.

The answer - at least when running npm install - appears to be that NPM will search up parent directories until it finds one with a node_modules or package.json in it, and it treats that as the project directory (it may also halt arbitrarily at ~, but I don’t actually know). When it finds it though, it’s generally nice enough to create all of node_modules, package.json, and package-lock.json² if they don’t already exist. I discovered this by just making a node_modules folder in the root of my git repo as a “what the fuck why not maybe it’ll work”, and, well, it did!

So, now I had NPM actually working, it was time to automate the shit out of it. In particular this is where I wrote build-scripts/npm-processing.sh

#### Skipping Safe Mode Prelude
source build-scripts/little-utils.sh

# Allow for obtaining nonzero ret values without crashing nya.
set +e
which npm
NPMWHICH_RET=$?
set -e

if [[ ! $NPMWHICH_RET -eq 0 ]]; then
    log "No NPM found - this is a required dependency for katex preparsing"
    exit 1
fi

# Do an npm-install for the current "project" to get katex into a local `node_modules` nya
function npm_install_components {
    log "Running npm install to ensure the presence of katex - will create a node_modules folder if not present."
    npm install
}

# Ensure that appropriate katex files are copied over into assets/katex/ nya, or
# just present where needed.
function npm_ensure_katex {
    log "Ensuring existence of ./node_modules/.bin/katex"
    if [[ ! -e './node_modules/.bin/katex' ]]; then
        log "File not found."
        return 1
    fi
    log "This script uses the --reflink=auto option for space and time savings."

    log "KaTeX built and ready. Preparing to copy files over into assets/katex/ when necessary"
    mkdir -p "assets/katex"
    cp -p -v -u --reflink=auto "node_modules/katex/dist/katex.min.js" "assets/katex"
    cp -p -v -u --reflink=auto "node_modules/katex/dist/katex.min.css" "assets/katex"
    cp -p -v -R -u --reflink=auto "node_modules/katex/dist/contrib" "assets/katex"

    log "Preparing fonts to copy over to static-katex/fonts"
    mkdir -p "static-katex/fonts"
    cp -p -v -R -u --reflink=auto "node_modules/katex/dist/fonts" -T "static-katex/fonts"

    log "Done ensuring katex is set up."
}

As can be seen, this script attempts to run which npm and hence determine the presence of npm, before defining functions.

`npm_install_components`

The first defined function is essentially trivial. When running npm install katex, it automatically listed KaTeX as a dependency in package.json. Therefore running npm install will pull katex from the internet (if it needs an update) and build it if necessary³. The building-it part is important because that is what gives us the finalised files - fonts, CSS, <script>-importable JS and the commandline interface.

`npm_ensure_katex` and an exploration of the `node_modules/katex/dist` folder

When NPM builds a package, it looks through a number of things in that package, primarily package.json, which in a proper node package (rather than the essentially-nothing that my site’s is) contains a lot of information on stuff like:

An entry point for a CLI
A list of files
Dependencies

The sort of thing you’d expect, to be honest, but the important part is it outputs combined CSS, various bits of JavaScript, and other assets (think fonts) into a folder called dist. This folder, in fact, is where npm_ensure_katex pulls all it’s files from, and it contains important things like katex.min.js, katex.min.css, and similar combined files for extra modules like the auto-parser script most websites use and a chemistry addon in contrib/.

When the build is completed, NPM also symbolically links the executable component of the package into node_modules/.bin/<package-name> - this is the file that the function checks for at the start.

Our function, npm_ensure_katex, copies the main KaTeX JavaScript and CSS into assets/katex, as well as contrib modules too. This is because we need to be able to access them with the hugo resource fingerprinting system.

Fonts are thrown into static-katex, which is merged with static on build because my config.toml (hugo config file) contains the line staticdir = ["static", "static-katex"], marking static-katex as a second static directory. The fonts cannot be namespaced into a katex subfolder, as the KaTeX CSS code accesses them by path, so instead we at least make a separate folder to stop it polluting manually-placed static content.

The unusual cp command is something to note for performance’s sake:

-p causes preservation of mode and time data (and some other things)
-u only performs the copying if the target files are older than the source files
--reflink=auto is something awesome I discovered which will essentially avoid a copy if your filesystem supports copy-on-write semantics - it creates a reflink. I use BTRFS, so I get this benefit.

Once I had completed the NPM portions of the build process, I then moved on to the next stage - getting mathematics through to HTML without it being mangled by the markdown parser.

Hugo Shortcodes as a Mathematical Escape Hatch

As mentioned in the previous article , a major difficulty in including LaTeX-formatted mathematics in a hugo website is the conflict between LaTeX syntax (most starkly, _-based subscripting), and markdown syntax (where _ is usually used for some kind of emphasis). Getting around this is tricky if you want to preserve certain nice things.

The simplest solution in the end was to define two shortcodes taking a single positional string parameter containing the mathematics (either delimited by “$” or not, depending on preference). The shortcode m for inline maths, and dm for display maths. Implementing them, however, was more complex than it may seem.

Primarily, this is down to the fact that we still want to be able to use hugo serve to preview the site despite maths rendering normally occuring after the work that Hugo does.However, there is a solution based on the way that most site authors use KaTeX (the “normal” way) - where the maths processing occurs clientside by running katex.js in the browser with the autoload script. It is simply required that this only be added to pages when running in a hugo serve.

Hugo provides this functionality, with a simple {{ if .Site.IsServer }} to switch on whether or not the site is being built-and-run by hugo serve, which means that the solution is viable. The shortcodes themselves, at least, are still fairly simple - an implementation of “m” is shown below.

{{ $maths := (.Get 0) }}
{{ if (site.IsServer) }}
\beginMATHSmaths {{ $maths }} \endMATHSmaths
{{ else }}
<raw-maths>{{- $maths -}}</raw-maths> 
{{ end }}

In the case of running on a hugo serve, it provides some fairly unique strings to allow the KaTeX autorender to identify mathematics. Otherwise, it dumps them in invalid html tags.

The complexity comes in the template for <head>, because this is where the scripts are imported. First, we need to detect if mathematics has been used in a page - Hugo provides a fairly obscure function for detecting shortcode usage, and the conditional is as follows

{{ if (or (.HasShortcode "m") (.HasShortcode "dm")) }}

In this case, we then want to load up the JS for KaTeX in the case that we’re building for a hugo serve, with something like the following little chunk of HTML and JS - important to note is the conditional insertion of <script> tags as well as where the resources are being loaded from⁴:

{{ $katex_style := resources.Get "katex/katex.min.css" | resources.Fingerprint "sha512" }}
{{ $katex_js := resources.Get "katex/katex.min.js" | resources.Fingerprint "sha512" }}
{{ $katex_autoload_js := resources.Get "katex/contrib/auto-render.min.js" | resources.Fingerprint "sha512" }}

<link type="text/css" rel="stylesheet" href="{{- $katex_style.RelPermalink -}}" integrity="{{- $katex_style.Data.Integrity -}}"/>
{{ if .Site.IsServer }}
<script defer src="{{- $katex_js.RelPermalink -}}" integrity="{{- $katex_js.Data.Integrity -}}"></script>
<script defer src="{{- $katex_autoload_js.RelPermalink -}}" integrity="{{- $katex_autoload_js.Data.Integrity -}}"></script>
<script>
    document.addEventListener("DOMContentLoaded", function() {
        renderMathInElement(document.body, {
            // customised options
            // • auto-render specific keys, e.g.:
            // FROM: https://katex.org/docs/autorender.html
            // We use an obscure sequence of characters so it is impossible to *accidentally*
            // trigger the autorender nya. We also allow for a $ on the inside, so that 
            // adding one in the shortcode string (for markdown preview purposes) works all fine.
            delimiters: [
                {left: '\\beginDMATHSmaths $', right: '$ \\endDMATHSmaths', display: true},
                {left: '\\beginDMATHSmaths', right: '\\endDMATHSmaths', display: true},
                {left: '\\beginMATHSmaths $', right: '$ \\endMATHSmaths', display: false},
                {left: '\\beginMATHSmaths', right: '\\endMATHSmaths', display: false}
            ]
        })
    })
</script>
{{ else }}
<PARSE-FOR-MATHS-RENDERING-PLEASE-AND-THANK-YOU-NYAA>
{{ end }}

In that script:

"DOMContentLoaded" waits for the deferred scripts to load (I think)
renderMathInElement is a hook into the katex autorender script
The delimiters are a little funky because we also add dollar signs for the purposes of ease of editing. Note that it’s important that - of the delimiters with the same prefix - the longest comes first (otherwise the autorender script would match the shorter one and never check to see if more delimiter can be consumed).

When not running in a server, then, a fairly random HTML tag is inserted into <head> instead - this allows skipping any files with no maths in them to use less processing power. It’s important to note that Hugo converts all HTML tags to lowercase, though, so case-insensitive detection is a must.

Banishing the Javascript - `build-scripts/maths-feedthrough.py`

With this, then, we already have access to KaTeX when running hugo serve. We’ve essentially added it to the site the “normal” way such a thing is done - with the KaTeX autorender script. However, the holy grail of this task was to generate KaTeX fully statically, no clientside JS required.

The script used to do this is written in python (I avoid writing anything too complex in bash, and for shell-script stuff my preferred language is python 3) - it takes input on stdin and spits output to stdout (while logging to stderr).

It uses a fairly complex regex to locate mathematics in the page and capture it’s contents - the code for producing the regex is as following. This uses some of Python’s special regex features (which make this problem much simpler).

# Note that these matches *include* the tag - we want to get rid of it in substition 
# after all nya. *? is non-greedy *
INLINE_REGEX_COMPONENT=r"<raw-maths>(?P<inlinemaths>.*?)</raw-maths>"
DISPLAY_REGEX_COMPONENT=r"<raw-display-maths>(?P<displaymaths>.*?)</raw-display-maths>"
# Tags: Multiline, Dot matches all (including \n), Unicode matching nya.
MATHS_TAG_REGEX=r"(?msu)" + r"(?:" + INLINE_REGEX_COMPONENT + r")|(?:" + DISPLAY_REGEX_COMPONENT + r")"

Most of the regex is explained in the comments, but important to note is the exceptionally useful non-greedy multi-character matching which will prevent the regex from consuming any terminating tags in the .* expression as it would if the * was a greedy operation - this is the part most specific for reading data from within pairs of tags, as it makes not consuming the closing tag fairly easy.⁵

The second key part of the script is converting characters delimited to be HTML safe (things like < being encoded as <, amongst many other possibilities including arbitrary unicode codepoints), so that the maths is correctly interpreted. The step that does this also consumes any $ signs which are optional for the maths content.

Fortunately, python includes some HTML stuff in it’s standard library, making this code almost trivial with an import html:

def maths_preprocessing(raw_tagpair_content):
    """
    Turn raw tag content into something that ./node_modules/.bin/katex can consume nya

    Notably:
    * Unescape HTML characters
    * Then remove trailing and preceding whitespace and then $ signs if present meow.
    """
    return html.unescape(raw_tagpair_content).strip().strip('$')

The third key part, then, is of course calling into the KaTeX executable, which is also simple to do with slight variations depending on whether or not we’re making inline or display maths. The commands and arguments used are as follows:

KATEX_BIN_PATH="./node_modules/.bin/katex"
INLINE_KATEX_COMMAND=[KATEX_BIN_PATH, "-t"]
DISPLAY_KATEX_COMMAND=[KATEX_BIN_PATH, "-t", "-d"]

The first line is self-explanatory, and the second two parts provide two separate arguments to katex - first, is -t, which tells KaTeX not to crash when there are errors, instead producing HTML indicating the mistake. This is important as it means that a single maths error won’t kill the build system, amongst other things. The second, -d, tells katex to format the maths as display rather than inline (the default). katex does have more options, but they were unnecessary for the system I was building.

Actually calling into katex occurs in the following function, when passed the raw mathematics as input (a similar implementation exists for display maths):

# Subprocess module used here to instantiate the katex executable file
import subprocess, sys

def compile_inline_maths(raw_maths):
    """
    Run KaTeX with appropriate arguments to generate inline maths meow
    """
    log("Converting raw inline maths:\n{}".format(raw_maths))
    output_result = subprocess.run(
        INLINE_KATEX_COMMAND,
        check=True, 
        capture_output=True,
        input=raw_maths,
        text=True,
        encoding="utf-8"
    )
    log(output_result.stderr.strip())
    return output_result.stdout.strip()

This is fairly clear in purpose - the text and encoding stuff is required however, especially as KaTeX outputs UTF-8 for the purposes of using unicode. Without text=True and encoding="utf-8", it would be necessary to manually convert strings to and from raw byte sequences with .encode and .decode to provide input and obtain output. It also crashes if the subprocess spits out an errorcode (which is fine, as we provided -t earlier, so it will only happen with some seriously nasty weird error things).

So, we have a regex that can extract html-escaped LaTeX into a regex match object with named groups, we have a function that unescapes LaTeX, and we have another function that can take unescaped LaTeX and turn it into properly formatted HTML via calling out to katex.

The last thing we need is to turn a regex match object into rendered mathematics and perform a global match and substitute. Now that we’ve set up everything else, this is as simple as implementing two functions.

def generate_single_substitution(special_regex_match):
    """
    Function for turning regex match objects into maths meow.
    ?P<inlinemaths> should contain the still-htmlized-and-unstripped inline mathematics to 
    parse, if present.
    ?P<displaymaths> should contain the still-htmlized-and-unstripped display mathematics to
    parse, if present nya.
    """
    if special_regex_match.group("inlinemaths") is not None:
        clean_maths = maths_preprocessing(special_regex_match.group("inlinemaths"))
        return compile_inline_maths(clean_maths)
    if special_regex_match.group("displaymaths") is not None:
        clean_maths = maths_preprocessing(special_regex_match.group("displaymaths"))
        return compile_display_maths(clean_maths)
    log("Match without either inlinemaths or displaymaths group found??")
    return ""

def parse_and_write_maths(maths_section):
    """
    Parse all the maths in a section of a HTML document.
    """
    display_katex_version()
    log("Looking for maths using regex {}".format(MATHS_TAG_REGEX))
    maths_regex_pattern = re.compile(MATHS_TAG_REGEX)
    log("Inline maths KaTeX command: {}".format(" ".join(INLINE_KATEX_COMMAND)))
    log("Display maths KaTeX command: {}".format(" ".join(DISPLAY_KATEX_COMMAND)))
    log()
    return maths_regex_pattern.sub(generate_single_substitution, maths_section)

The first is simple - it just takes the value of the relevant named groups and processes it. The second function does something particularly interesting in that it uses a mildly obscure feature of re.sub or rather the equivalent method on a pattern object - in which, rather than passing a string as the substitution pattern, you pass a function taking a regex match object.

It is this that actually substitutes in the mathematics produced by katex, and is the final step in enabling fully static $\KaTeX$ processing. (I omitted filtering for the maths HTML tag string, but that is comparitively trivial).

Satisfying Results and Where I’m Going From Here

All in all, this project to render maths with no clientside JS was a fairly solid success even if it took a bit of struggle. I’m extremely happy with it, and I hope other people use what I have written in this article to do this for themselves - feel free to take the python script and use it in your own projects (it’s technically under CC0, at least the parts I wrote in this article).

Just as a little demonstration, here’s the time dependent Schrödinger Equation - $i\hbar{d \over dt}\ket{\Phi(t)} = \widehat{H}\ket{\Phi(t)}$ - and the nonrelativistic spinless Schrödinger equation of a particle in 3D space: $\def\wf{{\color{orange}\Phi(\vec{r}, t)}}i\hbar{\partial \over \partial t} \wf = \left[-{\hbar^2 \over 2m}\nabla^2 + V(\vec{r}, t)\right]\wf$

So, it obviously works well (and I’m very happy for it). However, there are still some flaws I want resolved and things I plan to do soon now I have a framework ready for easy modification to apply the improvements.

Shortcodes are still too long

The major immediate issue is that writing a shortcode (even with how tiny it really is) just to display maths is a massive annoyance and makes including small amounts of maths like $x = 4$ into the document exceptionally annoying, especially considering that a lot of the time the LaTeX does not conflict with markdown syntax (the original reason for adding a shortcode in the first place). As a result, I intend to modify the maths parsing and hugo templates to allow entering maths in the form of $...$⁣; (for inline) and $$...$$⁣; (for display) - note that these two expressions have an invisible unicode character in them to avoid triggering any future scanners.⁶ ⁷

Thankfully, the sequence of characters $; and are very rare, even in code, which is important because one of the major issues with doing this is that trying to write a regex over HTML that avoids matching inside <code> or <pre> tags is a really bad idea for the reason that trying to parse HTML with regex for any more than simple find/replace is a really bad idea. Unfortunately, a delimiter also is nonviable, because it would not work with the autorender script in previews.

Doing this requires a number of changes to the code I’m writing:

head.html needs to have another condition for adding maths to a document - namely using the hugo string functions to count the occurrences of $; in a page.
maths-feedthrough.py needs to have another regex matching the pair of single- and double-$ and extracting maths from inside. Trying to integrate it with the existing regex is extremely difficult because python does not allow duplicate named regex groups, so instead adding a second pass is essential. Furthermore, this new regex must avoid matching across HTML tags so, for instance, a dollar sign in one code block will not match up with a dollar and a semicolon in the next. This also should occur before unecaping so that valid < don’t break the maths.
head.html also needs to add $ and $; and $$ and into the $\KaTeX$ autoload script call as well.

As far as changes go, I’m basically going to start on this the moment I’ve finished with this post.

Lack of per-page or global KaTeX macro definitions

One of the side effects of independently calling into $\KaTeX$ for each equation is that some of the macro functionality integrated into $\KaTeX$ is lost. Luckily, $\KaTeX$ ’s CLI has parameters to inject macro definitions and load macro definitions from a file on each execution, which means a little poking should be able to enable macro functionality - perhaps with another shortcode for injecting macros into a macro file $\KaTeX$ is able to load, or something similar. The main issue is that if macros are defined as you go down the page, it would depend on certain ordering behaviour in the function being used to perform substitutions.

If it lacks that ordering behaviour, though, I’m ok with page-global macros obtained during another LaTeX pass if that’s the only option. Mildly less flexible, but I already almost never use LaTeX macros anyhow, so it is not a problem.

This isn’t really a pressing issue, though, so it’s not on my priority list really.

Lack of $\KaTeX$ Extensions

The $\KaTeX$ CLI existing is a wonderful thing, as far as I’m concerned. However, it does have one limitation - namely that it lacks access to any of the relevant extensions in node_modules/katex/dist/contrib. Notably, the chemistry extension, which I’m interested in using in the future.

Fortunately, the CLI code (in dist/cli.js and linked to node_modules/.bin/katex) is comparitively simple, but adding extensions to it implies modifying that code somehow (or writing my own CLI, maybe my node project will not be empty in future after all!), and that’s still a moderately intimidating task - enough that I don’t really want to have to mess around until I have to.

Fixing the lack of extensions will come at a later date unless I get bored and want to try doing it immediately, is what I’m saying. Regardless of current limitations, however, being able to statically compile LaTeX into HTML - both with a hugo serve and when running build.sh - is something I’m fairly proud of and I hope others use and replicate my efforts on that front.

Updates

2022-01-06

I fixed a major bug in which katex could not find it’s font files - since it turns out that those fonts are namespaced into /katex - in commit 4d76957. This bug was interfering with proper rendering of maths equations (things like the size of brackets were broken, for instance).

This eliminates the entire need for /static-katex in the hugo directory, as instead .gitignore can now simply ignore the subdirectory /static/katex. It also means that removing the staticdir=[...] line from config.toml is viable. It also entails a minor modification to the NPM build script to change the target directories of the file copy commands.

2022-01-07

Successfully added maths in $…$; for easier entry, in git commit cb2c61eb22ce1fd43b1816fa3e2e33e081e6c8d2. Also added a script to use python to directly host a local http server for ./public, for the purposes of diagnosing errors in the hugo serve live reload version of the site and for testing if it builds correctly.

The second ‘sh’ passed as the first argument after -c '....long command....' is entirely arbitrary. It essentially provides a value for the zeroth program argument ($0) which would normally be filled in with however the program was called. ↩︎
this is presumably like Cargo.lock in that it holds versions of dependencies until they are updated manually ↩︎
on another note, this means that my repository does not need to keep a whole copy of katex inside itself or as a git submodule ↩︎
Also note that Hugo includes a feature for calculating integrity hashes for the integrity attribute. You should always set this attribute - it ensures that once someone has the HTML document they know they’re getting correct (and presumably, non malicious) versions of CSS and JS resources that are loaded. If you can verify that the HTML is valid, it ensures that you are getting only the code that the original document writer intended. ↩︎
A couple of other very useful python regex features in use here are named groups - in the form of (?P<name> normal group stuff here) - and unnumbered/unnamed groups (?: normal group stuff here), the former being useful for both clarity and more flexible APIs not dependent upon group numbers staying the same when modifying a regex, and the latter for making precedence clearer and avoiding the pollution of numbered groups. ↩︎
kitty terminal makes adding that stuff so easy, just ctrl+shift+u - the character I added is Invisible Separator, U+2063 ↩︎
Adding the semicolon requirement adds another side benefit by massively simplifying the regex required to allow proper compatibility with $\KaTeX$ commands that need a change of mode (e.g. from textual mode to maths mode or matrix mode or anything else), because that is typically done with dollar signs. See the KaTeX documentation ↩︎

My Workflow and build.sh