Implementing Unicode Entry for All Programs With kitty's Unicode Kitten

So, kitty is a wonderful terminal emulator that is very keyboard driven. I use it for lots of things, even stuff that other people use a GUI for such as pulseaudio management (via pulsemixer, specifically).

This is because it’s fast, uncluttered, and keyboard driven, with some very nice features (such as allowing me to use powerline symbols without having to mess around with font patching, because kitty allows selecting different fonts for different unicode codepoints).

One particularly nice feature of kitty is the unicode entry kitten (unicode_input), which can be invoked by kitty +kitten unicode_input, or in practical usage, with ctrl+shift+u. For those who don’t know, kitty kittens are essentially mini commandline programs that can be used to script kitty with many of it’s added features (like, for instance, inserting a character).

The kitty unicode_input kitten in particular is extremely nice. It has searchable characters, favourite characters, emoji, and a clean, minimalist interface where the unicode insertion is just one keypress away once you’ve found the character - simply enter.

However, it can only be used within a kitty terminal, which means I don’t get to use this very nice niceness anywhere else, instead having to do janky copy pasting. With how I use unicode characters, it is simply the cleanest interface - even kcharselect is far more clunky due to the weird clipboard and multi-character entry stuff.

At least that’s how it was before today! I figured out a clean method of injecting arbitrarily customised text into the input of your programs with a simple keybind, and I’m very happy with it, because now I can enter arbitrary unicode using my preferred unicode entry method as long as I can get that text into a file and wait for a process to finish.

Hopefully this will enable some advancement and simplification of entry for many of us with programs that do not support more complex input mode editor frameworks like ibus - many of which require full desktop environments, and require an app to be using a popular framework like Qt or GTK.

The method I have created does not require any support from any program other than a way to write raw textual input to the active window (whether X11 or wayland, though I am still on X11 so I can’t speak to the tooling on wayland), and presumably some way to set up a keybinding to initiate the alternate entry method.

It also only requires the ability to spawn a process and retrieve it’s PID in the case of an asynchronous framework for running the commands rather than delegating down to a bash script that itself is run asynchronously, as is often desirable if you want to modify window parameters of windowed input collector commands that are managed by your x11 window manager or wayland compositor.

Creating a temporary file

First things first, we need to create a temporary file to store the intermediary unicode output in. This is important as it means that - as is in many cases is a requirement - you do not depend on stdout for communication. It also allows for truly interesting cases where you edit big chunks of input or compose them fully before passing them through to the active program.

The act of creating a tmpfile that can only be read or written to by the current user is simple with the mktemp command, which outputs the temporary filename on the commandline.

I have the following function in my awesomewm config, that lets me wait for the mktemp process to finish asynchronously and call an arbitrary function with the resulting filename.

local actions = {}

-- Make a temporary file securely using the `mktemp` command nya
--
-- Call a callback with the temporary file filename as the only parameter
function actions.make_temporary_file(callback)
    awful.spawn.easy_async("mktemp", function (stdout, _stderr, exitreason, exitcode) 
        callback(stdout)
    end)
end

In an sh script, you can synchronously take the output of mktemp - this is viable if you call the entire script asynchronously - with awesomewm you should NEVER synchronously use output because it blocks the WM, so we have to do this instead.

TEMP_FILE=$(mktemp);

 Note on the case of no intrinsic asynchrony in how you execute things

In fact, since we need the temporary file to store output in the case of asynchronous execution of arbitrary commands in the first place (if they do not or cannot output to stdout), there is no (clean) way to do this bootstrapping step in an asynchronous way unless you have a ‘built-in’ way to process the output with a callback like awesomewm does.

Next, we need to run the unicode entry command such that it outputs the entered input into the temporary file generated before. The bulk inner command must be synchronous - as we are going to wait on it’s termination - but the command itself can be asynchronous, as long as you can obtain the PID.

Note that if you are just doing it in a script, you can probably skip the asynchrony and PID stuff entirely, and synchronously wait for the output. But the asynchrony is extremely useful in the case of window managers or desktop environments, as many of the APIs for configuring windows are asynchronous (at least in the case of awesomewm), and they do not enable callbacks on close (for a reason we’ll get to in a sec).

For me, my unicode entry command is the following, inside a callback to actions.make_temporary_file, with the temporary filepath being tmpfile:

pid, _ = awful.spawn({
    "kitty",
    "--class", "cli_popup",
    "--single-instance",
    "--override", "remember_window_size=no",
    "--title", 'Single Character Unicode Entry',
    "bash", "-c", 'kitty +kitten unicode_input >' .. tmpfile ..';'
}, {
    width = 640,
    height = 480,
    -- We need to set this again here since the resize seems to happen after
    -- placement? nya
    placement = awful.placement.centered
})

Importantly, here, it retrieves the process ID even as the process itself is asynchronous.

The command arguments are there for the following reason:

--class cli_popup is something which tags the window in such a way that awesomewm makes it not tiling and such.
--single-instance means that if I ever set up kitty single-instance (for optimisation), this should make this much more performant.
--override remember_window_size=no is an attempt to prevent some weird alignment stuff, that seems to sort of work so far.

The command it runs is simple, it just outputs the selected unicode character (if any) to the temporary file and quits, and the table of window modification rules after it prevents the kitty window from eating the whole screen, and centres it, which is the main reason I used this command.

Given that there is no (clean, anyhow) way to output to the stdout of the outer kitty window, however, this use of temporary files is necessary regardless of the methods used.

In a bash script, you can do something pretty much the same as this, for some arbitrary command that eventually writes to the given temporary file.

<YOUR_ENTRY_COMMAND_HERE> &
PROCESS_ID=$!

However, if you are running the whole script asynchronously, you can just run the command synchronously anyway with…

<YOUR_ENTRY_COMMAND_HERE>

Either way, in the asynchronous case, there is then an interesting problem for anyone who is not just calling a bash script for the entire operation in the background - namely, how do you wait for the PID to terminate.

In bash, you can use wait -f $PID, but in the case of more general runtimes, even if you call out to the bash shell you can’t do this because wait -f $PID only works for child processes of the original bash shell.

Very annoying, and it turns out implementing “wait for this process to close”, when the process is not a child process, in a crossplatform and asynchronous way is atrociously annoying and difficult - this is why I think it is not common in asynchronous environments to add this feature.

We can see by the answers in this stackoverflow question that it is, well, really janky.

The method I used to solve this - and it is really weird, to say the least, because for some reason it’s a GNU extension on the tail command (seriously, what the fuck has that got to do with waiting on PIDs?) - is in the following chunk of lua:

awful.spawn({
    "bash",
    "-c",
    "tail -s 0.05 --pid=" .. pid .. " -f /dev/null;"
     .. "xdotool type \"$(cat " .. tmpfile .. ")\";"
     .. "rm " .. tmpfile .. ";"
})

In particular, tail -s 0.05 --pid=$PID -f /dev/null, which polls every 50ms for new data on /dev/null, and terminates if the process with PID $PID dies (the --pid option only works when -f is present).

We passed it /dev/null, which of course is empty, so all this tail command ends up doing is checking the status of the process with ID $PID every 50ms - the default poll rate is 1 second which is really slow for character entry (as we only enter characters after the process is finished), so it’s worth changing that.

Once the process is terminated - however you happen to have determined this is the case - you simply feed the contents of your temporary file into xdotool type for X11 (which performs all necessary actions including temporary key remapping to input the passed string), or an equivalent tool for wayland like wtype (this one is actually likely to be way less buggy).

Conclusion

This process is pretty much generic to any method of inputting unicode characters that you can construct to write to a temporary file. This is regardless of application compatibility with some kind of IMF or input method editor, and hence can be used for any graphical application, as long as you can set up a keybinding to enter the alternate input mode.

Of course, for some cases, it may not be viable, but I suspect you could hijack these IMFs or IMEs to use this method for less framework-dependent input capabilities.

Either way, it was very difficult to find straightforward instructions on how to do this, so hopefully this blogpost is helpful.

 Follow-Up Idea

This whole process gave me an interesting idea for an alternative to the whole IME/IMF means of unconventional unicode character input, that does not depend on cooperation from applications.

Instead of having every different framework hook into the IMF/IME on the user bus, create a unix socket per-login-session or per-seat, with some kind of basic protocol, which has a daemon that runs xdotool or wtype as appropriate.

Then you can design custom tools - in the style of IME prompts - that can provide arbitrary unicode input to arbitrary programs instead of just those that cooperate with the IME/IMF protocol, simply by sending text into that socket.

Perhaps using an environment variable in each session/seat that encodes the address of the unicode key injector socket.

This actually seems sufficiently simple that I might do it as a side project, or perhaps someone else will, but it seems like a good idea - maybe even provide some kind of translation layer to allow already existing IMEs to feed into it.

Though if you have a continuously-open IME, there may be some minor “active window” problems that must be solved, though it could be avoided by applications doing all the input in one go.

Food for thought I suppose.