Blog tag refactor: splitting 'tags' and 'tech tags'

My current collection of blog tags is a complete cesspool, where some framework-based tags such as #git, #jekyll, or #wasm feel like clutter to collect onto the “Tags” page because they only refer to 1-2 posts.

I tried deleting less-frequent tags to make the tags page easier to look at, but felt like I was losing valuable information without making this page look any more comprehensible.

What I wanted was not actually less metadata, but just a more coherent way to look at it – without doing any manual edits to my Jekyll tag pages, which are currently autogenerated.

My current post metadata structure looks something like this, where tags is an array of strings that is later fed into the templating engine.

Tech stacks ≠ concepts

In my personal opinion, descriptions of tech stacks look like meaningless word salad if you aren’t specifically familiar with the frameworks and languages that make them up (and often, even, when you are familiar with them), so they’re not actually that helpful to scan on a tag page. But they are very helpful when you have a specific framework or language in mind, and want to find more posts on that topic.

There are other kinds of tags I find more useful to scan from an “all tags” page, and generally think about as totally compartmentalized from tech stacks. These tend to describe the flavor or mindset of the blog post.

With this in mind, I decided to refactor my “tags” into:

  1. tech tags” for terms I might use to describe a specific project’s stack, such as names of frameworks (#git, #jekyll, #wasm), names of programming languages (#clojure, #rust, #python), or other concrete categories of technologies (#webcam, #web, #firmware, #shaders).
  2. tags” for more conceptual terms, such as blog post flavors (#hot-tips, #tutorial), “soft” topics of discussion (#industry, #workflow, #own-your-data), or fields (#distributed-systems, #scientific-computing).

Quick implementation annoyance, resolved by realizing tags are treated as magic variables in Jekyll

I added a separate array techTags to this post and tried to smash together tags and techTags using the concat filter in Liquid, Jekyll’s templating language. (Liquid is a templating language built by Shopify that can be frustrating to debug at the data interface due to the inherent hackiness and lack of introspection functionality in templating languages.)

Mysteriously, tags and techTags appeared to behave differently, with tags actings as a proper array and techTags seeming to act like a string, even though I was describing them in the exact same format in the front matter.

I eventually read the Jekyll documentation and realized that tags are handled as a special case in the front matter. I had mistakenly assumed Jekyll was parsing all array-like front-matter variable values to arrays, when in reality it was probably singling out tags.

I explicitly parsed the value of techTags to an array, and then just had to refactor some of my automated tag compilation work from way back when I first implemented tags on this blog.

The two types of tags are mostly treated similarly

techTags are still compiled under the tag/ directory/URL, use the same layout, and are smashed together with tags at the top of the post. As a wacky bonus, the way I implemented it also happens to accommodate backwards compatibility. This means that posts won’t get lost from their tag page even if I commit the unforgivable sin of flubbing which tag type my tag is.

Manual refactor

I went back through all of my blog posts and did the manual work of deciding which tags were techTags. This was a deliberate side effect of this refactor in that I wanted to re-brain-index my posts’ concepts separately from their tech stacks.

Much more browseable now!

The compartmentalization feels comprehensible enough that I even went to the luxury of throwing in a bunch of tech tags I would previously have been really stingy about cluttering the tags page with, like #soldering and #regex!