Why are the word counts so wrong?

Published: 2020-02-02
Description: Spoilers: it's because I count some metadata too.
Word count: ~591

Sup'

I use my own homemade static site generator to create this website. A static site generator, in its most basic form (and the form that I've built mine around) turns Markdown files into disgusting HTML files.

Since my personal site generator is basically a nightmare-inducing shell script (you will never see it. Its too horrible for mortal eyes), I can use it to do some magic on the resulting HTML files to build index pages, give you those cool buttons on the bottom of the screen that go to the next/previous post, and summon Shub-Niggurath, black goat of the woods, mother of a thousand young.

Along with bringing forth the end of times, I recently made it so that the generator auto-adds word counts to the post. It also counts the total word count for the whole blog and puts it at the top of the index! I had to do that one, less the outer darkness seeps into my hard drive again.

Of course, since it's automatic, it's prone to being annoyingly precise even when you don't really want it to be. When a human says "word count" they mean how many words are in the thing, when a computer says "word count" it means the total number of white space separated character strings greater than zero characters long. As you can imagine, this definition catches things that aren't necessarily "words" in the human sense.

Furthermore, it also catches things that you wouldn't expect to be counted. The description of the post, the title of the post, and even the publication date. All of that is added to the post's word count.

For example:

This heading is seven "words" long

You may think I've forgotten how to count, but look at what the heading looks like while I'm writing it:

## This heading is seven "words" long

That ## situation tells my site generator (specifically the pandoc part of it, for all those nerds out there) that I want there to be a "size two" heading in that spot. This way it's a lot easier for me to type (and archive) my writings, since I don't need to deal with HTML tags.

(The markup language I use to write is the famous and easy to learn "Markdown". Look it up.)

Anyways, since my computer is a computer, it counts the ## as its own word, since it fills the criteria of "word" in its specific way of counting.

There's other examples like this. For example I have metadata tags in here that you never read that add superfluous word counts, and sometimes the monster from under my bed cuts the cheese, making my CPU cry from the smell and miscount a word or two.

That's why I list the word count as "(~XXX words)", since the tilde (~) means "about this many".

The longer a post is the less off the word count measurement will be. Since the previous test post thing is so short, that means the metadata shit stands out more in the count. Yes I'm keeping a test post there for the sole purpose of explaining this obscure word count bullshit.

"But why don't you make it so that the site generator only counts relevant words?" You ask.

Because fuck you, that's why. To be more precise: Fuck you.

Thanks for paying attention.

-Tim