The great website refactoring of 2020

Published: 2020-08-08
Description: Tim updates the site to resounding apathy.
Word count: ~2882

Good golly jesus fuck that was a lot of work.

I did a major refactoring of how this website works, mostly to have it be more future proof. The main goal here was to change how the URLs work.

From:

https://www.timtimestim.com/blog/tinnitus-the-game-and-pink-elephants/

To:

https://www.timtimestim.com/b/tinnitus.html

This change is obviously backwards compatible with all the old links, since I'm sort of obsessed with backwards compatibility and it would be absurd to remove the links.

Furthermore, you can change the .html extension on a page to .txt to get the plaintext version of that page:

https://www.timtimestim.com/w/ideas.html

To:

https://www.timtimestim.com/w/ideas.txt

(Also works with directory names by using "index.txt": /w/index.txt)

(Works well with command line programs like curl, if that's your style.)

If you just wanted to know whats going on, than you can leave now. If you want to learn my reasoning, than read on.

URLs or: how I learned to worry a lot more than necessary about minutia

Quick summary: I'm intending this website to last into the long long future. For at least as long as I'm alive and cognizant, which if we're going by the historical average will be about 40-60 more years (or much longer if I can help it).

This is much longer than the average website is made to last for, and so it requires different design than the average website; design made to last until the year 2060 or beyond.

I didn't have this intent when I started the site, and that's where the problems began.

You see, I'm kind of a novice when it comes to websites and servers. I didn't know about rsync until a couple weeks ago, for example. So I didn't really understand that a web server is basically something that points to different files and folders on a filesystem. URLs are just file hierarchies; https://www.timtimestim.com/b/cc0.html says (basically) to load the "cc0.html" file from the "/b/" directory on the timtimestim server.

When a browser is told to load this resource it uses black magic and puppy love to realize that it's an html file, and renders it as a web page. If it was a png file it would load the image. Same with pdf and all that. Browsers are mostly there to render css and html, but a web server can technically serve any kind of file.

But as you might have realized, I didn't really conceptualize this. The web server wasn't something for me to cooperate with, it was something to beat into the form that I wanted. You see, when I started, I didn't want to have the .html file extension that you see on web pages. I didn't want /blog/incipient-insipid-instigation.html, I wanted /blog/incipient-insipid-instigation.

So I copied a shady ass .htaccess config onto the server that stripped the html. That worked for a time.

Later I wanted to remove my reliance on .htaccess, so I employed a weird trick. Basically, when a browser is told to load a directory (like https://www.timtimestim.com/b/) it will look for and load any index.html file under that directory. So the previous link will actually load https://www.timtimestim.com/b/index.html, but without the extra "index.html" cruft at the end.

You can probably see where this is going. The next obvious move to remove all the .html extensions would be to have every page as a directory that loads its own index.html. So /blog/cc0.html would turn into /blog/cc0/index.html, where every call to that page would be made like /blog/cc0/ (with the trailing /). I had a little script that did this all automatically for me.

If you look around, there's a lot of websites that use this strategy. They might do it with an index.php or similar, though.

The problem is that this feels like a dirty hack, and dirty hacks aren't "long term" in my eyes. We could debate night and day about how future proof exploiting this quirk in browsers is, but the honest truth is that I felt dirty using it and wanted to stop. Every time I saw a website pointing to .html pages I would think, "I should do that..." and every time I added a new page to my own site there was a nagging disappointment that I chose such a weird hack to be the backbone of my url structure.

So now we're here. Glorious html extensions. You can see that I've used simple 301 redirects to point from the old pages to the new ones, so there should be no loss in functionality. And even if there was some insane situation where 301 redirects stop working, all the old pages are still in the server with clickable links that take you to the new pages. Bask in its redundant backwards compatibility.

But that's not all I changed. I had the opportunity to fix some of my other mistakes in the process.

URLs or: how I learned to worry about minimalism and file extensions

The hardest problem in computer science is naming thingsand off by one errors. So it makes sense that I'd try to make some sort of guideline for naming the URLs.

The basic idea is that I want URLs that aren't awful. I mean, that's kind of the goal with anything anyone ever does or makes, but what's more important is what "awful" means to me in the hyper specific context of URLs.

The world is on fire and everything is exploding and I'm here talking about designing website URLs. What a lovely life I live.

Anyways, here we are with an example of a URL that I don't like:

https://www.example.com/news/portal/alan-smithee/2019/10/3266-on-the-life-and-death-of-death

In this example, "www.example.com" is a news website reporting on some new medical technology or something. Whatever. The point is that the URL is 5 directories deep when it doesn't need to be, overly long, has some weird ID number in it, and is just generally ugly as balls. I would rewrite this URL as:

https://www.example.com/death.html

www.example.com is a news site, so it doesn't make any sense to put the news in a "/news" directory, since we already know that it's all going to be news anyways. I have some contentious beliefs about leaving in the .html extension, but that doesn't mean "https:www.example.com/death" wouldn't work just as well.

"But the first level of your directory is important!" You say, mtn. dew spewing out of your mouth and seeping into the very fabric of your keyboard, "What if someone else wants to use the death.html slug?"

Yeah. Your the top level of your site is important. I totally agree. That's why you should put the news stories of your news site on them. It's not like you can't reserve certain critical words like "about.html" for future use.

I'm being a hard ass about this, but that makes sense considering I'm presenting the most extreme example first to anchor your anticipation of my actual proposal. So here's some sanity:

Obviously, this extremely minimal URL structure has enormous issues with any publication that makes more than one post a month. "death.html" could refer to the death of a celebrity, or the death of a nation, or the death of a fucking hamster. Not only that, but if you have a team of people making stories than the global namespace will fill up very quickly; soon you'll get slugs like "alansmitheedeath.html" just so there aren't naming conflicts.

I'm of the mind that most news publications don't actually publish anything worth reading the vast majority of the time, and if they stuck with only important stuff they wouldn't have namespace issues, but whatever. Depending on the size of your company and your future plans, a URL structure like:

https://www.example.com/2019/death.html

Would probably work pretty well. The namespace would refresh every year as the year value increments, while still keeping it minimal and effective. Some people would even argue that it's really useful to have the year of publication in the URL, but I don't agree as much. You should always have the publication date of an article at the very top of your page anyways, and your page should load fast enough that clicking on the link to check the page isn't arduous and annoying.

Just compare these two domain names, and tell me which one is better:

https://www.example.com/news/portal/alan-smithee/2019/10/3266-on-the-life-and-death-of-death

https://www.example.com/2019/death.html

They convey nearly the same amount of information, but one of them is much shorter and effective at dividing up the namespace. This particular structure scales to nearly any size and time scale, since every year you get a fresh "year" directory to put new posts in.

But that's for a news publication site. www.timtimestim.com isn't a news publication site; at least, it isn't right now. Let's list some of the design goals of the timtimestim URL structure again:

The intent of this site is to be a collection of my entire life's work. When designing the URLs I have to keep in mind that I'll be adding things to this site 20 years from now. I can't really predict how I'll think in 20 years, so I have to leave room to change things as I see fit. If I somehow do end up wanting to have a news publication part of the site, than I need to allow for that to happen with minimal fuss.

For this reasoning, I've chosen a small hierarchical structure, instead of putting everything in a flat domain space. What do I mean? Well, let's take a look at the URL for the wiki page on what things I use:

https://www.timtimestim.com/w/using.html

Now, I debated long and hard about having all my URLs be something like:

https://www.timtimestim.com/using.html

(Notice the lack of the /w/ directory)

But I decided against it for the aforementioned longevity reasons. Simply put, it would be too difficult for me to differentiate between what is part of the wiki or the blog or my fiction. The problem would get even worse if I decide to add other things, like an art page or something else I can't even imagine at the moment. I just have too much uncertainty behind the kinds of things I'll want to put on the site to really justify anything less.

As for why I'm using single letter directory names, that's three fold. First off, I wanted to preserve the old pages, which means that their names are already reserved and I needed new ones. Secondly, I wanted to keep it minimal, and this is one of the ways to do it. Thirdness, they can be ambiguous, allowing for more future freedom; if I find I want to make a new namespace for bananas, which would obviously be the /b/ that's already taken by the blog, than I could still have a new /k/ directory that makes sense (because bananas are a wonderful source of potassium).

This design has a bit of an issue in that it looks a bit ugly at times. Kind of like the line noise of a regular expression. I mean, just look at this:

https://www.timtimestim.com/b/cc0.html

I consider this a price to be payed in the name of pragmatism. I could prettify the URL using some tricks into:

https://timtimestim.com/b/cc0

But then I would have to support that URL into the far future. And honestly, I feel like the way I've done it gives me much more freedom and stability in the long term, since I don't need to rely on weird quirks of apache or some host to have minimally acceptable functionality. Sticking with the defaults as much as possible means that your setup will be supported the longest.

There are some other, much more extreme, situations I've prepared for.

For example, let's assume that the internet transitions away from html. This is unlikely, but it's still possible within the next 60 years. First off, let's hope there's some legacy html support out there for the... Trillions? Hundreds of trillions? Of html web pages out there, so it probably wouldn't be that big of a deal in the short term (people still use MS Dos programs for things).

But what about the old .html links? Wouldn't those links be "rotted" if html becomes depreciated? I mean, I could serve a different file format while keeping the .html extension, but that's a dirty hack and there's a more elegant solution.

Remember that I can serve different files with the same content. If about.html becomes depreciated, I can just add a bit of code to my static site generator to make an about.tttl or whatever the new hotness will be called. All the links to the html from the old html internet will still be there, and you can link to the same content with the new format if you need to.

The point is that having a file extension allows me to be explicit in what type of content I'm serving, and the type of interpreter that you need to understand it.

Another reason is that I'm already doing something like this! Remember in the first part of this post, where I talked about having plaintext files for every html page? Go ahead, try it; go to your URL bar and change "refactor.html" into "refactor.txt". I'll still be here, promise.

I'm actually really really excited about this idea. Mostly because you can do things like...

curl -s https://www.timtimestim.com/b/refactor.txt | fold -s | less

...To read this blog post in plaintext on your command line. And since it's just as simple as changing ".html" to ".txt", you can reliably get any html page on this site in its original markup.

That's pretty sweet. I'm sure you could do other cool stuff with it too, if you needed. That's the awesomeness of plaintext.

You might notice that I've written this post (and all my pages) in markdown, and yet I use the .txt extension as opposed to the .md one. This is because I don't know if I'll be using markdown in the future, so choosing the generic "this is plaintext" file extension seems like the safest long term bet, even if it's a little confusing.

Also: I make absolutly no guarentees that the "API" of this plaintext shit will be stable. None besides the fact that the files will exist and not change names. I can, and will, change how its content is formatted on a whim. If you need something stable, than just contact me and we can work something out.

Also 2.0: You might be tempted to download this website through this plaintext thing. Please don't. It consumes bandwidth that someone besides myself could pay for. Instead, I've provided an easy way to download my website should you ever feel the need to.

I feel like such a nerd for being so excited about this idea, but fuck it. I can be a nerd if I want. It's my website, after all ;).

URLs or: how I learned to worry about other minor changes

Here's a rapid fire list of other small changes I made in the process:

URLs or: how I learned to stop worrying and give in to the siren call of madness

I've pretty much accepted that I'm a crazy weirdo at this point. I mean, who really cares about pragmatic URL design that's built to be long term and useful and intuitive? Besides me, of course.

But I'm happy with how this has all worked out, and changing everything around deserves a blog post announcing it anyways. Not that anyone reads this shit to begin with, hanging on my every word and nodding along with my insane reasoning and complicated thought processes about a website that hosts stories like the fucking exploding ham phone, not quite believing that someone could actually do the things I do without any hope of reward or feedback, always burning with an intense curiosity at the inner workings of my brain but never wanting to read through my entire oeuvre, running out of breath in your internal voice because of how long this sentence is.

...huh?

Oh right. Thanks for coming along or whatever. I think I need a nap.

-Tim