Long term websites

Published: 2020-03-08
Description: timtimestim.com is built to last for a long time.
Word count: ~2687

Update - Apr 06 2020: .htaccess no longer required. Updated relevent section.

One of my major reasons for making this website is to preserve the content I produce for it. The words I write on this blog, on my stories, and anything else I happen to think of along the way, are meant to last.

That might seem like a lofty goal, and that's because it is. Grade A++ logic there, Tim, really killing it.

Maybe if I tried saying things that were surprising instead of obvious my blog posts wouldn't be more bloated than a celibate sperm whale.

I digress.

I want my writings to be as available as possible for as long as possible. There's a roughly 500% chance that, in the future, I'm going to cringe at this shit so hard that my spine will collapse. Still, If the past version of me decided that putting a sperm whale joke online was important, than that means it's important to preserve that folly.

Furthermore, the internet is perhaps the worst thing on the internet. This means that I want this website to be easily backed up and readable offline with minimal fuss. And in the worse case scenario allow for totally decentralized distribution like it's some sort of disturbing virus that says non-sequitur anecdotes instead of causing pneumonia.

As a concrete goal: I want this site to exist for at least the next 60 years, even if I suddenly die.

Backups

The content of this website is backed up in several places. Here they are in increasing order of absurdity:

  1. On my local laptop. Using git version control to check for data corruption and archival purposes.
  2. On the web server hosting the site.
  3. On a flash drive I back up once a week.
  4. On my desktop PC that I never use anymore. Backed up once every 4 or so months.
  5. Probably somewhere on my phone. Who knows at this point what's on that thing.
  6. I can vaguely remember a bit of the site, and could probably re-create it imperfectly if needed.
  7. Should I gain god-like powers sometime in the distant future I will be able to re-create this site in my own image on the 7th day.
  8. Some sort of public git repository, without all the spaghetti code and rough drafts. The raw Markdown files and final HTML files are all available there (the link to this repo is at the downloads page here
  9. I plan on making a paper backup and putting it into a long-term storage company at some point. Probably when there's more substantial content or the coronavirus zombie apocalypse begins.

I'll probably start backing this thing up on read only CD's or something silly like that, just to be contrarian.

There's a hundred million ways to back things up. I'll eventually find easy ways to make other backups as time goes on, and I'll ensure that none of them are added to this list because I can't be fucked to edit this post in the future.

Static websites

(warning: technical information ahead)

Welcome to the part of the post where Tim complains about software and the internet. Strap yourselves in, lads and ladettes.

I'll just say outright that getting stuff to stay on the internet is pretty horrible. HTML is horrible. CSS is horrible. JavaScript is a legit torture tool made by Nyarlathotep to bring programmers to the brink of madness. And it's all glued together with enough dried scotch tape and hope that I'm surprised I can even pretend to use it.

This site tries to keep everything as simple as reasonably possible. Yes, I could make the site much simpler, but at a certain point I would be getting diminishing returns that just aren't worth it. Cutting away all formatting and presenting raw plaintext files is just silly.

For example: this is a pure static website. That means that everything is just HTML and CSS. There's none (and never will be) any JavaScript or other server-side or client-side code that has to run for the site to function. WordPress can politely suck my ass. Anything that can handle HTTPS request and slightly understand HTML tags can read this website.

This means that the site is rally fast to load, and doesn't waste your bandwidth. This also means I don't have any sort of analytics outside of server-side log files (which I've been debating about removing entirely).

At the moment I don't have any images on the site, and I'm rather reluctant to add them. A single JPEG of my dong dog might take up more bandwidth than a 100,000 word long blog post, and that just feels dirty to me. Still, this is probably going to change some day. If that's the case than I'm going to hyper compress the images using every dirty trick I can pull off, and use the most ubiquitous file format I can find (which will probably be JPEG).

I'm also striving to have the entire source code of every page be on the page. All the CSS is embedded into the HTML of every page, instead of being in an external file. Even the favicon is embedded into the HTML of every page. You should be able to easily back up and archive any page of this website with a stupidly basic wget or equivalent and have it render exactly the same as if it were online.

An exception is the RSS feeds, which are just XML/Atom feeds written in plaintext.

I'd be stupid to write the content on this website in raw HTML. Just the thought of typing all those <'s and >'s makes me feel seriously ill. I'd have a better time trying to parse the HTML using regular expressions (which, I remind you, is mathematically impossible to do completely).

No. I write this website using Markdown, which is perhaps one of the most beautiful inventions the human race has ever produced. Since it's plaintext, and it's formatted in a way that's human readable in plaintext, I can easily share the raw markdown files and have those act as easy long-term backups of the site. In fact, you can visit such a public backup through the link on my downloads page; it's also a good place to read the site offline if you'd like to download the whole thing.

That git repo also has the finalized HTML of the website, just in case.

If it's in Markdown, than I need to have some sort of static site generator for it. I wrote my own site generator using shell script, since I simply hate all other site generators and needed something simple and effective. I use pandoc to generate the raw HTML files, but it could be replaced in the future if needed. Markdown isn't that difficult to parse, and there's plenty of programs out there that can do it (and will be able to do it in the future, which is the most important part).

There's some "risky" things in my generator like generating RSS feeds and whatnot. I consider these things luxuries. If I need to get rid of them I will. The most important thing to me is keeping the content of the website preserved at all cost.

Easy to back up. Easy to generate. Easy to archive. Pretty good foundation, if I do say so myself.

Hosting and domain name

Since this site is totally static HTML, that means it's host-agnostic.

I don't have to use any content management system to keep the site up. I don't need to manage any security updates (unless there's something as significant as the whole HTTPS thing). I don't need to play by the rules of WordPress or Ghost or something else. I don't need to make sure my hosting provider allows me to run databases or server-side code. I don't need to summon a dread succubus to whip me in punishment for thinking about JavaScript.

The only things a potential hosting provider needs are: Ability to use custom domain names (through their own DNS service or external) and an Apache2 server (since I use a .htaccess file).

Update - Apr 06 2020: I no longer need an .htaccess file! Yes, links still work, even though I was stripping the ".html". More info on how I accomplished this in the previously mentioned git repo, if you're into that kind of technical crap (especially since I've still kept the website entirely static).

That's it.

As for the domain name, I chose a .com domain since it's the domain with the most likelihood of existing into the far future. Yes there's several problems with how it's managed, but there's plenty of incentive for .com to exist well past the point where it makes reasonable sense.

I chose "www.timtimestim.com" instead of "timtimestim.com" since the former is more obviously a website instead of a domain for an email or something. Also it just sounds cool. "timtimestim.com" just redirects to the "www.timtimestim.com" domain anyways.

Currently my hosting provider and domain name provider are the same company. Since either service failing means my website goes down, I can justify the ease-of-use benefits of it without much problem with redundancy. This opinion is subject to change in the future, since I haven't really given it much thought.

My hosting provider is a "prepay for what you might use" provider. I can put in $100 into my account and have my website stay operational for approximately 5 or more years with no extra maintenance. It doesn't work on a subscription charge that might be canceled for any number of reasons (credit card failing, subscription service failing, etc). If this provider fails, I'll probably be looking for one similar if the need arises.

And this is the part where I make a funny joke about hosting providers or something. Maybe I'll end this paragraph with a shocking-but-silly metaphor.

Content

All the content of this website is written in English. That's because I only know how to speak English. What a fucking revelation.

English is, at the moment, the most popular and universal language on Earth (lucky me). This provides an aspect of future proofing, since people/computers will be able to draw upon a huge swath of English literature and writings to understand what sick nasty rhymes I'm spitting.

I thought about trying to make the content of this site "evergreen" and "timeless" so that it could be understood in isolation of other things, but fuck that. I'll write what I want, instead of worrying how someone 200 years in the future might interpret it. If I get cryogenically revived in the future and they behead me for disrespecting bananas or something, than I'm already fucked.

There's no hard and fast rules of how the content of this website will be presented. If there were rules, I'd try to break them into pieces anyways. I was that kid that cut open my Mighty Beanz to see the ball bearing inside, if you couldn't tell.

The important thing about content is how it gets updated.

You can't ignore updating content on the internet. Things change, websites change, opinions change, spelling mistakes get corrected, people make mistakes and want to cover them up, and even my socks change from time to time.

I want the freedom to edit things if needed, but I also see the value of preserving my thoughts on things even when my opinions inevitably change. If there's substantial changes made to a post/story I will give sufficient warning and clearly mark the edited areas. If my opinion on something changes enough to be significant, I will probably put some sort of warning on the top of that particular post. Spelling and grammar mistakes won't get this treatment, since they aren't really significant.

Pages that get continually updated (like the about page) probably won't be getting the same treatment. It really just depends on how the page is structured and what kind of content is on there. It'll be handled on a case-by-case basis.

Link rot

Ever visit an old forum or wiki? Ever click on one of the external links? They almost always take you to a dead webpage. It's a god damn pandemic. Even when you go to the Internet Archive you can't always find the dead link, like it was just cast into the void never to return again.

I hate this feeling. It makes the page feel rotten in a way that's hard to describe in words (except for the fact that I just described it with words). If I want my website's content to last, than any of the external pages I link to should also be built to last.

This is easy with websites that I own myself, where I can implement similar longevity tactics.

For other pages I'll mostly just tell you to look something up for yourself, instead of linking to it directly. Web searching will always exist as long as the internet exist, so you should be able to find what you're looking for even far into the future.

Obviously, linking to internal pages works fine, since I'll manage my own links well enough to keep them in line.

If I need to link something externally (that I don't own myself) than I'll make sure that I keep a careful watch on that link while conspiring to find a way to remove the "rotten" link. I'll also provide justification, like this:

Right now the "git backup" is an external link, since the whole point is to have an external backup; if something were to happen to gitlab (the current host of the backup) I can easily move the host and the links to something else. If I die or am otherwise unable to chance the link, than I'll be able to rely on the fact that people have been able to download the git repository. The whole point of a git repository is that many people can download it and distribute it, so I feel okay with having an external link in this specific scenario. This justification also provides extra justification for the justification.

The biggest problem is citation. If I ever want to cite something (like an academic paper), than I'll have to constantly monitor the link so it doesn't rot, which would be pretty difficult if I were dead. I could self host the cited article/paper/scroll, but that would be inviting quite a bit of legal litigation that I'd rather not get.

The best I can do is say, "This is the title and author of the cited piece! Figure out where to find it your self, asshat." This is basically the same strategy I'm taking to mitigate most of the other external links I might need. People will be able to find the information they need though their own means of searching, even in the far future.

As for linking to this website, you're in luck. I plan on never removing any link I publicly put online. If you link to this post on your own website, I plan on supporting the existence of that link for as long as I reasonably can. Combined with my desire to keep my content available, you shouldn't have to worry about links rotting as much.

(The RSS feeds will still be link-able, even if I remove them. They just won't get updated or something.)

Conclusion

There's a ton of things I can do to try and preserve this website's content even more. As it stands, I think I'm already doing more than 99% of website owners, which is pretty good by my standards. Still, in the future I might want to implement better fail safes for things like a sudden death or financial troubles. It's a constant battle to preserve information, but one that I plan on winning.

I could end this with some sort of impactful quote about the importance of preserving information, but you didn't want to know what I had to say about that anyways, right?

-Tim