https://www.timtimestim.com

FML: Fantastic Markup Language -- Writing HTML in C

2021-03-17

WARNING: Magnum opus programming project

View this page in its original markup or visit the fml.h library to make your own!


What.

Has your empty life finally gotten to the point where you're willing to do anything short of murder for a fleeting moment of joy? Has the constant stream of internet content left you with a gnawing empty feeling, making you believe you'll never again experience true fulfillment? Do you perhaps hear the screams of the damned deep inside your own house?

If you answered "yes" to any of these questions you're in luck because this page was written in C. I don't mean that this page is generated by a C program working on an external file or an enormous string literal, or that I wrote it by writing HTML into C comments, or even that I output something that got post-processed by other utilities. Actual, real, semantically correct, ANSI C "code".

This HTML file is generated by compiling and running a C program.

I sincerely hope you get revenge on whatever cruel twist of fate brought you here.

A little taste

FML is supposed to be an actual markup language, but I didn't want to cheat by ignoring C's language semantics. The middle ground was implementing it using C's most user-friendly feature, a variadic function:

#include "fml.h"

int main(void) {
	fml(stdout,
	".BOLD", "How to lose the will to live by writing your dissertation in C: A comprehensive guide", ".PG",

	".RED", "A fresh hell around every corner", ".PG",
	"Just give up now. There is ", ".ITALIC", "nothing", " left for you here.", ".PG",

	NULL)

	return 0
}

This will end up looking like...


How to lose the will to live by writing your dissertation in C: A comprehensive guide

A fresh hell around every corner

Just give up now. There is nothing left for you here.


As you can plainly see this entire situation is a good argument in favor of 92nd trimester abortion. Nevertheless I prevail -- for how long is up for debate considering how this is the second sign of the apocalypse, but still.

For more information see the "FML Reference" section somewhere in the mawing pits of this page.

My so-called "reasoning"

Okay officer, I can explain.

You see, my brain is actually a ceaseless spiraling mess of madness that just so happens to seep into reality sometimes. My skull houses a bio-chemical computing substrate capable of things like "insanity" and "unstoppable bloodlust" among other things...

But alas you still ask questions. That was your first mistake. If you ask questions you might get answers.

It was a dark cold night over here in hell and I was feeling bored. Instead of jacking off like a normal 20-something nerd I instead decided to read "Using a C preprocessor as an HTML authoring tool" By Jukka Korpela. This manifesto instilled in me a deep desire to violate the laws of nature and do something similar.

You see, Korpela's preprocessor-based markup language, while an acceptable abomination, wouldn't produce valid HTML. If you wanted to write an escape character like the quote (") mark, you would have to write is as &quot; inside of your macro. You would have to do similar with other characters like (>) and (<) and (').

This problem on its own isn't so bad considering the eldrich horrors he already summoned, but like any good programmer I got that itch to fix the tiny pedantic complaint by making something a hundred times more complex.

And so I sat in the mountains and meditated on the important questions in life. How do I sing the song to end the world if I need three mouths to do it? Will we ever solve busy-beaver 69? Is an apple falling forever Turing complete? What's that smell? How do I make a markup language in C?

The first idea to come to me was to make a series of functions that would wrap string literals in tags. For example something like txt("That's "); b("bold"); txt(" of you to assume"); would produce: "That's bold of you to assume."

If I were less insane this would have been acceptable. It would be useful considering that you can use your formatting functions in loops and whatnot. The problem is that it wasn't absurd enough for my tastes. I wanted angels and angles to weep. I wanted generations of people born with eyes on their toes from the deep-seeded poison of what I created. Having bog-standard functions seemed too pedestrian by comparison.

But what to do? I could use a C-based markdown library and pass a gigantic string literal to it, thus fulfilling the requirement of "writing it in C". A silly thought with interesting implications, but it seemed like cheating. I wanted to rely on C's semantics instead of subverting them. This reasoning also ruled out using comments.

I was watching a bird when it hit me. I dodged a second blow from the bird and pulled a weapon, but not before I had an idea; just like this bird, variadic functions in C are horrifying beyond mortal comprehension. Why not use the blood of this totally-not-my-fault dead bird I found and a variadic function to process my markup language?

With reverence I carved a name into the tombstone of my credibility as a programmer: Fantastic Markup Language.

Stupid technical details nobody cares about

I just wanted to demonstrate the blockquote functionality.

-Me

No. Seriously. FML is an actual thing that exists and I had to solve actual problems implementing it. You know that one image of the spongebob fish guy's eyes burning saying, "MY EYES!"? Yeah.

Image of a spongebob fish with eyes burning screaming, "MY EYES!"

Believe it or not I actually had some goals:

The code for this thing is fly-by-the-seat-of-my-chinny-chin-chin quality. Considering the lightning pace I pinched it out at, I'm surprised it became as coherent as it was. Still, don't use this as an example of "good" C code. Please don't. Don't even look at it. Just pretend you're on the beach somewhere warm listening to the waves tell you a story about C's baroque argument promotion rules or something.

Variadic functions considered harmful

You know printf()? Hold on. It looks a little like this:

#include <stdio.h>

int main(void) {
	int x = 23;
	int y = 3266;

	printf("x is %d, y is %d", x, y);

	return 0;
}

In direct contradiction to the prophesy, printf() somehow manages to be the perfect example of a variadic function. The above example takes two arguments, but it could take more if it wanted. It's variadic -- takes a variable number of arguments.

You don't see a lot of variadic function use in C outside of printf, and for good reason. Every time someone designs a new variadic function they have to eat an entire jar of olives and wash it down with mustard. Seriously, it's a lesser known undefined behavior of C (undefined because the C standard doesn't assume what brand of olives you have in your fridge). Ask anyone who's ever tried, they'll say the same thing. It's the price you have to pay to use one of the most dangerous, stupid, poorly designed, and flat out disgusting language features that the C standards committee has ever hurled at us.

I know this post is supposed to be talking about the FML markup language, but it's going to turn into "Tim complains about variadic functions for the next trillion years". You're just going to have to take me at my word that this is relevant. Relevant for my therapy sessions, that is.

Let's look at a very simple variadic function I made to demonstrate the folly of man:

#include <stdarg.h>

int sum(int argc, ...) {
	int i, n = 0;
	va_list args;

	va_start(args, argc);

	for (i = 0; i < argc; ++i) {
		n += va_arg(args, int);
	}

	va_end(args);

	return n;
}

(psst. argc is the count of how many arguments extra you passed to it)

In our arrogance we actually thought we could add up a list of numbers using: sum(3, 3266, 420, 69)

The real magic here comes from the va_arg() function. It takes a va_list as sacrifice and uses the name of the type you expect to be passed. It pulls the next argument in the list, going down the list in consecutive calls.

While I'm busy washing out the taste of olives from my mouth, let me tell you why this state of affairs is far shakier than you might first realize.

Let's imagine that an int is 32 bits and a long is 64 bits here. What happens if you pass a long as an argument to this function? That's right, va_arg() would still pull in 32 bits as an int, since there's no way whatsoever to check the type of an argument passed variadicly.

Okay. There is a way. But I'm not about to make my own compiler with an extension that solves this problem.

It gets worse. Much much worse. Notice how I have the 'argc' variable there. Why would I need to have a count of how many arguments you've passed? Isn't that silly? It is! I'm so glad you noticed! Good thing there's no way whatsoever to check how many arguments have been passed to a variadic function.

If you mess something up your va_arg() calls are going to start blowing through your stack like a puffer fish blows through your intestines after eating it. It'll iterate over whatever else follows it in the stack until it segfaults or compromises your system or both. Every single variadic function needs some sort of stop-gap, or it might as well just be a regular function. printf's stopgap is its format templates like %d, which tell it what types to expect next.

Imagine you expect a large struct, but you accidentally pass an int to it instead. Say hello to all that extra stack memory! Fuck you! :D

You have to be CRAZY careful with variadic functions, be it printf or otherwise. The function HAS to know in advance what types to expect in what order, and they HAVE to be passed in that order OR ELSE. The security risk makes string buffer overflows look like child's play.

But wait, there's more! You thought it was over? You think you know what the gotchas are now? Ha! Suck va_arg()'s flaming dong and witness this: C's argument promotion rules for variadic macros.

You see, C likes the 'int' type. A lot. It's legit written in the sacred text of the standard that any math operation that can be expressed using an int will be expressed using an int. So that char + char thing you did? Behind the scenes C's nepotism promoted them all the way to int + int for that operation.1

Same thing for variadic arguments, with a few special rules because life is pain and flesh is mortal. The inexorability of entropy demands that any types that can be 'promoted' into an int will be if passed to a variadic argument. So all your shorts, chars, _Bools, whatever, all are passed as int. That means you just have to somehow know that you can't use va_arg() with a char, and that you have to tell it to expect an int instead. Good luck if your compiler doesn't give you a warning, sucker.

And let's not forget floating points. All floats are promoted to doubles in variadic argument land. A land of sugar and honey and swarms of angry hornets large enough to fit in the palms of your hands. If you want a float, you better tell va_arg() to take it as a double.

In short int: variadic functions suck.

1 This isn't a bad thing in every case. For one it allows for quite a bit of optimization, since int is defined as the fastest integer type for the platform you're compiling for. Even though it causes a lot of issues with unsigned ints and variadic functions.

How FML does it

Despite my best efforts to use one of the Top 10 Worst C features (you won't believe what number 3 is :O) I still managed to make it somewhat coherent.

The fml() function takes a single mandatory argument of a FILE pointer, so that you can tell it to print to stdout or whatever other file you fopen(). I know that good libraries are supposed to be fopen() agnostic, but you can't possibly understand how little I care. The variadic part is expected to only be a list of char * (chars pointing to null-terminated strings) with the last argument being a NULL pointer (called a "sentinel" value, like how c-strings end in \0).

This sick nasty setup allows for me to write string literals directly into the function parameters while also being able to tell it to stop with the sentinel NULL pointer. The rules are so easy to understand that there's only about 100 different ways you can mess it up.

What's a string literal? It's what happens when you write text inside double quotes, like "The number is %d". I'm sure you know that already, but it took me so damn long to learn that the official name of these things wasn't "string constant".

Insert funny section-ending joke here.

Memory/storage limits?

You can't be serious. You're worried about memory usage? What? Do you not realize that a single megabyte of memory is able to hold somewhere in the ballpark of 200 thousand ASCII words? Do you know how much text 1 gigabyte of memory could hold? I could store everything you've ever written directly in memory and your computer wouldn't even flinch.

Nobody is going to use FML for that scale of work. At all. Ever. It's stupid to even think about. Which is why I made it work anyways.

An interesting property of string literals is that they're (almost always) stored directly in read-only memory inside of your executable. If you're on a unix/POSIX machine you can use the strings command to print these names to your 800 year old stone tablet terminal.

Since string literals decay into char pointers this provides a great way to store text inside of your C executable without using up system memory. The internals are set up so that dynamic allocation isn't needed for processing, conditional on the sun exploding in a couple billion years.

One possible limitation is that the char pointers in your arguments are probably stored on the stack. If each pointer is 64 bits large and your stack is 1 MiB large you're going to stack overflow after about 131,000 arguments or so. Or until you reach some other limit in your compiler. Oh no. What a shame. Oh shucks.

So just in case you want to write a few hundred gigabytes of text inside of a single function call, you can do that now. I don't know if this is a mercy to you or a curse. I could see both arguments for that.

What about speed?

This question is somehow more absurd than the memory usage one. It's a C program operating on a small amount of ASCII text in trivial ways in a loop. You tell me how fast it is.

We're all C programmers here. We at least know at little bit about how fast computers are at this kind of thing, right? ...Right?

The cracks start to form, faults spreading, splitting down the middle

FML is like the skin of a rotten apple. It presents a false face to the world and only when you bite into it is the rot exposed.

Look at this page. Isn't it so cool that images, text styling, lists, and even code blocks are working? It's so nice.

You might be thinking that FML is able to express HTML in a way that isn't limited in absurd ways. You would be wrong.

For example, due to how block level elements are processed ("""processed""" being used pretty loosely here) you can't have lists with nested elements. It's all one level or nothing.

Or, this is a doozy, any block level element -- a blockquote, code block, or list -- can't have text styling inside of it. No italics in lists, no bold in blockquotes, nothing. I know you didn't notice, but now that you have you should realize how frail this whole system really is.

Inline images? Nope. What about bold and italics at the same time? Nope. Save your marriage? Nope.

This markup language is more limiting than the agonizing feeling of being buried alive, waking up in a casket under tons of dirt, coming to a realization that your space of movement is constrained to a tiny box you can't even sit up in, no one to hear you scream and cry, no one to save you while you suffocate after depleting the small amount of oxygen left, fulfilling the expectations that everyone else already formed about you being dead. I hate it; I hate this markup language so fucking much.

I'll be so glad when I'm done with this post and I never ever have to write in it again. The syntax is like trying to inscribe a tattoo onto my eardrums, the stupid little rules and limitations break me down into a lump of wet sand that can feel only anger, my very existence sours and bruises at the thought of typing out another god damn ".PG", to break a paragraph

Use a better markup language to talk about this one? Not in my house. I have standards, after all.

The parable of the HTML Tags

HTML tags are stupid.

Errors

Just in case FML wasn't horrible enough, I actually have some errors coded into it. Just in case you dare make a mistake that I didn't want to code around.

One funny error is that the string you give to an argument can't be more than 509 characters long. This is because ANSI C has the same limit for string literals, and I thought it would be funny to enforce it by code. Yes I'm that sadistic. Yes I need help. No I won't take my pills, doctor, they control my brains.

Wait... Doesn't that look familiar?

If you look at the markup for this page (linked near the top), you'll notice that it looks similar to another markup language out there: roff

That's right! I fooled you! This has actually been a shitty roff implementation the whole time! Ha ha ha ha!

But yeah. I realized a little too late that this looks and functions a lot like roff expect embedded into the arguments of a C function. There's no original thoughts, everything is derivative, we should just embrace the void and transcend our mortal coil via Kool-Aid lobotomy.

What about bugs?

lol

FML Reference

Here lies the FML reference. If you decide your life lacks misery, you can learn what functionality FML has here. Use the markup for this post as reference for how to actually use these things.

Breaking elements are "stand-alone" in the sense that they produce something without extra arguments to help.

Inline elements are applied to the next argument in the argument list.

Block elements have special properties applied to each element inside of them. They are stopped by reaching a .END_BLOCK element.

The end

As the flames of the last burning city fade to soot, as the final hour of the final person looms, once we have all met our good or bad ending, we will know that it happened because I made a markup language in C.

That's the long and the short of it. I had a funny idea and executed on it. And now that I have I can at least communicate to you the magnitude of my mistake so that you might not make it yourself.

I know how you programmer types work. You're going to look at this and start getting ideas. You're going to want to do something similar in whatever insipid language you happen to like this month. You're going to be at work pretending to do something and your brain will helpfully supply you with a way that this could be "improved". You're going to find a bug and feel the need to fix it.

I'm here to tell you to ignore these feelings. Stomp them down like your emotions after a bad breakup. Learn from my folly so that it might have some meaning beyond my self-inflicted suffering.

I have a dozen more important projects I could have worked on in the three days it took me to program FML and write up this post. Maybe more than a dozen, if we're being honest. I got to do programming and writing, which are my two highest passions in this merciless universe, but the cost was so high. My hand literally hurts from the awful ergonomics of typing in this markup language.

And yet, I somehow enjoyed myself. And really, that's all that matters when programming, isn't it? Or, wait. No. It isn't. That's the stupidest thing I've ever said. Why did your program reveal our user's passwords? Because I was enjoying myself :D! Just forget this whole paragraph happened.

Whatever. I'm going to let the words I've already written speak for themselves before I dig this hole any deeper. Peace out.

-Tim

This article is licensed under CC0