XHACK

XHACK is a hypothetical programming language based on what I want out of a language. I have considered making it many times, but it requires more work than I have motivation for. Here's an extremely rough summary of where it's coming from:

Similar to C
Static typing
Completely manual memory management
More organized syntax than C
Cleaner and less error prone macros than C (more similar to inlined functions)
Even more programmer freedom than C
Conveniences that make low level programming feel more high level without actually being any higher level (such as type inferring)
Extremely well defined and opaque (no invisible architecture or behavior will be inserted into the program)

The name "XHACK" is a bit weird, it came from a much older idea about a language that can modify the program itself in realtime without restrictions. It eventually transformed into this language idea and I never came up with a more suitable name.

Most of this page will describe the language from the perspective of a C programmer who isn't interested enough in other languages to have paid attention. Some ideas here are what certain other languages are probably already doing.

What I want is a language that works at a very similar level as C, or perhaps even lower. Basically I don't want my program to do ANYTHING except the things that I specifically tell it to do, not counting optimizations (which should also be toggleable). I want to have full control of what is happening, and I want to be very specific about what I want, for example I want static typing with exact-width types ("i32" instead if "int", "u8" instead of "unsigned char", etc) to be the default way of doing things.

I also want minimum-width types that the compiler can optimize into any size as long as it is at least some size. For example, u16_fast would be any unsigned integer that is at least 16 bits, the compiler/optimizer decides what type to use based on what it thinks is fastest in any given context. For example it could increase that variable into the natural CPU register size (32bit or 64bit). I think the type names should be very convenient, but I'm not sure what they should be. *_fast is annoying to type, another option is to use C type names for this:

char = i8_fast
short = i16_fast
int = i32_fast
long = i64_fast
float = f32_fast
double = f64_fast

The most important thing is to be consistent, well defined, and give the programmer the means to be exact without requiring tricks or additonal libraries.

As for booleans, they should mostly behave like 1-bit integers.

C pretty much already has all of this, but you need to include a separate library for them and typedef a bunch of names, which is not ideal, especially since libraries aren't going to use them, and when they do they use the hideous default uint32_t syntax. Plus you can't do the _fast names described above. This is all part of the idea of "cleaning up C", I spend a lot of time programming and I don't want to spend that time with an ugly language that has a lot of weird and annoying quirks.

Low level programming, high level systems

Programming languages are usually categorized as either "low level" or "high level" languages. What I want is essentially both. I want a language that is at the lowest level possible, but has the necessary syntax and features to use systems in a very high level way.

Basically: if you want to make a new system (like a UI or graphics library), you use the low level parts, but if you want to make a program, you use libraries whose frontend is designed to be simple and convenient.

There's also many things that can make low level programming much easier without using any higher level language, one of those is type inferrence. Basically instead of doing this:

Type_you_dont_care_about* foo = get_boring_thing();
explode_boring_thing(foo);

You don't really lose or change anything if you do this instead:

auto foo = get_boring_thing();
explode_boring_thing(foo);

It does the exact same thing, but it is much more comfortable to type the second one. You know what type you want but you don't want to write out the name all the time or start memorizing weird third party library types, you just want to use it.

Syntax

I will be giving some code examples below, and for that purpose here is a summary of the syntax.

To declare variables, instead of the C-like Type name, you use name : Type. There's a few reasons for this. Originally the syntax I used was # name type, with the pound sign essentially denoting a variable declaration (similar to "var" in Javascript), but after experimenting I came to the conclusion that name : Type is cleaner and more consistent.

When the type is on the right side, you can easily stack additional properties like static without making the code look like a mess.
Type on the right is more consistent with functions since the entire function can be on the right side.
name:type is simpler inside tight spaces like loops and function parameters when compared to #name type.
The advantage of the additional symbol (as opposed to just the type) is that it inherently supports type inferrence for variable declaration: cool_object := new_cool_object();.

Additionally, instead of :, you can use :: when you declare something that isn't a variable, such as a new type, a function, or a constant value (similar to #define in C).

For meta compiler instructions, I decided to use @ instead of # like C. Originally that was because I used # for other things, but I'm not sure if that has relevance anymore.

The syntax is obviously not final because I haven't even finished thinking about all aspects of the language let alone started building it.

Pointer as &

I am not entirely sold on this idea (mostly because it looks ugly and is slightly more annoying to type), but I have a desire to change the pointer symbol from * to &. The reason is simply because I think it's more consistent and descriptive. & means "the address of", so it makes more sense to use it like this:

thing : &Foo;
thing = &something;

Dereferencing with * is a completely different operation, you're not talking about an address, you're talking about the value at that address. A pointer variable is just an address, it doesn't have what's at the destination. So the asterisk is reserved only for dereferencing. Ideally I would use a different symbol for that as well since * is the multiply operator, and it's annoying to look at code like this: x + *foo * *bar. I don't know what it should be replaced with though. Maybe foo[] since that's how you dereference arrays anyway.

No more arrow

thing : &Foo = get_thing();
thing->x = 10; // WRONG
thing.x = 10; // Correct.

That is all. Basically only C and C++ still do this shit so I doubt anyone except the worst fans of those languages would even argue about this.

No headers

mem :: @import("memory.xhx");

main :: () () {
	foo = mem.alloc(123);
	mem.free(foo);
}

Arrays and strings

I have put a lot of consideration into arrays and strings. These are very fundamental concepts to programming, but there is no one true way to do them. For example if you read data from a binary file, the length of the string may be null terminated, it may be before the string data, or it may be somewhere else entirely (for example as part of some header). The type of the length value may also vary, for example you may want it to be an 8-bit integer when you store a lot of very small strings like names in a database or keys in a dictionary or whatever. For various reasons you may want to use any of these options.

So if there is no one true way, shouldn't the programming language respect that? The idea I came up with was to essentially remove arrays and strings from the language, and instead treat them as a metaprogramming feature. The length and the data location of the array are always separate, but they can be linked with a meta array variable, that variable can then be sent to functions and the necessary conversions will be done automatically. I have many variations of this idea and I haven't come up with a final solution, but it basically looks something like this:

myarray : @array(x, y); // 'myarray' is not a real variable. You can use it like a variable but in reality it just refers to x for the length, and y for the data.
myarray : @inline_array(i32, 100); // Similar to above, except this places the array data right here. Both the length and the data location are known at compile time and thus do not need to be placed anywhere as variables.

The language also does not stop you from modifying either the length value or data pointer, but it is up to you to make sure you don't break anything with it.

Generally speaking you are not meant to send pointers to arrays or modify the length of an array that you did not create. If you wanted to make some kind of dynamic array/string type, you should make a new struct and add a meta array into that:

Coolarray :: struct {
	length : u16;
	data   : & i32;
	array  : @array(.length, .data);
};

.array is not a physical variable, instead it just refers to the other two variables and can be used as an array. For any function that modifies Coolarray, you send the struct, but for any function that wants a vanilla array, you send coolarray.array.

One advantage that I like about this system is that if the length and the location of the data are known at compile time, then this entire system is completely invisible, it has no consequence to your program because the values are already known and do not need to be stored anywhere.

There ARE situations where you might want to send a pointer to a length and a data pointer into a function, which then modifies those values. I haven't thought about this enough to figure out how to deal with such situations. The way I think about this is: if the programmer does not want to use the array system for some purpose that arrays are made for, then there's a problem with the array system. I haven't finished thinking about this so I'm not entirely sure if this is how arrays and strings should ultimately work.

Enums and bit flags

There's a very simple yet powerful thing you can do with enums:

SOMETHING :: enum {
	FIDDLES;
	SKITTLES;
	GRUG;
};

thing : SOMETHING = .GRUG;

Notice the . before GRUG: the members of an enum can be inferred based on the type of the enum. The same can be done with functions:

give_me_something :: (x:SOMETHING) () {
	...
}

give_me_something(.GRUG);

I cannot describe how much I want this functionality in the language I use. It is so tiresome to type SUPER_LONG_AND_COMPLICATED_ENUM_NAMES all the time just to avoid name conflicts and to generally type the stupid enum name when I only need to talk about it's member.

But that's not quite all, you can also do this for bit masks/flags:

EMOTION :: bitmask {
	HURT;
	HAPPY;
	LAZY;
};

set_emotion :: (emotionflags:EMOTION) () {
	...
}

set_emotion( .LAZY | .HAPPY );

Multiple return values

If you read the code examples above, you may have noticed something strange about functions; they have 2 sets of parentheses. After some experiments I came to the conclusion that the best way to support multiple return values is to change the function declaration syntax.

calculate_distance :: (x:i32, y:i32) (error:ERRNUM, distance:i32) {
	dist : i32 = max(x,y) - min(x,y);
	return 0, dist;
}

e:i32, distance:i32 = calculate_distance(50, 100);
if (e) {
	printl("Error calculating distance!");
}
else {
	printl("Distance is ", distance);
}

This way the return values never get mixed up with something else and are cleanly wrapped in a bundle no matter how you define the function.

The reason I think multiple return values are important is exactly what you see above: returning error codes. I find it very annoying when certain functions return values indirectly by making you send pointers into the function. Multiple return values can eliminate almost all of those cases.

onreturn (defer)

While I prefer manual memory management, there's certain things that make it very annoying. Namely when you allocate a bunch of things and have to free them in the same function:

do_stuff :: () (error:ERRNUM) {
	foo := alloc(100*sizeof(Coolio));
	if (!foo) {
		return 1;
	}
	bar := new_array(500);
	if (!bar) {
		free(foo);
		return 2;
	}
	bizzle := new_dooger();
	if (!bizzle) {
		free(foo);
		free_array(&bar);
		return 3;
	}
	e := do_a_thing();
	if (e) {
		free(foo);
		free_array(&bar);
		free_dooger(&bizzle);
		return 4;
	}
	
	printl("Yay!");

	free(foo);
	free_array(&bar);
	free_dooger(&bizzle);
	return 0;
}

Another way to structure the code is to nest it, though I prefer not to write code like this because of the ridiculous stacks it creates which are also annoying to read and to modify:

do_stuff :: () (error:ERRNUM) {
	error : ERRNUM = 0;
	foo := alloc(100*sizeof(Coolio));
	if (!foo) {
		error = 1;
	}
	else {
		bar := new_array(500);
		if (!bar) {
			error = 2;
		}
		else {
			bizzle := new_dooger();
			if (!bizzle) {
				error = 3
			}
			else {
				e := do_a_thing();
				if (!e) {
					error = 4;
				}
				else {
					printl("Yay!");
				}
				free_dooger(&bizzle);
			}
			free_array(&bar);
		}
		free(foo);
	}
	return error;
}

This kind of code is extremely annoying to maintain, and also confusing to read. If you could just tell the program "do this whenever you return", the code would look like this:

do_stuff :: () (error:ERRNUM) {
	foo := alloc(100*sizeof(Coolio));
	if (!foo) return 1;
	onreturn free(foo);

	bar := new_array(500);
	if (!bar) return 2;
	onreturn free_array(&bar);

	bizzle := new_dooger();
	if (!bizzle) return 3;
	onreturn free_dooger(&bizzle);

	e := do_a_thing();
	if (e) return 4;

	printl("Yay!");
	
	return 0;
}

You're significantly less prone to making errors, the code is more clear, and it's much more comfortable to write. I don't know what the actual implementation of this would look like and what the potential restrictions would be (such as using it inside a loop), but I would be happy enough if all this did was tell the compiler to copypaste the defer contents before all following returns.

Perhaps another keyword like onscopeout can be used to do the same when you exit the current scope. This way you could use this functionality inside if cases and loops. Perhaps onreturn doesn't even make sense, and onscopeout should be the only way to do it.

for (x:=0; x<100; x++) {
	bizzle := new_dooger();
	onscopeout free_dooger(&bizzle);
	
	foo := alloc(100*sizeof(Coolio));
	if (!foo) continue;
	onscopeout free(foo);
	
	// do other stuff
}

Anyway, I've never made a programming language so I don't know how this would work exactly, but this is the general idea of what I want in a language. Some languages like Go call this functionality "defer", though personally I find that word weird and would prefer to use something like "onreturn" which more clearly describes what it does.

Zero initialization by default

I am very much in the "zero initialization" camp of things. When you create a variable or an array, it should all be initialized to 0 by default. I hate bugs so I always have to do that manually:

Foo thing = {0};

It is pretty tiresome to have to do this because I want to do it every single time I do anything at all. Instead, in XHACK it is the opposite. If you want a variable to NOT be initialized for some deliberate performance reason, you have to do something like this:

thing : Foo = ---;

I saw this syntax in Jai and I think it works fine, I haven't thought about it further than that at all. I mostly don't care what the non-initialized syntax looks like because it's extremely rare for me to want it. It could even be thing : Foo @no_initialization;.

Cleaner macros

I use macros in C a bunch, but I don't actually like them. A lot of the things I use macros for could be replaced with better features. One of my most commonly used macros however is the following:

#define STRING_FROM_STRINGS(n, ...) \
	String n = {0}; \
	{ \
		String item[] = { __VA_ARGS__, ((String){0}) }; \
		int i = 0; \
		while (item[i].data) { \
			n.length += item[i].length; \
			i ++; \
		} \
	} \
	n.data = alloca(n.length+1); \
	{ \
		String item[] = { __VA_ARGS__, ((String){0}) }; \
		int pos = 0; \
		int i = 0; \
		while (item[i].data) { \
			mem_copy_to(n.data+pos, item[i].length, item[i].data); \
			pos += item[i].length; \
			i ++; \
		} \
		n.data[n.length] = 0; \
	}

This has to be done as a macro, because what it does is take strings as input, and combines them into a stack allocated string:

STRING_FROM_STRINGS(foo, "Hello ", "world ", "or ", "whatever.")
printl("(", foo, ")"); // (Hello world or whatever.)
// no need to free(foo) since foo is allocated on the stack

I use this a lot when I want to combine file paths or parse file names and the like. The macro definition however is extremely ugly, unwieldy, hard to read, error prone, stupidly formatted, etc. All I want is to combine strings on the stack. Anyway, in XHACK it would look something like this:

STRING_FROM_STRINGS $ (args : ... String) {
	length : i32;
	for (i:=0; i<args.count; i++) {
		str : &String = args[i].data;
		length += str.length;
	}
	@left_hand.data = alloca(length);
	@left_hand.length = length;
	pos : i32;
	for (i:=0; i<args.count; i++) {
		str : &String = args[i].data;
		mem_copy_to(@left_hand.data+pos, str.length, str.data);
		pos += str.length;
	}
}

There's several things to say about this. Firstly $ is used instead of :: when defining a macro. Both macro arguments and variadic arguments can be type checked, you can give this macro any number of strings, but only strings.

Unlike C macros, macros in XHACK are more like inlined functions with some extra abilities. For one the special @left_hand variable is available. Basically, if you do the following, @left_hand refers to path:

path : String = STING_FROM_STRINGS("foo", "bar");

Maybe you wonder what "String" is, wasn't there supposed to be a special array+string system? Well, I haven't thought about all the details regarding the array system proposed earlier, and this is a particularly tricky one. There's no way to define the type since arrays are essentially 2 separate variables that are linked with metaprogramming, so in order to work with variable arguments, there would HAVE TO be a type or some way to identify it. Instead of answering this I'll distract you by talking about variable arguments:

Variable arguments

In C, functions can have variable arguments, but it basically sucks and nobody uses it willingly. The problem is that C doesn't give you any information about what was sent into the variadic function. My current thought is that there should be some kind of type identifier:

do_thing :: (args : ...) {
	for (i:=0; i<args.count; i++) {
		if (args[i].typeid == @typeid(i32)) {
			printl("Value is an int: ", *(&i32)args[i].data);
		}
		else if (args[i].typeid == @typeid(f32)) {
			printl("Value is a float: ", *(&f32)args[i].data);
		}
	}
}

do_thing(155, 6.6, dingo);

If you expanded the above into user code, I imagine it would look something like this:

Arg :: struct {
	typeid : TYPEID;
	data : &void;
}

do_thing :: (args : @array(Arg)) {
	for (i:=0; i<args.count; i++) {
		if (args[i].typeid == @typeid(i32)) {
			printl("Value is an int: ", *(&i32)args[i].data);
		}
		else if (args[i].typeid == @typeid(f32)) {
			printl("Value is a float: ", *(&f32)args[i].data);
		}
	}
}

a : i32 = 155;
b : f32 = 6.6;

args : @inline_array(Arg, 3);
args[0].typeid = @typeid(a);
args[0].data = &a;
args[1].typeid = @typeid(b);
args[1].data = &b;
args[2].typeid = @typeid(dingo);
args[2].data = &dingo;

do_thing(args);

You can use @typeid() to get the ID of any given type and compare it to Arg.typeid. Type IDs are just integers, each type receives their own automatically.

The actual syntax here isn't too important, it doesn't have to be an array that goes into the function. There's probably better ways to implement the functionality of variable arguments in other languages, but this is how I would approach it if I made my own language since I don't know what the "right way" is.

In this page I use the printl function:

printl("Hello ", 500, "!!");

This is how I would do 99% of my prints, but it's impossible in C. It simply prints each argument in order. If the argument is a number, it prints it as a number. If it's a struct, it prints some kind of object representation like { x=12, y=15 }. The "L" in "printL" refers to "line", it just adds a new line character after the last argument so you don't have to type \n every time you print something. Of course traditional printf would still be available.

Syntactic methods

There seems to be a strange war between object oriented people and low level programmers regarding this:

foo : Thing = get_thing();
foo.do_stuff(500);
foo.do_things("Hello");

Some programmers tell you that you should do the following instead:

foo : Thing = get_thing();
do_stuff(foo, 500);
do_things(foo, "Hello");

While the idea of "object oriented programming" makes me wince, I think there's a lot of value in the syntax, both for conciseness and in terms of organization. If you have an array, it's very clearly an object that you want to interact with, it makes perfect sense to structure such code as array.append(1). It's also very helpful when you enter someone else's codebase since text editors can help you find functions that are related to things.

So what I want is simply this syntax. It doesn't "do" anything, it's just a normal function with different syntax. You would also define it like a normal function:

Thing :: struct { ... }

Thing.do_stuff :: (this:@this, x:i32) () { ... }

I'm not sure exactly how the this variable should be defined, I'm inclined to believe, for various reasons, that it SHOULD be inserted into the arguments manually. That way you can remove it if it isn't necessary, lets you change the name to whatever you want, it looks more similar to how the function actually works, and makes it more consistent if the function is exported to a system that doesn't support the method syntax (for example calling this function from C).

One of the problems I have with the language that Jonathan Blow is working on (my current top candidate for the language I'll move to from C) is that he seems to have exactly this war mentality, that this syntax is somehow wrong and programmers shouldn't use it, thus he won't add it to the language. Admittedly I don't know the language very closely so I don't know exactly if he has added something like this or not, but I recall hearing him say that it's the wrong way to do things.

Customizability

On the note of programmer wars and opinions, one of the things I want is the ability to customize the languge. I believe that it's not the programming languages's job to decide what's right and what's wrong, that's the programmer's job, so I want to be able to customize the language.

For example in C++ it is impossible to automatically cast void pointers, you HAVE TO manually cast them every single time and there's no way to change that, unlike in C where you can. This doesn't make any sense to me because the whole point of a void pointer's existence is to be eventually changed into another type, there's no reason to cast it manually because being cast is it's purpose in life. And yet, you can find more than plenty of C++ fans who will defend this behavior to the death. They think it is more secure somehow but consistently fail to provide an example where it would save your ass.

There's a lot of things that programmers are divided on, so I want the programmer to decide how something should work. If you think void pointers should be cast manually, then you can just turn on that behavior. If you think all integer types should be cast manually, you can turn that on too. If you think converting an integer into a bigger one is ok but converting it into a smaller one should require a manual cast, you can configure the language to work that way.

All of this functionality, along with all compiler settings, should be in a build file. You don't give settings to the compiler, instead you start the whole compiler from a build file that tells it everything it needs to know. Perhaps you can give the compiler customizable variables, which can then be interpreted in the build file however you want, similar to the -D option in GCC.

It should be possible to add settings into code files too, that way you can change settings without breaking any libraries, since the libraries depend on their own settings.

Metaprogramming

There's infinite things you could use metaprogramming for, I don't even remember everything that I've wished I could do.

One thing that always stands out in my mind though is profiling and tracking the program call stack. If I could tell the program to insert a little code snippet at the beginning and end of all functions, I could make performance profiling systems without using any external tools or having to mess with my code, it would just be a library that can be imported into any project and it just works. I could also print the current callstack whenever I want just to check something or to debug without a debugger. If I could create and initialize an array at compile time based on how many functions there are, I could make the profiling more efficient.

Compile-time initialization is another thing I can never get enough of. There's many things that the program wouldn't have to do at all if I could make a metaprogram that takes some inputs, and outputs source code or modifies a part of the program. This part of metaprogramming is very appealing to me because it's essentially deleting unnecessary parts of the program and making it faster, it allows you to write simple APIs that get expanded by the metaprogram into something that would be way too unwieldy to do by hand or would otherwise have to be done by the program at runtime.

I don't know exactly what metaprogramming should look like, but I think you'd need something similar to what Jai supposedly has: a scripting language that lets you print out code. Something like this:

@compiletime {
	x : @inline_array(i32) = [1, 2, 3, 4];
	for (i:=0; i<x.count; i++) {
		@pastecode {
			printl("Value = ", @var(x[i]) * y);
		}
	}
}

Which will get expanded into this:

printl("Value = ", 1 * 100);
printl("Value = ", 2 * 100);
printl("Value = ", 3 * 100);
printl("Value = ", 4 * 100);

Maybe it's possible to put compile time scripts into macros, since the macro gets expanded anyway, the compile time script within a macro would expand into different code depending on how you call the macro.

foo $ (x:@array(i32), y) {
	@compiletime {
		for (i:=0; i<x.count; i++) {
			@pastecode {
				printl("Value = ", @var(x[i]) * y);
			}
		}
	}
}

x : @inline_array(i32) = [1, 2, 3, 4];
foo(x, 100);

These examples look like something that the compiler would do anyway without all these @compiletime shenanigans, but what if you put if/else conditions in it? What if you want to parse a string and output code based on what you find in it? What if you want to have a global array and insert data into it at compile time based on some other inputs and parameters? What if I want to read a file and do something based on what's in it? Or what if I want to embed that file into the executable somewhere? What if I want to add secrets into my game that can't be datamined out of the executable, so I want to encrypt some text in the program?

Furthermore, I want the ability to loop through struct and enum members and do something based on those.

I also want the ability to add new enum members through metaprogramming, for example different parts of the program should be able to add their own error values to a single error number enum. That way you can not only have a unique error value for every error in the program and can pass it along however far you want without it losing it's meaning, but you can also get the name of the error with metaprogramming:

ERRORNUM :: enum {
	NONE;
};

something :: () (ERRORNUM) {
	@add_enum_member(ERRORNUM, SOMETHING_NOT_IMPLEMENTED);
	
	return .SOMETHING_NOT_IMPLEMENTED;
}

main :: () () {
	e := something();
	printl("Error when calling something(): ", @member_name(ERRORNUM, e)); // Error when calling something(): SOMETHING_NOT_IMPLEMENTED
}

@add_enum_member is a compile-time action and isn't part of the function, but you may want to put it into the function for clarity about what it's related to.

This also makes it more easy to plug in modules to other things. For example one thing I've been rewriting over and over for a long time is my UI system because I can't figure out how to do it well. A lot of my problems stem from lack of metaprogramming capabilities in C. One of the many things I want is to be able to write modules for it, for example adding a color selector module, but only include it in my program if I need it. For a program like FilenameWrangler, I will want a special monospace row-based text editor of sorts, I want to write a module like that for my UI system without having to include it into every program I use my UI system for.

I'm sure there's more I'm forgetting about, but I'm getting tired of writing this page.

Switch cases should break by default. Obviously. This is obviously what should happen in a switch case so please stop making me break every case manually.

I also like the idea of removing the switch keywords, this is how I prototyped it:

if (x) ... {
... 1: printl("one");
... 2: printl("two");
... 3: printl("three");
... : printl("wut?");
}

People seem to hate static variables for some reason, but I don't. However I would rename "static" into "persist" or "global" to better reflect what it's for. Sometimes I just want a persisting variable, but since it's functionality is internal to a single function, making it a global variable is just plain worse than making a local variable whose value persists.

XHACK

Low level programming, high level systems

Syntax

Pointer as &

No more arrow

No headers

No headers

Arrays and strings

Enums and bit flags

Multiple return values

onreturn (defer)

Zero initialization by default

Cleaner macros

Variable arguments

Syntactic methods

Customizability

Metaprogramming

More