return 0;

Improving C

C is my favorite programming language, but it would be more accurate to describe it as "the language I hate the least". This page has some changes to C that would allow me to confidently call it my favorite.

A lot of people have tried to "fix C" by making another language, but all of them (that I've seen) change too much from what kind of language C is or impose some arbitrary personal preference -based restrictions. It's fine to make something different, but then you're not "fixing C" anymore, you're making a different language. I think that C almost all good, there's just some nuances that make it unnecessarily clumsy or annoying, and most of those could be fixed without changing what C is or even breaking compatibility with the current C.

C++ has some advantages over C, but it also has some unhelpful regressions, for example there's no way to automatically cast void pointers, you're forced to define struct members in the correct order, worse designated initializers, etc. I also just feel less happy using a language that is so bloated with things that I don't need or want, and that is under the control of a committee that is completely detached from what I care about and keeps adding more and more to the pile of unwanted bloat.

Note: the actual wording and semantics are not important. For example below I propose always_inline as a keyword, but I don't actually care what it's called as long as it exists, it might as well be __always_inline__ or an attribute [[always_inline]] (since C23) or #inline_always or something else.

Some of these may require the compiler to make multiple passes, but I don't know of any good reasons for why that's a problem. It might be an issue for something like TCC, but from my perspective compilers for obscure meta use-cases should have their own restrictions where necessary, it's stupid to gimp one of the most popular languages in the world for everyone just because someone wants to compile C programs with their wrist watch.

"It would require big changes to the compiler" also isn't a valid argument. Compiler developers' laziness isn't a valid argument against making the language better.

"It would make the compiler slower" isn't a good argument because the compilers are already extremely bad and way slower than they should be. The unreleased Jai language is much more complicated than C, yet it's compiler is orders of magnitude faster than C compilers. If you were concerned about compile time you should be complaining about the compilers, not the language design.


Undeniable improvements

As far as I'm concerned, there is no valid argument against these changes.

Nested functions

void check_adjacent (int x, int y) {
int count = 0;
void check (int x, int y) {
...
count ++;
}
check(x, y-1);
check(x, y+1);
check(x-1, y);
check(x+1, y);
}

I do this kind of thing all the time, but it's a GCC compiler extension. You can use goto to do something vaguely similar, but it will always be super clumsy and error prone compared to nested functions.

Nested functions can also be sent elsewhere as function pointers, here's another pattern I use all the time:

void load_assets () {
void callback1 (char* file_path, bool is_folder) {
...
}
read_folder_contents("/assets/images/", callback1);

void callback2 (char* file_path, bool is_folder) {
...
}
read_folder_contents("/assets/things/", callback2);
}

I'm no Assembly pro, but as far as I know there's nothing remarkable or special about this as far as the CPU is concerned, so there shouldn't be any reason to be against this feature.

Automatic casting for struct literals and arrays

This is possible in C++, and I don't understand why it isn't in C.

struct Vec2f {
float x;
float y;
};

void test (Vec2f a, Vec2f b) {
...
}

void main () {
Vec2f v;

// Currently, you have to do this:
v = (Vec2f){5,22};
test(v, (Vec2f){1,2});

// You should be able to do just this:
v = {5,22};
test(v, {1,2});
}

You should also be able to do it with arrays.

void test (int count, int a []) {
...
}
void main () {
test(4, ((int[]){1, 2, 4, 5})); // Before
test(4, {1, 2, 4, 5}); // After
}

Allow dereferencing struct pointers with period

Vec2f* pos = get_pos();
pos->x = 123; // Before
pos.x = 123; // After

Here's some advantages:

As a cherry on top this can be added to the current C language without breaking anything. The compiler knows that this is a pointer because it can complain about it, but it specifically chooses to complain instead of just letting you do it. There's currently no valid use for using a period, so you don't have to break anything by allowing it. You can therefore add this to current C without breaking anything.

Optional stricter typedefs

I use a lot of IDs to manage and refer to things, but there's a problem:

typedef unsigned short Entityid;
typedef unsigned short Itemid;

void destroy_entity (Entityid id) {
...
}

void main () {
Itemid id = 1234;
destroy_entity(id); // You've created a bug, but the compiler won't complain.
}

This should not be possible. There's only one way to fix this; typedef a struct with 1 member that is the ID, because structs have stricter type checking. However that's one of those stupid workarounds that you shouldn't have to do just because the language lacks obviously useful features that would allow you to do it in the obviously correct way.

From language design perspective, fixing this could be as trivial as a new keyword:

typedef_strict unsigned short Entityid;
typedef_strict unsigned short Itemid;

This is ESPECIALLY important for enum values, although somehow it's less common for me to mix those up.

using

struct Vec2f {
float x;
float y;
};
struct Tree {
using Vec2f position;
};
Tree tree;
tree.x = 123; // Valid because x is inherited from Vec2f position.
tree.position.x = 123; // Also valid, the original member can still be used.

One of those things from C++ that is actually useful. I don't like the word "using" though. I'd prefer something more like #inherit.

Nested break and continue

for (x) {
for (y) {
if (y == 10) break; // Breaks the inner loop.
if (x == 10) break 2; // Breaks 2 loops, i.e. both.
}
}

With labels:

outer: for (x) {
inner: for (y) {
if (y == 10) break inner;
if (x == 10) break outer;
}
}

Labels already exist, but they're only used by goto. You can do it this way in current C:

for (x) {
for (y) {
if (y == 10) goto inner;
if (x == 10) goto outer;
} inner:
} outer:

It's pretty clumsy and ugly to play with labels this way though, the label needs to be on the other side of the braces for continue, and it interferes with your flow of programming when you have to change your way of thinking in this way. This is what it looks like if you want to prepare labels for both continue and break:

for (x) {
for (y) {
...
continue_inner:
} break_inner:
...
continue_outer:
} break_outer:

Just let me type break 2. Or alternatively, allow stacking breaks:

for (x) {
for (y) {
if (y == 10) break; // Breaks the inner loop.
if (x == 10) break break; // Breaks both loops.
if (x == 8) break continue; // Breaks the inner loop, and continues on the outer loop.
}
}

Arbitrary length character literals

I often do something like this:

int foo = *(int*)"help";

Exploiting string casting like this is dangerous and weird, the compiler won't be able to tell when I make a mistake. But if I could just set the value as a character literal (which is precisely my intention), the compiler would know that my intent is to set a 4 byte value onto the integer:

int foo = 'help';

The size of the character literal should be the same as the type, i.e. both 'lol' and 'hello' should give a compiler error because int is 4 bytes.

Currently, C only supports character literals with 1 character, but I can't think of any valid reason why it couldn't support more. These should not be allowed to automatically convert into pointers, otherwise you might accidentally do foo('hello!!!') when you meant to do foo("hello!!!"), especially if you often use a language that allows single quoted strings.

New built-in constants: __function_id__, __function_count__

__function_id__ returns a unique integer for each function.

__function_count__ returns an integer for the total number of functions in the program.

These would make it easier to implement profiling and debugging functionality.

Custom string delimiters

When you define a string with a lot of quotes and/or new lines, it becomes a huge mess. The easy solution is custom string delimiters.

char* sometext = #string FOOBAR"Hello world,
this
is "a story" about
coding and stuff"
FOOBAR;

Ideally you should be able to control indentation of the contents, so you can indent the string with the rest of your code. In PHP the indentaion is based on where the string starts, so as long as the content is indented to the same level as the #string tag, there would be no indentation in the string data.

Better ways to inline code

I want easier and cleaner ways to inline a bunch of stuff, but neither macros nor inline functions do it well.

Macros can't return a value, don't do type checking, and having to escape new lines and parenthesize every variable is very ugly and annoying. Inline functions don't actually inline the code the way you'd expect (it won't be optimized as well as properly inlined code would), and is not guaranteed to inline it to begin with.

Here's some ideas that could help, there's more approaches but I won't bother listing everything.

Function attribute: static_always_inline

This function doesn't even exist in the program, it only serves to inline it's contents where ever it's called from, before any kind of optimizations.

There's compiler extensions that always inlines and gives an error if it cannot do so, but I don't know if they exhibit the behavior I desire. There's also some stupid limitations to inlining that I don't understand, sometimes it just gives an error and refuses to inline for some arbitrary reason, because it thinks the stack of inlined functions is too long or something..? Better error messages that actually explain what the problem is would be great.

I've had functions fail to inline even though it literally just redirects to another function and has no other content. This is what I mean when I say that inlining doesn't work correctly. It's extremely obvious what should happen here: the original function call should be replaced with the redirected one as if you never called the original.

Compile-time functions

Just like you can mark a function as "inline", you should be able to mark it as "compile_time". This function would guarantee to return it's value at compile-time. If it cannot do that, it gives a compile error.

constexpr from C++ pretends to do this, but as far as I know even C++ doesn't have true compile-time-only functions.

Multi-line macros

Macros get real ugly when they're long because you have to use backslashes to escape new lines. I don't care how, but there should be a way to define multi-line macros, maybe something like this:

#define_start add_foobar(n)
n += 100;
n *= 2;
#define_end

More flexible struct manipulation

These aren't features that I specifically "want", but I feel that it would be more natural if the language worked this way. I also have a feeling that, if they were possible, I would find some uses for them.

Identical anonymous structs are compatible (and work in function arguments and return values

void test (struct {int x, int y} foo) {
...
}

void main () {
test({.x = 123});

struct {
int x;
int y;
} foo = {2, 50};

test(foo);
}

This would benefit greatly from "Automatic casting for struct literals" (see above).

Get struct member directly from return value

Vec2f get_pos () {
return {1, 2};
}
void main () {
float x = get_pos().x;
}

If you combine both of the features above, you could make your own implementation of multiple return values without requiring explicit support from the language for it:

struct {Vec2f pos; int status;} get_pos () {
if (error) return {.status = 123};
return {.pos = {10,25}};
}
void main () {
struct {Vec2f pos; int status;} result = get_pos();
if (result.status) {
printf("Failed!\n");
}
else {
do_stuff(result.pos);
}

Vec2f pos = get_pos().pos; // I only care about the result, not about the status.

int status = get_pos().status; // I only care about whether it succeeded, not about the result.
}

I don't know how useful this would be in practice, but I feel like there may be some situations where I'd want to use it. It's hard to know because I can't use it and get experience with it. But it's important to note is that both of the features that would make this possible are features that, in my opinion, should already be how the language works. This particular use case doesn't necessarily have to be a good idea, it still makes sense to make it possible.

This would be even more clean and practical if you could have an anonymous variable on the receiving end:

struct {Vec2f pos; int status;} = get_pos();
if (status) {
printf("Failed!\n");
}
else {
do_stuff(pos);
}

That might be going too far though... or would it really? I feel like this is still very C-like, it's just more syntactic freedom for interacting with structs.


Obvious improvements

These are things that would obviously improve C, but I can imagine arguments (however weak) against them.

Struct comparison with ==

== should work for structs as long as they have the same type:

Thing foo = {};
Thing bar = {};
if (foo == bar) {
...
}

I deal with position/size/offset values a lot, being able to just check if 2 positions are the same with == would be really nice.

The main argument I can think for being against this is: the reason you can check equality on ints is that there's CPU instructions for them. However, there are also memcmp instructions for checking equality of arbitrary amounts of data, so I don't think CPU instructions are a good perspective to be against this either. This isn't an undeniable improvement because you could argue that C should mirror what the instruction set can do, and I'm not sure how ubiquitous those instructions are and on what CPU architectures.

Variable assignment must be it's own statement

if (foo = bar) { ... }
if (foo = getbar()) { ... }

This shit shouldn't be allowed and I won't pretend to humor any code golfing arguments for it. This is a typo (== vs =) I've made multiple times and the compiler won't tell me about it because it's valid syntax. Runtime bugs caused by this kind of crap makes me feel like I'm programming in Javascript.

The only reason this isn't in the "Undeniable improvements" list is backwards compatibility. People have written code using this stupid syntax and changing it would break all that code.

Compilers will warn about this depending on how they're configured, but this shouldn't be allowed in standard C and anyone who thinks otherwise is objectively wrong.

Default struct values

struct Thing {
int x = 14;
int y = 50;
};
Thing foo; // foo.x and foo.y are undefined.
Thing bar = {}; // bar.x is 14 and bar.y is 50.

Think of it this way: when you initialize a struct with {}, the default values are invisibly added inside the backets unless you define them yourself.

This would greatly help with anonymous struct variables:

// Before.
struct {
int x;
int y;
} foo = {
.x = 14
.y = 50
};
// After.
struct {
int x = 14;
int y = 50;
} foo = {};

This requires C23 because in C23 {} will initialize the struct values, but before C23 you had to do {0} which effectively has the same meaning as {.x=0}.

Note: it might be good to have a way to zero-initialize without the defaults, but I don't know what that would be. You could use memset but it's not ideal.

Zero-initialization

Foo test; // All data initialized to zero.
Foo test = __no_init__; // Uninitialized, undefined behavior.

Initializing data to 0 is much more error-proof than having to do it yourself. I've had several hard to track bugs that were caused by me forgetting to initialize a variable somewhere in the depths of my codebase, it doesn't happen much anymore because I found a compiler option to warn me about it, but you shouldn't have to find compiler flags for this.

Leaving a variable uninitialized should be explicit behavior that you do for performance reasons.

This isn't an undeniable improvement because somehow I feel like this is, in some parts, just an opinion. I do think that zero-initialization is a more desireable default behavior though because that way you can't get difficult-to-find runtime errors accidentally.

Making this change would not break compatibility with existing C software, in fact it may fix a bunch of bugs from existing software. It might reduce performance a tiny bit too though, but it's very hard to say how much, especially since compilers can optimize some of the zero-initialization away.

This would make "Default struct values" above even better because you wouldn't need the = {} part.

16-bit floats (half floats)

I don't know how complete/consistent the support on most CPUs is for 16-bit floats, but most CPUs and GPUs do support these at least to some extent. I'd use these a lot if they were easily usable.

Apparently C++ supports them too.

Enums with bitfield values

enum [[bitfield]] {
FOO, // 0x01
BAR, // 0x02
ZYZ, // 0x04
XOR, // 0x08
};

Self-explanatory. You can of course define the values of an enum manually, but that's kind of annoying and error-prone.

This isn't an undeniable improvement because it's kind of a minor nitpick.

Forced optimization settings

[[always_optimize]] static void very_costly_function () {
...
}
void main () {
int i = 0;
#optimize_start(3)
while (i < 100) {
do_stuff(i);
i ++;
}
#optimize_end
}

If you have some incredibly hot code path, you might want the compiler to always optimize it, even in debug builds. I have several of those.

If you added localized optimization settings, you might as well allow the opposite too: you might want a certain part of your code to be pure and not optimized.

Manual function overloading

Function overloading means that 2 functions can have the same name as long as the arguments are different. I actually don't think function overloading should be an inherent part of C the way it is in C++, the fact that there's a "true version" of each function has it's strengths, the overloading should just be added on top of that.

Manual overloading would be a great way to get all the benefits.

void foo_int (int x) {
...
}
void foo_float (float x) {
...
}
#overload(foo, foo_int);
#overload(foo, foo_float);

void main () {
int x = 123;
float y = 1.5;
foo(x); // Calls foo_int
foo(y); // Calls foo_float
}

I could see arguments against this ("you don't know what's really being called"), but most of them went out the window as soon as C added _Generic which enables a worse way to do exactly this. The following is currently valid C code:

void foo_int (int x) {
...
}
void foo_float (float x) {
...
}
#define foo(x) _Generic(x, \
float:foo_float, \
default:foo_int)(x)

void main () {
int x = 123;
float y = 1.5;
foo(x); // Calls foo_int
foo(y); // Calls foo_float
}

You could argue that _Generic is better because you can see all the overloads from the same location, but that's also what makes it worse and less useful than true function overloading. Firstly you can't just include a bunch of data storage libraries with their own overloads of append(), and secondly _Generic doesn't allow you to make overloads with varying number of arguments (well, technically you can, I invented a convoluted workaround macro for it).


Advanced improvements

These are questionable, start to steer away from C-like language style, and it's hard to say what complications may arise from them.

Enum value namespaces and inferring

If you use enums a lot, having to prefix all the values is really annoying because you'll almost immediately have to start using such long prefixes that they're longer than the actual member name.

typedef own_namespace enum {
BIG,
SMALL,
MEDIUM,
} FACE_SIZE;

void do_stuff (FACE_SIZE size) {
...
}

FACE_SIZE size = BIG; // Error: BIG not defined.
FACE_SIZE size = .BIG; // Valid, BIG is inferred from the type FACE_SIZE.
do_stuff(.SMALL);

This might have unforeseen consequences that I'm not seeing, mainly because this is similar to struct member initialization:

struct Thing {
FACE_SIZE a;
FACE_SIZE b;
};

Thing foo = {1, 2};
Thing foo = {.BIG, .SMALL};
Thing foo = {.a = .BIG};

Anonymous enum variables

struct Dude {
enum {
HAPPY,
ANGRY,
SAD,
} emotion;
};

Dude dude;
dude.emotion = .HAPPY;
dude.emotion = Dude.emotion.HAPPY; // Maybe this if the above won't work..?

Very often I want some simple state that is tied to a specific struct or function like this and doesn't need a separate enum.

This isn't an undeniable improvement because there might be syntax complications that I haven't considered. This probably also requires "Enum value namespaces and inferring" mentioned above.

Union member priority, alternate union definitions

union Box {
struct { int x; int y; int w; int h; };
struct { Vec2i pos; Vec2i size; };
};

Box foo = {10,10,40,60}; // ERROR!
Box foo = {.x=10, .y=10, .w=40, .h=60}; // Valid, but annoying to type.
Box foo = box(10,10,40,60); // I end up creating macros like this a lot, but it feels very unnecessary and doesn't always work because sometimes you have to do ((Box){}) but sometimes you can't use ((Box){}) and have to use {} instead, but the macro can't pick between one and the other.

I use unions like this a lot when defining rectangles and boundaries and vectors and such, but it makes defining the value a pain in the ass. You could probably solve it by giving one union member a priority, so if you don't define a name, it picks that one.

union Box {
#priority struct { int x; int y; int w; int h; };
struct { Vec2i pos; Vec2i size; };
};

Box foo = {10,10,40,60}; // Sets values for the #priority struct.

This isn't an obvious improvement because I feel like there's a smarter way to solve the problem, or a smarter way to define structs so that you don't need these weird unions, for example by using aliases or manual offsets:

struct Box {
int x;
int y;
int w;
int h;
#offset_of(.x) Vec2i pos;
#offset_of(.w) Vec2i size;
};

OR it could just always prioritize the first option in the union.

#defines with correct scopes

When you #define something inside a function, it becomes available everywhere, not just inside that function. That's obviously incorrect behavior, I think the reason it happens is because the pre-processor isn't aware of functions or scopes. I don't know what's the solution to fixing it, all I know is that the current behavior is obviously incorrect and should be corrected.

#on_leave

void main () {
void* data = malloc(1000);
#on_leave{ free(data); }

if (x) return; // free(data) is inserted here.

for (int i=0; i<100; i++) {
Thing* thing = get_thing();
#on_leave{ release_thing(thing); }

if (x) continue; // release_thing(thing) is inserted here.
if (y) break; // release_thing(thing) is inserted here.
if (z) return; // release_thing(thing) and free(data) are inserted here.

// release_thing(thing) is inserted here.
}

// free(data) is inserted here.
}

#on_enter_function, #on_leave_function

This would make profiling and debugging related systems much easier to implement.

#on_enter_function{ printf("Hello! %s\n", __func__); }
#on_leave_function{ printf("Bye! %s\n", __func__); }

void testfunc (int x, int y) {
x += y * 2;
if (x > 1000) {
return;
}
printf("x=%i y=%i\n", x, y);
}

The code above would equate to the following:

void testfunc (int x, int y) {
printf("Hello! %s\n", "testfunc");
x += y * 2;
if (x > 1000) {
printf("Bye! %s\n", "testfunc");
return;
}
printf("x=%i y=%i\n", x, y);
printf("Bye! %s\n", "testfunc");
}

This would be especially powerful when combined with __function_id__ mentioned earlier in this page. You could automatically store information about function calls.

Ideally you would have some kind of tagging system that allows you to add arbitrary tags to functions, which can then be used to enable or disable these. I'm not sure exactly what that would look like though.

Metaprogramming through scripting

Imagine you could write a bit of C code that generates code. Something like this perhaps:

enum {
#comptime {
File file = read_file("things.txt");
while (1) {
char* word = read_word(file);
if (!word) break;
#paste_code(word)
#paste_code(",")
}
}
};

This code would read a file and write each word in that file as a member of the enum.

I can't do anything like this so I haven't had the opportunity to think of how exactly this could work. I think there's a couple newer languages that can do this, I haven't looked into how they do it though.


Breaking changes

I wouldn't recommend these for C because they break compatibility too much, but they would nonetheless make a better language. If you made a C clone language, you would do these.

Pointer is a property of the type, not of the variable

int* foo, bar; // Correct.
int *foo, *bar; // Stupid and wrong.

typedef syntax is rotated

// Before:
typedef unsigned int u32;
typedef struct {
int x;
} Foo;

// After:
typedef u32 unsigned int;
typedef Foo struct {
int x;
};

This way is more readable in my opinion, especially when it comes to structs.

This also allows you to universally search your codebase for all type definitions by searching for "typedef Name".

String/array length

Strings would be significantly easier to use if they had a length. However, there isn't an obvious "fix" because there's lots of ways to use strings. You'd just be trading an imperfect string system for another (although perhaps less) imperfect one.

The same applies to arrays, it's a bit silly that you currently need to get the array size separately. But even more than strings, you often define the length of an array with a #define constant, so the language adding a length on it's own is completely useless, which may rub some people (me included) the wrong way. I don't want the language to do what I don't want it to do, and that's one of the reasons I like C, because it usually only does exactly the thing I tell it to do and nothing else.

On top of length vs no-length, there's also a question: what type should the length variable be? Should it be 32-bit? 64-bit? Should it be signed or unsigned? Depending on what you're programming (for example a HTTP server with 10 million simultaneous clients), a 64-bit integer for every string may use more space than you want to spend, perhaps you calculate some offsets and would benefit from a signed length value that and won't overflow when it goes negative, perhaps you like the simplicity of guaranteed positivity so you DON'T want a signed value, or perhaps you have your own storage mechanism that doesn't need a length at all so it would just end up being a waste of resources.

I almost never use C strings in my code, I typically do something like this:

struct String {
i64 length;
char* data;
};

#define string(s) (String){.length=strlen(s), .data=(s)}

String something = string("Hello world");

This isn't always desireable, but it's usually too cumbersome to do something better.

I think ultimately the best answer to the array length question would be to allow you to define new string/array types, YOU as the programmer decide what the string or array format should look like and the compiler obeys. The main problem with my macro is that it's extremely annoying wrap every single damn string into the string() macro. Maybe the type could be used as a prefix like this: String foo = String"Hello";. That would make it at least slightly easier, I might even rename the type from String to S.

It's also very difficult to do certain things with just a macro, for example what if I want to put the string length as a 16-bit integer into the beginning of the string? I've used that kind of strings on several occasions, but I don't know how to do that at compile-time. With customizable string types the compiler could inject the length to the beginning of the data of string literals automatically.

Compile-time programming could be another answer to how to implement this, although I'd prefer the compiler to just understand what I want and do it.

Switch cases break by default, explicit fallthrough, multiple case values

switch (x) {
case 0: fallthrough;
case 1: printf("0 or 1\n"); // No need for break.
case 2: printf("2\n");
case 3: printf("3\n");
}

In 99% of cases you will want to break, so it should be the default behavior. Making break explicit would also allow you to break from an upper loop when you're in a switch case.

It would be very useful if you could set multiple values to a switch case:

switch (x) {
case 0, 1: printf("0 or 1\n");
case 2: printf("2\n");
case 3: printf("3\n");
}

Better variable arguments

The problem with C variable arguments is that you can't know what the arguments are from the function-side, you need to explicitly provide some kind of type information when you call the function. I don't have a good design for better varargs, but I assume that some kind of TypeID system would be involved.

You could add a new variable arguments system into current C without breaking compatibility simply by using a special keyword for the new system. I don't know if it's a good idea to have 2 though.

Different base type names

I don't really care since I can just typedef whatever I want in C currently, but I think the default type names are silly and you shouldn't use them if you were to remake C today. Here's what I would use as base types:

Most new languages seem to do something like this, and I think that's a good decision. The abstract names of C might have made more sense in the distant past when CPU register sizes were less consistent, but these days effectively all CPUs or GPUs are very consistent with these.

There may be some merit to having a type called "int" because sometimes you just want a number and don't really care what size it is. It should be a i64. Similarly, having a "char" type might be useful because it communicates better that you're talking about text/string and not a generic data buffer or something. It should be u8 (unlike in C where char==i8).

There's a few reasons for prefixing signed integers with i instead of s. Firstly, you rarely think of the words "signed integer", I think it's more common to think of just "integer" or "int" which starts with "i". Meanwhile when you think of an unsigned integer, you specify with the word "unsigned" which starts with "u". Secondly, the letter "s" also reminds of "string" and "struct", while "i" doesn't really remind of anything else. When I see "s32", it immediately makes me think of some kind of string. I might want a string that uses a 16-bit integer for it's length, which is what "s16" sounds like.

Enum members (and struct members?) split by semicolon

enum {
FOO;
BAR;
ZYZ;
};

This is a very minor nitpick, but I typo these very often because everything else is defined like this. If I'm not consciously thinking about it, I'll end up typing semicolons here.

If you define multiple constants, you use semicolons. If you make a struct, the members are split by semicolons. If you make a bitfield with booleans, you use semicolons. But if you make an enum, you use commas... why?

I'd want the same for struct initialization:

Vec2f pos = {
.x = 10;
.y = 5;
};

Other than consistency, there's a big advantage in that if you decide to switch between that and the following (which I end up doing quite often), it's much easier to make the change:

Vec2f pos = {0};
pos.x = 10;
pos.y = 5;

That one might be going too far though, not sure.

Importing and building

There's nothing too interesting to say about this. It's obvious that building programs and including libraries and doing forward declaration in C is demoralizing crap. Most other languages do it better, and the correct solution, at least to me, seems like it goes without saying.

#import "coolarray.c"

Coolarray test;
init_array(&test);
#import "coolarray.c" ca

ca.Coolarray test;
ca.init_array(&test);

When you import a file, the compiler pulls it in and finds all the types and functions from it, and it just works, and keeps that data around in case any other file imports it too. You also wouldn't need to typedef structs or enums anymore.

Some parts of the C macro system stops working if you do this. The biggest confusing thing is that if you do a #define and then #import something, you can no longer #undef it from the outside, it will get "baked in" to that file and all the other files that it imports. Most of these problems go away when you realize that they're effectively compiler settings, "if feature x or setting y is enabled then add this code", which should be a global project-wide setting anyway. It's very rare to #define something for 1 file and then #undef it afterwards, I do that sometimes to rename types but that would also not be necessary if I could wrap the file into it's own namespace so it won't clutter my code with names that I wanted to use myself.

#include should still work the same way, but #import would obviously become the new standard way to include stuff because nobody likes headers.

Here's some other related details:

// Public functions are accessible by anyone who imports this file. This is default.
public void foo () {
...
}
// Private functions are unavailable even if you import this file. Useful if you have a lot of internal functions and don't want them to clutter the namespace.
private void bar () {
...
}
// Same as public, except this will also be visible from an object file or DLL.
exported void zip () {
...
}
// This comes from a compiled library at the linking stage.
external void pop ();
#import shared "coolarray.c" // coolarray.c will be implicitly imported by any other files that import the current file.
#import global "coolarray.c" // coolarray.c will be implicitly imported by all files. Useful if you want some base libraries to always be imported in all project files.
#import force "coolarray.c" // Forcibly imports everything, treating all private functions and variables as public.

For context, I always compile my programs as a single unit and only link with one other thing; a file that has abstractions for OS functions. I have my own way of mostly not needing header files. This makes C programming a lot more comfortable for me, but it's still limited compared to the design above.


Remaining desirables

Even if all the changes above were applied, there's still neat things I'd like to have. However, these can't easily be added to C without significantly changing the language.

Function syntax

function add_ints (int x, int y) static -> int {
return x + y;
}

function* callback (int x, int y) -> int = add_ints;

function main () {
callback(1, 3);
}

There's numerous benefits from doing something like this.

The biggest downside (besides totally breaking compatibility) is that it's longer to type, and it looks different from what you're used to. It really doesn't "look like C" at all.

Cleaner macros

In many cases I would prefer if macros could do type checking, be more easily contained, be able to return a value, and in general behave more like inline functions.

I don't have a good design or examples at the moment, but I feel like the entire macro system in C could be replaced with a more controllable one without losing any of it's current capabilities. For example I think you could add optional type checking into the current macro system without breaking compatibility with current C code.

Adding enum members dynamically

enum ERROR {
NONE,
};

ERROR foo () {
if (x) return #unique_enum(ERROR, BAD_THING_HAPPENED);
if (y) return #shared_enum(ERROR, MEMORY_ALLOCATION_FAILED);
return ERROR.NONE;
}
ERROR bar () {
if (x) return #shared_enum(ERROR, MEMORY_ALLOCATION_FAILED);
return ERROR.NONE;
}
void main () {
ERROR e = foo();
if (e) {
printf("Oh no! %i\n", e);
}
}

I've always wanted something like this: global error numbers. An enum that you can expand dynamically. The value could flow all the way from some inner function to top-level code and retain it's meaning. I think I would have to build an actual programming language before I can say anything more about it.

With some additional introspection metaprogramming features (for example enum_name_as_string(ERROR,value)), you could even print the error name regardless of where it comes from.

This could also be useful for other purposes, for example #including a UI module which adds a new UI node type to a UI system.

Struct templates

If you've ever implemented custom arrays or buffers in C, or perhaps certain kinds of math, you may know what I mean.

struct Array {
u32 count;
Something* data;
};

The Something here should be whatever's in the array. There's no good way to re-define this with a different pointer type without tedious macro crap. But here's what it could look like:

struct Array(T) {
u32 count;
T* data;
};

void print_ints (Array(int)* array) {
for (u32 i=0; i<array.count; i++) {
printf("%i = %i\n", i, array.data[i]);
}
}

Array(int) array_of_ints;

At first this may seem icky and un-C-like, but if you really think about it, this isn't much different than the hideous macro hacks that people tend to present when they want to do something unusual. All this does is expand to a different struct based on what the input is, and you can't define a function that takes in Array(T), it has to have a type attached to it.

An interesting note is that you could do this without any additional features if, as mentioned earlier in this page, identical anonymous structs were compatible. Then you could just do something like this:

#define Array(T) struct{ u32 count; T* data; }

Array(int) array_of_ints;
// The line above will expand to the line below:
struct{ u32 count; int* data; } array_of_ints;

Since anonymous structs aren't compatible, you cannot send this to any function. I guess this might also slow down compile times since the compiler would have to interpret and compare tons of anonymous structs everywhere.

Struct templates would also be useful for other purposes. For example I use generic vector structs for all kinds of positions and offsets, I need those structs in basically all base types (for whatever packing reasons). But instead of defining Vec2_u16 along with a thousand other things with macros, I would prefer to use Vec2(u16) and have it interpreted automatically.

The utility of struct templates is limited by the fact that functions cannot take in an arbitray struct template. For example you cannot create a generic multiply() function that can multiply any type of vector. For that you would need function templates, which completely destroy the idea of remaining C-like. You CAN use struct templates for arrays though (because all pointers are technically interchangeable), see below for an example.

Function templates

This is a feature that would make my life a LOT easier, but also the feature that is hardest to justify for being added to C.

Depending on how ugly you're willing to go, you can work around to some template-ish stuff even with current C. Here's one of my own array types that I use:

#define TMFARR(T) \
{ u32 count; u32 maxcount; T* data; }

typedef struct TMFARR(void) Tmfarr;

void _tmfarr_append (Tmfarr* a, u32 itemsize, void* item) {
...
}
#define tmfarr_append(a, item) \
_tmfarr_append((Tmfarr*)(a), sizeof(*(a)->data), (item))
typedef struct TMFARR(int) Tmfarr_int;

void main () {
Tmfarr_int ints = tmfarr_new(Tmfarr_int, 64);
tmfarr_append(&ints, &(int){123});
for (u32 i=0; i<ints.count; i++) {
printf("%i = %i\n", i, ints.data[i]);
}
}

The idea is to have a base array type that uses a void pointer, and then define new arrays for other types. When you use one of the array functions, the array gets cast to the base type, and the pointer type size is extracted and sent to the function.

The only reason this works is because pointers are interchangeable, the same trick won't work for vector templates.

There's a lot of questions about how exactly template functions should work. For example in C++ you have to always add the type separately after the function name when you call the function array_append<int>(...);. Perhaps this is necessary for some reasons, but when you're programming, you just want to use the array and function as if they're regular structs and functions.

Better error messages

This isn't really a C problem, this is a compiler problem. I use GCC and the error messages are horribly bad.


Other thoughts


Making my own language?

I've tried to make a language on a few occasions. The conclusion I reached is that in order for my own language to be an improvement over C, I need to make a really proper language and not half-ass it. Problem is that making a proper language requires way more work than I'm interested in putting into such a project.

Things I value in a language

I want a language that is just a tool for doing things, and doesn't make any assumptions about how you should do things. Given that philosophy as the bedrock, make it as powerful and flexible as possible at doing things, with as few words/symbols/syntax as possible. Make the obvious and most convenient things happen by default, but let me specify the behavior more precisely where desired.

It should treat manipulation of data as the most important thing. Everything is just plain data that is visible and accessible to you, and you can do whatever you want with it. When you make a string, it should just be N bytes of data, and it's up to you what to do with it and how to interact with it and where/how it's stored. The reason I don't want a language where strings have a length, is because I didn't decide for it to have a length. If I want my string to have a length, it should be up to me where to put that length and how, the language should give me the tools to define that behavior.

I am conflicted about number autocasting. If you create an int, and you set the value to a short, there may be loss of information. Most of the time I like the fact that C doesn't care about this because usually I also don't care about it, or I know it can't fail. For example the length variable on my string type is an i64, but I usually roughly know it's length, for example it's not realistic for a file path to overflow a short. But on the other hand there have been some situations where I would prefer if there was a warning. I think you can configure compilers to warn about it, but I'm not sure what the all-encompassing solution would be.

Similar to above, there should be options that let you configure the language according to your preferences. For example I wouldn't mind the C++ void pointer casting situation if the compiler let me disable it, but I can't disable it.

I hate it when I want to use a name but the language has reserved it. The worst offender is probably the word 'class' which I've run into in Javascript, there's infinite things I might want to give some kind of classification to, but I can't name the variable 'class' because that word is reserved by the language. You could prefix some keywords with #, for example #class instead of class, but it's tricky to find the right balance because that makes it more annoying to type them.

I've often heard it said that syntax isn't important in a language, but I think it's very important because it's your direct interface with your program every single second that you're programming. If the syntax is ugly and clumsy, it's like living in an ugly house, it's just going to make you feel bad.