TFD
TFD is my vision for a programming language. It is basically an overhauled version of C, meant to be cleaner and more comfortable to use, mostly by being less pedantic and allowing you to do the same kind of things with less effort. In this page I mostly describe it from the perspective of how it differs from C. This page also functions as documentation for myself.
Click here for the story.
C is the programming language that is closest to my ideal, but there's a lot things about it, big and small, that are very clumsy or annoying. Just to give an example, when you define the value of a struct for a variable or function call, you have to type (Somestruct){x,y}
even though there isn't any good reason why {x,y}
wouldn't be enough.
Some of the problems, such as the problem above, are fixed by C++. However, almost all of the problems I have with C also exist in C++, it introduces some new problems (for example you can't type {.y=y,.x=x}
, it has to be in the correct order {.x=x,.y=y}
), and it just feels bad to use a language that is so overcomplicated. It makes me feel like I'm on unstable ground and that the survival and propagation of the language (and thus my codebase) is complex and uncertain.
There's other languages that proclaim to fix C or be a better version of it, but all of them miss the point of what I actually want and care about.
One of the core philosophies of TFD is that it's not the programming language's place to tell the programmer what's the right way to program. That's why you can configure most behaviors and rules with compiler settings, and it doesn't withold features for ideological reasons.
I've gone through multiple phases of wanting to make different kinds of languages, or perhaps a simple pre-processor for C, but I always end up feeling like the costs outweigh the advantages. I now believe that the only thing that would be worth doing is a proper and complete new language.
I may make TFD some day when I find the right motivation and time. The biggest problem is that I don't really want to learn to use LLVM, and the only alternative is to transpile into C code but that comes with it's own complications. I'm also considering learning enough x86 that I could just output an executable directly, but that couldn't be optimized at all.
Here's some random sample code:
#module "print.c";
#module "string.c";
#module "memory.c" mem;
#import "fireworks.c";
struct Vec2f {
f32 x;
f32 y;
};
struct Entity {
enum STATE {
NONE;
ALIVE;
DEAD;
INVINCIBLE;
};
STATE state;
STATE state_previous;
#inherit Vec2f pos;
};
function create_entity (Entity.STATE state, Vec2f pos) Entity {
return {
.pos = pos,
.state = state,
.state_previous = .NONE,
};
}
function main (int arg_count, &String args) int {
&Entity enemies = mem.alloc(32*32*sizeof(Entity));
int fireworks_done = 0;
function do_fireworks (&Entity entity) ERROR {
ERROR e = spawn_fireworks(entity.x, entity.y);
if (!e) fireworks_done ++;
return e;
}
for (int y=0; y<32; y++) {
for (int x=0; x<32; x++) {
Entity enemy = create_entity(.INVINCIBLE, {x, y});
ERROR e = do_fireworks(&enemy);
if (e) {
break 2; // Break both loops.
}
enemies[y*32+x] = enemy;
}
}
printf("Did {} fireworks!\n", fireworks_done);
return 0;
}
Base types
u8, u16, u32, u64 // Unsigned integers. The number is the size in bits.
i8, i16, i32, i64 // Signed integers.
f32, f64 // Floating point types.
bool // true or false. 8-bit size
void // No type.
int // i64, this is a convenience integer when you don't want to think about it. It can also communicate that you don't have a specific reason to make the value 64-bit.
uint // u64, see above.
Notable syntax differences from C
Examples of syntax in C, followed by the equivalent in TFD.
- There are no strings by default, see strings for how strings work in TFD.
- Arrays are not equivalent so they cannot be directly compared, more about them below.
- Pointers are dereferenced with
*
like in C, however it goes immediately before the member you're dereferencing, and has maximum precedence over all other symbols..
will dereference once if the variable on the left is a pointer.// Pointer to int.
int *thing = NULL;
&int thing = NULL;
// Pointer to struct
thing->x = 123;
thing.x = 123;
// Pointer to pointer to struct
(*thing)->x = 123;
*thing.x = 123;
// Struct member pointer
*thing.x = 123;
thing.*x = 123;
// Struct member pointer to struct member pointer
*(*thing.x).y = 123; *thing.x->y = 123;
thing.x.*y = 123;// Typedef.
typedef int Something;
typedef Something : int;
// Struct.
typedef struct { xxx; } Foo;
struct Foo { xxx; };
// Union.
typedef union { xxx; } Foo;
struct Foo #overlap { xxx; };// Function.
static int foo () { xxx; }
function foo () int { xxx; }
// Function pointer.
int (*foo) () = NULL;
&function foo () int = NULL;// Import from pre-defined directories.
#include <foo.h>
#module "foo.c";
// Import from local path.
#include "foo.h"
#import "foo.c";// Switch.
switch (foo) {
case 1:
case 2: break;
default: break;
}
if (foo) ... {
case 1: #fall_through;
case 2:
case:
}
Value literals
int value = 1222333444555666777; // No need to postfix this kind of number with "LL".
int value = 0xFFAABB; // Hex value.
int value = 0b0000111100001111; // Bit value.
// All number types will completely ignore underscores (except inside the 0x or 0b prefixes). Can be used at your discretion to make the number more readable.
int value = 1_222_333_444_555_666_777;
int value = 0x_FF_AA_BB;
int value = 0b_00001111_00001111;Character literals.
u32 value = 'X'; // 0x58
u32 value = 'Help'; // 0x706C6548
u32 value = '❤'; // 0xA49DE2The size of the character literal must be equal or smaller than the type. 'Hello' gives an error here because u32 is only 4 bytes. If the type is larger than the value, 0s are added to the end. The data is in text byte order, basically the equivalent of this in C:
u32 value = *(u32*)"X\0\0\0";
u32 value = *(u32*)"Help";
u32 value = *(u32*)"❤\0";Non-hexadecimal numbers can be postfixed with a type if you want to be specific about the type of the value. These aren't usually necessary since TFD interprets the literal as the most appropriate type for it's context.
u8 value = 255u8;
i64 value = 1000i64;
f32 value = 0.25f32
SomeID value = 123SomeID;
SomeID value = 'LOL'SomeID;
Pointer offsets
&u32 foo;
foo ++; // Moves the pointer by 4 (sizeof(u32)) bytes.
foo &++; // Moves the pointer by 1 byte.
foo[2] = 500; // Modify a value from offset sizeof(u32)*2.
&u32 bar = foo + 2; // Gets an 8-byte (sizeof(u32)*2) offset to foo.
&u32 gar = foo &+ 2; // Gets a 2-byte offset to foo.
Arrays
Arrays are treated the same way as structs, they are passed and copied by value. In C arrays are treated as a weird fake pointer.
foo.[x]
will access a member of the array,foo[x]
is an offset pointer dereference (same as in C).struct Arr4 {
int item0;
int item1;
int item2;
int item3;
};
Arr4 s;
s.item1 = 123;
printf("Size in bytes is {}, it has {} ints\n", sizeof(s), sizeof(s)/sizeof(int));
function test1 (Arr4 x) Arr4 {
return x;
}
s = test1(s);
// These are identical to the above.
[4]int a;
a.[1] = 123;
printf("Size in bytes is {}, it has {} ints\n", sizeof(a), countof(a));
function test2 ([4]int x) [4]int {
return x;
}
a = test2(a);It's important to internalize the difference to C arrays because offsetting a pointer to an array will offset the pointer by the whole array size, not the item size.
[4]int array;
&[4]int a = &array;
a[1]; // a &+ sizeof(int)*4, this will overflow the array
a ++; // a &+= sizeof(int)*4
a.[1]; // a &+ sizeof(int), access second item in the array. Like with structs, period will implicitly dereference once if needed.Array's pointer type is (by default) compatible with the item's pointer type.
function print_floats (int count, &f32 items) {
for (int i=0; i<count; i++) {
printf("{} = {}\n", i, items[i]);
}
}
[100]f32 array;
print_floats(countof(array), &array); // Even though the function wants float pointer, a float array pointer will be accepted too.
Zero initialization
int foo; // Initialized to 0.
#noinit int foo; // Uninitialized.
Multiple switch case values
if (foo) ... {
case 1, 2, 3: print("1, 2, or 3!\n");
case 4: print("4!\n");
case: print("unknown...\n");
}
Nested functions
function check_adjacent (int x, int y) {
int count = 0;
function check (int x, int y) {
...
count ++;
}
check(x, y-1);
check(x, y+1);
check(x-1, y);
check(x+1, y);
}function load_assets () {
function callback (String file_path, bool is_folder) {
...
}
read_folder_contents("/assets/images/", callback);
// You can shove it directly into function arguments, this is identical to the above except the function doesn't have a name.
read_folder_contents("/assets/things/", function (String file_path, bool is_folder) {
...
});
}There might be restrictions in some cases, see Additonal thoughts.
Type definitions
typedef
is used to create duplicate types and aliases.struct
andenum
create a type automatically if they're followed by a name.The biggest reason (besides self-documenting code) to use typedef is to define type-checking rules:
typedef Itemid #strict : u32;
typedef Entityid #relaxed : u32;
#strict
types are not compatible with anything other than itself. This is the default for named enums.#abitstrict
types are compatible with relaxed types, but not with other abitstrict types. This is the default for primitive typedefs.#relaxed
types are compatible with everything except strict types. Regular integer/float values are relaxed types. Good if you want your code to be self-documenting, but don't want the compiler to be picky about your ints or whatever.You can change the default strictnesses with build rules. Integers/floats have additional rules outside of this categorization, by default you can't set a value if there may be loss of information (float -> int, u32 -> u16, signed -> unsigned or vice versa).
#compatible_types
can be used to make types compatible with each other. Only works for types with the same size and structure.struct Vec2i {
int x;
int y;
};
struct Location {
int horiz;
int vert;
};
struct Dimensions {
int width;
int height;
};
#compatible_types Vec2i, Location, Dimensions, [2]int;
function test (Location pos) { ... }
Vec2i vector;
Dimensions size;
[2]int array;
test(vector + size);
test(array);
Enums
enum COLOR : u8 {
RED;
GREEN;
BLUE;
};
enum COOL_BITS #bitfield {
FOO; // 0x01
BAR; // 0x02
ZYZ; // 0x04
XUL; // 0x08
};
function paint_the_wall (COLOR color) {
...
}
function main () {
paint_the_wall(COLOR.BLUE);
paint_the_wall(.BLUE); // Same as above.
COOL_BITS bites = .FOO | .XOR;
}If
enum
is immediately followed by a name, it creates a new type. If there's a name at the end, it creates an anonymous enum variable.enum { RED; GREEN; BLUE; } color = .GREEN;
if (color == .RED) {
color = .BLUE;
}Name is entirely optional, this basically creates a bunch of constant integer values.
enum { RED; GREEN; BLUE; };
int color = RED;Switch statements have a special modifier that requires every enum value to have a condition.
if (color) #complete_enum ... {
case .RED: print("Rad!");
case .BLUE: print("Bleu!");
}
// Error: a case for .GREEN is missing from switch.There's some compile-time constants for enums:
#highest_enum_member(COLOR)
Expands to the member of the enum with the highest value.#enum_member_names(COLOR,String)
Expands to an array of strings (with the specified string macro) containing all the names, mapped to their equivalent integer values. This will give an error if the array exceeds a maximum configured size.#all_bits(COOL_BITS)
Expands to a value with all the bits of all the values merged together. Meant for bitfields, but works on normal enums too.
Structs
struct Vec2f {
f32 x;
f32 y;
};
struct Thing #pack(1) { // Members are tightly packed.
u8 foo;
f32 bar;
Vec2f pos;
};Like enums, structs can be anonymous.
struct { int x; int y; } thingy = {5, 20};
if (thingy.x > 100) {
thingy.x -= 100;
}Structs don't need to have a name at all, they can be used to describe grouping or get more precise control over padding/offsets of multiple variables.
struct { int foo; int bar; } = {5, 20};
foo = 2000;There are no "unions", to get the equivalent of a C union, you must manipulate the offsets of struct members. Here's some examples:
struct Coolthing {
#offset(0) f32 foo;
#offset(0) i64 bar;
};
struct Coolthing #overlap { // Easy way to give all members an offset of 0.
f32 foo;
i64 bar;
};struct Coolthing {
f32 foo;
struct #overlap {
f32 bar;
i32 zorg;
};
};
struct Coolthing { // Alternate way to define the same as above.
f32 foo;
f32 bar;
#offset(.bar) i32 zorg;
};struct Splitnumber {
u64 full;
#offset(.full) u32 lower;
#offset(.full+4) u32 upper;
};A struct can be padded to a specific size. For example you could make a struct whose size gets increased to the nearest 32 bytes, making it easier to use 256-bit AVX on it:
struct Simdthing #align_size(256/8) {
u8 x;
f32 y;
};Inherited struct members:
struct Vec2f {
f32 x;
f32 y;
};
struct Tree {
int leaf_count;
#inherit Vec2f position;
};
Tree tree;
tree.position.x = 123;
tree.x = 123; // Same as above, .x is inherited from Vec2f position.To be clear since "inheritance" is a thing in object oriented programming, this isn't that. This has no effect on the data or struct or behavior, it only enables alternate syntax for accessing the child members.
You could also think of an inherited member this way:
struct Vec2f {
f32 x;
f32 y;
};
struct Tree {
int leaf_count;
Vec2f position;
#offset(.position) f32 x;
#offset(.position) f32 y;
};Structs cannot have "private" members.
Structs members can have default values. Whenever you create a variable with the struct, the members are secretly assigned to the default values.
struct Thing {
int x = 14;
int y = 500;
};
Thing foo;
printf("{},{}\n", foo.x, foo.y); // "14,500"Default values can be particularly helpful when defining anonymous struct variables:
// Without default values:
struct {
int x;
int y;
} foo = {
.x = 14,
.y = 500,
};
// With default values:
struct {
int x = 14;
int y = 500;
} foo;Braces will be interpreted as the relevant struct/array type based on context, in C you would have to put a type cast before them.
struct Vec2f {
f32 x;
f32 y;
};
function test_vecs (Vec2f a, Vec2f b) { ... }
function test_array ([4]int a) { ... }
Vec2f v;
v = {5,22};
test_vecs(v, {1,2});
test_array({1, 2, 4, 5});Examples of variable grouping with structs:
function get () struct {int x; int y} {
return {1, 2};
}
function give (struct {int x; int y} foo) {
print("Gave {} and {}!\n", foo.x, foo.y);
}
function main () {
struct {int x; int y} foo = get();
printf("{},{}\n", foo.x, foo.y);
give(foo);
give({.x = 123});
struct { int x; int y; } = get();
printf("{},{}\n", x, y);
y = get().y;
}
Nested types
Types can be nested inside structs. A nested type doesn't do anything by itself, it's just a regular type with a special namespace. You can nest types just for the heck of it, but the intention is to use it on the inside.
struct Vec2f {
f32 x;
f32 y;
};
struct Entity {
enum STATE : u8 {
NONE;
ALIVE;
DEAD;
INVINCIBLE;
};
STATE state;
STATE previous_state;
Vec2f pos;
};
function main () {
// Nested types can be used from the outside like this.
Entity.STATE state = .ALIVE;
state = .DEAD;
// Surprise: ALL members can be used as types, not just nested types.
Entity.pos position = {15, 20};
Vec2f position = {15, 20}; // Same as above.
}
Struct templates
struct Array(T) {
u32 count;
&T data;
};
function print_ints (&Array(int) a) {
for (u32 i=0; i<a.count; i++) {
printf("{} = {}\n", i, a.data[i]);
}
}
function main () {
Array(int) a;
print_ints(&a);
}
Operators on structs and arrays
struct Vec2f {
f32 x;
f32 y;
};
Vec2f foo;
Vec2f bar;
if (foo == bar) { // Compare each member of the structs.
...
}
Vec2f zip = {
.x = foo.x+bar.x,
.y = foo.y+bar.y,
};
// Same as above, just adds each member.
Vec2f zip = foo + bar;
Vec2f zip = {
.x = foo.x*100,
.y = foo.y*100,
};
// Same as above, just multiplies each member.
Vec2f lol = foo * 100;[2]f32 foo;
[2]f32 bar;
if (foo == bar) { // Compare each member in the array.
...
}
[2]f32 zip = {
.[0] = foo.[0]+bar.[0],
.[1] = foo.[1]+bar.[1],
};
// Same as above, just adds each member.
[2]f32 zip = foo + bar;
// You can also think of it as a loop.
for (int i=0; i<countof(zip); i++) {
zip.[i] = foo.[i] + bar.[i];
}Operators can be used across types if they are made compatible with
#compatible_types
and have the same member types/offsets.struct Vec2i {
int x;
int y;
};
struct Dimensions {
int width;
int height;
};
#compatible_types Vec2i, Dimensions, [2]int;
[2]int array;
Dimensions size;
Vec2i pos = array + size;
if (array == pos) {
...
}
Multi-break and continue
for (...) {
for (...) {
if (y == 10) continue; // Continues the inner loop.
if (x == 10) continue 2; // Breaks the inner loop and continues the outer loop.
if (x+y == 1000) break 2; // Breaks both loops.
}
}Break works on scopes:
{
{
if (y == 10) break #scope; // Basically goto to the end of the current scope.
if (x == 10) break #scope 2; // Same except the outer scope.
}
}By putting a label before a loop/scope, you can use break on it.
outer: for (...) {
inner: for (...) {
if (y == 10) break inner;
if (x == 10) break outer;
}
}
Strings
TFD has no strings by default. In order to use a string, you must first define what kind of string you want. The language comes with a library
#module "string.c";
which defines a string type, all standard libraries use it too.String constants are created with a "string macro" system. If the macro is not explicitly used, it is automatically picked based on the context.
String foo = "Hello world"; // Struct with length and data pointer.
Cstring bar = "Hello world"; // Pointer to u8, a 0 is appended to the end of data.
// The string literals above are actually using string macros. The macro is inferred based on the type of the variable, but you can use the macro explicitly:
String foo = Stringxxx"Hello world";
Cstring bar = Cstringxxx"Hello world";These two string types look something like this:
struct String {
i64 length;
&u8 data;
};
typedef Cstring #strict : &u8;The macros are created like this (note that these work somewhat differently than regular #macros):
#macro Stringxxx"" \
#inferred_type String \
#place {.length=#length,.data=#data}
#macro Cstringxxx"" \
#inferred_type Cstring \
#place #data \
#append_invisibly u8 0
#inferred_type
= the type that causes this macro to be inferred. If this is not defined, then the macro cannot be inferred and must be used explicitly. 2 string macros cannot use the same inferred_type, but by typedefing a new one like Cstring here, it won't conflict with a &u8 string macro.#place
= the string constant will be replaced with this.#length
= length of the string,#data
= pointer to the string data.#append
#prepend
= Append or prepend something to the string data.#append_invisibly
#prepend_invisibly
= Same as above, except these do not increase the value of #length.#align_size
= Works the same was as with structs, this will pad the data with 0s until it aligns to the desired size.#length
#data
= integer of the length of the string, and pointer to the string data. The data is UTF-8 by default but can be inserted in a few different formats, for example#data_utf16
.#data_ascii
can be used to enforce ascii-only content.Note 1: I added xxx to the macros for clarity, in reality they would use the same name as the type.
Here strings / multiline string literals / custom string delimiters
2 backticks
``
can be used in place of normal quotations"
to create a multi-line string. Helpful if you want to include a lot of text, like help texts or GPU shaders or something with a lot of normal quotation marks.Indentation is ignored up to the same level as the line that the ending backticks are on. 1 empy line immediately before and after the backticks are also ignored.
String sometext = ``
Hello world,
this is "a story" about
coding and stuff!
``;
// Defining the string above in C would be done like this:
char* sometext =
"Hello world,\n"
"this is \"a story\" about\n"
"coding and stuff!";The backticks actually create a special delimiter, you can optionally add a word inside of them:
String sometext = `STR`This is a cool `piece of text` with some ``backticks`` all over it, they won't end the string because of the custom delimiter word.`STR`;
Explicit string macros work the same way as with normal strings:
Cstring sometext = Cstringxxx`STR`This is a "cool" piece of text..!`STR`;
Built-in constant values
#line
Integer for the current line number.#file_name
String for the current file's name.#file_path
String for the current file's relative path (not including name).#file_full_path
Same as #file_path except the full system path.#function_id
Integer that is unique to each function. This is 1 in the first function, 2 in the second function, and so on. 0 is reserved for the main function, but otherwise there's no particular order.#function_count
Integer for the total number of functions in the program.#function_name
String for the name of the function.#function_name_array
Array of Strings containing every function name.#unique(foo)
Every time this is used, it gives a different integer value for the given key. It starts from 0 and increments each time it's used. This is similar to __COUNTER__ in C except you have multiple counters by using different keys.Functions may be removed during compilation (e.g. non-exported functions that aren't used anywhere). Removed functions will have empty names in
#function_name_array
, or may be removed entirely (in which case they also don't contribute to#function_id
or#function_count
).
typeid()
int type_of_int = typeid(int);
int type_of_float = typeid(float);
int type_of_Foo = typeid(Foo);Returns a different integer value for each type in the program.
Type functions
These are just normal functions with special syntax for accessing them. The first argument's type can be
#this
(or a pointer to it), which will be treated in a special way.function f32.add (&#this a, f32 b) {
*a += b;
}
f32 something = 3.14159;
f32.add(&something, 100); // Call the function manually.
something.add(100); // Same as above.
Variable function arguments
There's 2 kinds of variable arguments, array style and minimal style. The array style is recommended, but the program has to do more work to set things up. It basically sends in a struct that has a count, pointer to a list of type IDs, and pointer to a list of pointers to the values.
function print (#args_array args) {
for (int i=0; i<args.count; i++) {
int typeid = args.typeids[i];
if (args.typeids[i]) ... {
case typeid(int): {
&int value = (&int) args.values[i];
}
case typeid(Vec2f): {
&Vec2f value = (&Vec2f) args.values[i];
}
case: {
&void value = args.values[i];
}
}
}
}The second style should be the most minimal possible implementation. I'm not completely certain how they should work, I'd have to know more assembly to be able to properly design this feature. Maybe they would work like in C, maybe something similar to the array method is possible without a meaningful cost, who knows.
Function overloading
Functions must have a unique name, but they can be overloaded separately. There cannot be a function with the same name as the overloaded name.
function foo_int (int x) {
...
}
function foo_float (f32 x) {
...
}
#overload foo foo_int;
#overload foo foo_float;
function main () {
int x = 123;
f32 y = 1.5;
foo(x); // Calls foo_int
foo(y); // Calls foo_float
}
Optional runtime bounds checking
When you access pointers or arrays with a variable whose limits aren't known at compile-time, a bounds check is done at runtime to make sure the index is within bounds. There's 4 relevant modifiers:
#max_index(x)
= modifier for pointers to define the limit. Can use any integer variables and constants to calculate the maximum index, for example#max_index(foo*bar-8)
.#min_index(x)
= same as above, but for a minimum index.#counted_by(x)
= for convenience, a combination of#max_index(x-1)
and#min_index(0)
.#check_bounds
= modifier for struct types and variables, causes it to always be bounds checked even if bounds checking is off.struct Coolarray {
int count;
#counted_by(.count) &int data;
};
Coolarray a;
a.data = mem.alloc(10*sizeof(int));
a.count = 10;
a.data[10] = 123; // Runtime error: index 10 is out of bounds, maximum is 9.
function test_me (int count, #counted_by(count) &int data) {
a.data[10] = 123; // Runtime error: index 10 is out of bounds, maximum is 9.
}
test_me(a.count, a.data);These errors might actually be given at compile-time since in these cases the limit is resolvable at compile-time, but you get the point.
The limits won't be checked if not necessary, for example
#min_index(0)
does nothing if the indexing variable is an unsigned integer, and#max_index(300)
would do nothing if the integer was u8.Array length is known at compile-time so there's no need to set #max_index for them.
Bounds checking can be toggled on/off whenever. I'm undecided whether this should be on or off by default, but I tend to favor safety and reliability as a default so for now I'm saying the default is enabled.
Casting/converting variables
// Conversion (pick only 1).
f32 x = (f32 #convert)y; // Properly converts y into the closest equivalent x.
f32 x = (f32 #placebits)y; // This just slaps the bits in without converting anything, this probably won't actually become a valid f32.
// float -> int rounding (pick only 1). NOTE: these only work when converting floats to ints, it does nothing for float-to-float or int-to-int conversions.
i32 x = (i32 #floor)y; // Will floor floats.
i32 x = (i32 #ceil)y; // Will ceil floats.
i32 x = (i32 #round)y; // Will round floats.
// Range checking (pick only 1).
u8 x = (u8 #check)y; // Makes sure u8 can contain the information from y.
u8 x = (u8 #nocheck)y; // This does not do aforementioned check.
u8 x = (u8 #clamp)y; // y will be clamped to the range of u8, so if y is 300, it will become 255.
// Lazy cast, casts to whatever type is appropriate, in this case into an int.
int x = (*)y;
int x = (int)y;
int x = (int #convert #nocheck #floor)y; // Same as above because these are the default casting settings. They can be changed with compiler options.
Importing and building
To build a program, simply give the compiler a starting code file. All the options relevant to building a program must be defined through special global variables (list of options below).
There's no forward declaration or headers like in C, some considerations must be made as a result. Code files are imported directly with
#import
which gives access to that file from the current file, but unlike C #include which just copy pastes the file contents, this is a self-contained unit.#import "coolarray.c";
Coolarray test;
init_array(&test);You can also import a file into it's own namespace:
#import "coolarray.c" ca;
Coolarray test; // Error: Coolarray is undefined.
init_array(&test); // Error: init_array is undefined.
ca.Coolarray test;
ca.init_array(&test);
#module
works the same way except the path is relative to library directories (mostly the compiler's standard library directory) instead of your project's directory. It also implicitly has #no_everywheres enabled, more about that in Name visibility. #module is meant for libraries that are separate from your project code, #import is meant for your project's files.#module "string.c";
String test = "Hello world";
#paste
can be used if behavior identical to#include
from C is desired, it acts exactly like the file's contents were inserted here, which means you can use this multiple times, and macros and everything will directly collide with it's contents. This cannot have a name.#paste "some_code.c";
#file
can be used to place a file as au8
array. This must have a name#file "something.txt" file;
for (int i=0; i<sizeof(file); i++) {
printf("Char {char}\n", file.[i]);
}All of the above can be used in any normal scope.
function foo () {
#import "coolarray.c";
Coolarray test;
}
function bar () {
Coolarray test; // Error: Coolarray is undefined.
}
Name visibility
The visibility of every global name in a file can be controlled with 3 keywords which affect whatever name is defined after them.
#public
names are visible when someone imports this file. This is the default.#private
names are not visible (unless this file is imported with #force).#everywhere
names are visible to files that are imported from the current file, so it effectively injects the name into other files.#private struct Thing {
int x;
int y;
};
#private function test () {
print("Testing a thing {}\n", foo);
}
#private int foo = 123;#everywhere #module "string.c";
#everywhere #macro "My awesome project"
// cool_array.c will implicitly import string.c, and the program_name macro will be usable there. The same is true for all files that cool_array.c imports.
#import "cool_array.c";
// boring_array.c will neither import string.c, nor have access to program_name.
#import #no_everywheres "boring_array.c";
// #module implicitly has #no_everywheres.
#module "third_party_library.c";
// ...but you can change that.
#module #include_everywheres "hackable_library.c";Note: When a file is #imported for the first time, the current #everywheres will get "baked in" to it. #importing it again with different #everywhere names doesn't have any effect. This can cause confusion since some other file might have already imported a file. #everywhere names are intended for project-wide settings and libraries that you want available everywhere, and although you can, you probably should avoid haphazardly defining them in the middle of everywhere the same way you might in C. Perhaps #everywhere is better compared to the -D compiler option in C, not to #define.
If some library is over-using #private or you want to get more access than was intended, you can forcibly get access to all the names by using
#force
. This will treat all #private names as #public.#import #force "coolarray.c";Functions can be linked from a pre-compiled library.
// This function will be visible if you compile an object file or DLL.
function foo () #export { ... }
// The other side of #export: this function comes from a compiled library at the linking stage.
function bar () #external;If you're creating a pre-compiled library, a header file will be needed to use it. For this purpose you can import a file with
#validate_header
. This will compare all #external functions in the file with #export functions from the current file, and gives an error if there's a mismatch.#import #validate_header "coolarray.c";
Macros
A macro is a text replacement, it gets replaced with it's contents where-ever it is used. For the most part, macros work exactly like in C. TFD macros are local to their own scope.
#macro something "Global!"
function foo () {
#macro something "Local!"
print(something); // "Local!"
}
function bar () {
print(something); // "Global!"
}The arguments can optionally have types.
#macro something(foo, f32 x, f32 y) (x*y + foo)
Unlike C, the arguments are captured more similarly to function arguments.
// C
#define foo(x) ...
foo({1, 2, 3}) // Error: foo takes 1 argument, but 3 provided.
// TFD
#macro foo(x) ...
foo({1, 2, 3}) // No problem.Multi-line macros can be made by "escaping" line breaks with
\
, or by wrapping the contents inside#{
and#}
.#macro complicated_macro_1 \
if (x) { \
foo(); \
bar(); \
}
#macro complicated_macro_2 #{
if (x) {
foo();
bar();
}
#}Arguments can also be wrapped with
#{
#}
if you want to input arbitrary text/code into the macro.#macro funny_loop(condition, increment, inner_code) #{
for (int i=0; condition; i+=increment) {
inner_code
}
#}
funny_loop(i<100, 2, #{
printf("This is a macro loop!\n");
printf("The number is {}\n", i);
#})
##
can be used as a void space to isolate arguments without separating them with a space, it's mostly used to connect arguments to something else and dynamically creating names.#macro foo(x, y) 999##x##y
printf("I ate {} cakes.\n", foo(100, 50)); // "I ate 99910050 cakes."
#x
can be used to place an argument as a string.#macro foo(x) printf("{} = {}\n", #x, x)
foo(2 * 10); // "2 * 10 = 20"
Parsing of macros, #ifs, #imports
The biggest technical difference between C and TFD is that since TFD has no header files, it must be compiled non-linearly. The compiler may run into a name that it doesn't know about yet (it's later in the same file, or in another file), so it has to defer that part until later.
While functions and types are order-independent, #macros and #ifs are order-dependent because they change how the code after them should be interpreted. When the compiler finds an #if, the condition must be solvable at that time, it cannot be deferred to later.
#import #last "macro_that_makes_thing_true.c";
#if (thing) // Error: thing is undefined.
#endif
#macro thing trueIf 2 files require each other to be imported, neither can be parsed first. This is a problem because a macro from one may change how the other must be interpreted, and it can cause a paradox. If either file contains a non-#private macro, you have to either use
#first
or#last
to control who gets imported first, or use#no_macros
to import a file without macros (can be made the default behavior with Build rules).TODO: Can the modifiers be omitted as long as a #public #macro from one file never affects the other, even if they do have such macros? Also, could simple macros be used as long as they don't change the structure of the code? For example a macro that's just a constant number. If you use a global variable, the compiler would have to defer resolving the value anyway.
Although not fully linear, #everywhere names must be solved before any #imports.
Foo thing = 100;
#import "something.c";
typedef Foo : u16;#everywhere Foo thing = 100;
#import "something.c"; // Error: #everywhere variables must be fully defined before any subsequent #imports. Type 'Foo' is undefined.
typedef Foo : u16;#everywhere Foo thing = 100;
typedef Foo : u16;
#import "something.c";
#on_leave, #on_enter_function, #on_leave_function
#on_leave
is a special variation of #macro that automatically places it's contents everywhere that the scope ends.function main () {
&void data = alloc(1000);
#on_leave free(data);
if (x) return; // free(data) is inserted here.
for (int i=0; i<100; i++) {
Thing* thing = get_thing();
#on_leave release_thing(thing);
if (x) continue; // release_thing(thing) is inserted here.
if (y) break; // release_thing(thing) is inserted here.
if (z) return; // release_thing(thing) and free(data) are inserted here.
// release_thing(thing) is inserted here.
}
// free(data) is inserted here.
}
#on_enter_function
and#on_leave_function
are also special variations of #macro, they automatically place their contents at the beginning and end of functions.#on_enter_function printf("Hello! {}\n", #function_name);
#on_leave_function printf("Bye! {}\n", #function_name);
function testfunc (int x, int y) {
x += y * 2;
if (x > 1000) {
return;
}
printf("x={} y={}\n", x, y);
}The code above would equate to the following:
function testfunc (int x, int y) {
printf("Hello! {}\n", #function_name);
x += y * 2;
if (x > 1000) {
printf("Bye! {}\n", #function_name);
return;
}
printf("x={} y={}\n", x, y);
printf("Bye! {}\n", #function_name);
}The above example isn't 100% comparable in practice because if you do
return foo()
, the#on_leave_function
contents must be placed after the function call, so you can't just put it before the whole return statement.
Build rules
These are effectively #everywhere #compile_time variables that the compiler uses directly to control it's behavior. They can be changed at any time, and like macros, they're bound to scopes, so changing them at the start of a function will revert them at the end of the function, and putting one to the end of a file doesn't do anything. Some of them like #exe_name are unique in that they only have effect from the starting file, but the value can be read from other files.
#exe_name = "coolprogram.exe";
#exe_path = "release/bin";
#exe_icon = "res/icon.png";
#add_linked_library("Gdi32"); // Equivalent of -lGdi32 in GCC.
#add_linked_library_path("/foobar/lib");
#add_module_path("/foobar/include");
#optimization_level = .MAX_SPEED;
#remove_unused_functions = true; // Will delete any functions from the program that aren't called from anywhere else and that don't have #export.
#remove_unreached_functions = false; // Similar to above, but a search is done starting from the main function to check if functions are reached from it.If you want custom values, use global variables or macros with #everywhere:
#everywhere #compile_time #read_only int program_version = 123;
#everywhere #macro program_name "Cool Program"These are language rules that can be modified according to your preference. It's not recommended to change these, but you can. One of the core principles of TFD is that it's not the language designer's job to tell the programmer what's the right way to program.
#default_type_visibility = #public; #default_macro_visibility = #public; #default_function_visibility = #public; #default_global_variable_visibility = #private; #default_import_visibility = #private; #default_macro_importing = #include_macros; #default_typedef_strictness = #abitstrict; #default_enum_strictness = #strict; #allow_int_signedness_loss = false; // i32 -> u32 #allow_int_size_loss = false; // u64 -> u32, u32 -> i32, (excludes u16 -> i32) #allow_float_size_loss = false; // f64 -> f32 #allow_float_to_int_loss = false; // f32 -> i32 #allow_int_to_float_loss = false; // i32 -> f32 #allow_int_to_float_lossless = true; // i8 -> f32 #bool_technical_type = u8; // Bool is technically similar to this type. #treat_bool_as_int = false; // bools accept arbitrary integer values, not just true and false. #auto_cast_from_void_pointers = true; // &void -> &int #auto_cast_to_void_pointers = true; // &int -> &void #auto_cast_from_array_pointer = true; // &[]int -> &int #auto_cast_to_array_pointer = true; // &int -> &[]int #runtime_bounds_checking = true; #default_casting_behavior = #convert #nocheck #floor; #allow_lazy_cast = true; // x = (*)y #default_struct_packing = .C_LIKE; #untyped_enum_size = .SMALLEST; // What size should enums be if they don't have a type. #undefined_string_macro = String; // In some cases the string type can't be inferred from context. #max_enum_member_names_count = 16384; // If you use #enum_member_names and the array would be longer than this, you get an error. #max_enum_member_names_size = 524_288; // Same as above except byte size. #zero_initialize_arrays = true; #zero_initialize_structs = true; #zero_initialize_primitive_types = true; #allow_variable_length_arrays = false; // [x]int foo; #allow_assignment_in_conditions = false; // if (x = foo()) ... #allow_increment_decrement_inside_statement = false; // x = foo[y++]; #allow_variable_assignment_inside_statement = false; // x = foo[y=foo()]; #allow_redeclaration_of_name_of_same_type_from_parent_scope = true; #allow_redeclaration_of_name_of_different_type_from_parent_scope = true; #allow_redeclaration_of_name_of_same_type_in_scope = false; #allow_redeclaration_of_name_of_different_type_in_scope = false; #allow_unused_variables = true; #allow_unreachable_code_after_return = true; #allow_unicode_in_comments = true; // /*🔥*/ #allow_unicode_in_strings = true; // "🔥" #allow_unicode_in_character_literals = true; // '🔥' #allow_unicode_in_code = true; // int 🔥 = 123; #enforce_indentation_character = .NONE; // .TABS or .SPACES #enforce_indentation_length = -1; // UPPERCASE = 0x1 HELLOWORLD // LOWERCASE = 0x2 helloworld // BEGINUPPER = 0x4 Helloworld // BEGINLOWER = 0x8 helloworld // example: .UPPERCASE|.BEGINLOWER = hELLOWORLD #enforce_struct_capitalization = .NONE; #enforce_enum_capitalization = .NONE; #enforce_macro_capitalization = .NONE; #enforce_function_capitalization = .NONE; #enforce_variable_capitalization = .NONE; #enforce_typedef_capitalization = .NONE;
Miscellaneous modifiers and mechanics
If some C attribute equivalent isn't here, then I probably don't normally use it. That doesn't necessarily mean it shouldn't be in the language, I just never think about it. I'd need someone more knowledgeable about assembly and stuff to tell me what kind of modifiers and features are useful.
#inline
,#noinline
- #inline functions behave identically to macros in that it replaces the function call with the function's contents. Unlike C, inline functions will always inline, if it cannot (for example if it recursively calls itself), it will give a compiler error. In C, theinline
keyword is just a suggestion and isn't guaranteed to do anything. The compiler may inline functions that aren't marked as #inline if it thinks it's a good idea, #noinline prevents that. You can also use this from the calling site.function foo () #inline { ... }
function bar () { ... }
function main () {
foo();
#inline bar();
}I don't know much about linkers, so I don't know how inlining of external library functions works. I don't really care to be honest, I feel that the benefit of inlining linked functions is lesser than the benefit of inlining function code because the latter has much greater potential for optimization.
#must_receive
- If the caller of a #must_receive function doesn't receive the return value, you get a compiler error. Useful when the function returns some allocated memory/object that's expected to be freed from the outside.#optimize(.MAX_SPEED)
- Used to set optimization level for an individual function. This will override the compiler setting.#align(x)
- Aligns a type or variable in memory, useful for AVX. Same syntax as #offset() except this can be a modifier to a type too.#aligned(x)
- Tells the compiler that a pointer's value is aligned, which may help it do optimizations. May cause a crash if the pointer value isn't aligned.#persist
- Variable in a function whose value persists across function calls, i.e. a global variable that's only accessible from the function. In C you would usestatic
.#thread_local
- Variables marked with this are unique per thread.#read_only
- For variables, the data cannot be written to. Basically the same asconst
in C.#warning"x"
,#error"x"
- Can be attached to functions, macros, types, global variables, enum values, or globally to a file. Causes the compiler to give a warning or an error if they are used, useful for things that are deprecated or unfinished or broken.typeof(x)
- Expands to the type of x.In C,
++
and--
work differently from+=1
and-=1
. In TFD they're different syntax for the same thing.&int foo;
*foo += 1; // Increments the int.
*foo ++; // Increments the int. In C this would shift the pointer and dereference for no reason.A semicolon after most braces (functions, ifs, loops...) are optional and will not do anything.
Additional thoughts
-
Data segments. I'm not too familiar with them, but there should be a mechanism for deciding where some data is stored, and what it's properties are (real properties like whether it's read-only and what it's size is, and fake properties like how data in it should be aligned, or whether it's hot or cold which could be used by the compiler for optimizations). My knowledge starts and ends at the introduction of the wikipedia page which seems kind of incomplete to me. String macros should probably also be able to control where their data is stored.
Ideally you should be able to create arbitrary data sections that you can read/write from as you please. You could create a big u8 array for that purpose, but that sounds like a hack compared to having a blank data section that you can get a pointer to.
Perhaps you could give the compiler some kind of layout of your desired data sections and their organization.
There's probably a few things I would know how to design better, or things I would change my mind on, if I was more knowledgeable about assembly and how programs are structured.
-
Variable groups. I'm about 85% sure that this should be in the language, but I haven't had enough time to feel it yet, and there might be a deeper reason why it's not as simple as it looks.
function test (int x) int, int {
if (x > 1000) {
return 0, 0;
}
else {
return x*10, x*20;
}
}
int a, int b = test(50);
int c = 15;
int d = 200;
c, d = d, c;
d, c = test(a);This is basically a better version of unnamed anonymous structs. If unnamed structs work, then I imagine there shouldn't be much reason why variable groups couldn't.
function test (int x) struct { int; int; } {
if (x > 1000) {
return {0, 0};
}
else {
return {x*10, x*20};
}
}
struct { int a; int b; } = test(50);
printf("{}, {}\n", a, b); -
I don't have a full picture of how this works at assembly level, but it's probably necessary to control whether it's valid to access variables from the parent scope from a callback.
function main () {
int count = 0;
function callback () {
count ++;
}
do_stuff(callback);
}This is a bit ambiguous and weird and possibly error prone (especially if it dereferences pointers) because you have no idea where this callback is going, and it may be stored and called later, and you just can't use fixed memory offsets from a callback like this.
If the nested function is only called directly from the parent function, then it's a lot more straightforward, it's the callbacks that are problematic.
Perhaps function pointers could have a
#synchronous
modifier which communicates that the function is not stored long-term nor called asynchronously. Without that property, you can't use variables from parent scope.function do_stuff (&function callback ()) {
callback();
}
function do_stuff_sync (#synchronous &function callback ()) {
callback();
}
function main () {
int count = 0;
function callback () {
count ++;
}
do_stuff(callback); // Error: callback may be asynchronous, variables from parent scope cannot be used.
do_stuff_sync(callback);
}I'm trying to design something without full understanding of how it works though, so maybe this is all nonsense.
-
C's text replacement -type macros are known to be problematic for various reasons. TFD improves them slightly, but some of the fundamental problems like the difficulty of debugging them aren't fixed. The good thing about them is that they're very simple, which keeps the compiler simple, and doesn't require the user to know much about them. They're also extremely powerful because they can generate almost any syntax.
Ideally I would prefer a more structured macro/metaprogramming system, but I don't really know what that would look like. C-style macros combine extremes of capability and simplicity, it's very very hard to compete with them. Inline functions and compile-time functions can solve some of the things macros are needed for, but it's not quite enough.
An idea I'm most interested in is compile-time scripting. The easiest example is a function that returns it's value at compile-time:
function wowza (int x, int y) #compile_time {
for (int i=0; i<10; i++) {
x *= y;
}
return x;
}Types and maybe even variables can use #compile_time.
-
There's a trick I sometimes do with C arrays, it still works but I'm bothered by the fact that TFD doesn't improve it: sending variable length array literals to functions. In fact this requires 1 extra character than C so in a way this is a downgrade.
function foo (&int a) {
while (*a) {
printf("Number {}\n", *a);
a ++;
}
}
foo(&([]int){1, 2, 3, 0}); // Works but ugly and annoying.
foo(&{1, 2, 3, 0}); // This would be much better, but the function doesn't recieve an array so this can't be interpreted as one (if the int was some struct then this would look like a pointer to the struct).
function foo (&[]int a) { ... } // Maybe the answer?The last example could probably work, but it's a bit weird because normally you'd expect the array type to modify the size, but this is like the array disappears and it becomes just an int pointer. It makes more sense in C because arrays are treated as pointers so the argument can be either a pointer or an array, both work. I'm not sure yet if this is a good idea, and you'd have to use the long syntax anyway if the function isn't written that way.
-
I want 16-bit floats, but I'm unsure what their status is as far as CPU support goes. It would be a bit weird to have that type if it's not well supported and consistent. But if it's left out and the user must implement them, then the language should also have operator overloading. I haven't thought about operator overloading much, maybe that should be in the language too.
-
I may want to get rid of the ternary operator syntax and replace it with just an if/else. This would also let you use 'else if's more naturally.
int foo = if (bar) x else y;
int foo = if (bar) x else if (zip) y else z; -
What should the size of
bool
be? I have a desire to think of it as a 1-bit integer, but you should have the ability to point a pointer into a boolean value. 8-bits is enough and gives quite a bit of flexibility about where the bool is. But maybe for some kind of alignment reasons, or setting an integer into a bool, it would be better if it was 32-bit or 64-bit? I think I'd have to do way more assembly or build an optimizer or language internals to be able to say. For now I'll just assume it's best to treat it as 8-bits. -
There's some merit to a type called
char
because it communicates that something is meant to be text better thanu8
. However, from my experience it doesn't really work that way in practice. I often want to just read/write text from/to random binary, and UTF-8 is a variable-sized format, so I usually end up thinking of text as a blob of bytes, not as an array of characters.There's also a question of type safety (&u8 shouldn't auto-cast into &char), but I don't necessarily see that as being an issue if your strings use a type like
Cstring
anyway. -
There's a few reasons for prefixing signed integer types with
i
instead ofs
. Firstly, you rarely think of the words "signed integer", I think it's more common to think of just "integer" or "int" which starts with "i". Meanwhile when you think of an unsigned integer, you specify with the word "unsigned" which starts with "u". Secondly, the letter "s" also reminds of "string" and "struct", while "i" doesn't really remind of anything else. When I see "s32", it immediately makes me think of some kind of string. I might want to make a string macro that uses a 16-bit integer for it's length, that's what "s16" sounds like to me. -
I originally wanted structs and enums to not need a semicolon, but there's a problem with unnamed enums:
// Anonymous enum, semicolon needed because this is a variable.
enum {
FOO;
BAR;
} foo;
// Named enum, semicolon not needed, the compiler knows from the name that this is a type.
enum SOMETHING {
FOO;
BAR;
}
// Nameless enum.
enum {
FOO;
BAR;
}
x = 2 // Is this supposed to be an anonymous enum like the first example, or an assignment to another variable? -
The syntax for creating string macros is basically the first idea I came up with. Sometimes the simplest answer is the best, but there's probably things to improve about it.
-
It's always annoying when you want to use a name but it's reserved by someone else. This is why most keywords are prefixed with a
#
. This is also the reason for renamingswitch()
intoif()...
, replacingdefault:
with an emptycase:
, and part of the reason for removing theunion
keyword (the main reason is because I want to think of structs and unions as the same thing). I am also considering if it would be worth it to replacecase X:
with some kind of symbol, such as... X:
or-> X:
.I kind of want to use # for all keywords, but it's a little annoying to type that character so I'm leaving it for more esoteric keywords and macros and modifiers.
I would probably rename
struct
intoplex
(?) if I made this language since there's a much higher chance that you might want to use "struct" as a name, but I left it as struct for now.I don't like adding the
function
keyword, it feels too generic, but I can't think of anything that feels better. -
There might be value in some kind of tagging system that could be used to toggle build rules and control behavior in certain ways, or do some kind of introspection/metaprogramming. For example something like this:
#zero_initialize_arrays @"super_hot_function" = false;
#zero_initialize_structs @"super_hot_function" = false;
#zero_initialize_primitive_types @"super_hot_function" = false;
#optimization_level @"super_hot_function" = .MAX_SPEED;
#set_property @"super_hot_function" = #inline;
function crunch_some_numbers () @"super_hot_function" {
...
}#on_enter_function @"profiler" { profiler_start(rdtsc(), #function_id); }
#on_leave_function @"profiler" { profiler_end(rdtsc(), #function_id); }
function something () @"profiler" {
...
}It could also be used for types, maybe even local scopes. Or just as a generic way of toggling settings and modifiers for any other thing.
I haven't thought about the details of this idea very far though.
-
TFD stands for "Tool For Doing".
Questionable ideas
Adding enum members dynamically
Something I've always wished I had was global error numbers. The value could flow all the way from some inner function to top-level code and retain it's meaning. One interesting way to enable that would be to allow enums to be expanded dynamically, you could even use #enum_member_names to print the error name.
enum ERROR {
NONE,
}
function fooler () ERROR {
if (x) return #unique_enum(ERROR, BAD_THING_HAPPENED_IN_FOOLER);
if (y) return #shared_enum(ERROR, MEMORY_ALLOCATION_FAILED);
return .NONE;
}
function bar () ERROR {
if (x) return #shared_enum(ERROR, MEMORY_ALLOCATION_FAILED);
return .NONE;
}
function main () {
ERROR e = foo();
if (e) {
[]String error_names = #enum_member_names;
printf("Error occurred: {}\n", error_names[e]);
}
}This could potentially have uses in other places, for example plugins/customizability/extensibility, like adding a new UI module to an UI library.
I'm not sure if this is a good idea, it certainly sounds like it would complicate the compiler since no enum could be completed until all files in the program have been parsed. Maybe the language should just have an error type that has unique mechanics.
Syntax highlighting test.
Comments: /* There's something to say */
Strings: "Hello world!"
Primitive literals: 12345, 'X', 0xffff00ff
Named constants: true, NULL, #function_id ...
Control flow: if, while, case, break, return, goto ...
Macros and compile-time: #macro, #if, #else, #import ...
Types/names/constructs: function, struct, enum, typedef
Modifiers: #private, #inline, #pack(1) ...