TFD
TFD is my vision for a programming language. It is basically an overhauled version of C, meant to be cleaner and more comfortable to use, mostly by being less pedantic and allowing you to do the same kind of things with less effort, without adding complex features or things that aren't "C-like". In this page I mostly describe it from the perspective of how it differs from C. This page also functions as documentation for myself.
Click here for the story.
C is the programming language that is closest to my ideal, but there's a lot things about it, big and small, that are very clumsy or annoying. Just to give an example, when you define the value of a struct for a variable or function call, you have to type (Somestruct){x,y} even though there isn't any good reason why {x,y} wouldn't suffice.
Some of the problems, such as the problem above, are fixed by C++. However, almost all of the problems I have with C also exist in C++, it introduces some new problems (for example you can't type {.y=y,.x=x}, it has to be in the correct order {.x=x,.y=y}), and it just feels bad to use a language that is so overcomplicated. It makes me feel like I'm on unstable ground and that the survival and propagation of the language (and thus my codebase and programs) is complex and uncertain.
There's other languages that proclaim to fix C or be a better version of it, but all of them miss the point of what I actually want and care about.
One of the core philosophies of TFD is that it's not the programming language's place to tell the programmer what's the right way to program. That's why you can configure most behaviors and rules with compiler settings, and it doesn't withold features or impose restrictions for ideological reasons. The main reason why a feature wouldn't exist is because it complicates the language too much (templates) or because it violates core assumptions about what kind of language TFD is and how it's going to work (garbage collection).
I want my language to be very explicit, do nothing behind my back, and be as consistent and simple (syntactically and technically) as possible.
I've gone through multiple phases of wanting to make different kinds of languages, or a simple pre-processor for C, but I always end up feeling like the costs outweigh the advantages. I am trying to implement some features with SBC in a way that doesn't require a proper language.
I may make TFD some day when I find the right motivation and time. The biggest problem is that I don't really want to learn to use LLVM, and the only alternative is to transpile into C code but that comes with it's own complications. I'm also considering learning enough x86 that I could just output an executable directly, but that couldn't be optimized at all.
Quick links:
- Base types
- Notable syntax differences from C
- Value literals
- Pointer offsets
- Arrays
- Zero initialization
- Multiple switch case values
- Nested functions
- Type definitions
- Enums
- Structs
- Nested types
- Struct templates
- Operators on structs and arrays
- Multi-break and continue
- Strings
- Here strings / multiline string literals / custom string delimiters
- Built-in constant values
- Default and named function arguments
- Type functions
- Function overloading
- Bounds checking
- Casting/converting variables
- Macros
- #on_leave, #on_enter_proc, #on_leave_proc
- Importing and building
- Macros and order of compilation
- Name visibility
- Build rules
- Miscellaneous modifiers and mechanics
- Additional thoughts and ideas
Here's some random sample code:
#import "basedefs.tfd";
#import "print.tfd";
#import "memory.tfd" mem;
struct Vec2f : {
f32 x;
f32 y;
};
struct Entity : {
enum STATE : u8 {
NONE;
ALIVE;
DEAD;
INVINCIBLE;
};
STATE state;
STATE state_previous;
Vec2f pos #inherit;
};
proc create_entity : Entity (Entity.STATE state, Vec2f pos) {
return {
.pos = pos,
.state = state,
.state_previous = .NONE,
};
}
proc main : int (int arg_count, &String args) {
#module "fireworks.tfd";
// &Entity enemies = mem.alloc(32*32*#sizeof(Entity));
&[32][32]Entity enemies = mem.alloc(#sizeof([32][32]Entity));
int fireworks_done = 0;
proc do_fireworks : ERRNUM (&Entity entity) {
ERRNUM e = spawn_fireworks(entity.x, entity.y);
if (!e) fireworks_done ++;
return e;
}
for (int y=0; y<32; y++) {
for (int x=0; x<32; x++) {
Entity enemy = create_entity(.INVINCIBLE, {(*)x, (*)y});
ERRNUM e = do_fireworks(&enemy);
if (e) {
break 2; // Break both loops.
}
// enemies[y*32+x] = enemy;
enemies.[y].[x] = enemy;
}
}
printf("Did {} fireworks!\n", fireworks_done);
return 0;
}
NOTE: ALL keywords are prefixed with # by default, basedefst.tfd defines un-prefixed names for the most common keywords. All the code below in this page will use the raw syntax, but in reality you're expected to use basedefs.tfd (or base.tfd which includes a bunch of other basics) or define names according to your preference.
Base types
u8, u16, u32, u64 // Unsigned integers. The number is the size in bits.
i8, i16, i32, i64 // Signed integers.
f16, f32, f64 // Floating point types.
void // No type.
bool // #true (1) or #false (0). Unsigned integer with the smallest directly addressable size (i.e. you must be able to point a pointer to it), effectively always u8. Booleans are not compatible with integers during type checking.
int // Integer that matches the biggest or most natural register size (usually general purpose register), almost always i64 (i32 on 32-bit machines).
uint // Unsigned version of the above.
Notable syntax differences from C
Examples of syntax in C, followed by the equivalent in TFD.
- There are no strings by default, see strings for how strings work in TFD.
- Arrays are not equivalent so they cannot be directly compared, more about them below.
- Pointers are dereferenced with
*like in C, however it goes immediately before the member you're dereferencing, and has maximum precedence over all other symbols..will dereference once if the variable on the left is a pointer.- Think of
&as "address of".// Pointer to int.
int *thing = NULL;
&int thing = #null;
// Pointer to struct
thing->x = 123;
thing.x = 123;
// Pointer to pointer to struct
(*thing)->x = 123;
*thing.x = 123;
// Struct member pointer
*thing.x = 123;
thing.*x = 123;
// Struct member pointer to struct member pointer
*(*thing.x).y = 123; *thing.x->y = 123;
thing.x.*y = 123;// Typedef. typedef int Thing; #typedef Thing : int; // Struct. typedef struct { int x; } Thing; #struct Thing : { int x; }; // Union. typedef union { int x; } Thing; #struct Thing : #overlap { int x; }; // Enum. typedef enum : u16 { THING_A, THING_B, THING_C } THING; #enum THING : u16 { A; B; C; }; // Using enums. THING thing = THING_B; THING thing = .B; // Or THING.B// Anonymous struct. struct { int x; } thing; #struct { int x; } thing; // Anonymous enum. enum : u16 { THING_A, THING_B, THING_C } thing = THING_B; #enum u16 { A; B; C; } thing = .B;// Function.
static int foo () { xxx; }
#proc foo : int () { xxx; }
// Function pointer.
int (*foo) () = NULL;
&#proc int () foo = #null;// Import from pre-defined directories.
#include <foo.h>
#import "foo.tfd";
// Import from local path.
#include "foo.h"
#module "foo.tfd";// Switch.
switch (foo) {
case 1:
case 2: break;
default: break;
}
#if (foo) ... {
#case 1; #next_case;
#case 2;
#case;
}
Value literals
int value = 1222333444555666777; // No need to postfix this kind of number with "LL".
int value = 0xFFAABB; // Hex value.
int value = 0b0000111100001111; // Bit value.
// All number types will completely ignore underscores (except inside the 0x or 0b prefixes). Can be used at your discretion to make the number more readable.
int value = 1_222_333_444_555_666_777;
int value = 0x_FF_AA_BB;
int value = 0b_00001111_00001111;Character literals.
u32 value = 'X'; // 0x58
u32 value = 'Help'; // 0x706C6548
u32 value = '❤'; // 0xA49DE2The size of the character literal must be equal or smaller than the type.
'Hello'would give an error here because u32 is only 4 bytes. If the type is larger than the value, 0s are added to the end. The data is in text byte order, basically the equivalent of this in C:u32 value = *(u32*)"X\0\0\0";
u32 value = *(u32*)"Help";
u32 value = *(u32*)"❤\0";
Pointer offsets
&u32 foo;
foo ++; // Moves the pointer by 4 (#sizeof(u32)) bytes.
foo &++; // Moves the pointer by 1 byte.
foo[2] = 500; // Modify a value from offset #sizeof(u32)*2.
&u32 bar = foo + 2; // Gets an 8-byte (#sizeof(u32)*2) offset to foo.
&u32 gar = foo &+ 2; // Gets a 2-byte offset to foo.
Arrays
Arrays are treated the same way as structs, they are passed and copied by value. In C arrays are treated as a weird fake pointer.
foo.[x]will access a member of the array,foo[x]is an offset pointer dereference (same as in C). If accessing a member with a fixed number offset like 2, you can omit the brackets and just dofoo.2[4]int a;
a.1 = 123;
printf("Size in bytes is {}, it has {} ints\n", #sizeof(a), #countof(a));
#proc test2 : [4]int ([4]int x) {
#return x;
}
a = test2(a);
// The above code is identical to this:
#struct Arr4 : {
int item0;
int item1;
int item2;
int item3;
};
Arr4 s;
s.item1 = 123;
printf("Size in bytes is {}, it has {} ints\n", #sizeof(s), #sizeof(s)/#sizeof(s.item0));
#proc test1 : Arr4 (Arr4 x) {
#return x;
}
s = test1(s);It's important to internalize the difference to C arrays because offsetting a pointer to an array will offset the pointer by the whole array size, not the item size.
[4]int array;
&[4]int a = &array;
a[1]; // a &+ #sizeof(int)*4, this will overflow the array
a ++; // a &+= #sizeof(int)*4
a.[1]; // a &+ #sizeof(int), access second item in the array. Like with structs, period will implicitly dereference once if needed.Array's pointer type is (by default) compatible with the item's pointer type.
#proc print_floats : (int count, &f32 items) {
#for (int i=0; i<count; i++) {
printf("{} = {}\n", i, items[i]);
}
}
[100]f32 array;
print_floats(#countof(array), &array); // Even though the function wants float pointer, a float array pointer will be accepted too.
Zero initialization
int foo; // Initialized to 0.
int foo #no_init; // Uninitialized.
Multiple switch case values
#if (foo) ... {
#case 1, 2, 3; print("1, 2, or 3!\n");
#case 4 ... 9; print("4, 9, or somewhere inbetween!\n");
#case; print("unknown...\n");
}
Nested functions
#proc check_adjacent : (int x, int y) {
int count = 0;
#proc check : (int x, int y) {
...
count ++;
}
check(x, y-1);
check(x, y+1);
check(x-1, y);
check(x+1, y);
}#proc load_assets : () {
#proc callback : (String file_path, bool is_folder) {
...
}
read_folder_contents("/assets/images/", callback);
// You can shove it directly into function arguments, this is identical to the above except the function is anonymous (doesn't have a name).
read_folder_contents("/assets/things/", #proc (String file_path, bool is_folder) {
...
});
}There might be restrictions in some cases, see Additonal thoughts.
Type definitions
#typedefis used to create duplicate types and aliases.#structand#enumcreate a type automatically if they're followed by a name.The biggest reason (besides self-documenting code) to use typedef is to define type-checking rules:
#typedef Itemid : #strict u32;
#typedef Entityid : #relaxed u32;
#stricttypes are not compatible with anything other than itself. This is the default for named enums.#abitstricttypes are compatible with relaxed types, but not with other abitstrict or strict types.#relaxedtypes are compatible with everything except strict types. Regular integer/float values are relaxed types. Good if you want your code to be self-documenting, but don't want the compiler to be picky about your ints or whatever.You can change the default strictnesses with build rules. Integers/floats have additional rules outside of this categorization, by default you can't set a value if there may be loss of information (f32 -> int, u32 -> u16, signed -> unsigned or vice versa).
#compatible_typescan be used to override strictness rules and make types compatible with each other. Only works for types with the same size and structure.#struct Vec2i : {
int x;
int y;
};
#struct Location : {
int horiz;
int vert;
};
#struct Dimensions : {
int width;
int height;
};
#compatible_types Vec2i, Location, Dimensions, [2]int;
#proc test : (Location pos) { ... }
Vec2i vector;
Dimensions size;
[2]int array;
test(vector + size);
test(array);
Enums
#enum COLOR : u8 {
RED;
GREEN;
BLUE;
};
#enum COOL_BITS : #bitfield {
FOO; // 0x01
BAR; // 0x02
ZYZ; // 0x04
XUL; // 0x08
};
#proc paint_the_wall : (COLOR color) {
...
}
paint_the_wall(COLOR.BLUE);
paint_the_wall(.BLUE); // Same as above.
COOL_BITS bites = .FOO | .XUL;If
#enumis immediately followed by a name, it creates a new type. If there's a name at the end, it creates an enum variable.#enum { RED; GREEN; BLUE; } color = .GREEN;
#if (color == .RED) {
color = .BLUE;
}Name is entirely optional, this basically creates a bunch of constant integer values.
#enum { RED; GREEN; BLUE; };
int color = RED;Switch statements have a special modifier that requires every enum value to have a condition.
#if (color) #complete_enum ... {
#case .RED; print("Rad!");
#case .BLUE; print("Bleu!");
}
// Error: a case for .GREEN is missing from switch.There's some compile-time constants for enums:
#highest_enum_member(COLOR)Expands to the member of the enum with the highest value.#enum_member_names(COLOR,String)Expands to an array of strings (with the specified string macro) containing all the names, mapped to their equivalent integer values. This will give an error if the array exceeds a maximum configured size.#all_bits(COOL_BITS)Expands to a value with all the bits of all the values merged together. Meant for bitfields, but works on normal enums too.
Structs
#struct Vec2f : {
f32 x;
f32 y;
};
#struct Thing : #pack(1) { // Members are tightly packed.
u8 foo;
f32 bar;
Vec2f pos;
};Like with enums, you can create struct variables.
#struct { int x; int y; } thingy = {5, 20};
#if (thingy.x > 100) {
thingy.x -= 100;
}Structs don't need to have a name at all, they can be used to describe grouping or get more precise control over padding/offsets of multiple variables.
#struct { int foo; int bar; } = {5, 20};
foo = 2000;There are no "unions", to get the equivalent of a C union, you must manipulate the offsets of struct members. Here's some examples:
#struct Coolthing : {
f32 foo #offset(0);
i64 bar #offset(0);
};
#struct Coolthing : #overlap { // Easy way to give all members an offset of 0.
f32 foo;
i64 bar;
};#struct Coolthing : {
f32 foo;
#struct #overlap {
f32 bar;
i32 zorg;
};
};
#struct Coolthing : { // Alternate way to define the same as above.
f32 foo;
f32 bar;
i32 zorg #offset(.bar);
};#struct Splitnumber : {
u64 full;
u32 lower #offset(.full);
u32 upper #offset(.full+4);
};A struct can be padded to a specific size. For example you could make a struct whose size gets increased to the nearest 32 bytes, making it easier to use 256-bit AVX on it:
#struct Simdthing : #align_size(256/8) {
u8 x;
f32 y;
};Inherited struct members:
#struct Vec2f : {
f32 x;
f32 y;
};
#struct Tree : {
int leaf_count;
Vec2f position #inherit;
};
Tree tree;
tree.position.x = 123;
tree.x = 123; // Same as above, .x is inherited from Vec2f position.To be clear since "inheritance" is a thing in object oriented programming, this isn't that. This has no effect on the data or struct or behavior, it only enables alternate syntax for accessing the child members.
You could also think of an inherited member this way:
#struct Vec2f : {
f32 x;
f32 y;
};
#struct Tree : {
int leaf_count;
Vec2f position;
f32 x #offset(.position.x);
f32 y #offset(.position.y);
};Structs cannot have "private" members.
Structs members can have default values. Whenever you create a variable with the struct, the members are secretly assigned to the default values.
#struct Thing : {
int x = 14;
int y = 500;
};
Thing foo;
printf("{},{}\n", foo.x, foo.y); // "14,500"Default values can be particularly helpful when defining struct variables:
// Without default values (C-style):
#struct {
int x;
int y;
} foo = {
.x = 14,
.y = 500,
};
// With default values:
#struct {
int x = 14;
int y = 500;
} foo;Braces will be interpreted as the relevant struct/array type based on context, in C you would have to put a type cast before them.
#struct Vec2f : {
f32 x;
f32 y;
};
#proc test_vecs : (Vec2f a, Vec2f b) { ... }
#proc test_array : ([4]int a) { ... }
Vec2f v;
v = {5,22};
test_vecs(v, {1,2});
test_array({1, 2, 4, 5});Members can be unnamed, this can be used for alignment.
#struct Test : {
u16 a;
u16; // Occupies space but has no name.
u32 b;
};Miscellaneous examples of variable grouping with structs:
#proc get : #struct { int x; int y; } () {
#return {1, 2};
}
#proc give : (#struct { int x; int y; } foo) {
print("Gave {} and {}!\n", foo.x, foo.y);
}
#proc main : () {
#struct { int x; int y; } foo = get();
printf("{},{}\n", foo.x, foo.y);
give(foo);
give({.x = 123});
#struct { int x; int y; } = get();
printf("{},{}\n", x, y);
y = get().y;
}
Nested types
Types can be nested inside structs and enums. A nested type doesn't do anything by itself, it's just a regular type with a special namespace. You can nest types just for the heck of it, but the intention is to use it for the members of the parent struct.
#struct Vec2f : {
f32 x;
f32 y;
};
#struct Entity : {
#enum STATE : u8 {
NONE;
ALIVE;
DEAD;
INVINCIBLE;
};
STATE state;
STATE previous_state;
Vec2f pos;
};
// Nested types can be used from the outside like this.
Entity.STATE state = .ALIVE;
state = .DEAD;
// Surprise: ALL members can be used as types, not just nested types.
Entity.pos position = {15, 20};
Vec2f position = {15, 20}; // Same as above.
Struct templates
#struct Array : (T) {
u64 count;
&T data;
};
#proc print_ints : (&Array(int) a) {
#for (u64 i=0; i<a.count; i++) {
printf("{} = {}\n", i, a.data[i]);
}
}
#proc main : () {
Array(int) a;
print_ints(&a);
}Note: this is only a convenient way to define alternate versions of a struct, there are no template functions.
Operators on structs and arrays
#struct Vec2f : {
f32 x;
f32 y;
};Vec2f foo;
Vec2f bar;
#if (foo == bar) {
...
}
Vec2f zip = foo + bar;
Vec2f lol = foo * 100;
// Below are the manual equivalents to the above.
Vec2f foo;
Vec2f bar;
#if (foo.x == bar.x && foo.y == bar.y) {
...
}
Vec2f zip = {
.x = foo.x + bar.x,
.y = foo.y + bar.y,
};
Vec2f lol = {
.x = foo.x * 100,
.y = foo.y * 100,
};[2]f32 foo;
[2]f32 bar;
#if (foo == bar) {
...
}
[2]f32 zip = foo + bar;
// Below are the manual equivalents to the above.
[2]f32 foo;
[2]f32 bar;
#if (foo.0 == bar.0 && foo.1 == bar.1) {
...
}
[2]f32 zip = {
.0 = foo.0 + bar.0,
.1 = foo.1 + bar.1,
};
// You could also think of it as a loop:
#for (int i=0; i<#countof(zip); i++) {
zip.[i] = foo.[i] + bar.[i];
}Operators can be used across types if they are made compatible with
#compatible_typesand have the same member types/offsets.#struct Vec2i : {
int x;
int y;
};
#struct Dimensions : {
int width;
int height;
};
#compatible_types Vec2i, Dimensions, [2]int;
[2]int array;
Dimensions size;
Vec2i pos = array + size;
#if (array == pos) {
...
}
Multi-break and continue
#for (...) {
#for (...) {
#if (y == 10) #continue; // Continues the inner loop.
#if (x == 10) #continue 2; // Breaks the inner loop and continues the outer loop.
#if (x+y == 1000) #break 2; // Breaks both loops.
}
}Break works on scopes:
{
{
#if (y == 10) #break #scope; // Basically goto to the end of the current scope.
#if (x == 10) #break #scope 2; // Same except the outer scope.
}
}By putting a label before a loop/scope, you can use break on it.
looplers: {
outer: #for (...) {
inner: #for (...) {
#if (y == 10) #break inner;
#if (x == 10) #break outer;
#if (x+y == 1000) #break looplers;
}
}
}
Strings
TFD has no strings by default. In order to use a string, you must first define what kind of string you want. The language comes with a library
#import "string.tfd";which defines a string type, all standard libraries use it too.String constants are created with a "string macro" system. If the macro is not explicitly used, it is automatically picked based on the context.
String foo = "Hello world"; // Struct with length and data pointer.
Cstring bar = "Hello world"; // Pointer to u8, a 0 is appended to the end of data.
// The string literals above are actually using string macros. The macro is inferred based on the type of the variable, but you can use the macro explicitly:
String foo = Stringxxx"Hello world";
Cstring bar = Cstringxxx"Hello world";These two string types look something like this:
#struct String : {
i64 length;
&u8 data;
};
#typedef Cstring : #strict &u8;The macros are created like this (note that these work somewhat differently from regular #macros):
#macro Stringxxx"" \
#inferred_type String \
#place {.length=#bytes,.data=#data}
#macro Cstringxxx"" \
#inferred_type Cstring \
#place #data \
#append_invisibly u8 0
#inferred_type= the type that causes this macro to be inferred. If this is not defined, then the macro must always be used explicitly. 2 string macros cannot use the same inferred_type, but by typedefing a new one like Cstring here, it is only inferred from Cstring type and thus won't conflict with a string macro that's inferred from &u8.#place= the string constant will be replaced with this.#append#prepend= Append or prepend something to the string data.#append_invisibly#prepend_invisibly= Same as above, except these do not increase the value of #bytes or #characters.#align_size= Works the same way as with structs, this will pad the data with 0s until it aligns to the desired size.#bytes#characters#data= integer for the number of bytes, integer of the number of characters, and pointer to the string data. The data is UTF-8 by default but can be inserted in a few different formats, for example#data_utf16.#data_asciican be used to enforce ascii-only content.Note 1: I added "xxx" to the macro names for clarity, in reality they would use the same name as the type.
Note 2: This syntax for defining a string macro is just the first idea I came up with. Sometimes the simplest answer is the best, but there's probably a better way to do it.
Here strings / multi-line string literals / custom string delimiters
2 backticks
``can be used in place of normal quotations"to create a multi-line string. Helpful if you want to include a lot of text, like help texts or GPU shaders or something with a lot of normal quotation marks.Indentation is ignored up to the same level as the line that the ending backticks are on. 1 empy line immediately before and after the backticks are also ignored.
String sometext = ``
Hello world,
this is "a story" about
coding and stuff!
``;
// Defining the string above in C would be done like this:
char* sometext =
"Hello world,\n"
"this is \"a story\" about\n"
"coding and stuff!";The backticks actually create a special delimiter, you can optionally add a word inside of them:
String sometext = `STR`This is a cool `piece of text` with some ``backticks`` all over it, they won't end the string because of the custom delimiter word.`STR`;Explicit string macros work the same way as with normal strings:
Cstring sometext = Cstringxxx`STR`This is a "cool" piece of text..!`STR`;
Built-in constant values
#lineInteger for the current line number.#file_nameString for the current file's name.#file_pathString for the current file's relative path (not including name).#file_full_pathSame as #file_path except the full system path.#proc_idInteger that is unique to each function. This is 1 in the first function, 2 in the second function, and so on. There's no particular order, the numbers are likely in the order that the compiler finds the functions in.#proc_countInteger for the total number of functions in the program.#proc_nameString for the name of the current function.#proc_name_arrayArray of Strings containing every function name, indexed according to the function's #proc_id.#unique(foo)Every time this is used in source code, it resolves into a different integer value for the given key. It starts from 0 and increments each time it's used. This is similar to __COUNTER__ in C except you have multiple counters by using different keys. To be clear, this is compile-time, the values do not change at runtime. You can optionally add a value#unique(foo,12)which will set the counter to the given value.Functions may be removed during compilation (e.g. non-exported functions that aren't used anywhere). Removed functions will have empty names in
#proc_name_array, or may be removed entirely (in which case they also don't contribute to#proc_idor#proc_count).
Default and named function arguments
#proc foo : int (int a, int b = 100, int c = 1337) {
#return a * b * c;
}
// All of the following function calls are the same.
int hi = foo(5, 100, 1337);
int hi = foo(5, 100);
int hi = foo(5);
int hi = foo(5,, 1337);
int hi = foo(5, .c=1337);
Type functions
These are just normal functions with special syntax for accessing them. The first argument's type can be
#base(or a pointer to it), which will be treated in a special way.#proc f32.add : (&#base a, f32 b) {
*a += b;
}
f32 something = 3.14159;
f32.add(&something, 100); // Call the function manually.
something.add(100); // Same as above, the actual purpose of a type function is to use it like this.Type functions can be chained from the function's return value.
#proc f32.mul : f32 (#base a, f32 b) {
#return a * b;
}
f32 a = 3.14159;
f32 b = a.mul(10).mul(5).mul(123);
Function overloading
Every function in a namespace must have a unique name, but they can be overloaded separately. There cannot be a function with the same name as the overloaded name.
#proc foo_int : (int x) {
...
}
#proc foo_float : (f32 x) {
...
}
#overload foo foo_int;
#overload foo foo_float;
#proc main : () {
int x = 123;
f32 y = 1.5;
foo(x); // Calls foo_int
foo(y); // Calls foo_float
}This is similar to using a _Generic macro in C, the main difference is that with #overload you can define each overload separately.
Bounds checking
When you access pointers or arrays, the bounds will be checked at compile-time and you get an error if you overflow the length. If the index and/or length aren't known at compile-time, a bounds check is done at runtime to make sure the index is within bounds. Runtime bounds checks can be disabled with compilation options.
#struct Coolarray : {
int count;
&int data #counted_by(.count);
};
Coolarray a;
a.data = mem.alloc(10*#sizeof(int));
a.count = 10;
a.data[10] = 123; // Compile-time error: index 10 is out of bounds.
#proc test_me : (int count, &int data #counted_by(count)) {
data[10] = 123; // Runtime error: index 10 is out of bounds.
}
test_me(a.count, a.data);All bounds checks are attempted at compile-time regardless of compiler settings, and if possible, it won't be done at runtime.
#index= can be used in the following conditions, this is substituted with the index that you access the array/pointer with.#bounds_check(condition)= modifier for pointers, the condition must pass or else you get an error. Can use any integer variables and constants, for example#bounds_check(#index <= foo*bar-8).#counted_by(x)= equivalent to#bounds_check(#index<x). This is just for convenience since this is the most common use-case.#always_check_bounds,#never_check_bounds= modifier for struct types and variables, can be used to override the compiler setting for whether to check bounds.Arrays automatically have
#counted_by(#countof(a)).Bounds checking is enabled by default, but can be toggled on/off whenever, in a single file, in a single function, in a single scope... You can also force compile-time checking off if you really want to.
Casting/converting variables
// Conversion (pick only 1).
f32 x = (f32 #convert)y; // Properly converts y into the closest equivalent x.
f32 x = (f32 #place_bits)y; // This just slaps the bits in without converting anything, this probably won't actually become a valid f32. This disables all the other rules below.
// float -> int rounding (pick only 1). NOTE: these only work when converting floats to ints, it does nothing for float-to-float or int-to-int conversions.
i32 x = (i32 #floor)y; // Will floor floats.
i32 x = (i32 #ceil)y; // Will ceil floats.
i32 x = (i32 #round)y; // Will round floats.
i32 x = (i32 #cut_decimals)y; // Will round towards 0 by removing the decimals, basically positive and negative values round differently.
// Range checking (pick only 1).
u8 x = (u8 #check)y; // Makes sure u8 can contain the information from y.
u8 x = (u8 #no_check)y; // This does not do aforementioned check.
u8 x = (u8 #clamp)y; // y will be clamped to the range of u8, so if y is 300, it will become 255.
// Lazy cast, casts to whatever type makes the compiler not complain, in this case into an int.
int x = (*)y;
int x = (int)y;
int x = (int #convert #no_check #cut_decimals)y; // Same as above because these are the default casting settings. They can be changed with compiler options.
Macros
A macro is a text replacement, it gets replaced with it's contents where-ever it is used. For the most part, macros work exactly like in C. TFD macros are local to their own scope.
#macro something "Global!"
#proc foo : () {
#macro something "Local!"
print(something); // "Local!"
}
#proc bar : () {
print(something); // "Global!"
}The arguments can optionally have types.
#macro something(foo, f32 x, f32 y) (x*y + foo)Unlike C, the arguments are captured more similarly to function arguments.
// C
#define foo(x) ...
foo({1, 2, 3}) // Error: foo takes 1 argument, but 3 provided.
// TFD
#macro foo(x) ...
foo({1, 2, 3}) // No problem.Multi-line macros can be made by "escaping" line breaks with
\, or by wrapping the contents inside#{and#}.#macro complicated_macro_1 \
#if (x) { \
foo(); \
bar(); \
}
#macro complicated_macro_2 #{
#if (x) {
foo();
bar();
}
#}Arguments can also be wrapped with
#{#}to more easily input arbitrary text/code into the macro.#macro funny_loop(condition, increment, inner_code) #{
#for (int i=0; condition; i+=increment) {
inner_code
}
#}
funny_loop(i<100, 2, #{
printf("This is a macro loop!\n");
printf("The number is {}\n", i);
#})
##can be used as a void space to isolate arguments without separating them with a space, it's mostly used to connect arguments to something else and dynamically creating names.#macro foo(x, y) 999##x##y
printf("I ate {} cakes.\n", foo(100, 50)); // "I ate 99910050 cakes."
#xcan be used to place an argument as a string.#macro foo(x) printf("{} = {}\n", #x, x)
foo(2 * 10); // "2 * 10 = 20"
#on_leave, #on_enter_proc, #on_leave_proc
#on_leaveis a special variation of #macro that automatically places it's contents everywhere that the scope ends.#proc main : () {
&void data = mem.alloc(1000);
#on_leave mem.free(data);
#if (x) #return; // mem.free(data) is inserted here.
#for (int i=0; i<100; i++) {
Thing* thing = get_thing();
#on_leave release_thing(thing);
#if (x) #continue; // release_thing(thing) is inserted here.
#if (y) #break; // release_thing(thing) is inserted here.
#if (z) #return; // release_thing(thing) and mem.free(data) are inserted here.
// release_thing(thing) is inserted here.
}
// mem.free(data) is inserted here.
}
#on_enter_procand#on_leave_procare also special variations of #macro, they automatically place their contents at the beginning and the end of functions.#on_enter_proc printf("Hello! {}\n", #proc_name);
#on_leave_proc printf("Bye! {}\n", #proc_name);
#proc testfunc : (int x, int y) {
x += y * 2;
#if (x > 1000) {
#return;
}
printf("x={} y={}\n", x, y);
}The code above would equate to the following:
#proc testfunc : (int x, int y) {
printf("Hello! {}\n", #proc_name);
x += y * 2;
#if (x > 1000) {
printf("Bye! {}\n", #proc_name);
#return;
}
printf("x={} y={}\n", x, y);
printf("Bye! {}\n", #proc_name);
}The above example is slightly misleading because if you do
#return foo(), the#on_leave_proccontents must be placed after the function call, so you can't just think of it as going before the whole return statement.This code will not be added to functions marked as
#inlineor#no_inject.
Importing and building
To build a program, simply give the compiler a starting code file. All the options relevant to building a program must be defined through special global variables (list of options below).
There's no forward declaration or headers like in C. Code files are imported directly with
#modulewhich gives access to that file from the current file, but unlike C #include which just copy pastes the file contents, the file becomes a self-contained object.#module "coolarray.tfd";
Coolarray test;
init_array(&test);You can also import a file into it's own namespace:
#module "coolarray.tfd" ca;
Coolarray test; // Error: Coolarray is undefined.
init_array(&test); // Error: init_array is undefined.
ca.Coolarray test;
ca.init_array(&test);
#importworks the same way except the path is relative to library directories (mostly the compiler's standard library directory) instead of your project's directory. It also implicitly has #no_everywheres enabled, more about that in Name visibility. #import is meant for re-usable/third-party libraries that you're "importing" into your program, #module is meant for separating different parts of your own project.#import "string.tfd";
String test = "Hello world";
#pastecan be used if behavior identical to#includefrom C is desired, it acts exactly like the file's contents were inserted here, which means you can use this multiple times.#paste "some_code.tfd";
#filecan be used to place the contents of a file into an array or string.[]u8 file = #file "something.txt";
#for (int i=0; i<#countof(file); i++) {
printf("Char {c}\n", file.[i]);
}All of the above can be used in any normal scope.
#proc foo : () {
#module "coolarray.tfd";
Coolarray test;
}
#proc bar : () {
Coolarray test; // Error: Coolarray is undefined.
}
Macros and order of compilation
Since TFD has no header files or forward declaration, it must be compiled non-linearly. The compiler may run into a name that it doesn't know about yet (it's later in the same file, or in another file), so it has to defer that part until later.
The biggest complication are compile-time ifs
#ct_ifbecause they change what code actually becomes part of the program and what doesn't. TFD has a "pre-processor" like C, but it encompasses more things and is also non-linear.#ct_if FOO < 32
#struct Thing : {
int x;
};
#ct_else
#struct Thing : {
f32 x;
};
#ct_endIf FOO is not defined, the compiler either stops at the #ct_if or jumps over it and solves it later when more macros have been defined.
#definedchecks the presence of the macro itself. Since the compiler doesn't know whether the name is waiting to be defined in another file, or not defined at all, this can lead to #ct_ifs being parsed in the wrong order. To aid with this, #defined must be preceded by a source hint, either#now,#maybe, or#from.// Will be resolved immediately. FOO is assumed to be either defined by the time the compiler gets here (mostly global build settings), or not defined at all.
#ct_if #now #defined FOO
#ct_end
// If FOO is not defined, this will wait for the other files that are imported into this file. This will only resolve into false if none of the other files can continue parsing (they're waiting on a similar condition) and FOO has still not been defined.
#ct_if #maybe #defined FOO
#ct_end
// Same as #maybe, except only waits for the given file, not all the other files. This can be used for all macro checks, not just #defined.
#ct_if #from "something.tfd" #defined FOOThe compiler remembers every #defined that previously resolved into false, and gives an error if that macro is defined later. It should only happen if the files were parsed in the wrong order or there's some kind of circular paradox.
TODO: This is a bit confusing as the user of the language because this adjusts compiler parsing order which has nothing to do with your program, or even compiler's behavior the same way a setting would.
Having to always define a hint seems annoying, but making them optional would make it much more likely for a bunch of problems to arise. Global settings may toggle parts of the code that have highly cascading effects, but are never be defined. Meanwhile non-global macros probably have a fairly good chance to not be defined by the time the compiler runs into a #ct_if that asks for it.
Name visibility
The visibility of every global name in a file can be controlled with 3 keywords which affect whatever is defined after them.
#publicnames are visible when someone imports this file. This is the default.#privatenames are not visible (unless this file is imported with #force).#everywherenames are visible to files that are imported from the current file, so it effectively injects the name into other files. You could also think of this as defining a new compiler-constant.#private #struct Thing : {
int x;
int y;
};
#private #proc test : () {
print("Testing a thing {}\n", foo);
}
#private int foo = 123;#everywhere #module "string.tfd";
#everywhere #macro program_name "My awesome project"
// cool_array.tfd will implicitly import string.tfd, and the program_name macro will be usable there. The same is true for all files that cool_array.tfd imports.
#module "cool_array.tfd";
// boring_array.tfd will neither import string.tfd, nor have access to program_name.
#module "boring_array.tfd" #no_everywheres;
// #module implicitly has #no_everywheres.
#module "third_party_library.tfd";
// ...but you can change that.
#module "hackable_library.tfd" #include_everywheres;Note: #everywheres will get "baked in" to a file when the file is first parsed. #importing the same file again does nothing so you can't import it with different #everywheres later. #everywhere #macros also cannot be #undefined later. #everywhere names are intended for project-wide settings and libraries that you want available everywhere, it's better to compare them to the -D compiler option in C, not to #define.
If some library is over-using #private or you want to get more access than was intended, you can forcibly get access to all the names by using
#force. This will treat all #private names as #public.#import "coolarray.tfd" #force;Functions can be linked from a pre-compiled library.
// This function will be visible if you compile an object file or DLL.
#proc foo : () #export { ... }
// The other side of #export: this function comes from a compiled library at the linking stage.
#proc bar : () #external;If both a function with #export and function declaration with #external exist in the same program, the latter is assumed to be a header for the former and they must be identical. If you're making a library, you can separate the types and function declarations and macros into a separate file from the functions, just like with C.
Build rules
These are similar to compile-time #everywhere variables that the compiler uses directly to control it's behavior. They can be changed at any time, and like macros, they're bound to scopes, so changing them at the start of a function will revert them at the end of the function, and putting one to the end of a file doesn't do anything.
The following options only have effect when used in the starting file, but the values can be read from other files.
#exe_name = "coolprogram"; // .exe or .dll may be added on Windows depending on #build_type. #exe_path = "release/bin"; #exe_icon = "res/icon.png"; #build_type = .EXECUTABLE; // .EXECUTABLE .DYNAMIC_LIBRARY .STATIC_LIBRARY #add_linked_library("Gdi32", "User32"); // Equivalent to -lGdi32 -lUser32 in GCC. #add_linked_library_path("/foobar/lib"); // Equivalent to -L"/foobar/lib" in GCC. #add_import_path("/foobar/include"); // Equivalent to -I"/foobar/include" in GCC. #remove_unused_functions = #true; // Will delete any functions from the program that aren't called from anywhere else and that don't have #export. #remove_unreached_functions = #false; // Similar to above, but a search is done starting from the main function to check if functions are reached from it.If you want custom values, use macros with #everywhere before importing any other files:
#everywhere #macro program_version 123
#everywhere #macro program_name "Cool Program"These affect how the program is compiled and what's in it. These are scoped, so you can change them inside an individual function or any scope. Files imported from this file will inherit the current value.
#optimization_level = .MAX_SPEED; #runtime_bounds_checking = #true;These are default preference-based rules that can be modified according to your personal preference. It's not recommended to change these, but you can. One of the core principles of TFD is that it's not the language designer's job to tell the programmer what's the right way to program. It can only make suggestions through the default values.
Unlike the settings above, libraries imported with
#importwill have their own self-contained rules for everything below unless you import it with#include_everywheres.#default_type_visibility = #public; #default_macro_visibility = #public; #default_function_visibility = #public; #default_global_variable_visibility = #private; #default_import_visibility = #private; #default_typedef_strictness = #abitstrict; #default_enum_strictness = #strict; #allow_int_signedness_loss = #false; // i32 -> u32 #allow_int_signedness_lossless = #true; // u8 -> i16 #allow_int_size_loss = #false; // u64 -> u32, u32 -> i32 #allow_int_size_lossless = #true; // u16 -> u32 #allow_float_size_loss = #false; // f64 -> f32 #allow_float_size_lossless = #true; // f32 -> f64 #allow_float_to_int_loss = #false; // f32 -> i32 #allow_int_to_float_loss = #false; // i32 -> f32 #allow_int_to_float_lossless = #true; // i8 -> f32 #auto_cast_from_void_pointers = #true; // &void -> &int #auto_cast_to_void_pointers = #true; // &int -> &void #auto_cast_from_array_pointer = #true; // &[]int -> &int #auto_cast_to_array_pointer = #true; // &int -> &[]int #bool_only_accepts_true_false = #true; // If true, bool will not accept integers, improves type checking when calling functions. #treat_true_false_as_int = #false; // If true, 'true' and 'false' can be used more freely, for example you can set them into an 'int' variable without casting. #default_casting_behavior = #convert #no_check #cut_decimals; #allow_lazy_cast = #true; // x = (*)y #completion_of_enums_on_cases = #partial_enum; // #partial_enum, #complete_enum #untyped_enum_size = .SMALLEST; // What size should enums be if they don't have a type. #aligned_members_must_match_struct_align = #true; // If you have a struct member with align(16) in a struct whose size is not divisible by 16, you get an error. #default_struct_packing = .POWER_LESS_OR_EQUAL_8_OR_8; // If the variable is 2 bytes, it's aligned to 2 bytes, if 4 then aligned to 4, if 3 then to 4, if bigger than 8 then to 8. .POWER always aligns to the nearest power of 2, for example 17-byte struct aligns to 32. .LESS_OR_EQUAL_X aligns to any size, not just powers of 2. #non_inferrable_string_macro = String; // In some situations the string type can't be inferred from context, in that case this string macro is used. #max_enum_member_names_count = 16_384; // If you use #enum_member_names and the array would be longer than this, you get an error. #max_enum_member_names_size = 1_000_000; // Same as above except byte size. #zero_initialize_arrays = #true; #zero_initialize_structs = #true; #zero_initialize_primitive_types = #true; #maintain_zeroed_struct_padding = #true; // If false, struct padding may be left uninitialized, and arbitrary data may be written to it, depending on what the compiler thinks is faster to do. #assume_zeroed_struct_padding = #true; // If true, some optimizations may be made, for example equality of structs can be checked across members even if there's padding in-between them. Programmer must ensure that uninitialized data isn't used for structs. #allow_stack_alloc = #true; // void* foo = #stack_alloc(x); #allow_variable_length_arrays = #true; // [x]int foo; where x is a variable, not a compile-time constant. #allow_assignment_in_conditions = #false; // if (x = foo()) ... #allow_assignment_in_statement = #false; // x = foo[y=foo()]; #allow_increment_decrement_in_conditions = #false; // if (x++) ... #allow_increment_decrement_in_statement = #false; // x = foo[y++]; #allow_redeclaration_of_name_from_parent_scope = .YES; // .YES or .NO or .SAME_TYPE_ONLY #allow_redeclaration_of_name_from_same_scope = .NO; // .YES or .NO or .SAME_TYPE_ONLY #allow_unused_variables = #true; #allow_unreachable_code_after_return = #true; #allow_unicode_in_comments = #true; // /*🔥*/ #allow_unicode_in_strings = #true; // "🔥" #allow_unicode_in_character_literals = #true; // '🔥' #allow_unicode_in_code = #true; // int 🔥 = 123;#enforce_indentation_character = .NONE; // .TABS or .SPACES #enforce_indentation_length = 0; #enforce_open_brace_placement = .NONE; // .SAME_LINE or .NEW_LINE #enforce_else_placement = .NONE; // .SAME_LINE or .NEW_LINE #enforce_close_brace_placement = .NONE; // .MATCH_OPEN_BRACE_LINE #enforce_unbraced_code_placement = .NONE; // .SAME_LINE or .NEXT_LINE_PLUS_INDENT or .FORBID, this refers to foo in if (x) foo; // UPPER_CASE = 0x1 HELLOWORLD // LOWER_CASE = 0x2 helloworld // BEGIN_UPPER = 0x4 Helloworld // BEGIN_LOWER = 0x8 helloworld // FORBID_UNDERSCORE = 0x16 // example: .UPPER_CASE|.BEGIN_LOWER = hELLOWORLD #enforce_struct_capitalization = .NONE; #enforce_enum_capitalization = .NONE; #enforce_macro_capitalization = .NONE; #enforce_function_capitalization = .NONE; #enforce_variable_capitalization = .NONE; #enforce_typedef_capitalization = .NONE;
Miscellaneous modifiers and mechanics
If some C attribute equivalent isn't here, then I probably don't normally use it. That doesn't necessarily mean it shouldn't be in the language, I just never think about it. I'd need someone more knowledgeable about assembly and compiler optimizations and stuff to tell me what kind of modifiers and features are useful.
#inline,#noinline- #inline functions behave identically to macros in that it replaces the function call with the function's contents. Unlike C, inline functions will always inline, if it cannot (for example if it recursively calls itself), it will give a compiler error. In C, theinlinekeyword is just a suggestion and isn't guaranteed to do anything. The compiler may inline functions that aren't marked as #inline if it thinks it's a good idea, #noinline prevents that. You can also use these from the calling site.#proc foo : () #inline { ... }
#proc bar : () { ... }
#proc main : () {
foo();
#inline bar();
}I don't know much about linkers, so I don't know how inlining of external library functions works. I don't really care to be honest, I feel that the benefit of inlining linked functions is lesser than the benefit of inlining function code because the latter has much greater potential for optimization.
#align(x)- Can be used for type definitions or variable definitions, aligns it in memory. Same syntax as #offset() except this can be a modifier to a type too. Can also be used for values behind pointers, for example& f32 #align(32)would mean that the pointer address is aligned with 32 bytes, allowing the usage of 256-bit SIMD operations on it. May cause a crash if the address isn't aligned.& #align(32) f32would mean that the pointer variable itself is aligned, but it's value may not be. Unlike most modifiers whose location is not important, this modifier must come immediately after the thing being aligned.#persist- Variable (inside a function) whose value persists across function calls, i.e. a global variable that's only accessible from the function. In C you would usestatic.#thread_local- Variables marked with this are unique per thread.#read_only- For variables, the data cannot be written to. Basically the same asconstin C.#warning "x",#error "x"- Can be attached to functions, macros, types, global variables, enum values, or globally to a file. Causes the compiler to give a warning or an error if they are used, useful for things that are deprecated or unfinished or broken.#stack_alloc(x)- Similar toalloca()in C.#typeof(x)- Expands to the type of x.In C,
++and--work differently from+=1and-=1. In TFD they're different syntax for the same thing.&int foo;
*foo += 1; // Increments the int.
*foo ++; // Increments the int. In C this would shift the pointer and dereference for no reason.A semicolon after most braces (functions, ifs, loops...) are optional and will not do anything.
Syntax struggles
Syntax matters. It's what you spend all your time looking at, it's how you interpret the code. You can barely keep all the specifics of a single function in active memory, let alone the entire codebase, so you have to continuously re-read it and understand the meaning. The easier it is to do this, the more of your brain's L1 cache and processing power can be allocated to thinking about the program instead of parsing syntax and trying to find and understand things.
C syntax is almost perfectly concise so it feels very bad to add anything to it, but there IS a bunch of ambiguities and inconsistencies and problematic aspects to it. Functions are a whole mess of their own and it's extremely difficult to fix without adding a new keyword, so I'll take the #proc keyword as granted here.
In an ideal world the defintion of the type/function would always be the same regardless of where it is and whether it's a type or anonymous, and there's no ambiguity about what the syntax means even when you don't know what word is a type and what isn't. This is very difficult to achieve without adding a bunch of extra symbols, I'm less bothered by them on type/function definitions than in variables so I would really like to avoid changing the latter.
One of the most annoyingly small achilles heel inconsistencies is with enums. Is this an enum called foo, or anonymous enum with the type foo? enum foo {}; This seems to be unsolvable without adding a new symbol or keyword. If you add colon like in C enum : foo then it's inconsistent because you don't need it if there's no specified type. The optional-ness and unusualness and similarity to labels of this colon bothers me and I would probably choose different syntax if this problem didn't exist.
One of the most concise ways to differentiate definitions from variables is to take the variable syntax and flip the order for definitions:
// Variables
u32 thing;
#enum u8 {...} thing;
#struct {...} thing;
&#proc int () thing;
// Definitions
Thing #typedef u32;
THING #enum u8 {...}
Thing #struct {...}
thing #proc int () {...}
The main flaw of this is the preference of having the name after keyword. Another option is to use pascal syntax that adds colons between name and type. Jai and Odin do something similar to this:
// Definitions
Foo :: u32;
TIP :: #enum u8 {...}
Bar :: #struct {...}
do_thing :: #proc int () {...}
// Variables
Foo : u32;
TIP : #enum u8 {...};
Bar : #struct {...};
do_thing : &#proc int ();
This is quite unambiguous, and has the added benefit of being consistent with labels (think of it as "location of data" for variables and "location in code" for gotos).
My biggest grievance with this syntax is when you do normal variables, especially in function arguments. The : adds noise that makes things back-to-back variables annoying to read, and it's just more verbose and annoying to type.
#proc thing : (int foo, f32 bar, Thingy thingy_dingy) {...}
#for (int i=0; i<y; i++) {...}
thing :: #proc (foo: int, bar: f32, thingy_dingy: Thingy) {...}
#for (i:int=0; i<y; i++) {...}
It can also make structs look more confusing depending on what member names are like, usually most members are primitive types so the type is short but name is long, which looks much better with C-like syntax. It only looks fine if you align all the types, but that's kind of annoying to do. More syntax highlighting would also help, but I don't like having much of it.
#struct Thing : {
bool:1 visible;
bool:1 has_thingy_dingy;
bool:1 was_destroyed;
u32 id;
Vec2 position;
Vec2 size;
};
Thing :: #struct {
visible : bool:1;
has_thingy_dingy : bool:1;
was_destroyed : bool:1;
id : u32;
position : Vec2;
size : Vec2;
};
#struct Thing : {
bool:1 visible;
bool:1 has_thingy_dingy;
bool:1 was_destroyed;
u32 id;
Vec2 position;
Vec2 size;
};
Thing :: #struct {
visible : bool:1;
has_thingy_dingy : bool:1;
was_destroyed : bool:1;
id : u32;
position : Vec2;
size : Vec2;
};
Here's a bunch of other ways to add a symbol or keyword to the type:
Foo: u32;
TIP: #enum u8 {...}
Bar: #struct {...}
do_thing: #proc int () {...}
#typedef:Foo u32;
#enum:TIP u8 {...}
#struct:Bar {...}
#proc:do_thing int () {...}
#typedef Foo: u32;
#enum TIP: u8 {...}
#struct Bar: {...}
#proc do_thing: int () {...}
#typedef Foo = u32;
#enum TIP = u8 {...}
#struct Bar = {...}
#proc do_thing = int () {...}
+#typedef Foo u32;
+#enum TIP u8 {...}
+#struct Bar {...}
+#proc do_thing int () {...}
#define Foo u32;
#define TIP #enum u8 {...}
#define Bar #struct {...}
#define do_thing #proc #must_receive int () {...}
#new Foo u32;
#new #enum TIP u8 {...}
#new #struct Bar {...}
#new #proc do_thing #must_receive int () {...}
#typedef #as Foo u32;
#enum #as TIP u8 {...}
#struct #as Bar {...}
#proc #as do_thing #must_receive int () {...}
Foo #is u32;
TIP #is #enum u8 {...}
Bar #is #struct {...}
do_thing #is #proc #must_receive int () {...}
As a C programmer it feels very natural to start typing "struct Name" and to read it later, but I find it very difficult to make it work well, especially because of the enum type problem.
Foo :: is easier to search for because it's independent from what's being defined, but I find type-after-name harder to read when browsing code (could just be a matter of what I'm used to), and I do a lot more reading than searching. That said, #struct Foo: is in a way the best of both worlds since you can also search for Foo:.
I actually find that when the name is before keyword, it's easier to read the code when they're separated very clearly, so Foo :: #struct is actually more easily readable than Foo: #struct or Foo #struct.
One cool part of pascal syntax is that it's compatible with labels. In C you can use foo: to give a name to a location in code and then goto into it, but you could expand your understanding of it to be the location of anything, or more precisely the location of data, which is what a variable is.
When deciding on a consistent unambiguous syntax, you have to also keep in mind statements that are neither variables nor definitions, like foo[4] = 123; ((int*)foo+5)[4] = 123; #inline dingler();, and also that it's useful to be lenient of parentheses due to macros and type casts and the like. Perhaps the biggest advantage of pascal syntax is that it very clearly labels definitions and variables.
Shoving a function definition into another function definition looks very confusing, so it's useful to be able to define functions as types. #typedef foo : #proc int (); Removing the typedef keyword is tempting depending on syntax, but you kind of lose this ability if you remove it.
The most complicated part of syntax is functions. The problem is that functions can return other things, including pointers to other functions, and it all goes to hell if you want to support multiple return values because then it becomes very difficult to tell who a type belongs to. Adding even more symbols or keywords to separate things would suck.
Type should be before parentheses because otherwise function pointer syntax becomes more ambiguous since the return type and variable name are next to each other. Is this a function pointer that returns something, or a function pointer called "something" that returns nothing?
&#proc (Vec2f pos) something;
Requiring a return type like in C (void for nothing) would fix this, but it feels stupid because there's way more functions without return than there are function pointers, so the latter imposing extra syntax on the former is dumb.
Type should be after parentheses because then returning complicated types (like anonymous structs or function pointers) won't make the function as hard to read, and the function syntax is more unambiguous because all return type -related things are dumped after the parens.
#proc something : (Vec2f pos, Vec2f size) int { ... } #proc something : (Vec2f pos, Vec2f size) [4]Thing { ... } #proc something : (Vec2f pos, Vec2f size) #enum { ERROR; SUCCESS; } { ... } #proc something : (Vec2f pos, Vec2f size) #struct { f32 foo; f32 bar; } { ... } #proc something : (Vec2f pos, Vec2f size) &#proc (int, f32) int { ... } &#proc (Vec2f pos, Vec2f size) int something; &#proc (Vec2f pos, Vec2f size) [4]Thing something; &#proc (Vec2f pos, Vec2f size) #enum { ERROR; SUCCESS; } something; &#proc (Vec2f pos, Vec2f size) #struct { f32 foo; f32 bar; } something; &#proc (Vec2f pos, Vec2f size) &#proc (int, f32) int something;
#proc something : int (Vec2f pos, Vec2f size) { ... } #proc something : [4]Thing (Vec2f pos, Vec2f size) { ... } #proc something : #enum { ERROR; SUCCESS; } (Vec2f pos, Vec2f size) { ... } #proc something : #struct { f32 foo; f32 bar; } (Vec2f pos, Vec2f size) { ... } #proc something : &#proc (int, f32) int (Vec2f pos, Vec2f size) { ... } &#proc int (Vec2f pos, Vec2f size) something; &#proc [4]Thing (Vec2f pos, Vec2f size) something; &#proc #enum { ERROR; SUCCESS; } (Vec2f pos, Vec2f size) something; &#proc #struct { f32 foo; f32 bar; } (Vec2f pos, Vec2f size) something; &#proc &#proc (int, f32) int (Vec2f pos, Vec2f size) something;
Ideally it should be easy to copypaste the function definition and create function call syntax from it, this is one of the good parts of C syntax but it's hard to retain it.
#inline int foo() ... int thing = foo()
#proc foo : int () #inline ... int thing = foo()
#proc foo : #inline () int ... int thing = foo()
#proc something : (Vec2f pos, Vec2f size) &int #inline { ... }
#proc something : &int (Vec2f pos, Vec2f size) #inline { ... }
#proc something : (Vec2f pos, Vec2f size) &int #const #inline { ... }
#proc something : &int #const (Vec2f pos, Vec2f size) #inline { ... }
#proc something : #inline (Vec2f pos, Vec2f size) &int { ... }
#proc something : #inline &int (Vec2f pos, Vec2f size) { ... }
#proc something : #inline (Vec2f pos, Vec2f size) &int #const { ... }
#proc something : #inline &int #const (Vec2f pos, Vec2f size) { ... }
#inline #proc something : (Vec2f pos, Vec2f size) &int { ... }
#inline #proc something : &int (Vec2f pos, Vec2f size) { ... }
#inline #proc something : (Vec2f pos, Vec2f size) &int #const { ... }
#inline #proc something : &int #const (Vec2f pos, Vec2f size) { ... }
#proc something : &int #const, &int #const (Vec2f pos, Vec2f size) #inline { ... }
#proc something : (Vec2f pos, Vec2f size) &int #const, &int #const #inline { ... }
The best solution would probably be to wrap the return type into parentheses. The problem with that is that you don't need it most of the time so it would be annoying to use, and technically may not ever need it so it would just exist for making the code and intent optionally easier to read. Perhaps it would be optional, but become mandatory if you have multiple return values. In this case it definitely should be after the function arguments.
#proc something : #inline (Vec2f pos) (int, f32) { ... }
#proc something : #inline (Vec2f pos) (#aligned(8) &int) { ... }
#proc something : #inline (Vec2f pos) (&#proc int (int, f32)) { ... }
&#proc #inline (Vec2f pos) (int, f32) something;
&#proc #inline (Vec2f pos) (#aligned(8) &int) something;
&#proc #inline (Vec2f pos) (&#proc int (int, f32)) something;
Additional thoughts and ideas
Here goes stuff that I'm unsure about, or otherwise think is worth mentioning.
-
There's value in keeping the language simple. The easier it is to implement the compiler, the less likely it is for the language's survival to be dependent on a single compiler author/source or be restricted to certain platforms. It makes me feel more secure.
I don't think it's bad to have lots of modifiers for everything (especially if the program can be compiled and works even if the modifier doesn't do anything, such as #inline for functions), but it is a negative to add features that are complicated to implement.
It's also useful if it's easy to parse/analyze parts of the source code, even if you don't want to implement a compiler for the language, so ideally the syntax should be easy to interpret.
-
It would be extremely convenient if you could access pointers in structs as arrays.
#struct Int_array : {
int* data #array_access;
int count;
}
Int_array a = ...;
for (int i=0; i<a.count; i++) {
print(a[i]); // Note: no need for a.data[i]
}The only reason this isn't unambiguously part of the language is because the syntax for accessing pointers and arrays is already little confusing when you start combining arrays and pointers and structs a lot and having pointers to pointers. It may not actually be a problem, but I'll have to think about it more.
-
Struct member expectations (tagged unions). It could be possible to give struct members a condition, and any scope/function where that condition is used will check for the condition and gives an error if it's false.
#enum TYPE : {
FOO;
BAR;
ZIP;
};
#struct Thing : {
TYPE type;
int x #expects(.type==.FOO);
int y #expects(.type==.FOO);
f32 a #expects(.type==.BAR);
f32 b #expects(.type==.BAR||.type==.ZIP);
};
#proc do_the_thing : (Thing thing) {
#if (thing.type == .BAR) {
thing.a = 1.5;
thing.b = 0.1;
thing.x = 123; // Error: .x expects (.type==.FOO), but checked that (.type==.BAR) on line 15.
}
}I'm not sure how useful would this actually be in practice, and whether there's a better way to do this. There has been at least one time where I had a bug caused by using the wrong thing from a union.
This would work similarly to bounds checking in that it happens at compile-time whenever possible, and you can turn off runtime checks.
Maybe
#must_expect()could be used to require a condition, you will get an error if the compiler cannot at compile-time figure out if the condition is true. -
Variable groups. I'm about 85% sure that this should be in the language, but I haven't had enough time to feel it yet, and there might be a reason why it's not as simple as it looks, especially in regards to syntax.
#proc test : int, int (int x) {
#if (x > 1000) {
#return 0, 0;
}
#else {
#return x*10, x*20;
}
}
int a, int b = test(50);
int c = 15;
int d = 200;
c, d = d, c;
c, d = test(a);This is basically a less verbose version of unnamed structs. Since unnamed structs can work, I imagine there isn't a reason why variable groups couldn't, unless there's some case of ambiguous syntax somewhere.
#proc test : #struct { int; int; } (int x) {
#if (x > 1000) {
#return {0, 0};
}
#else {
#return {x*10, x*20};
}
}
#struct { int a; int b; } = test(50);
printf("{}, {}\n", a, b);That said, if you receive variables in a different order than they're defined, it's no longer the same as a struct.
#proc test : int, int () {
#return 1, 5;
}
int a, int b;
b, a = test(); -
Are the names of
#importand#modulebackwards? My thinking is that "module" is a part of your program, and "import" is you importing something from a third party. -
#endif. Sometimes it's annoying how switch increases indentation by 2 levels, and trying to lower it just feels weird because of the braces.
#if (foo) ...;
#case 1;
print("1\n");
#case 2;
print("2\n");
#case 3; {
int foo = 123;
something(foo);
print("3!\n");
print("This is a pretty cool number.\n");
}
#case;
print("Idk what's happening.\n");
#endif;You could maybe also use it for normal ifs, but I'm not sure what the utility would be. I suppose you could just prefer it.
#if (foo == bar);
int foo = 123;
something(foo);
print("Idk what's happening.\n");
#endif;I'm not 100% sure if this feature is really worth having though, there's something about it that feels off.
-
#typeid()
int type_of_int = #typeid(int);
int type_of_float = #typeid(f32);
int type_of_Foo = #typeid(Foo);Returns a different integer value for each type in the program.
I'm unsure about this feature because I don't know how/if it would work if you link with pre-compiled libraries. I'm not familiar with how linking works, but it shouldn't be hard to add a table of type ID locations into the pre-compiled library and just swap them out during linking. Maybe the hardest part is matching type conflicts, like how do you know that Foo in library 1 is the same type as Foo in library 2?
Even worse, this probably cannot work properly with dynamically linked libraries, you'd have to use some weird runtime translation or type info tables or something, and that sounds way more complicated than what I want.
-
I want better variable function arguments, but I also don't want to make functions like printf more expensive by making it more internally complicated and bloated. If the function can operate with minimal information then the compiler shouldn't set up a bunch of crap that the function doesn't need. I'd have to know more about how variable arguments are implemented in C to be able to design a better system.
An example of what I'd want is basically sending a struct that has a count, pointer to a list of type IDs, and pointer to a list of pointers to the values.
#proc print : (#args_array args) {
#for (int i=0; i<args.count; i++) {
#if (args.typeids[i]) ... {
#case #typeid(int);
&int value = args.values[i];
#case #typeid(Vec2f);
&Vec2f value = args.values[i];
#case;
&void value = args.values[i];
}
}
} -
I don't have a full picture of how this works at assembly level, but it may be necessary to control whether it's valid to access variables from the parent scope from a callback.
#proc main : () {
int count = 0;
#proc callback : () {
count ++;
}
do_stuff(callback);
}This is a bit ambiguous and weird and possibly error prone (especially if it dereferences pointers) because you have no idea where this callback is going, and it may be stored and called later, and you can't just use fixed stack memory offsets from a callback like this.
If the nested function is only called directly from the parent function, then it's a lot more straightforward, it's the callbacks that are problematic.
Perhaps function pointers could have a
#synchronousmodifier which communicates that the function is not stored long-term nor called asynchronously. Without that property, you can't use variables from parent scope. Or alternatively invert the default and add#asynchronous.#proc do_stuff : (&#proc () callback) {
callback();
}
#proc do_stuff_sync : (&#proc () #synchronous callback) {
callback();
}
#proc main : () {
int count = 0;
#proc callback : () {
count ++;
}
do_stuff(callback); // Error: callback may be asynchronous, variables from parent scope cannot be used.
do_stuff_sync(callback);
}I'm trying to design something without full understanding of how it works though, so maybe this is all nonsense.
-
C's text replacement -type macros are problematic for various reasons. TFD improves them slightly, but some of the fundamental problems like the difficulty of debugging them aren't fixed. The good thing about them is that they're very simple, which keeps the compiler simple, and doesn't require the user to know much about them. They're also extremely powerful because they can generate almost any syntax.
Ideally I would prefer a more structured macro/metaprogramming system, but I don't really know what that would look like. C-style macros combine extremes of capability and simplicity, it's very very hard to compete with them. Inline functions and compile-time functions can solve some of the things macros are needed for, but it's not quite enough.
An idea I'm most interested in is compile-time scripting. The easiest example is a function that runs and returns it's value at compile-time:
#proc wowza : int (int x, int y) #compile_time {
#for (int i=0; i<10; i++) {
x *= y;
}
#return x;
}Ideally this should be able to do almost anything that a normal function can. Types and maybe even variables can use #compile_time.
-
Data segments. I'm not too familiar with them, but there should be a mechanism for deciding where some data is stored, and what it's properties are (real properties like whether it's read-only and what it's size is, and fake properties like how data in it should be aligned, or whether it's hot or cold which could be used by the compiler for optimizations). My knowledge starts and ends at the introduction of the wikipedia page which seems incomplete to me. String macros should probably also be able to control where their data is stored by default.
Ideally you should be able to create arbitrary data sections that you can read/write from as you please. You could create a big u8 array for that purpose, but that sounds like a hack compared to having a blank data section that you can get a pointer to.
Perhaps you could give the compiler some kind of layout of your desired data sections and their organization.
There's probably a few things I would know how to design better, or things I would change my mind on, if I was more knowledgeable about assembly and how programs are structured.
-
There's a trick I sometimes do with C arrays, it still works but I'm bothered by the fact that TFD doesn't improve it: sending variable length array literals to functions (with 0 at the end to denote the end of the array). In fact this requires 1 extra character than C so in a way this is a downgrade.
#proc foo : (&int a) {
#for (*a) {
printf("Number {}\n", *a);
a ++;
}
}
foo(&([]int){1, 2, 3, 0}); // Works but ugly and annoying to write and read.
foo(&{1, 2, 3, 0}); // This would be much better, but the function doesn't recieve an array so this can't be interpreted as one (if the int was some struct then this would look like a pointer to the struct). -
I may want to get rid of the ternary operator syntax and replace it with just an if/else.
int foo = #if (bar) x #else y;
int foo = #if (bar) x #else #if (zip) y #else z;It's somewhat annoying to look at this when you're used to the old syntax though:
int foo = (bar) ? x : y;
int foo = (bar) ? x : (zip) ? y : z;The idea of using an if/else is attractive because it reduces the amount of concepts in the language. The old syntax is kind of weird and only exists for this one purpose, and you could easily replace it with the same syntax that you use everywhere else. The old syntax also uses
?which isn't used anywhere else, you could be re-purpose it for something else. -
Use case for unused symbols?
$@. -
There would be value in some kind of tagging system that could be used to toggle build rules and control behavior in certain ways, or do some kind of introspection/metaprogramming. For example something like this:
#zero_initialize_arrays = #false;
#zero_initialize_structs = #false;
#zero_initialize_primitive_types = #false;
#optimization_level = .MAX_SPEED;
#set_property = #inline;
#proc crunch_some_numbers : () {
...
}#on_enter_proc { profiler_start(rdtsc(), #proc_id); }
#on_leave_proc { profiler_end(rdtsc(), #proc_id); }
#proc test_one : () {
...
}
#proc test_two : () {
...
}It could also be used for types, maybe even local scopes. Or just as a generic way of toggling settings and modifiers for any other thing in a more convenient and coherent way, then use #ct_if to centrally change what it does.
I haven't thought about this idea very much so I'm not 100% sure about the details of how it would work.
-
There's some merit to a type called
charbecause it communicates that something is meant to be text better thanu8. However, from my experience it doesn't really work that way in practice. I often want to just read/write text from/to binary chunks, and UTF-8 is a variable-sized format, so I usually end up thinking of text as a blob of bytes, not as an array of characters. The String type communicates that something is a string anyway, this is mostly a concern in C which doesn't have any explicit string types. -
There's a few reasons for prefixing signed integer types with
iinstead ofs. Firstly, you rarely think of the words "signed integer", I think it's more common to think of just "integer" or "int" which starts with "i". Meanwhile when you think of an unsigned integer, you specify with the word "unsigned" which starts with "u". Secondly, the letter "s" also reminds of "string" and "struct", while "i" doesn't really remind of anything else. When I see "s32", it immediately makes me think of some kind of string. I might want to make a string macro that uses a 16-bit integer for it's length, that's what "s16" sounds like to me. Also, I want to type "int" for the default integer, not "sint", so "i" is just more consistent. -
It's always annoying when you want to use a name but it's reserved by something else. I seem to run into this problem consistently enough that I decided to prefix ALL keywords with
#, a bit like an inverse of PHP (where all variables are prefixed with $, which I find extremely annoying to code with). I'm not 100% sure if I like this, but I did it for now.This is also part of the reason for renaming
switch()intoif()..., replacingdefaultwith an emptycase, and removing theunionkeyword (the main reason is because I want to think of structs and unions as the same thing and because it's easier to text-search types that way). I have a desire to replace#case Xwith some kind of symbol, such as... Xor-> X, however I like the fact that#casecan have similar syntax highlighting as#ifso I probably won't change it.I'm concerned that basedefs.tfd creates 2 separate worlds for this language, one where raw syntax is used, and another where basedefs is used (plus a third world where the user defines their own names). I want to also prefix primitive types, but I really really really really hate the idea of ever seeing #-prefixed types being used in code. People might be encouraged to use them if they prefer the raw syntax or otherwise don't like basedefs.tfd.
-
Something I've always wished I had was global error numbers. The value could flow all the way from some inner function to top-level code and retain it's meaning. One interesting way to enable that would be to allow enums to be expanded dynamically, you could even use #enum_member_names to print the error name.
#enum ERRNUM : {
NONE;
};
#proc fooler : ERRNUM () {
#if (x) #return #unique_enum(ERRNUM, BAD_THING_HAPPENED_IN_FOOLER);
#if (y) #return #shared_enum(ERRNUM, MEMORY_ALLOCATION_FAILED);
#return .NONE;
}
#proc bar : ERRNUM () {
#if (x) #return #shared_enum(ERRNUM, MEMORY_ALLOCATION_FAILED);
#return .NONE;
}
#proc main : () {
ERRNUM e = foo();
#if (e) {
[]String error_names = #enum_member_names(ERRNUM);
printf("Error occurred: {}\n", error_names[e]);
}
}This could potentially have uses in other places, for example plugins/customizability/extensibility, like adding a new UI module to a UI library.
Maybe some enum property like
#expandableto go with this would be helpful.I'm not sure if this is a good idea, it certainly sounds like it would complicate the compiler since no value can be assumed to be invalid until all files in the program have been parsed (the value may come from an expanded enum). Maybe the language should just have a hard-coded error type that can be expanded.
-
Ideally it should be possible to associate every major name and type with a single letter.
Struct collides with String, could be fixed by renaming Struct into Plex. Function collides with Float, but Procedure would collide with Plex, but there's no other good names for it. Struct could be Class for cultural reasons, but not only is it bad at describing what it does, it's also a word that definitely will be in user code so it would become impractical to remove the # prefix. Group would be super descriptive, but even worse than Class for user code.
Boolean Define Enumerator Float Function Integer Macro Procedure Plex Pointer String Struct Signed Type(def) Unsigned Union I lean slightly more towards "proc" than "function"/"func" as the function keyword because it's less likely to cause naming conflicts. The downside is that I never actually use or think abuot the word "procedure", it's always a "function" to me.
-
TFD stands for "Tool For Doing".
Syntax highlighting test.
Comments: /* There's something to say. */
Primitive literals: 12345, 'X', 0xffff00ff
Strings: "Hello world!"
Named constants: #true, #false, #null, #proc_id ...
Control flow: #if, #for, #case, #break, #return, #goto ...
Macros and compile-time: #macro, #import, #ct_if ...
Types/names/constructs: #proc, #struct, #enum, #typedef
Modifiers: #private, #inline, #pack(1) ...