C programming guide
Comprehensive intro to the C language
If you're following this guide properly, you should already have compiled a program. From now on I will only talk about code, I expect that you can change your own code and compile it yourself in order to try out things from this page.
This is a comprehensive intro that explains things thoroughly, if you'd rather get a fast run through then check the quick intro to C.
This guide is only comprehensive with explaining the basics, not comprehensive about everything there is to know about C. The contents of this page are a bit mixed, they're organized in a way that makes sense when reading from top to bottom without skipping around.
Do not expect to understand all of this at once, I didn't understand things from programming tutorials at the beginning either. This is all very simple stuff, but it might take some practice before things start to click, so don't hesitate to move on to the following parts of this guide after reading this.
Statements and variables
int thingy = 45;
This is called a "statement", a single part of the code that does something. Statements must always end in a semicolon ;
. This statement creates a new variable for a 32bit integer and sets the value to 45.
int
A data type determines to how much data there is, and how the compiler treats it. C has several pre-determined data types, int
is a type that refers to a 32 bit (4 bytes) integer. "Integer" is just a fancy word for "number without decimals", in other words an integer cannot be "1.25", only "1" or "2" or some other whole number.
thingy = 45
This part sets the value of the "thingy" integer to 45.
Whenever you type "thingy" after declaring it, it refers to the data that the variable has, in this case the 4 bytes that represent an integer. You can send those bytes elsewhere, place them into another pile of data, insert some other data into them to change the value, or do various operations on them. We can modify thingy with additional statements and math:
thingy = thingy + 45;
This sets thingy to it's own value with 45 added to it, so now the value is 90.
thingy += 45;
This is a more convenient and clear way to do the same thing.
You can of course use multiple variables and do operations with them:
int thingy = 45; int multiplier = 3; thingy = thingy * multiplier;
This creates 2 values, and multiplies one with the other, in the end the value of 'thingy' is 135.
Basic types
Other than int, there's various common data types:
char
= a 8 bit integer, in other words a single byte. Often used to represent text characters.short
= a 16 bit integer.long
= a 32 or 64 bit integer (depends how/where you compile the program).long long
= a 64 bit integer.float
= a 32 bit number that, unlike an integer, can have decimals.double
= a 64 bit version of float, it's more precise and supports larger numbers.void
= a special type that refers to the lack of type; it's nothing. You can't make a void variable, but you can use it for other things.
You don't need to remember these yet, just know that int
is a number and float
is a number when you need decimals. char
will be used for text and sometimes other kinds of data.
If you mix different number types in operations, the compiler will automatically convert from one type to another. You should avoid this if you can though, it can lead to mistakes because some types are smaller and may not be able to contain the expected value properly, and there might be inaccuracy because integers and floats work differently. It can also lead to slowness if you do it in very large quantitites.
Integers have alternate versions that are slightly different, called "unsigned integers". You don't need to care about these either yet. Normal integers are secretly "signed" integers, what this means is that the first bit of the data, referred to as the "sign bit", is used to determine whether the value is positive or negative. Unsigned integers don't have that, so they can use the first bit for extra storage, allowing the value to be twice as high but not allowing the value to be negative.
As an example, the numerical value of a 'char' can be in the range -128 to 127, while the numerical value of an 'unsigned char' can be in the range 0 to 255. In case you expected +128 and +256, you must remember than 0 also counts as a value.
Comments
int x = 12; // this is a horizontal position... int y = 50; // ...and this is a vertical position! /* This program is free software! Donate or I'll die */
Anything after //
in a single line will be ignored. This is a "comment", you can use it to describe your code and leave notes. If you want to write lots of text or put a comment in the middle of a line, you can start a comment with /*
and end it with */
.
Comments are very valuable because it helps you remember what everything does when you come back to your code a month later. If you're working with other people or releasing your code, comments also help other people understand the code.
Comments aren't always good though, if you add too many comments or add comments that aren't very useful, your code can easily become cluttered and more confusing. You may also need to keep updating comments if you keep changing the code, which can become a chore if you have paragraphs of text describing everything.
Another use for comments is to turn parts of your code into comments in order to disable them, without requiring you to remove it entirely.
Functions
A function, or a "procedure", is a bundle of code that we can jump into from anywhere else in the program. Jumping into the function is called "calling the function". The program will do whatever is in the function, and returns back when the function ends.
do_the_thing(9000);
A name followed by parentheses is a function call. It calls a function, and sends whatever is inside the parentheses into the function. In this case we're calling a function named do_the_thing and sending the value 9000 into it.
int something = gibs_me_dat(9000);
Functions can also return values back to you, in this statement we're calling a function that returns an integer, and we store that integer into a variable.
Here's how to define a new function:
int gibs_me_dat (int x) { return x * 2; }
Declaring a function is similar to declaring a variable, except you put parentheses ( )
after the name, and curly brackets { }
after the parentheses. You do not have to put a semicolon after the curly brackets, but you can if you want.
Inside the parentheses you can declare some variables, these are what the function takes in. When we called gibs_me_dat above, the compiler recognized that the function wants a 32bit integer, thus the value 9000 is sent as a 32bit integer. If we sent in another variable with a different type, such as a 'char', it would be converted into a 32bit integer first. If you send a variable that can't be easily converted to the correct type, the compiler will give you an error.
Inside the curly brackets are the actual contents of the function. You can put any statements you want in there.
return
"return" is a special keyword in C, it stops the function and returns back to whoever called it. It also sends back the value that you put after it, in this case we take the value that was sent in, and send back the same value multiplied by 2, in other words if we send 9000 into this function, it gives back 18000. You can put return anywhere into the function, not just at the end.
void do_something (void) {}
This is a function that doesn't take anything and doesn't return anything. Putting void into the parentheses isn't strictly necessary, but I recommend doing it because otherwise the function has some weird properties.
void do_something (int x, int y, float abc) {}
If you want a function to take multiple values, separate them by commas.
do_something(9000, 50, 1.34);
Similarly, to send multiple values into a function, separate them by commas.
NOTE: a function called "main" is special. When your program starts, it starts from the main function.
The main function normally returns an integer, returning 0 from main indicates that your program ended successfully. Returning another value indicates that it exited because of some problem, for example if a compiler fails to compile your code, it returns a non-0 value from it's main function, that's how our build script knows whether it was successful or not. The compiler allows you to set the return type of main to "void" (i.e. nothing) and not return anything, but it basically just defaults it to int 0.
Our example program didn't put anything into the main function inputs, you can leave it empty for convenience, but the main function actually does receive some values (we'll look into those when we make a program later). When you leave the parentheses of a function empty, you can send any and all values that you want into it, and the compiler won't complain. It doesn't make any sense though since you can't use the inputs, there's some obsolete historical reason for why it works that way. If you add void
into the parentheses, the compiler will warn you if you try to send something into the function, this is recommended since it can help you avoid mistakes and there's no downside to doing it.
Libraries (#include)
We're skipping ahead a little so we can start understanding the original program that we compiled and printing our variables. Let's look at that again:
#include <stdio.h> int main () { printf("Bag of biscuits\n"); return 0; }
Most of this should be fairly clear by now, but you might wonder where the printf() function comes from. The answer is:
#include <stdio.h>
Lines starting with # are special instructions for the compiler, they won't actually be part of your program. #include is an instruction that basically says "place the contents of this file here".
This imports C's "standard IO" (data In/Out) library, thus making it's functionality usable from your code. stdio.h gives you ways to print text onto the console and read text from it, for example printf() is a function that puts text onto the console, it's defined somewhere in stdio.h.
A library is essentially just another file like your main.c, it has a bunch of things in it and when you #include it, you can use those things. The included file can also include other files. You'll want to eventually split your own code into multiple files for better organization, and you can then #include them from other files.
Libraries may be in a pre-compiled form though, so you can't necessarily find the actual code from the included file. The included file usually only has some basic information that describes what's in the compiled library so your compiler can make appropriate decisions, for example stdio.h only has the "function header" for printf. A function header is the top of the function, the part before square brackets. You CAN put code into the included file if you want to though.
There are 2 ways to include a library:
#include "stdio.h"
#include <stdio.h>
The difference is that the compiler searches "stdio.h" from the folder that your code files are at, while <stdio.h> searches it from folders that the compiler has been informed about. The compiler already knows where to find it's own files like stdio.h, but you can give it additional information from the command line to let it know where to look for files from. If you want to include your own files, then you probably want to use the first version. You can also use a path to the file, not just the file name.
The extension .h stands for "header", .c stands for, well, "C". They're basically the same thing though, you could also #include a .c file.
There's much more to know about libraries (such as how does the program use functions when the code isn't in the included file), but we'll continue on that topic later when we look at libraries more in depth.
Printing text with printf()
Printf is one of the functions you'll probably use most, it is helpful to learn it early so you can see the results of your code.
printf("Bag of biscuits\n");
Any text that you send into printf gets printed into the command line console. Text inside double quotes will become a string, we'll look at strings more in the next section. You can use a backslash \
to insert some special symbols into text in C, for example \n
will add a 'new line' character. If you don't add \n, your prints will go back-to-back and look confusing. If you want to print a backslash itself, you can backslash another backslash \\
.
printf("His power is %i, perhaps more!\n", 9000);
You can print variables by sending them after the string and adding a tag in the text. Tags start with %
, for example %i
will read the next variable as a 32bit integer. The above would print His power is 9000, perhaps more! into the console.
printf("%i - %f - %s\n", 9000, 1.32, "test");
You can send any number of variables to print everything you need. %f
will print a float, %s
will print a string. There's many more tags, but you can look them up yourself if you want to know. If you see weird values or broken text, you may be sending variables that are incompatible with the tags.
Note: normally functions can't take in an arbitrary number of variables like this. If you make a function that takes in 2 integers, you must always send it 2 integers when you call it, no more, no less. printf can do it because it's a "variadic function". Variadic functions are more complicated than I want to go into here, you can search for it yourself if you care, look for "variadic arguments".
You don't need variadic functions and should avoid making them in general when using C. They cannot take in your data correctly (integers will get converted to a different type), they make things unnecessarily complicated, and you can't detect what the type of each input is. That's why printf requires you to identify each tag manually; because there's no way for printf to know what the type of the input variables are. Normally if you want a function to process many things, you would send them in an array. The only function that really benefits from the variadic nature is printf because you want to call it in many different ways all the time.
Text, arrays, pointers, data
That might sound like a lot of things at once, but they're all together because they are very closely related.
This is where the ideas of memory and data start to become important. You could think of your computer's memory as a very long line, with stuff placed onto it. All of your variables and data are slotted somewhere onto that line. In fact, even functions are placed onto that line, and when you call a function, you're telling the CPU to move to where the function is and start executing the code there. Everything about programs is about data.
An array is just many values placed back-to-back on the line, and text (from now on referred to as strings) is just an array of characters.
A pointer is an integer whose value determines some distance along that line where you'll find something. For example if the pointer's value is 1000, then it points to the 1001st byte in memory.
"WAIT, 1001st? Why not 1000?"
The pointer is an offset from the beginning. If the pointer's value is 0, then it's pointing to the beginning, in other words the first byte, because you're not moving from the beginning at all. If you increase the pointer to 1, now you're moving 1 byte away from the beginning, and in effect pointing to the second byte. This is why an offset of 1000 would point to the 1001st byte.
This is called 0 indexing, you'll have to get comfortable with it in programming because arrays are basically always 0 indexed.
When you create an array or a string, the array data is placed somewhere onto the line, and then you get a pointer that points to the location where the array begins. When you use an array or a string in C, you're actually just using a pointer. Then if you want to access items in the array, you simply move the pointer's value accordingly.
With the memory above, if you had a 'char' pointer with it's value set to 6, you could use it as a string that says "Hello". If you increased the pointer value by 1, it would then become a string that says "ello".
If you're curious enough, you may wonder what happens if you move the pointer backwards or otherwise point to an unexpected location. The answer is "who knows". The program doesn't really care what you point your pointers into, it will just treat the bytes in there as ASCII characters, which may result in weird garbled nonsense if the bytes there haven't been set to represent proper text. Or if your pointer is an integer pointer, the program may interpret the memory as some crazy random number like -412,320,657.
If the pointer points into a location that your program isn't allowed to access and you try to access the data there, the OS will crash your program with no survivors in order to protect everything else that's running on the computer.
Basic pointers
int* pointy = NULL;
The above defines a pointer variable. It looks mistakenly like we're declaring an integer, but there actually isn't an integer anywhere. There is only a pointer, as denoted by the asterisk ( * ) inbetween the type and the name. Furthermore, we're setting the pointer's value to NULL, NULL is just a special word for 0. You cannot use a 0 pointer, but it's useful because if a pointer is 0, you know that it doesn't point to anything. ALWAYS initialize your pointers to NULL if you don't have anything else to put to it immediately.
A null pointer isn't valid because there's some critical operating system things at memory location 0. Therefore NULL is used to denote that a pointer doesn't point to any data.
int thingy = 60;
int* pointy = &thingy;
'&' before a variable allows you to get an integer that represents the location of that variable's data. The code above changes pointy's value to be the position in memory where thingy's data can be found: pointy points to thingy's data. If you went to where pointy points to and modified the data there, it would change thingy's value:
*pointy = 5000;
Now the value of 'thingy' is 5000 even though we didn't do anything with 'thingy'. To get or modify the data at a pointer's destination, put an asterisk before it. This is called dereferencing the pointer.
pointy[0] = 5000;
This is identical to the above. It's just a different way of dereferencing the pointer, this will become more relevant soon.
Before we move on:
int* pointy = 60;
This is NOT valid and the compiler will warn you about this. If you did this, you'd be making a pointer that points to memory location 60. This doesn't make any sense because you have no idea what's in there, it's most likely forbidden for your program to access that location.
Arrays
int fiddle[20];
The syntax here is a bit confusing, because now it REALLY looks like we're defining an integer. However, in reality the variable we declare here is just a pointer to an integer again. The main difference between this and a normal pointer is that this actually does create the integers, and then sets the pointer to point into them. Imagine defining 20 separate integer variables back-to-back, and then defining a pointer variable that points to the first one. This is that pointer. However, the compiler won't let you change this pointer's value, it will always only point to the first integer.
fiddle[0] = 1337;
Remember how we dereferenced a pointer like this earlier? You may already guess what the point heh of it is.
fiddle[1] = 9001;
The value inside square brackets is an offset from the pointer's destination. If it's 0, then we're just dereferencing the first integer. But if you change it to 1, we're now dereferencing the second integer.
The compiler will recognize a pointer's type (in this case an 'int'), and when you offset from the pointer by 1, you're not actually moving by 1 byte, but rather (1 * size_of_type) bytes. Since an integer is 4 bytes, each time you move by 1, the compiler actually changes the pointer position by 4 bytes.
int fiddle[20] = {1, 2, 3, 4, 5};
If you declare a variable but don't set it to anything with =
, then the value may randomly be anything at all. When you declare an array, the pointer will always be set to point to the first integer, but the integers themselves will be random. To set the data of "complex" variables like arrays, you use curly brackets along with some values inbetween. The first 5 integers will now be set as desribed, and anything that you didn't define in the brackets will be set to 0. It is typical to just do {0}
to set everything to 0.
Setting values to 0 in the beginning is called "zero initialization". This is a good habit to get into because if the values are random, they may cause unexpected problems in places where you didn't expect.
int fiddle[] = {1, 2, 3, 4, 5};
If you don't define a size for the array, then the size will be based on the values you add into the brackets, this array will have a size of 5. You MUST add some values into the brackets if you don't define the size manually.
WARNING: You cannot return an array like this from a function. We'll find out why in the "Memory management" section below and how to do it properly.
Pointer arithmetic
Pointer arithmetic means changing a pointer's value in order to make it point to different things in memory.
int fiddle[20];
int* pointy = fiddle;
This creates a duplicate pointer: since we're setting the value of pointy into the value of fiddle, both 'fiddle' and 'pointy' point into the same place. However, unlike fiddle which was defined as an array, the compiler will allow us to change the value of pointy:
pointy += 1; pointy[0] = 9001;
If you modify the pointer itself, you're changing where it points to. Now pointy points to the second integer in the array, thus dereferencing with [0]
will actually dereference the second value in the array. The first integer is now behind the pointer's target and would have to be accessed with [-1]
.
Again, even though you're only adding 1 to the pointer, the compiler recognizes that it's a pointer to an integer, and adds 4 to the pointer's actual value.
Try to use the array notation with square brackets instead of pointer arithmetic whenever you can. It is very easy to make mistakes and cause bugs with pointer arithmetic. It's not something that you should never do though, it can be very useful, but it's one of those "taking off the training wheels" things so just be careful with it.
Strings
Strings are arrays of 'char's. That's it. There's 3 ways to define text characters.
char fiddle1[] = {72, 101, 108, 108, 111, 0}; char fiddle2[] = {'H', 'e', 'l', 'l', 'o', '\0'}; char fiddle3[] = "Hello";
All of these do the exact same thing. A character inside single quotes gets translated into the appropriate ASCII value, for example 'H'
is the same as 72
. You can also use double quotes to just type a bunch of characters in a row. Usually you use the last method when defining strings.
Strings in C are "null terminated", which means that the last character must have a value of 0. If there's no 0, then it's impossible to know where the text ends and it can cause serious problems if used. There's no ASCII character for 0, but we can use \0
to define the value 0 in a place where the compiler expects a character. A double quoted string will automatically have a 0 at the end. Obviously ASCII does have a value for the character "0" for when you need to print numbers.
You can type more than just ASCII characters into the strings, for example "おはよ". However, it gets stored with UTF-8 encoding which is most likely not supported by other systems such as printf. UTF-8 is beyond the scope of this guide though.
When you call a function like printf("Hello")
, you're not actually giving that function any text, you're only sending a pointer to where that text is stored. So if you want to make a function that receives the string, it should take in a pointer to a 'char':
#include <string.h> void print_me (char* text) { int text_length = strlen(text); printf("Your string (%s) is %i characters long.", text, text_length); }
To find out how long a string is, you can use the function strlen
. strlen takes in a char pointer, finds the 0 value at the end of the data, and returns the distance from the pointer to the 0. The result is the number of characters in the string. You need the library string.h
to use strlen.
You should consider storing the length in a variable so you won't need to use strlen many times for the same string.
NOTE: when you call strlen, you get the number of characters without the 0 at the end. If you need the total amount of data used by the string, you need to add 1 into the length.
You cannot use strlen to get the size of an array. Instead, you need to keep track of the array length yourself by storing it into an integer variable or something, and sending both the array pointer and the length integer to other functions. We'll look at structs later, they will help with keeping related values together.
Memory management
You already know that when you declare a variable, the data for it's value will be somewhere in the memory. The answer to "where" is that it's somewhere on the "stack".
There's 2 places where your program puts data into: the stack and the heap. The stack is like a tower, every time you want something, you put it to the top. And when you don't need it anymore, you remove it. For example when you return from a function, you remove everything that was placed onto the tower after you entered the function. The good part of the stack is that it's easy, you can just create variables and forget about them because they get automatically cleaned up when you return from the functions. It's also fast in terms of performance.
However there's a big problem with that. If you create an array and return it from the function, the array data will just be tossed off the tower and can't be properly used, all you will be left with is the pointer value that was returned from the function.
The solution to this of course is the heap. The heap doesn't do anything by itself, instead you can manually request the operating system to give you some amount of memory from the heap. The operating system will reserve the requested amount of space and then give you a pointer to it, then you can do whatever you want with that memory.
int* fiddle = malloc(2000);
To get memory from the heap, we must "allocate" it. We can do it with the function malloc()
. Malloc takes in a number that represents how many bytes of memory you want, it will reserve that amount of memory, and return a pointer to it.
'fiddle' can now be used in exactly the same way as the arrays and pointers that we used before. However this data will never go away unless you release it yourself.
Allocating 2000 bytes isn't necessarily very helpful though, what if we want to allocate space for 20 integers? We know that an integer is 4 bytes, so we could just allocate 20*4 bytes. However there's a better way:
int* fiddle = malloc(20 * sizeof(int));
We can use a special C function sizeof()
to find out how big something is (the size is in bytes, not bits), if you give it 'int', it will return 4. This way we can easily find out the size of the type that we want without having to worry about making mistakes.
Note that the memory that malloc returns is uninitialized, so if we read integers from it their values may be completely random. There's an alternative way to allocate memory that will set all the data to 0:
int* fiddle = calloc(20, sizeof(int));
calloc is exactly like malloc, with 2 differences. Firstly calloc will set all of the bytes to 0. Secondly calloc wants 2 things: how many things you want to store, and how big is a single thing (in bytes).
You could think of the malloc example as int fiddle[20];
, and the calloc example as int fiddle[20] = {0};
. calloc is slightly slower because it has to make sure the data is 0s, but it's better to use calloc in almost all situations in order to avoid strange bugs.
free(fiddle);
malloc() and calloc() come paired with free(). It will release the memory that you previously requested and allows something else on the computer to use that memory. The pointer that free() takes in must have the same value as you initially got from malloc() or calloc(), otherwise your program will crash. So if you use pointer arithmetic with it, you'll have to return it to the original value first.
When your program is closed, your operating system will just throw out all the memory that you had allocated, you do not need to "clean up" the memory yourself. In fact, if you do that, it will just make your program slower to close and more complicated to develop and maintain.
It is important to keep track of your pointers. If you allocate memory and then just forget about the pointer, it will cause a memory leak in your program. A memory leak will keep using more and more memory until eventually your operating system runs out of it and makes a mess.
malloc, calloc, and free are not in stdio.h
, instead they come from stdlib.h
. sizeof isn't actually a real function, it's more like #include, you can always use it without any libraries.
NOTE: Allocating memory is relatively slow, so you should avoid using malloc and the like for every little thing if you can. It can also cause your data to become fragmented (related data existing in different places) which will slow down your program.
It might seem annoying or weird that you have to allocate and free things yourself, but it's not bad when you get used to thinking of things as bundles of data and designing your program appropriately. Later we'll look into examples of how to use data and allocations in practice, and how you should structure things.
Structs & unions
Structs
struct Position { int x; int y; };
Structs are multiple variables grouped together in order to make them easier to use as a single "object". The variables in a struct are called "members" of the struct.
When you create a struct, you're creating a new type, and can then use it anywhere that you can use other types like int
in. You can create variables and arrays with it, return it from functions, get it's total size with sizeof()
, and so on:
struct Position hello = {0}; hello.x = 50; hello.y = 920;
Remember how we set array data to 0 with {0}
? You can do the same with structs, doing it is even more important here. Un-initialized structs are dangerous because the variables in it may be set to random values, which can easily lead to bugs that are hard to track. Always set structs to 0 when you create them unless you have a very good reason not to.
After creating a variable with the struct type, we can use a period .
to modify the members inside of it.
You can also initialize the struct by defining the members inside the curly brackets:
struct Position hello = { .x = 50, .y = 920 };
You don't have to set all the variables:
struct Position hello = {
.x = 50
};
This will set hello.x to 50, and everything else to 0. As long as you're setting the value with curly brackets somehow, you will be safe from weird un-initialized data bugs.
Typing "struct Position" is a little annoying, it's much more convenient if we can just type "Position".
typedef struct Position Position;
typedef is another special keyword in C, it can be used to define new types. The last word is the type's name, and in the middle is it's definition. Now when we type "Position" it has the same meaning as "struct Position":
Position hello = {0}; hello.x = 50; hello.y = 920;
You can also typedef a struct directly at the same time when you define the struct, it looks a little weird because typedef wants the name to come last:
typedef struct {
int x;
int y;
} Position;
Unions
union Value { char x; int y; float z; Position pos; };
Unions are very similar to structs, except all the members are overlapped in memory. The data that's used for one member is also used for all the other members. What this means is that if you change one of the variables, the other variables will change unpredictably. If you sizeof() this union, the size would be 8 because the biggest member inside of it is Position, which itself is 2 integers. You can fit any one of the members into 8 bytes of storage.
The point of a union is to let you conveniently store different kinds of things in the same space so you can avoid wasting memory. You will need to use some other way to determine which member of the union to use.
if / else / switch
if / else
In order to program actually useful things, you need to do more than just direct statements.
int x = 5; if (x == 100) { printf("x is 100!!! woah!\n"); } else if (x < 10) { printf("x is smaller than 10, lame.\n"); } else { printf("I dunno what x is. Not 100, but at least it isn't less than 10.\n"); }
This should be pretty self eplanatory. if
and else
are special keywords that let you make branches in the code. if
is followed by parentheses with a condition, and if the condition is true, the section marked with curly brackets will activate. If the condition is not true, we move onto the else
section.
"else if" is technically not a thing. There's only if
and else
, it simply looks like "else if" because there aren't curly brackets after the first else
. This is a more accurate representation of the above code:
if (x == 100) { printf("x is 100!!! woah!\n"); } else { if (x < 10) { printf("x is smaller than 10, lame.\n"); } else { printf("I dunno what x is. Not 100, but at least it isn't less than 10.\n"); } }
What you might derive from this is that the curly brackets aren't necessary. You can simply add a statement after the if/else:
int x = 5; if (x < 10) printf("x is smaller than 10.\n"); else printf("x is 10 or bigger.\n");
In this case the section will end at the first semicolon, you can't add more than 1 statement per section this way. Only use this format for very simple and straightforward things in order to avoid mistakes.
More specifically, the if
section will be activated if the value inside the parentheses is anything but 0. <
and ==
and similar conditions are a bit like functions that return either 1 or 0 depending on if the condition is true. Knowing this, you can just add a variable into the parentheses:
int x = 0; int y = 1337; int x_is_y = (x == y); // see explanation below if (x) { printf("This section will NOT activate because x is 0.\n"); } if (y) { printf("This section WILL activate because y is not 0.\n"); } if (x_is_y) { printf("This section will not activate.\n"); }
The value of x_is_y
is 0. As mentioned earlier, the conditions are basically functions that return 1 or 0. Since x is not the same as y, the condition is not true, therefore it returns the value 0. The parentheses aren't necessary, but it's good to add them for clarity. You can add parentheses basically where ever you want for clarity, or to change which condition or math operation happens first.
The conditions you can check for are:
==
Equal!=
Not equal!
Not (opposite; if true then false, if false then true)<
Less than>
Greater than<=
Equal or less than>=
Equal or greater than
if (x == 10 || x == 100) { printf("x is 10 or 100, nice round numbers.\n"); } else if (x > 1000 && x < 2000) { printf("x is inbetween 1000 and 2000.\n"); }
You can check for multiple conditions with ||
and &&
.
||
(or) checks if either condition is true, and &&
(and) checks if both conditions are true.
switch
int x = 2; switch (x) { case 1: printf("x is 1.\n"); break; case 2: printf("x is 2.\n"); // notice the lack of 'break' below this case 3: case 4: printf("x is 2, 3, or 4.\n"); break; default: printf("x is some other number.\n"); break; }
Switch is basically a game console simplified way to write a long if/else chain. You put some value into the parentheses, and use the case
keyword to check for what it might be. default
is a special case that activates if none of the other ones did.
Note that when a case activates, it will keep going to all the following cases until it hits a break
. Break stops the switch chain. Going to multiple cases at the same time is called "falling through", sometimes you may want to do it intentionally, but it's very easy to do it by mistake by forgetting break
.
switch (x) { case 1: { printf("x is 1.\n"); break; } default: { printf("x is some other number.\n"); break; } }
You can use curly brackets for the cases if you want, it may help with clarity if the cases have a lot of code inside them. You can also put the break
after the curly brackets if you want.
Loops
Any time you want to do something with many things on an array or other kind of list, you will want to use a loop to do it.
while loop
int x = 0; while (x < 20) { printf("%i\n", x); x += 1; }
The while
loop is basically the same thing as an if
block, except that when it ends, it goes back to the start and checks the condition again. Since x starts from 0 and gets increased by 1 in the loop, this code will print numbers from 0 to 19.
int stuff[20] = {0}; int x = 0; while (x < 20) { printf("Value at position %i is %i\n", x, stuff[x]); x += 1; }
Remember how you can dereference a pointer with []
, you can also add variable into it. Here we're looping through an array and printing the values from it. All the values in the array are 0 though, so this will just print a bunch of 0s.
This is when you need to start being careful. If x goes higher than 19, you'll dereference a location that's beyond the array's data, and may crash the program or cause a memory corruption bug. This is called a "buffer overflow", it often happens if a string doesn't end with 0. Before we look at methods for dealing with it, there's a second loop type.
for loop
int stuff[20] = {0}; for ( int x = 0; x < 20; x += 1 ) { printf("Value at position %i is %i\n", x, stuff[x]); }
This is exactly like a while loop except you can put all the relevant components into the parentheses, separated by 2 semicolons. First you declare a variable, then you have a condition, and finally you modify the variable for each loop. There's nothing much else to say about it, except that each part is optional, the only thing that is mandatory are the 2 semicolons that separate the 3 parts. Some people create infinite loops like this: for (;;)
for (int x=0; 1; x+=1) { if (x == 123456) break; if (x > 10) continue; printf("This is one of the first 10 values.\n"); }
This loop's condition is "1" which is always true, causing it to just loop infinitely. However when x reaches 123456, we use break
which forcibly stops the loop.
At the end of the loop we print some text. However right before it, we use continue
if x is greater than 10. Continue is a bit like break, except it doesn't stop the loop, instead it just stops the current iteration and skips to the next. As a result, the text doesn't get printed when x is greater than 10.
Bonus: remember how I said that there's no such thing as 'else if' because it's actually just 'else' without brackets? This is also valid:
if (something > 10) { // do something here } else for (int x=0; x<10; x+=1) { printf("X is %i.\n", x); }
I don't recommend doing this in actual programs though, it makes your code more confusing.
Miscellaneous
Array safety
Dereferencing a pointer outside of an array's data is called "buffer overflow", and will lead to bugs and crashes. To make arrays safer, you should make sure your loops go through the same amount of items as there are in your array.
#define STUFF_COUNT 20 int stuff[STUFF_COUNT] = {0}; int x = 0; while (x < STUFF_COUNT) { ... }
#define
is another command for the compiler, it defines a new text replacement "macro". Now whenever you type "STUFF_COUNT", the compiler will replace it with "20". Now the loop and the array are definitely the same size, and even if you are looping through this array in many places in your program, you can easily change the array size by changing the #define macro.
However using macros like this can be pretty restrictive, because you have to hard-code the number into the code and can't make it flexible.
int stuff_count = 20; int stuff[stuff_count]; memset(stuff, 0, stuff_count*sizeof(int)); int x = 0; while (x < stuff_count) { ... }
This time we're using another variable to define the array size. This is more flexible because we can receive the size variable from somewhere else or dynamically store different size variables for different arrays.
However there's one caveat to this. If you use a variable to define an array size, C won't let you initialize the data to 0 (don't ask me why). You have to do it manually.
memset
To set memory to 0 manually we can use another function that the default C libraries provide us: memset
. Memset takes in a 'pointer', a 'byte', and a 'count'. It goes to 'pointer' and starts writing 'byte' into memory until it has done it 'count' times. You could make your own memset like this:
void memset (char* data, char byte, int count) { for (int x=0; x<count; x+=1) { data[x] = byte; } }
There's a couple things to talk about here.
memset is in fact doing exactly what we were setting up for: it's taking in a pointer to an array and a variable that has the array's size, and then loops through that array. You don't need a new memset function every time you make a different sized array.
As a side note, memset is very old, it was invented before certain conventions were made. You should put the array size before the array pointer, this practice is recommended in the official C language guidelines. If you invented memset today, you would reorder the inputs like this: void memset (int count, char* data, char byte) {}
There's another problem though; it's taking in a char
pointer even though we have an int
array. I wrote it this way for clarity of what the function does, but this actually won't work because the compiler recognizes that the pointer types are different, and will give you an error in order to prevent a horrible mistake from happening. But we want this function to take all kinds of pointers, how do we do it? There's 3 things we can do about this.
Casting
The first solution is to "cast" the pointer. Casting is a way to convert values from one form to another, and to tell the compiler that you know what you're doing and to let you do it. To cast to another type, put the type in parentheses before the value you want to cast: (char*)stuff
. So to send our own int array into our own memset, we'd change:
memset(stuff, 0, stuff_count*sizeof(int));
Into:
memset((char*)stuff, 0, stuff_count*sizeof(int));
Macro with arguments
The second solution is macros. We already used a text replacement macro before, it is possible to give it values and make it look like a function:
void actual_memset (char* data, char byte, int count) { ... }
#define memset(data, byte, count) actual_memset((char*)data, byte, count)
Now when we call memset, we're actually triggering a text replacement that calls a secret real memset function, and slips in the cast in the middle. You could also create a memset that's specifically meant for integer arrays:
#define memset_int(data, byte, count) actual_memset((char*)data, byte, count*sizeof(int))
With memset_int, you no longer need to multiply the count with sizeof(int)
yourself since it's hard coded into the text replacement. You can use macros to do all kinds of wacky and creative things.
Wait, hold my beer:
#define memset(array, value, count) \
for (int x=0; x<count; x+=1) { \
array[x] = value; \
}
Don't worry if this looks confusing, you don't really need to understand it or do something like this.
This will insert the contents of the whole function in when you trigger the text replacement. We don't even need a real memset function since the text replacement causes the array to be looped through on the spot. As a result, this memset can be used to set numbers to any array type. We can again leave out the sizeof(int)
part when calling this, because this code uses the real array directly instead of changing it to a char pointer.
#define does not end in a semicolon, it ends in a new line. In order to make multi-line macros, we can use backslash at the end of the line to cancel out the new line.
The macros I just showed aren't very reliable though. Imagine you only want to fill in integers after the 10th one, and you call it like this: memset(stuff+9, 0, stuff_count-9)
. We're offsetting the pointer when we send it in, but this would get converted into stuff+9[x] = 0;
inside the macro and the compiler will get confused because it looks like you're dereferencing the number 9. To fix this, you should always add parentheses around the values in macros:
#define memset(array, value, count) \
for (int x=0; x<(count); x+=1) \
(array)[x] = (value); \
}
All of that is probably confusing and you don't really need to understand it, but I'm including it here to be comprehensive.
Void pointer
Macros can feel confusing though, they aren't always viable, and it's a little annoying if you have to write macros for everything all the time.
The third solution is to replace the char
pointer with a void
pointer. A void pointer is just a normal pointer, except the compiler treats it as a pointer to unknown data, and doesn't care what other pointer you put into it (or what other pointer you put it into). Functions like malloc
and calloc
return a void pointer as well, which is why the compiler didn't care when we put it straight into an int pointer earlier.
The compiler will complain if we try to dereference it though (since it doesn't know what type it is), so we have to create a char pointer inside of the function:
void memset (void* data, char byte, int count) { char* d = data; for (int x=0; x<count; x+=1) { d[x] = byte; } }
Some compilers (and even GCC with certain settings) will still complain if you mix void pointers with another pointer. Personally I think that's a mistake, because the whole point of a void pointer is to cast it to something else. Having to do it manually is just tedious.
We could also cast the void pointer inside of the loop directly like this: ((char*)data)[x] = byte;
, this way we don't have to create a new variable, but it looks pretty messy. There's many ways of handling the same problem, and it's up to you which one you want to use.
Constant values, enumerators
A constant is a variable whose value does not change. A common convention is to type constant names in all UPPERCASE. There's 3 kinds of constants.
#define THINGY 123
This one we already know about, it's a text replacement macro. Whenever we type THINGY, the compiler replaces it with 123. This isn't actually a variable or even a real thing though, it doesn't exist anywhere in your program. It's only an instruction for the compiler to replace your text. Macros are global to your program, if you #define something inside a function, it will be available in every other function as well, so using it inside functions is not practical.
const int THINGY = 123;
Variables can be defined as being constant. This is exactly like a normal variable except with 1 difference: the compiler won't let you modify the value. The compiler may be able to optimize certain things better if you use constants where appropriate, though it may just cause headaches. Unlike #defined values, const variables have a type and otherwise behave like variables, which may be preferable in some situations. Using const variables may also give more helpful error messages than #defined values, so it may be preferable for debugging reasons.
enum {
THANG,
THONG,
THOOONG = 1000
};
This is an "enumerator". Each name inside the curly brackets gets a unique value that's different from every other value inside the enumerator. The values start from 0 and go up to 1, 2, 3, and so on. You can also set a value manually if you want one of them to have a specific value.
typedef enum {
BIG,
SMALL
} SIZE;
SIZE benis = BIG;
You can give the enum a name and typedef it just like a struct. However unlike structs, the enum type isn't a group of many values, the type of the enum and the values in it is an integer of some kind, most likely int
, it is meant to represent one of the options in the enum.
The main reason to typedef your enum is to make your code more clear, because now you know that the value of this variable is supposed to be one of the values in the SIZE enumerator, and not any other number.
SIZE only has 2 possible values here, so if you need to store lots of SIZE values, the enum being an int
may be a waste of memory. Unfortunately C does not let you define a type for the enum so you can't control it's size, but there's a workaround that I tend to use:
typedef char SIZE; enum {
BIG,
SMALL
};
With this we're using a char
to define a new type, and then defining a nameless enumerator. Now when you use SIZE as a type, the type will internally be a char
and thus only 1 byte. Do note though that if the enumerator has values bigger than 255, a 'char' isn't enough to represent them.
Exact size integers
If you want an integer that's exactly 32 bits and your life depends on it, you can't really use any of the normal integers, because they may be different on some future computer or unusual device or change based on compiler/settings. Some types are already different depending on if you compile a 32bit or 64bit program. C comes with a library that provides types with exact sizes: #include <stdint.h>
By including this library, you get access to the following types:
int8_t
= integer that's exactly 8 bits (1 byte)int16_t
int32_t
int64_t
uint8_t
= unsigned integer that's exactly 8 bits (1 byte)uint16_t
uint32_t
uint64_t
If you're baffled by the weird names, you could typedef them into something cleaner: typedef int32_t s32;
typedef uint8_t u8;
etc.
Bitwise operators
int x = 5; x = ~x; // invert the bits in x
Bitwise operators let you do advanced manipulation of bits. Here's a table of different bitwise operators:
symbol | name | explanation | operation | result |
---|---|---|---|---|
~ | not | Inverts bits. | ~1101 | 0010 |
& | and | Bit becomes 1 if both sides have a 1 bit. | 0110 & 1100 | 0100 |
| | or | Bit becomes 1 if either side has a 1 bit. | 0110 | 1100 | 1110 |
^ | xor | Bit becomes 1 if the sides are different. | 0110 ^ 1100 | 1110 |
< | Left shift | Moves bits left. | 0001 << 2 | 0100 |
> | Right shift | Moves bits right. | 1000 >> 2 | 0010 |
Function pointers
int (*state) (Person); // The function pointer. This syntax is awkward but that's just the way it is. int idle_state (Person me) { if (me.saw_delicious_banana) state = moving_towards_banana_state; return 0; } int moving_towards_banana_state (Person me) { if (me.ate_banana) state = idle_state; if (me.exploded_from_eating_too_much) return 1; return 0; } void main () { state = idle_state; // State starts out as idle. Person person; while (1) { int result = state(person); if (result == 1) break; // If exploded from eating too many bananas, stop the loop. } }
A function pointer is what it sounds like: a pointer that points into a function. You can use the pointer to call the function, and can change the pointer to a different function at any time. This can be used for example to make state machines, as demonstrated above. (Note: the code above doesn't work, it's just an example.)
How do you open a window?
Let's ignore Linux for now.
Windows does not have a function called "malloc". But then how can malloc request memory from Windows? Could we make our own version of malloc()? The answer is yes, malloc is a function that comes with C that actually just calls the Windows function that gets memory: VirtualAlloc. C does not have anything that tells the operating system to open a window though, so you'll have to get the relevant functions from the operating system yourself.
Calling the operating system and then redoing the same thing for different operating systems is more work than most people want to sign up for. So instead of doing it, we can get a third party library that will do what C does for us: call the operating system in our stead. SDL is a commonly used library for this purpose, SDL makes it much easier to open a window and draw things onto it, and it works on multiple operating systems.
Calling the operating system yourself gives you more control over it's systems. I will show you an example of opening a window and drawing pixels onto it by calling the OS manually, but the rest of this guide will use SDL.
If you're interested in writing more OS calls yourself, Handmade Hero may be a good place to get guidance.
TODO TODO TODO TODO TODO
Strings initialized with char* instead of char[].
++ (increment) and --, replace some of the +=1's with ++.
Put macros in their own section. the sanic version is ok