Data-oriented C programming guide


NOTE: this guide is VERY unfinished. I made it public because it's better than nothing, but there's a lot of additions and improvements and rewriting that I still need to do.

What is this guide?

This is a C programming guide that aims to help you understand programming more clearly, and simultaneously make programming simpler and easier. I believe that the way people approach programming these days is unnecessary, adds complication that doesn't need to be there, and brings you further and further from truly understanding what you're doing.

This guide will teach you to make a C program within minutes, and learn programming from a "data-oriented" perspective: understanding how the computer works in reality, as opposed to just plugging together words.

The programming in this guide is for "serious" software programming. If you want to make things with minimal effort or don't care how efficient your program is, you should go read tutorials for Python instead.

This guide is partly split into 2 halves, one that goes through things as fast as possible (referred to as Sanic), and another that gives thorough explainations about everything (referred to as Full or Comprehensive). You might want to check the Sanic version first to get an overall view of things, and then read the comprehensive version afterwards to solidify your understanding.

Who is it for?

This guide is intended for people who are (in order of importance):

This guide is NOT a complete C tutorial, and it doesn't go into all the ways in which your computer handles things (for example thread or instruction scheduling). It's meant to teach you everything necessary so you can start writing programs on your own and understand what you're doing.

Crash course to computers

How to do the program?

To make a program, you need to give code to a compiler. The compiler reads your code, and creates a program based on it.

What actually IS the compiler? What is a program?

Everything in your computer is just programs, and the compiler is no exception, it's not even a very complicated program. A compiler takes your files, reads the text in it, and writes out a new file that has CPU instructions that tell the CPU do the same things as your code says. You could think of the compiler as a translator from one language to another; from a programming language into CPU language.

A CPU instruction is just a couple bytes of data (see 'What is data?' below). When you load that byte into the CPU (your operating system does the loading), the CPU does something based on what the instruction says. A "program" is any file that has a bunch valid CPU instructions in it.

If you're interested in this, search for "cpu instruction set". An instruction set is a set of rules that define which bits represent which actions in the CPU (for example moving data between 2 places or adding 2 values together), x86 is the instruction set for most current CPUs, ARM is the one used in smartphone/tablet CPUs. It's not particularly important to learn any of it for programming purposes though, but it can help if you want to truly understand your program.

Your program file also has some information for your OS, for example to inform the OS where to find certain things from the file. You can search for the specifications to your operating system's executables (.exe on Windows) to find out what else is in the program file, but that also isn't very useful unless you want to make a compiler or modify executables for some reason.

What is data?

A bit is a single electric signal or state somewhere in your computer, it is either 1 (high) or 0 (low). You don't need to care how the computer is able to have bits (think of them as microscopic electronic light switches), the important thing to know is that data is 1s and 0s. The storage of bits is called memory.

A bit alone can't do very much though, thus memory is usually interacted with in bytes: a group of 8 bits. By changing the bits in a byte, that byte can be made to represent 256 different things. The most common example is to make it represent text characters, for example the ASCII specification says that the bits 01000101 mean "6", and the bits 00110110 mean "E", and so on. If you wanted to make a program that supports ASCII, you would have to look up what bytes ASCII defines for each character. Thankfully almost all text in computers is ASCII or some derivative, and thus your compiler and other programs will handle it automatically.

You could write your own text system that uses completely different bits to represent "6" and "E", ASCII is just a commonly agreed way to do it. There's some values in a byte that don't represent useful ASCII characters, the game Dwarf Fortress for example uses those values to represent it's graphics. It's completely up to you how to use your data to do interesting things.

You can go further by grouping bytes, for example 4 bytes can be used to represent a typical integer; a number with 4,294,967,296 different possible values. When you're writing programs, it is important to be at least somewhat aware of what kind of data you're using: when you put an integer somewhere, you're putting in 4 bytes worth of data. If there's another integer next to it in memory then it would be 4 bytes away, if you want space for 10 integers then you would need 40 bytes of memory...

All of this will make much more sense when we actually write a program that uses data.

Why C?

This guide will be using the C language. You may wonder why, C is old and you might have heard that it's difficult or dangerous.

It is true that C is "dangerous", just like it is true that riding a bike without training wheels is dangerous. C is dangerous in part because it has a relatively high amount of freedom, it doesn't tell you that you're not allowed to do dangerous things. Throughout this guide I will try to mention ways to help you avoid mistakes.

While C is indeed very old, it's still widely used because it's simple and there's no direct upgrade to what it is. C doesn't pile up countless unnecessary things and (often bad) ideas on top of the basics in the same way that languages like C++ do, nor does it try to hide the computer away from you like so many new languages do. Many of the most performant tools and libraries are made in C, and a lot of the systems that we use are at least partially made with C on the inside.

There is no perfect language and C isn't one either, there's plenty of ways in which C could be improved and C++ actually does some of those things, but C knowledge is certainly valuable since it works at such a fundamental level, most other languages are derived from it, and it almost directly translates to C++ which is the current industry standard language for software development.

C is particularly useful for learning because if you read about C, you're reading about low level programming. If you read about C++, you'll be drowning in C++ features and possibly a lot of bad opinions about how you "should" program, because C++ has so many features and different ways of programming in it.

Making your program fast usually involves going back to the basics and programming the way the computer wants you to instead of relying on "conveniences" that someone thought would make coding easier. Learning C will almost inherently teach you the right practices because that's how you're meant to program C.