C Programming Basics

Preface

C is a programming language made by Dennis Ritchie and Ken Thompson for systems programming of the UNIX operating system.

The way I see it, C is the closest you can get to Assembly Language among high level programming languages, and can in fact be compiled into Assembly Language code using the -S option. It is also one of the two main ingredients of C++ (the other being Simula 67), Objective C (a combination of C and Smalltalk) and an important vitamin if you want to prevent scurvy or develop film using coffee and washing soda.

This says a bit about what sort of programming language it is compared to say, JavaScript, which was made to run on top of a web browser or Java, which was made to run in a virtual machine.

There are many good books written on programming C already, and I write this tutorial more as an exercise in writing and to remind myself of the things I’ve learned from these books.

If you want a better source for learning C I suggest “The C Programming Language” by Kernighan and Ritchie if you already know a bit about programming and “C Primer Plus” by Stephen Prata if you want a comprehensive tutorial where you start from the very beginning.

From now on I assume you know the very basics of running simple commands in a terminal and locating a directory in the terminal / command-line interface, if not, here’s a list of Terminal 101 tutorials:

If you know one UNIX terminal, you know them all, since they’re all pretty standardised, this is where Windows can get a little annoying, but with their new PowerShell, they too seem to have got with the times of the standard command-line tools.

A Quick Note about Comments

Before we start, I’d like to add that single line comments in c are written with two slashes like this:

//This is a single line comment

And that multiline comments are made with a slash and an asterisk like this:

/*
This is a multiline comment.
*/

This tutorial will be filled to the brim with code examples, so I thought I’d get that out of the way first.


Hello World

So, before we start, here’s the basics you need to write in C.

  • A computer
  • A compiler
  • A terminal emulator
  • An editor

And that’s it. These can be set up in the major platforms thusly:

Mac

Both Mac and Linux are UNIX-like systems. OS X / macOS was based on BSD which was a version of UNIX released to the public (much to the chagrin of UNIX System Laboratories) and GNU/Linux was a UNIX-inspired kernel released around the same time as BSD went public.

Because of this, the terminal emulator in both Mac and Linux come as part of the package and all you need to do to get a C compiler is install XCode (found in the AppStore) and install the XCode command line tools.

To install command-line tools, open the terminal and execute:

xcode-select --install

You can also install gcc using homebrew, because so many programming tools are easy to install using homebrew, I suggest you pick it up right away.

To install homebrew, open terminal, paste this line and hit enter:

/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
Then to install emacs (an editor) using homebrew, run this command in the terminal:
brew install emacs

If you want gcc and not clang (XCode uses clang and creates an alias that makes you think you’re running gcc when you’re really just compiling with clang), you install gcc with:

brew install gcc

You now have a compiler, editor and terminal emulator.

Linux

For Debian-based linux flavours (Debian, Ubuntu, Mint, CrunchBang), simply run:

sudo apt-get install build-essential

To install the compiler

For the emacs editor without the GUI (the way I like it):

sudo apt-get install emacs24-nox

To run emacs in terminal, either run

emacs [filename.c]

or

emacs -nw [filename.c]

if you installed regular emacs.

Windows

Because Windows seems to prefer you use their Visual Studio IDE for all things programming, traditional introductions in C don’t really fit their way of doing it. So to make up for this, you have three options (that I’m aware of).

  1. VirtualBox with Linux
  2. Cygwin
  3. Ubuntu Subsystem for Windows

All three are decent choices, with a Virtual Machine you have the freedom to really trash your OS if you feel like it, when set up, just follow the instructions for Linux. With Cygwin you pick up the programs you want using the installer for the program (you install the programs you want to run in Cygwin at the same time as installing the program itself), and with Ubuntu Subsystem set up, you just follow the instructions I wrote for Linux.

Editor

You can use any editor you like, but as you might have noticed, I suggest using emacs in terminal mode. Another simple terminal based editor is nano, it too has the benefit of coming preinstalled with a lot of UNIX and UNIX-like systems (as of yet I’ve found nano preinstalled on every version of Ubuntu and OS X I’ve tried).

If you don’t want to use a terminal based editor, and don’t want to install anything, I suggest using regular Notepad for Windows, macOS comes with an editor called TextEdit that can edit plain text if you click Format -> Make Plain Text, and Linux has Gedit.

If you want a good GUI editor for code, I think both Notepad++ for Windows and Atom for macOS and Linux are great choices, both of which are free.

Ye Olde “Hello, World!”

According to computer lore, the tradition of first writing a “Hello, World!” program started with the book “The C Programming Language” by K&R. So here it is:

#include <stdio.h>

int main(int argc, char *argv[]) {
    printf("Hello, World!\n");
    return 0;
}

The original “Hello, World!” was simpler and went something like this:

#include <stdio.h>

main( )
{
        printf("hello, world\n");
}

As you can see, I’ve added some rather unnecessary, but pretty standard stuff I like to do with any C program, such as declaring the argument count “argc” and argument vector “argv” variable, if I wanted to I could also declare the main function this way:

int main(int argc, char *argv[], char *envp[])

And get the environment variable for some information about the system, but this isn’t as common as the two preceding parameters.

I also had the main function return an int, and have that int be 0 upon successful completion, this is somewhat of a convention in C programming, where returning 0 means that the function finished normally and -1 or a non-zero number might mean that something went wrong.

To run this program:

  • Copy or write the program by hand into a file called hello.c
  • Run the cc or gcc compiler with gcc hello.c 
  • Run the program by adding ./ in front of the compiled files name, in this case you run it with the ./a.out command – this is because .c files automatically compile to the filename a.out if nothing else is specified.

You can give this program a nicer name after compilation by adding the option -o [filename], for example, if I wanted the compiled filename to be hello I could compile it like this:

gcc hello.c -o hello

And then run it using the ./hello command.

You need to cd (change directory) into the directory you put the file in the terminal, if you don’t know how to use the terminal yet, go back and check the links at the beginning of the page.

So now I’ve explained that the main function can have three parameters, that is, the number of arguments (there’s at least one, where the first one is the name of the program), the argument vector, and the environment pointer (which is also a vector).

The main function is where the program starts executing, it’s where it all starts, and for most larger programs, that’s where the initialising functions are called and probably some form of program loop is started. Like with Java and C++, every C program needs a main function so it has somewhere to start running from.

But what about the stuff before the main function?

Well in this case, that’s where we include the header files, these are links to ready-made libraries that help us avoid reinventing the wheel. For this simple program all we need is the <stdio.h> package, which contains standard input output functions such as the printf() function.

And what about printf(“Hello, World!\n”);

After main we run the print formatted output function called printf(), this function takes a string, in this case “Hello, World!\n” and prints the string along with any other strings, numbers or character variables we want to add. The “\n” is an escaped character meaning newline, without this, the command-line will look ugly with the text directly preceding your username on the same line after the program finished.

The printf() function call is one statement, and any single statement is terminated by the semicolon character.

Then there’s the return 0; statement.

As mentioned, this marks the end of the function, since this return statement wasn’t put inside a conditional statement such as “if” or “switch”, nothing other than the } symbol or a comment should come after it.

And we’re done with “Hello, World!”, so how about we introduce som variables and loops then.


Variables

Now saying hello to your computer shows you how to run gcc and print stuff to the terminal, but how about actually adding some user input or working on some large amounts of data? You know, not useless stuff.

So what are variables in C?

As you might know already, variable is kind of like a “name” or “a box” that represents a value that might change over time – hence the name variable.

C isn’t an object-oriented language and fairly low-level compared to other languages, so in a C context, a variable represents a place in memory and the value stored at that particular place in memory, this is true for any other programming language as well, but when working with pointers and memory, this becomes even more clear with C.

It’s also worth noting that variables in C aren’t objects the same way they are in Java, Python or JavaScript, but simply binary numbers with a particular way of being interpreted. Because of this, variables don’t come prepackaged with a set of methods, and the methods and operators you might have been used to from more higher level languages are simply functions that exist apart from the programming language itself.

But enough theorising, how about some real world examples eh?

We’ll start with a simple calculator.

This simple calculator uses the argument vector to add two numbers:

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char *argv[]) {
  if(argc < 3) {
    printf("Usage ./adder [addend] [addend]\n");
    return -1;
  }
  // Accesses second element of argument vector and parses to int
  int a = atoi(argv[1]); 
  // Accesses third element of argument vector and parses to int
  int b = atoi(argv[2]); 
  int c = a + b;
  printf("%d + %d = %d\n", a, b, c);
  return 0;
}

Compile this and run by running

gcc adder.c -o adder

./adder 2 2

This sums up the arguments and should print something like:

2 + 2 = 4

If you try to run it by just typing and executing ./adder you should get a warning message:

Usage ./adder [addend] [addend]

In this case the if ( argc < 3 ) statement checks to see if we have too few command-line arguments (the length of the *char argv[] array) for our program, if they’re less than 3, we print the warning and return from main, which is equivalent to exiting the program.

Now let’s look at the variables in this program.

As you might have noticed, I’ve already introduced three variables in the “Hello, World!” program: int argc, char *argv[] and char *envp[].

The first such variable is an integer, the second and third are arrays of strings. These variables are arguments to the main function, when you run the program, the shell then executes the compiled program and calls the main function with the list of arguments written in the terminal.

The other three variables, int a, int b and int c are part of the function body and are local to the main function. If we want to, we could declare these variables outside the main function, and the scope of the variables would be global, making it possible to access the variables from any function in the program, the downside of this is that since variables can only be declared once, no other variable within the same scope can have the same name, and when programming any larger program, global variables will become a PITA to keep track of.

The integer data type is normally a 32 bit signed integer with maximum and minimum values that you can print out using this program:

#include <stdio.h>
#include <limits.h>
int main(int argc, char *argv[]) {
  printf("Int32 min: %d \nInt32 max: %d \n", INT_MIN, INT_MAX);
}

Which prints:

Int32 min: -2147483648 
Int32 max: 2147483647 

So we have signed and unsigned variables. What do we mean by this?

Well a signed variable has either an implicit plus or an explicit minus in front of the number, to get this, one bit is sacrificed (if the most significant bit is 0, it’s positive, while a 1 means negative).

The primitive data types in C are.

  • char
    a single character, usually 8 bits, and can be signed or unsigned
  • short
    a short integer, often 16 bits, can be signed or unsigned.
  • int
    a regular integer, often 32 bits, can be signed or unsigned
  • long
    a large integer, often 64 bits, can be signed or unsigned
  • float
    a single precision floating point number, usually defaults to 32 bits
  • double
    a double precision floating point number, usually defaults to 64 bits
  • pointer
    like the char, this is an integer that is interpreted in a special way, a char can be seen as the “address” of a character in a table, while a pointer is an address to a place in memory.

If not otherwise specified, char, short, int and long defaults to signed, meaning that you often need to specify when you want it to be unsigned, while you don’t have to specify signed.

If you work across platforms, you should not trust this, but then again, if you work across platforms, you shouldn’t be using the primitive types either, but rather types such as uint32_t defined int the <stdint.h> header file or some other standardised library.

A variable is declared this way:

datatype variable_name;

Example:

int a;

And a variable is declared and defined using the assignment operator =

datatype variable_name = expression_value;

Note that since we use the equals sign as an assignment operator, a single equals sign in a statement such as a = b” should be read as an assignment and never as “a equals b“, for this type of statement we use the equality operator == as in “a == b”.

Example:

int a = 42;

Or:

double b = (my_function_name() + 42) / 37;

As with any programming language, there are conventions to variable names.

To keep things simple, always start with a letter, stick to mostly alphanumeric characters and underscores, use UPPER_CASE for constant names, and unless you’re planning on writing the C library, don’t start names with an underscore such as _foo or _bar.

I use the term expression value, since the right hand side of the assignment expression can in itself be an expression, but that the expression must result in a value equivalent to the type declared. If the expression results in a similar, but still different datatype than the type declared, you can try and use a cast. For example, float may be converted to double precision floating point values by using the (double) cast in front of the expression value, and any integral number may be converted to int using the (int) cast.

You can also convert floating point numbers into integers, and get the address of a pointer to by casting it to int.

The following code:

#include <stdio.h>

int main(int argc, char *argv[]) {
  double a_dbl = 3.5;
  int a_int = (int) a_dbl;
  unsigned int a_dbl_addr = (unsigned int) &(a_dbl);
  printf("a_dbl: %f \na_int: %d\na_dbl_addr: %u\n", a_dbl, a_int, a_dbl_addr);
}

Will print this to the terminal:

a_dbl: 3.500000 
a_int: 3
a_dbl_addr: 3912067768

If all this talk about pointers and addresses confuse you, don’t think too much about it. This is one of those low-level quirks that make C stand apart from other programming languages, and in nearly every tutorial on C, at least one chapter is dedicated to the topic of pointers and memory management.

Notice how every floating point number is rounded down to its nearest integer, this is because the casting doesn’t really consider anything beyond the decimal point and simply cuts it away. If you want your program to behave in a mathematically consistent manner, you can use one of the many useful functions found in the <math.h> header file.

For example, to round up, we can use the ceil() function in math.h like this:

#include <stdio.h>
#include <stdlib.h>
#include <math.h>

int main(int argc, char *argv[]) {
  if(argc < 2) {
    fprintf(stderr, "Usage ./round [float_number]\n");
    return -1;
  }
  double i = atof(argv[1]);
  int q = (int) ceil(i);
  printf("%f ≈ %d\n", i, q);
  return 0;
}

Which will print:

5.500000 ≈ 6

When supplied with the value 5.5 as the first argument after the program name.

The variables mentioned thus far are either in static memory or automatic variables such as those in the function declaration for main. These types of programs aren’t very useful since the user of the program can’t submit any new data to the program during runtime that is somehow stored in the program, but only a fixed number of bytes are available to the user.

Before we can use dynamic memory in C we need to understand how pointers, arrays and memory allocation works. We’ll look at that in later chapters.

 


Control Flow

Now that we’ve had a quick look at variables, variables are important for collecting data and manipulating them, but how about behaving on that data in different ways, and what about loops?

Loops and recursion are key to getting things done in programming, it’s how we avoid doing things over and over again, but before we can explain how loops work, we need to understand control flow, because all loops involve one form of control flow or another.

So what is control flow?

Control flow is the evaluation of a boolean expression and are the routes a program takes according to certain conditions set by the programmer.

Let’s start with an example from a little joke about computer programmers:

A programmer is going to the grocery store and his wife tells him, “Buy a gallon of milk, and if there are eggs, buy a dozen.” So the programmer goes, buys everything, and drives back to his house. Upon arrival, his wife angrily asks him, “Why did you get 13 gallons of milk?” The programmer says, “There were eggs!”

In C terms the code (according to the programmer) might look something like this:

void buy_milk(int gallons);
void buy_eggs(int eggs);
int eggs_exist();

int main(int argc, char *argv[]) {
    buy_milk(1);
    if(eggs_exist()) {
        buy_milk(12);
    }
}

/* 
This is a multiline comment
This is where I would implement the functions buy_milk, buy_eggs and eggs_exist.
*/

Notice the if statement here, as well as the return value of the eggs_exist() function. This is one example of control flow in C.

An if statement can be extended with an else if and else statement. So to exemplify this, let’s extend the calculator to do subtraction, multiplication and division.

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char *argv[]) {
  if(argc < 4) {
    fprintf(stderr, "Usage: ./calc [number] [operator] [number]\n");
    return -1;
  }
  int a = atoi(argv[1]);
  int b = atoi(argv[3]);
  char opr = argv[2][0];
  if(opr == '+') {
    printf("%d %c %d = %d\n", a, opr, b, a + b);
  } else if (opr == '-') {
    printf("%d %c %d = %d\n", a, opr, b, a - b);
  } else if (opr == 'x') {
    printf("%d %c %d = %d\n", a, opr, b, a * b);
  } else if (opr == '/') {
  //Integer division
    printf("%d %c %d = %d\n", a, opr, b, a / b);
  } else {
    fprintf(stderr, "Illegal operator %c \nUse one of + - * or / \n\n", opr);
  }
  return 0;
}

Save as calc.c and compile with gcc calc.c -o calc, I used x instead of because * has special meaning in the shell.

As you can see, an if statement evaluates the expression and if the condition is met (i.e. the expression evaluates to true), it will execute the statements inside the code block delimited by the curly braces.

If the if statements conditions aren’t met, it will try the next else if expression until any of those evaluate to true.

If none of the if or else if expressions are true, it will check to see if there’s an else clause and execute that. If no else clause exists, it will simply continue running after all the code blocks of the if and else if statements.

In some programming languages, like Java, the conditional statement inside the if parentheses evaluates to a boolean expression, but the thing about C is, it doesn’t have a datatype boolean.

So what is true in C, and can you handle it?

C is simple in that any integral value that evaluates to 0 is considered false, and everything else is true, this is why the function eggs_exist() was set to return a value of type int.

A conditional statement might consist of a set of different conditions, these can be slapped together using the boolean AND, OR and NOT operators in C.

NAME Operator Example Explanation
BOOLEAN AND && a == 5 && b == 9 a equals 5 AND b equals 9
BOOLEAN OR || a == 5 || b == 9 a equals 5 OR b equals 9
BOOLEAN NOT ! !(a == 5 && b == 9) NOT(a equals 5 AND b equals 9)

For example this line of code evaluates to if a AND NOT b OR c then execute this.

int a = some_function();
int b = some_other_function();
int c = some_other_other_function();

if(a && !b || c) {
// DO STUFF
}

Like with other expressions, you can always use parentheses to explicitly state or manipulate the operator precedence. For example.

(a || b) && (c && !d) is different from a || ((b && c) && !d).

C also has a switch statement, which is particularly useful for many rows of different conditions and evaluating enumerated types.

#include <stdio.h>

int main(int argc, char *argv[]) {
  if(argc < 2) {
    fprintf(stderr, "Usage: ./doremi [note]");
    return -1;
  }
  char note = argv[1][0];
  switch(note) {
  case 'c':
    printf("do\n");
    break;
  case 'd':
    printf("re\n");
    break;
  case 'e':
    printf("mi\n");
    break;
  case 'f':
    printf("fa\n");
    break;
  case 'g':
    printf("so\n");
    break;
  case 'a':
    printf("la\n");
    break;
  case 'h':
  case 'b':
    printf("ti\n");
    break;
  default:
    printf("not a note\n");
  }
  return 0;
}

As you can see, the switch first takes a single expression, then evaluates if it corresponds to a set of predefined constants, for every case where you want the same expression to be executed you add a break statement, it is good programming practice to always have a default clause in your switch statements – unless you have a very good reason not to.

In addition to the boolean operators used to slap together expressions in an if statement we also have comparison operators, these are:

Name C operator Example Explanation
equals == a == b a equals b
not equal != a != b a does not equal b
greater than > a > b a is greater than b
equal or greater than >= a >= b a is greater than or equal to b
less than < a < b a is less than b
equal or less than <= a <= b a i less than or equal to b

 


Loops

Now that we have if, else if, else and switch explained, we can finally start using loops.

Now a loop in c is any series of statements you want to have executed more than once, and to control that, an expression is evaluated (just like an if statement) during every iteration.

C has three loop constructs:

  • while
  • do-while
  • for

The while loop first evaluates an expression and then executes the code in between the curly braces if that expression evaluates to true. If the expression is false by default, the loop will never be executed.

Example:

int a = 0;
while(1) {
    printf("%d THIS IS AN INFINITE LOOP!\n", a);
}

Run this at your own peril, you can terminate the program by pressing the keys ctrl and c simultaneously.

The do-while loop executes the code exactly once, and if the expression is still true after the first iteration, it will execute the loop over and over until the expression is false. The do-while loop is intended for situations where you need the code to be executed at least once.

int a = some_function();
do {
// DO STUFF AT LEAST ONCE
} while (a != 0);

Note that since control flow evaluates 0 as false and anything else as true, we don’t need to write while(a != 0) and could just as well have written while(a) and get the same effect.

The for loop is a specialised while loop, usually with a counter variable and the evaluation consists of three statements that are: initialisation, evaluation, incrementation.

Some versions of C doesn’t support declaration of variables inside the for loop definition, because of this, it is a good idea to declare the variable outside the for statement.

#include <stdio.h>
int main(int argc, char *argv[]) {
  int i;
  for(i = 32; i < 127; i++) {
    printf("%d = \"%c\"\t", i, i);
    if(!((i-31) % 8)) {
      printf("\n");
    }
  }
  printf("\n");
}

This program prints all the visible characters in the ascii table.

The if(!((i-31) % 8)) part simply prints a newline for every eighth entry, it uses the modulo operator, and the fact that any number divisible by eight will evaluate to 0 – i.e. false. The modulo operator calculates the remainder we get if we did an integer division and is an extremely useful operator to check for values with a certain characteristic.

The initialisation in the for loop is i = 32;

The evaluation in the loop checks to see if i < 127; meaning that i is less than 127.

In the incrementation part we increment the i variable by one using the incrementation operator i++.

We could also have used ++i, which increments before the statement finishes, while i++ only increment after, this makes a big difference in function calls as we’ll see later on.


Arrays

Now that we’ve looked at loops and variables we know how to work on individual values and working on that value multiple times, but how about collections of values?

While some programming languages support variable size lists, associative arrays / dictionaries / hashes, tuples and all sorts of collections, C doesn’t go out of its way to give you those data structures.

What we do have are arrays, structs and unions.

So what is an array?

An array is a collection of variables with a predetermined length, and where every variable has the same data type as the others.

An array consists of a list of elements where you access each element using the name of the array and the index of the element. The index is simply an integral value that may be anything from 0 to the length of the array minus one.

Earlier in this text I’ve referred to arrays as vectors, these terms are used interchangeably in C and C++.

Arrays can be initialised as empty, or with a set of values.

#include <stdio.h>

int main(int argc, char *argv[]) {
// Declares an empty array
  int arr_one[7];
// Declares a predefined array
  int arr_two[] = {8, 6, 7, 5, 3, 0, 9};
  int i;
  for(i = 0; i < 7; i++) {
    printf("arr_one[i] = %d\t", arr_one[i]);
  }
  printf("\n");
  for(i = 0; i < (sizeof(arr_two) / sizeof(int)); i++) {
    printf("arr_two[i] = %d\t", arr_two[i]);
  }
  printf("\n");
  return 0;
}

The printout when I run this program is:

arr_one[i] = 0	arr_one[i] = 0	arr_one[i] = 0	arr_one[i] = 0	arr_one[i] = 0	arr_one[i] = 0	arr_one[i] = 0	
arr_two[i] = 8	arr_two[i] = 6	arr_two[i] = 7	arr_two[i] = 5	arr_two[i] = 3	arr_two[i] = 0	arr_two[i] = 9

Unlike certain other programming languages, you can never be sure that an empty array is filled with 0 values like it is with Java. Undefined variables in C might be filled with garbage which means that it may contain any sort of information stored at that particular memory address.

For strings, or any collection of byte-values, the <strings.h> header has the function bzero() and memset() to either zero out or set the entire array to a set of values.

Another detail worth noting is that C doesn’t feature any controls on array size, so accessing the tenth element of an array of five elements simply means that you’re trying to access a block of memory that is 5 * sizeof(int) memory addresses beyond the end of the array. When trying to access places in memory you don’t have access to, running such a program will most likely result in a segmentation fault ( or segfault for short).

Let’s try it out:

#include <stdio.h>

int main(int argc, char *argv[]) {
  int arr[5];
  int i = 0;
  while(1) {
    printf("%d\t", arr[i]);
    i++;
  }
}
Segmentation fault: 11

Because there’s a chance that you might be able to access arr[9] of an array of five elements, there’s no guarantee that such a program will crash, this program crashes because it runs in an infinite loop constantly trying to access a block of memory with one address number higher than the last.

Arrays can also be multidimensional and may therefore be used to create or emulate tables.

#include <stdio.h>

int main(int argc, char *argv[]) {
  int tbl[3][7] = {
    {0, 0, 2, 4, 6, 0, 1},
    {42, 42, 42, 42, 42, 42, 42},
    {5, 5, 5, 2, 3, 6, 8}
  };
  int i, q;
  for(i = 0; i < 3; i++) {
    for(q = 0; q < sizeof(tbl[i])/sizeof(int); q++) {
      printf("%d\t", tbl[i][q]);
    }
    printf("\n");
  }
}

This prints:

0	0	2	4	6	0	1	
42	42	42	42	42	42	42	
5	5	5	2	3	6	8

When working with both pointers and arrays, you will most likely find yourself doing dynamic memory allocation. Also note that C isn’t too keen on letting you send arrays back and forth between functions, luckily a two-dimensional array may be interpreted as a pointer to pointer.

In fact, the *char argv[] variable in the main function can also be written as **char argv, but not char argv[][], (because even both strings and arrays may be understood in a pointer context, they’re still entirely equivalent).

This program will compile and run just as well as the one with *argv[] and *envp[].

#include <stdio.h>

int main(int argc, char **argv, char **envp) {
  int i;
  for(i = 0; i < argc; i++) {
    printf("%s\n", argv[i]);
  }
}

 


Strings

We’ve looked at arrays, but how do strings fit into all this?

Earlier I went through the different data types of C, and as you might have noticed, there’s no mention of strings there.

That is because C doesn’t see strings as a basic datatype, but rather as a null terminated array of characters. Null terminated means that for a character array to be interpreted as a string, that array needs to have a 0 as its last element. Mind you that the 0 value does not mean the character ‘0’ which is 48 according to the ASCII table, but the number 0.

The difference can be seen in the following code sample.

char notzero = '0';
char zero = 0;
printf("notzero: %d\n", notzero); // This prints "notzero: 48"
printf("zero: %d\n", zero); // This prints "zero: 0"

Strings are declared as a pointer to char as pointers and arrays are very closely related.

#include <stdio.h>

int main(int argc, char *argv[]) {
// THIS IS A STRING CONSTANT
  char *hello = "Hello, World!\n";
// THIS IS AN ARRAY OF CHARACTERS USING CHARACTER NOTATION
  char hello_two[] = {'H', 'e', 'l', 'l', 'o', ',', ' ', 'W', 'o', 'r', 'l', 'd', '!', '\n', 0};
// THIS IS AN ARRAY OF CHARACTERS USING THE CHARACTERS INTEGRAL VALUES
  char hello_three[] = { 72, 101, 108, 108, 111, 44, 32, 87, 111, 114, 108, 100, 33, 10, 0 };
  printf("%s", hello);
  printf("%s", &hello_two[0]);
  printf("%s", hello_three);
  return 0;
}

This program prints:

Hello, World!
Hello, World!
Hello, World!

All three arrays are mostly equivalent, the ampersand in front of &hello_three[0] is the address of operator and gets the address of the first element in the array (in effect it returns a value that may be used as a pointer), but as you can see from the hello_three variable it isn’t necessary.

The difference between a string constant and a character array is the fact that a string constant is immutable, and therefore the individual elements may not be replaced.

For example:

#include <stdio.h>

int main(int argc, char *argv[]) {
  char *hello_one = "hello";
  char hello_two[] = {'h', 'e', 'l', 'l', 'o', 0};
  // THIS IS LEGAL
  hello_two[1] = 'a';
  // THIS IS NOT
  hello_one[1] = 'a';
  return 0;
}

Will result in an error:

Bus error: 10

For portability, you should use the character notation ‘c’ or ‘\n’ and avoid using the integer values of the characters if you can. The backslash \ represents the escape character and is used to enter non-readable characters or other characters that use an escape sequence.

Here’s a little table showing the escaped character sequences we have in C.

Newline \n
Tab \t
Vertical Tab \v
Alert \a
Backspace \b
Form Feed \f
Carriage Return \r
Double Quote \”
Single Quote \’
Backslash \\
Question Mark \?
127 (Signed) 255 (Unsigned) \377 Byte value in Octal (any number between 000 and 377)
127 (Signed) 255 (Unsigned) \xFF Byte value in Hexadecimal (from 00 to FF)
Escape character  \e
å  \U000000E5  Unicode value with eight hexadecimal digits
å  \u00E5  Unicode value with four hexadecimal digits

As you can see, both the *argv[] and *envp[] argument in the main function are arrays of strings. We can use this knowledge to print out information about your program and the environment it’s in.

#include <stdio.h>

int main(int argc, char *argv[], char *envp[]) {
  int i = 0;
  while(envp[i]) {
    printf("%s\n", envp[i++]);
  }
  return 0;
}

This prints some information about my system:

TERM_PROGRAM=Apple_Terminal
SHELL=/bin/bash
TERM=xterm-256color

And so on and so forth (I don’t really feel like sharing all the intimate details about my system, so I won’t share it here).

Notice the while(envp[i]) evaluation. This is because the envp variable is a null terminated array, and therefore the last element is certain to be 0.

Because strings are null terminated, we can easily iterate over the individual characters of a string using loops:

#include <stdio.h>

int main(int argc, char *argv[]) {
  char *hello = "this string is null terminated\n";
  int i = 0;
  while(hello[i]) {
    putc(hello[i++], stdout);
  }
}

The functions putc(int c, FILE *stream) and fputc(int c, FILE *stream) simply puts a character into a file, if that file is stdout the result is somewhat similar to using printf with the exception that we only handle one character at a time and can’t handle formatted input.


Pointers

While working with strings and arrays, you might have noticed the use of the asterisk and the ampersand * and &.

In a variable declaration, the asterisk is used to declare a pointer, as in:

int *pointer;
char *str = "this is a string";

In other situations, it is used as the dereferencing operator, while the ampersand is known as the address of operator.

So what are pointers?

I mentioned earlier that variables in C have certain characteristics, and those characteristics are:

  • Data type
  • Size
  • Address
  • Value

The datatype can be things like int, double, char or pointer, you know, the primitive types. All though they may also be of type struct or union. The datatype has an associated size. You can find the size (that is, the number of bytes) of the type using the sizeof() operator and so these two are closely related.

In addition to having a size, the datatype has a specific way of being interpreted. A float is interpreted differently than a char, and a char may be interpreted differently than an int, all though you can just as easily use chars to store byte-sized integers in the hopes that it might use less memory.

A variable also has an address and a value, you can access the address of the value like this:

#include <stdio.h>

int main(int argc, char *argv[]) {
  int a = 42;
  int *b = &a;
  printf("val: %d\t address: %u\n", a, (int) b);
}

Running this program I get:

val: 42	 address: 3952404972

The second variable here *b, is what’s known as a pointer, and we got a pointer to the value stored in by using the address of operator.

Using the sizeof operator I can find that an integer takes four bytes, i.e. 32 bits of memory.

This line:

printf("int size: %lu\tchar size: %lu\n", sizeof(int), sizeof(char));

Prints:

int size: 4 char size: 1

Now that we know how to print addresses of variables, let’s see if an array of four chars and an array of four ints are contiguous (i.e. that they occupy adjacent memory addresses), we’ll do that using the following program:

#include <stdio.h>

int main(int argc, char *argv[]) {
  char carr[4] = {'a', 'b', 'c', 'd'};
  int iarr[4] = {1, 2, 3, 4};
  int i;
  for(i = 0; i < 4; i++) {
    printf("%c : %u\t", carr[i], (int) &carr[i]);
    printf("%d : %u\n", iarr[i], (int) &iarr[i]);
  }
  return 0;
}

This prints:

a : 3963566556	1 : 3963566576
b : 3963566557	2 : 3963566580
c : 3963566558	3 : 3963566584
d : 3963566559	4 : 3963566588

As you can see, every char has an address exactly one integer value higher than the previous, while the integers follow each other in steps of four.

Because addresses are contiguous, we can therefore use pointer arithmetic to map the bytes of an int to individual chars, to do this, let’s experiment with the byte values (in hexadecimal) 41, 42, 43 and 44, which correspond to A, B, C and D in the ASCII table. Since hex values (not decimal) map to binary numbers easily, the sequence ABCD corresponds to the hexadecimal number 0x41424344 which would be 1094861636 in decimal.

#include <stdio.h>

int main(int argc, char *argv[]) {
  int i, a = 0x41424344; //ABCD
  char *b = (char *) &a;
  for(i = 0; i < 4; i++) {
    printf("Address: %lu\tCharacter: %c\tNumber %02x\n", (unsigned long) &b[i], b[i], b[i]);
  }
  printf("\n");
}

This prints

Address: 140732728916456	Character: D	Number 44
Address: 140732728916457	Character: C	Number 43
Address: 140732728916458	Character: B	Number 42
Address: 140732728916459	Character: A	Number 41

But why does it end up in the opposite direction?

This has to do with how my Intel processor has a reverse byte-order called little endian, meaning that the leftmost byte is stored at the first address. This is worth noting if you plan on writing portable code or do any network programming.

We can make pointers to pointers if we like:

#include <stdio.h>

int main(int argc, char *argv[]) {
  int a = 42;
  int *b = &a;
  int **c = &b;
  int ***d = &c;
  printf("&a: %u \ta: %d\n", (int) &a, (int) a);
  printf("&b: %u \tb: %u\t*b: %d\n", (int) &b, (int) b, (int) *b);
  printf("&c: %u \tc: %u \t*c: %u\t**c: %d\n", (int) &c, (int) c, (int) *c, (int) **c);
  printf("&d: %u \td: %u \t*d: %u \t**d: %u\t***d: %d\n", (int) &d, (int) d, (int) *d, (int) **d, (int) ***d);
  return 0;
}

This prints

&a: 3911977468 	a: 42
&b: 3911977456 	b: 3911977468	*b: 42
&c: 3911977448 	c: 3911977456 	*c: 3911977468	**c: 42
&d: 3911977440 	d: 3911977448 	*d: 3911977456 	**d: 3911977468	***d: 42

The usefulness of this will become apparent when working with functions and multidimensional arrays.

To sum up.

You declare pointers by putting an asterisk in front of the variable name:

int a = 42;
int *pointer = &a;

And you access the value stored in the pointer using the dereferencing operator.

int a_val = *pointer;

Because pointers only store addresses, you need to allocate memory or refer to an already allocated space in memory unless you want an illegal memory access (and a segmentation fault).

// Allocates enough space for one integer
int *pointer = malloc(sizeof(int)); 
// Allocates space for 32 integers, similar to an array
int *arr = calloc(32, sizeof(int));

Working with memory deserves its own chapter, so I’ll leave it at that.


Functions

Functions are how we pack code together into manageable chunks of functionality.

When we combine headers, containing function prototypes and functions we can split up our code into multiple files so that we avoid long text files of source code.

We’ve already talked about using functions from the C standard library such as atoimalloc, calloc, ceil and printf. As well as how every C program contains at least one function called main, but how do we create our own functions?

Functions have the following properties in the order that they appear in C:

  • A return type
  • A name
  • Parameters
  • A function body

For example, the function:

int power(int num, int pow) {
  int retval = 1;
  for(; pow > 0; pow--) {
    retval *= num;
  }
  return retval;
}

Has a return type int the name of the function is power, it takes two arguments, which are num and pow. The function body are the statements inside the curly braces.

Functions only know of the existence of functions declared before themselves, therefore this code:

#include <stdio.h>

int main(int argc, char *argv[]) {
  my_func();
  return 0;
}

void my_func() {
  printf("OK Then\n");
}

Produces the following compiler error:

comperr.c:4:3: warning: implicit declaration of function 'my_func' is invalid in C99 [-Wimplicit-function-declaration]

  my_func();

Therefore, you can fix this by always making sure that functions are declared in proper order (don’t worry, there is a better way to do things than that). This program estimates the number based on an integer supplied by the user, for precision, I could try and use the unsigned ULONG_MAX or UINT_MAX from the <limits.h> header instead of a user supplied variable, but since the power function is sequential, that could make the program run a little slow.

#include <stdio.h>
#include <stdlib.h>

double power(double num, long pow) {
  int i;
  double retval = 1;
  for(i = 0; i < pow; i++) {
    retval = retval * num;
  }
  return retval;
}

double e_num(long num) {
  double retval = (1.0 + (1.0 / num));
  return power(retval, num);
}

int main(int argc, char *argv[]) {
  long iter = atol(argv[1]);
  printf("e : %f\ti : %lu \n", e_num(iter), iter);
}

With an input of one million this program prints:

e : 2.718280	i : 1000000

Since the function e_num uses the function power,  power needs to be declared before e_num, and because main uses e_num, e_num is declared before main. But this is annoying, so to fix this, we have function prototypes. With a function prototype, the program gets to know about the name and arguments of a function and we can write the body of the function later on.

#include <stdio.h>
#include <stdlib.h>

double e_num(long num);
double power(double, long);

int main(int argc, char *argv[]) {
  long iter = atol(argv[1]);
  printf("e : %f\ti : %lu \n", e_num(iter), iter);
}

double e_num(long num) {
  double retval = (1.0 + (1.0 / num));
  return power(retval, num);
}

double power(double num, long pow) {
  int i;
  double retval = 1;
  for(i = 0; i < pow; i++) {
    retval = retval * num;
  }
  return retval;
}

Now we can structure our code and write functions in the order we please, as you can see, we can choose to declare the name of the parameters in the prototype, or we can simply ignore it, the only thing that matters here is that we declare the types of each parameter, as well as the return value and name of the function.