the C language continued

similarities and dissimilarities to Javascript

A body of statements in {} is actually a compound statement, itself a kind of statement. (However, the {} that delimit a function body or the members of a struct is not itself a statement.)

C has the ?: conditional operator, the ++ and operators, compound assignment operators (+=, *=, etc.), and the bitwise opertors (^, |, &, ~, <<, >>).

C has a for loop the same as in Javascript:

// call foo 30 times
for (int i = 0; i < 30; i++) {
    foo();
}

(C has no equivalent of for-in.)

C has break and continue statements, which are written and function the same.

C has no boolean type, so C’s comparison operators (<, >, <=, >=) and equality operators (==, !=) return integers (1 for true, 0 for false).

Only numeric values and pointer values have truth value: 0, 0.0, and null pointers are false, while all other numbers and pointers are true.

In C, && and ||take numeric or pointer operands and return integers (1 for true, 0 for false).

The && and || operators “short-circuit”, meaning they may only evaluate the first operand when evaluting the second operand isn’t needed to get their answer, e.g. (foo() || bar()) returns 1 without evaluating bar() if foo() returns a true value because the truth value of bar() at that point will not change the answer.

The value of a switch must always be a number, and the value of each case must be an expression which can be evaluated at compile-time, e.g. (35 + 9), rather than something which requires runtime action, e.g. (foo() + 9) :

case 35 + 9:     // valid
case foo() + x:  // invalid: compiler can’t know what foo() + x will return

literals

Number literals come in several different forms:

35             // an int
35L            // a long
2.3            // a double
2.3F           // a float
0x6AC2         // the int value hex 6AC2
0x6AC2L        // the long value hex 6AC2
027            // the int value octal 27
027L           // the long value octal 27
662.25E-4      // the double value 662.25 * 10-4
662.25E-4F     // the float value 662.25 * 10-4
‘A’            // the char value A (the integer value 65)
‘\t’           // the char value tab (the integer value 9)

(So strings are always written in double-quotes while chars are written in single-quotes.)

array initialization

We can initialize (populate) an array with a list of values in {}:

int arr[] = {3, 6, 7}; // OK

This syntax doesn’t work in non-initializing assignments:

arr = {2, 67, -11};    // illegal

This restriction is mainly about style.

We can specify a dimension that is greater or equal to the number of items in the {} such that only the first items are initialized:

int a[15] = {3, 6, 7};  // OK: only first three items initialized
int b[3] = {3, 6, 7};   // OK
int c[2] = {3, 6, 7};   // illegal

For multi-dimensional arrays, we nest the inner arrays as another pair of {}:

int arr[][3] = {{3, 6, 7}, {11, 2, 66}};

Only the leftmost dimension can be left unspecified. Again, we can specify dimensions greater than the number of items in the {}:

int arr[4][5] = {{3, 7}, {11, 2, 66}};

This creates 4 arrays of 5 ints each: the first two values of the first array are 3 and 7; the first three values of the second array are 11, 2, and 66; all other ints are left uninitialized.

We can initialize a char array with a string literal:

char message[] = “hello”;

In all other contexts, a string literal is considered a pointer-to-char value, so we can use string literals to initialize a char pointer array:

char *messages[] = {“hello”, “goodbye”, “moo”};

Recall that, in C, a multi-dimensional array is always one contiguous block, and the sub-arrays of each dimension are all the same size, so when declaring arrays, we always specify every dimension:

int arr[3][6][2];   //  in C, creates a block that is (3 * 6 * 2) ints in size

the [] operator

You’ve seen how [] is used to declare arrays in Java and C, and you’ve seen [] used as an operator to retrieve and set members of arrays in Java. What you haven’t seen yet is the use of [] to retrieve and set members of arrays in C.

In C, we don’t need [] for this purpose because we can use pointer arithmetic and the dereference operator, like so:

int x[3];
*x = 8;
*(x + 1) = 9;
*(x + 2) = 10;

However, this is inconvenient, so more commonly we use [] instead:

int x[3];
x[0] = 8;
x[1] = 9;
x[2] = 10;

In truth, the expression a[b] in C is syntactical magic for *(a + b).This is evident by the fact that it doesn’t matter whether we write a[b] or b[a]:

int x[3];
0[x] = 8;      // *(0 + x) = 8;

In Java, this would be illegal because Java’s [] operator strictly expects an array expression before the brackets and a number expression inside. Similarly, Javascript’s [] operator expects an object or array before the brackets and a number or string inside.

When we write a[b][c][d], in C, this translates into *(*(*(a + b) + c) + d). It’s just the normal syntactical magic applied left-to-right.

subscopes

A variable exists in the scope in which it is declared and in all subscopes thereof.

In Javascript, each function is a unique scope, and the other functions contained therein are subscopes of that function. So a variable declared within a function exists in that function and all other functions therein.

In C, functions cannot be nested, but each control structure within a function constitutes a subscope. So if you, say, declare a variable inside a while block, that variable exists only in that while block and its subscopes, not the rest of the function. So this method will give a compile error:

void april() {
    if (something) {
        int december = 3;
    }
    june(december);   // compile error: december doesn’t exist here
}

The solution here is to simply declare the variable in the outer scope where it can be used everywhere it is needed:

void april() {
    int december;
    if (something) {
        december = 3;
    }
    june(december);   // OK
}

This rule of subscopes serves two purposes:

1)      Though writing long, complicated functions is generally frowned upon, it’s done often enough that some programmers would like to be able to reuse variable names within a function, especially for commonly used names like i as a loop counter. The rule of subscopes helps in such cases because it effectively divides a function into multiple namespaces.

2)      When execution leaves a scope, the variables in that subscope are no longer needed and can be discarded. Discarding variables as soon as possible may allow the compiler to do certain optimizations, so dividing a function into multiple scopes may increase the number of opportunities for such optimizations.

As previously discussed, a compound statement {} can generally be used any place a statement can. A {} constitutes its own subscope, so variables declared within a pair of {} don’t exist outside that pair of {}:

{
    int december;
}
june(december);   // compile error: december doesn’t exist here

While using {} just to create a subscope is legal, doing so at best only serves a stylistic purpose, so it’s rarely done.

Within a subscope, you can declare a variable with the same name as a variable from an outer scope. When you do this, the outer scope variable of that name is “hidden” in that subscope and so can’t be used there:

{
    // in this scope, january refers to the int variable declared here
    int january = 3;
    …
    {
        // in this scope, january refers to the char variable declared here
        char january;
        …
    }
}

If you need an outer scope variable, simply use different names for your inner scope variables so as not to hide that outer scope variable.

goto

In C, we have goto, which is much like a jump instruction in assembly:

goto label;

When a goto is executed, execution jumps to the statement with the specified label. (Labels have the form  identifier:) So for example, instead of writing:

int x = 0;
while (x < 5) {
    foo();
    x++;
}

…we could write:

int x = 0;
ted:
foo();
x++;
if (x < 5) {
    goto ted;
}

(Note that our goto version of the code functions the same but only tests the condition after the first time through the “loop”.)

Using goto here is not sensible—we should use the regular control flow mechanisms wherever we can.  A goto statement should be reserved for the few cases where it can help simplify complicated control flow or can perhaps serve as a neat optimization.

A goto can only jump to labels within the current function. (Allowing jumps into other functions would raise puzzling questions about parameters and the call stack.)

varargs

A “varargs” function is a function which can take a variable number of arguments—hence “varargs”.

In C, a varargs function is similarly declared with ellipses as its last parameter, except you don’t specify a type or name:

void foo(int a, ...);

Here, foo takes as argument first an int and then zero or more arguments of any type. These arguments aren’t placed on the call stack like regular parameters. Rather, when a varargs function is called, the variable arguments get placed in a contiguous block. The arguments to the block and their types are not knowable looking at the block itself, so that information must be conveyed by other means, such as by other parameters.

To access the arguments, you create a pointer of type va_list and point it to the start of the block using the macro va_start; then you can invoke the macro va_arg to get each successive argument; before the function returns, you need to invoke the macro va_end to remove the block. (The va_list type and the macros va_start, va_arg, and va_end are specified in the standard library in the header stdarg.h.)

In this example, sum returns the sum of any number of ints:

int sum(int numArgs, ...) {
    va_list va;      // a pointer that can point to the block of arguments
    va_start(va, numArgs);  // ap now points to the block
    int sum = 0;
    for (int i = 0; i < numArgs; i++) {
        sum += va_arg(va, int);
    }
    va_end(va);
    return sum;
}

We can then invoke sum with any number of ints, though we must pass the number of ints as the first argument:

sum(5, 7, 1, 3, 4, 5)   // returns 20

When invoking va_start, we pass the name of the last parameter because the macro uses this to locate the block of varargs.

Also notice that, when invoking va_arg, we must specify the type of the next argument we expect. This is necessary so that the macro knows what kind of pointer cast to perform and how far ahead it must look to find the next argument.

A varargs function in C must take at least one regular parameter, so we can’t create a function with just ellipses:

int foo(...);  // illegal

const

Often in code, we need to use constants, values that never change. The value of pi, for example, never changes. Good style dictates that we create variables to represent constants to make code more readable. For example:

int area(double radius) {
    return 3.1415 * radius * radius;
}

…would be better written:

int area(double radius) {
    double PI = 3.1415;
    return PI * radius * radius;
}

By making constants into variables, we avoid scattering hard-to-read magic numbers everywhere in code. (Note that, by convention, variables for constants are given uppercase names.)

To make sure such variables don’t accidently get modified, C has the const modifier, which requires the variable to be initialized in its declaration and disallows subsequent reassignments:

int area(double radius) {
    const double PI = 3.1415;
    PI = 1.3;   // compile error: can’t reassign const
    return PI * radius * radius;
}

Parameters can also be declared const:

int area(const double radius) {
    radius = 2.9;   // compile error: can’t reassign const
    return 3.1415 * radius * radius;
}

When a variable is declared const in C, the value must be expressed as a compile-time expression, so this is illegal:

const int FOO = bar();   // compile error: a function call can’t be run at compile-time

signed vs. unsigned

The basic integer types in C all come in two forms: signed and unsigned. Which form is specified by the modifiers signed and unsigned; by default, the types are signed:

signed int x;     // a signed int
int y;            // also signed
unsigned z;       // an unsigned int

The utility of the unsigned types is that they have a larger positive range. So, for example, whereas a signed char has the range -128 to +127, an unsigned char has the range 0 to +255.

TODO effect upon arithmetic operations

pseudo-array parameters

C functions cannot have array parameters. However, pointer parameters can be declared using empty [] in place of *. So for example, this:

void foo(int a, char **b);

… can instead be written:

void foo(int a, char *b[]);

The only difference here is that, in the [] form, the parameter is condsidered constant like an array name and so we can’t assign to it.  Otherwise, it is just like a regular pointer parameter.

Be clear that, while we can’t have array parameters, we can have array pointer parameters, so you’ll sometimes see [] with numbers in them in parameter declarations, e.g.:

void foo(double (*a)[5][6]);

Here, the parameter is a pointer to an array (5) of arrays (6) of doubles.

struct variable declaration and initialization

When declaring a struct, we can include declarations of variables along with the type itself by listing them after the end brace but before the semi-colon:

struct cat {
    float x;
    char y;
} foo, bar;

This is really just shorthand for:

struct cat {
    float x;
    char y;
};
struct cat foo, bar;

When we define struct variables this way, we don’t have to give the struct itself a name:

struct {
    float x;
    char y;
} foo, bar;

This is only really useful when nesting the struct inside another named struct:

struct dog {
    int a;
    struct {
        float x;
        char y;
    } foo, bar;
} spot;
spot.foo.x = 7.2;

This works because the instances of the interior struct have names even if the interior struct itself doesn’t.

When declaring instances of a struct, you can initialize them with values in {}. For example:

struct rabbit {
    char *name;
    double weight;
} r = {“flopsy”, 3.5};
struct rabbit s = {“cotton”, 2.9};

When declaring an array of structs, you needn’t put each member in its own pair of {} :

struct rabbit bunnies[2] = {
    “cotton”, 2.9,
    “flopsy”, 3.5
};

…but if you insist on being verbose, you can keep the {} explicit:

struct rabbit bunnies[2] = {
    {“cotton”, 2.9},
    {“flopsy”, 3.5}
};

the -> operator

When dealing with pointers to structs, it’s bothersome having to write *(x).y, so as a convenience we can use the -> operator to write x->y.

struct rabbit bunny;
struct rabbit *p;
p->name = “flopsy”;         // *(p).name = “flopsy”;

Like the . operator, the operator -> is left-to-right associative, so we can neatly use it in a chain, e.g. the epxression x->y->z returns the member z of the struct pointed to by y in the struct pointed to by x.

Comments are closed.