Tab-size independent source code formatting

Executive summary

Indentation and alignment should be tab-size independent. Use one Tab for each indentation level. Use spaces for aligning portions of code.

Explanation and examples

Indentation and alignment are two related, but different things. To indent means to set a section of code in from the left margin, to facilitate distinguishing “contained” sections (such as a sequence of lines inside an if statement) from their “containers” (in this case, the if statement itself). For nested constructs, indentation is carried out in several levels:

No indentation
Begin 1
        One level of indentation
        Begin 2
                Two levels of indentation
                etc.
        End 2
End 1

To align, on the other hand, means to make parts of the text start on the same column on the screen, to make it look more tidy and clear.

Here is an example that features both indentation and alignment:

class person
{
        string name;  // person's full name
        int age;      // age in years
};

The inner part of the class definition (the part between the braces) is indented by one level. The comments at the end of these two lines are aligned to put them directly below each other on the screen.

Now why are we making such a fuss about the difference between these two? This is where the tab-size independent part comes in.

Both indentation and alignment are often done with Tab characters ('\t' or ASCII 0x09). And in theory, that would be the best and logical choice. Unfortunately, there is no general agreement about the placement of the tabulator stops in effect.

Traditionally, tab stops are every 8 characters. Many programmers indent with Tabs because it's only one keypress, but they feel that a tab-size of 8 pushes the code too far to the right, so they change it to 4 characters (or some other value) in their editors. When alignment is also performed with tabs this results in misaligned code unless the tab-size is set to the exact same value the author of the code used.

Take the person class definition from above as an example. Assume that we had done both indentation and alignment with Tabs, with a tab-size of 8 (here and in the following,      represents a Tab, while   represents a Space):

|-------|-------|-------|-------|-------|-------|------- <- tab stops
class person
{
        string name;    // person's full name
        int age;        // age in years
};

Now somebody who prefers a tab-size of 4 looks at the code:

|---|---|---|---|---|---|---|---|---|---|---|---|---|--- <- tab stops
class person
{
    string name;    // person's full name
    int age;    // age in years
};

The indentation is still correct, but the two comments are now misaligned.

The default indentation mode of the Emacs editor is even worse: it mixes Tabs (which it assumes to be of size 4) and spaces for both indentation and alignment, with an effective amount of 2 character widths per indentation level. The resulting code usually looks like a complete mess for any tab-size setting other than 4.

So, how do you make it tab-size independent? One solution would be to not use any Tab characters at all. This, however, would hard-code the amount of space used for indentation, something which so many people disagree about.

Instead, we adopted a different approach in GiNaC: Tabs are used exclusively for indentation (one Tab per level); spaces are used for alignment. This gets you the best of both worlds: It allows every programmer to change the tab-size (and thus, the visual amount of indentation) to his/her own desire, but the code still looks OK at any setting.

This is how our class definition should be entered using this scheme:

|-------|-------|-------|-------|-------|-------|------- <- tab stops
class person
{
        string name;  // person's full name
        int age;      // age in years
};

8 characters indentation are too much for you? No problem. Just change the tab-size, and it still looks good:

|---|---|---|---|---|---|---|---|---|---|---|---|---|--- <- tab stops
class person
{
    string name;  // person's full name
    int age;      // age in years
};

Some more examples (shown with a tab-size of 4):

// here, we have aligned the parameter declarations
int foo(int i1, int i2, int i3,
        string s1, string s2,
        vector<int> &result)
{
    // inside the function, one level of indentation
    if (i1 == i2) {
        // inside the "if", two levels of indentation
        return 0;
    }
    // outside the "if", one level again

    // indentation is also used here:
    static int fibonacci[] = {
        1, 2, 3, 5, 8, 13,
        21, 34, 55, 89, 144
    };

    // and here:
    int x = bar(
        i1 - i2,
        i2 - i3,
        i3 - i1
    );

    // continuation lines, however, are aligned, not indented:
    cout << "i1 = " << i1 << ", i2 = " << i2 << ", i3 = " << i3
         << ", string1 = " << s1
         << ", string2 = " << s2 << endl;

    if (s1 == s2)
        return i1;       // these two comments
    else
        return i2 + i3;  // are also aligned
}