ernstsson.net
Boolean Parameter Elimination

One commonly used anti-pattern increasing function complexity reported by Arqua is the use of boolean parameters. Let’s have a look at an example of this:

int calculateSum(int a, int b, int log)
{
    int sum = a + b;
    if (log) {
        printf("result:%i\n", sum);
    }
    return sum;
}

The function calculateSum is used like this:

int main(void)
{
    calculateSum(4, 5, TRUE);
    calculateSum(3, 2, FALSE);
}

Here the first call to calculateSum is logging and the second is not. The parameter log controls if the function should log or not. Of course this is a simplified example, containing only one line doing the calculation. More often the calculation is done over many lines with logging embedded between these lines:

int calculateComplexThings(int a, int b, int log)
{
    //Do initial calculations

    ...

    //Log a value
    if (log) {
        printf("result:%i\n", value);
    }

    //Do more calculations

    ...

    //Log a value
    if (log) {
        printf("result:%i\n", value);
    }

    //Calculate even more

    ...


    return value;
}

When we look at calculateSum we can see that either we want to log or we don’t. A boolean parameter is just creating one function out of two use cases. The extra branch in calculateSum can be removed by creating one function per use case instead:

int calculateSumWithLog(int a, int b)
{
    int sum = a + b;
    printf("result:%i\n", sum);
    return sum;
}

int calculateSum(int a, int b)
{
    int sum = a + b;
    return sum;
}

Now the branch is removed and we have two linear functions instead. When using the function we call the one we want. The call graph looks like this:

Multiple Functions

However, there is a drawback to this solution; we have now duplicated our algorithm sum = a + b. We could solve this by letting calculateSumWithLog call calculateSum:

int calculateSumWithLog(int a, int b)
{
    int sum = calculateSum(a, b);
    printf("result:%i\n", sum);
    return sum;
}

int calculateSum(int a, int b)
{
    int sum = a + b;
    return sum;
}

This gives us the following call graph:

Multiple Functions Single Algorithm

This is good enough for a simple example like this, however, for a more complex calculation such as calculateComplexThings this is not possible. The solution is using function pointers instead:

static void emptyLogger(int res)
{
}

static void resultLogger(int res)
{
    printf("result:%i\n", res);
}

static int calculateInnerSum(int a, int b, void(*logger)(int))
{
    int sum = a + b;
    logger(sum);
    return sum;
}

int calculateSumWithLog(int a, int b)
{
    return calculateInnerSum(a, b, resultLogger);
}

int calculateSum(int a, int b)
{
    return calculateInnerSum(a, b, emptyLogger);
}

In the solution above we still have the two functions calculateSum and calculateSumWithLog. We have also added three more functions:

  • emptyLogger - logging nothing.
  • resultLogger - logging the result of the calculation.
  • calculateInnerSum - containing the actual calculation.

The two external functions calculateSum and calculateSumWithLog now only calls calculateInnerSum, each with a different logger. Let’s have a look at the call graph:

Function Pointers

We are now using many small functions instead of one function with branches.

Log Independent Client Code

Sometimes we do not want to statically decide when to use logging or not. This is probably one of the most common cases for using boolean parameters since we then can choose to just pass on a boolean parameter to new functions:

int calculateSumSquare(int a, int b, int log)
{
    int res = calculateSum(a, b, log);
    return calculateSquare(res, log);
}

How can this be done using the function pointer solution above? We simply repeat the pattern:

typedef struct {
    int(*square)(int);
    int(*sum)(int, int);
} Chain;

static const Chain calculateChain = {
    .square = calculateSquare,
    .sum = calculateSum
};

static const Chain calculateLogChain = {
    .square = calculateSquareWithLog,
    .sum = calculateSumWithLog
};

static int calculateInnerSumSquare(int a, int b, Chain chain)
{
    int res = chain.sum(a,b);
    return chain.square(res);
}

int calculateSumSquareWithLog(int a, int b)
{
    return calculateInnerSumSquare(a, b, calculateLogChain);
}

int calculateSumSquare(int a, int b)
{
    return calculateInnerSumSquare(a, b, calculateChain);
}

In this case we have two functions to call in order to to do our chained calculation, calculateSquare and calculateSum. Both have their logging counterparts. Just as before we need to implement an inner function with the actual algorithm, and use this with the appropriate function pointers as input parameters. Since we have multiple function pointers it is reasonable to bundle these together in a table. One with logging calculateLogChain, and one without calculateChain.

Linux Kernel Examples

Let’s have a look at a few real examples of boolean parameters from the Linux kernel and how these parameters and their related branches can be removed.

Example - Function Extraction

Our first example is from kernel/time/timekeeping.c. Here we have the following function:

static void timekeeping_update(bool clearntp)
{
    if (clearntp) {
        timekeeper.ntp_error = 0;
        ntp_clear();
    }
    update_vsyscall(&timekeeper.xtime, &timekeeper.wall_to_monotonic,
             timekeeper.clock, timekeeper.mult);
}

Here we do not have to use function pointers at all. The caller in this case tells the function to do two things, clear and update. The boolean parameter should thus be replaced by a new function doing an ntp clear:

static void timekeeping_ntp_clear(void)
{
    timekeeper.ntp_error = 0;
    ntp_clear();
}

static void timekeeping_update(void)
{
    update_vsyscall(&timekeeper.xtime, &timekeeper.wall_to_monotonic,
             timekeeper.clock, timekeeper.mult);
}

The old function using a boolean parameter was called like this when we also wanted to clear:

timekeeping_update(true);

This now needs to be replaced calling both the new clear function and the modified update function:

timekeeping_ntp_clear();
timekeeping_update();

The client is still responsible of deciding when clear should be called, just as before, only now this is done with an explicit call.

Example - Function Pointers

The next example from arch/x86/kernel/cpu/common.c is a bit more complicated. Here we have the following function with the branch related to the boolean parameter in a loop:

static void __cpuinit filter_cpuid_features(struct cpuinfo_x86 *c,
        bool warn)
{
    const struct cpuid_dependent_feature *df;

    for (df = cpuid_dependent_features; df->feature; df++) {

        if (!cpu_has(c, df->feature))
            continue;

        if (!((s32)df->level < 0 ?
             (u32)df->level > (u32)c->extended_cpuid_level :
             (s32)df->level > (s32)c->cpuid_level))
            continue;

        clear_cpu_cap(c, df->feature);
        if (!warn)
            continue;

        printk(KERN_WARNING
               "CPU: CPU feature %s disabled, no CPUID level 0x%x\n",
                x86_cap_flags[df->feature], df->level);
    }
}

The boolean parameter controls if we should print a warning in each iteration in the loop or not. Here we can use our function pointer pattern by writing a new inner function:

static void __cpuinit filter_cpuid_features_inner(struct cpuinfo_x86 *c,
        void(*warn)(const structcpuid_dependent_feature *))
{
    const struct cpuid_dependent_feature *df;

    for (df = cpuid_dependent_features; df->feature; df++) {

        if (!cpu_has(c, df->feature))
            continue;

        if (!((s32)df->level < 0 ?
             (u32)df->level > (u32)c->extended_cpuid_level :
             (s32)df->level > (s32)c->cpuid_level))
            continue;

        clear_cpu_cap(c, df->feature);
        warn(df);
    }
}

The inner function now always makes a call to the function pointer warn and is called by two new functions:

static void __cpuinit filter_cpuid_features_warn(struct cpuinfo_x86 *c)
{
    filter_cpuid_features_inner(c, filter_cpuid_features_print_warning);
}

static void __cpuinit filter_cpuid_features(struct cpuinfo_x86 *c)
{
    filter_cpuid_features_inner(c, filter_cpuid_features_print_empty);
}

These new functions use the new warn functions:

static void __cpuinit filter_cpuid_features_print_warning(
    const struct cpuid_dependent_feature *df)
{
    printk(KERN_WARNING
        "CPU: CPU feature %s disabled, no CPUID level 0x%x\n",
            x86_cap_flags[df->feature], df->level);
}

static void __cpuinit filter_cpuid_features_print_empty(
    const struct cpuid_dependent_feature *df)
{
}

Example - If Else

The last example from fs/fuse/file.c uses an else statement in the branch related to the boolean parameter:

static void fuse_file_put(struct fuse_file *ff, bool sync)
{
    if (atomic_dec_and_test(&ff->count)) {
        struct fuse_req *req = ff->reserved_req;

        if (sync) {
            fuse_request_send(ff->fc, req);
            path_put(&req->misc.release.path);
            fuse_put_request(ff->fc, req);
        } else {
            req->end = fuse_release_end;
            fuse_request_send_background(ff->fc, req);
        }
        kfree(ff);
    }
}

The boolean parameter controls if the call should be synchronous or asynchronous. Here we can use the same pattern as in the previous example, only with a real implementation in the alternate function instead of having it empty. The inner function is written like this:

static void fuse_file_put_inner(struct fuse_file *ff,
        void(*put)(struct fuse_conn *fc,
        struct fuse_req *req))
{
    if (atomic_dec_and_test(&ff->count)) {
        put(ff->reserved_req, ff->fc);
        kfree(ff);
    }
}

It now makes a call to the function pointer put instead of branching on the boolean parameter. The put function pointer needs to point to either a synchronous or asynchronous put function. The inner function is called by the new functions where the names explicitly tells us if the function is synchronous or asynchronous:

static void fuse_file_put_async(struct fuse_file *ff)
{
    fuse_file_put_inner(ff, fuse_file_put_inner_async);
}

static void fuse_file_put_sync(struct fuse_file *ff)
{
    fuse_file_put_inner(ff, fuse_file_put_inner_sync);
}

These functions uses the functions doing the actual synchronous or asynchronous put:

static void fuse_file_put_inner_async(struct fuse_conn *fc,
        struct fuse_req *req)
{
    req->end = fuse_release_end;
    fuse_request_send_background(fc, req);
}

static void fuse_file_put_inner_sync(struct fuse_conn *fc,
        struct fuse_req *req)
{
    fuse_request_send(fc, req);
    path_put(&req->misc.release.path);
    fuse_put_request(fc, req);
}

Summary

There is never any real good reason to use boolean parameters in a function. For every use case there is an alternative solution, either by just extracting a function in order to let each function have a single responsibility or by creating a higher order function. Usually these solutions results in more lines of code. However, this is mainly caused by the syntactical overhead of creating a function. The amount of statements does not increase but the number of possible paths in each function is decreasing.


Blog comments powered by Disqus