ernstsson.net
Switch Elimination

A few weeks ago I wrote about Boolean Parameter Elimination, using function pointers to reduce the number of branches in a function. The same method could also be used eliminating switches. Let’s have a look at an example of a function switching on an enum parameter:

void pointMove(Point *point, Direction direction, int step)
{
    switch (direction) {
        case NORTH:
            point->x += step;
            break;
        case SOUTH:
            point->x -= step;
            break;
        case EAST:
            point->y += step;
            break;
        case WEST:
            point->y -= step;
            break;
        default:
            assert(0);
    }
    printf("Point: x=%i, y=%i\n", point->x, point->y);
}

The function pointMove operates on a Point struct changing its coordinates based on a cardinal direction enum parameter. After a direction specific instruction has been executed a common instruction, in this case printf , is executed for every direction.

Extracting Functions

An initial simple improvement to this would be extracting functions for each case:

void pointMove(Point *point, Direction direction, int step)
{
    switch (direction) {
        case NORTH:
            pointMoveNorth(point, step);
            break;
        case SOUTH:
            pointMoveSouth(point, step);
            break;
        case EAST:
            pointMoveEast(point, step);
            break;
        case WEST:
            pointMoveWest(point, step);
            break;
        default:
            assert(0);
    }
    pointPrint(point);
}

Each block of code in the switch is now extracted into a function. The common code at the end has also been extracted:

static void pointMoveNorth(Point *point, int step)
{
    point->x += step;
}

static void pointMoveSouth(Point *point, int step)
{
    point->x -= step;
}

static void pointMoveEast(Point *point, int step)
{
    point->y += step;
}

static void pointMoveWest(Point *point, int step)
{
    point->y -= step;
}

void pointPrint(Point *point)
{
    printf("Point: x=%i, y=%i\n", point->x, point->y);
}

The example here is of course kept very simple with only one instruction per case in the switch. It is sadly not uncommon to see code where each case in the switch contains tens or even hundreds of lines of code. In this simple example the code complexity after the change is roughly the same with each case still containing one line. For more complex functions this restructuring is the least one should do.

Read More



Recursive and Iterative Functions

Using recursive functions is in my opinion a very elegant way of solving a lot of problems, given that the language and environment is made for it. Unfortunately C is not. Especially in embedded environments or in limited operating systems, recursion can be harmful. In this post we will have a look at recursion in C and how to avoid it.

Recursive Functions

Let’s consider the following implementation of a function returning the factorial of a number:

unsigned long long factorial(unsigned long long n)
{
    if (n == 0 || n == 1) {
        return n;
    }

    return factorial(n - 1) * n;
}

The factorial function calls itself to calculate the factorial for n-1 and then multiplies this with n. For example, since the factorial of 5 is 1*2*3*4*5 this function asks itself to calculate the factorial of 4 (1*2*3*4) then multiplies this with 5. When we get down to calculate the factorial of one, we simply return one.

The effect of this function is that we have a recursively growing stack that depends on the input. Since we need to wait for the call to factorial of n-1 before we multiply this with n, we need to store an instance of the factorial function together with an instance of n for each recursion. This uses up a lot of stack. In many embedded systems this is unwanted since stack sizes might be of static size and can thus simply run out of memory. Even if we do have enough memory for our problem, static stack size configuration for code using recursive functions is hard. This especially problematic if the number of recursions change based on some external input. Of course, in non-embedded systems with dynamic stacks it is still desirable to save memory if possible.

Read More



Boolean Parameter Elimination

One commonly used anti-pattern increasing function complexity reported by Arqua is the use of boolean parameters. Let’s have a look at an example of this:

int calculateSum(int a, int b, int log)
{
    int sum = a + b;
    if (log) {
        printf("result:%i\n", sum);
    }
    return sum;
}

The function calculateSum is used like this:

int main(void)
{
    calculateSum(4, 5, TRUE);
    calculateSum(3, 2, FALSE);
}

Here the first call to calculateSum is logging and the second is not. The parameter log controls if the function should log or not. Of course this is a simplified example, containing only one line doing the calculation. More often the calculation is done over many lines with logging embedded between these lines:

int calculateComplexThings(int a, int b, int log)
{
    //Do initial calculations

    ...

    //Log a value
    if (log) {
        printf("result:%i\n", value);
    }

    //Do more calculations

    ...

    //Log a value
    if (log) {
        printf("result:%i\n", value);
    }

    //Calculate even more

    ...


    return value;
}

Read More



Android Jelly Bean C/C++ Components Structural Analysis

Last week the source code for Android 4.1 Jelly Bean was released. Since Android contains a lot of interesting open source components I could not help myself but to integrate Arqua in the Android build system and generate Arqua visualisations of all the C/C++ components. All the C/C++ components built into Executables, Shared Libraries and Static Libraries have been analyzed and uploaded to analysis.ernstsson.net as interactive clickable and zoomable UML-like diagrams. Might this even be the largest collection of open source project UML-diagrams out there?

Components

Some components analyzed are more Android related such as libaudioflinger, libsurfaceflinger, libpixelflinger and libstagefright.

libstagefright

Others are commonly used in other environments as well such as libpng, libjpeg, sshd, libxml2 and the libsqlite amalgamation.

libjpeg

Notes

The Android C/C++ source modules can be built into three different kinds of targets:

  • Executables
  • Shared Libraries
  • Static Libraries

All of these have been analyzed separately. Only the source in each build has been included in the analysis so for instance static libraries included in an executable will not be visualized as a part of the executable.

C++ files are analyzed but since the analysis is done on RTL files, viewing function symbols within files might be a bit messy. Also because of this, the quality score for C++ files will be misleading.

analysis.ernstsson.net is not tested with Internet Explorer nor with mobile browsers. It is recommended to use Firefox, Chrome or Safari to view the diagrams.



Dependency Inversion in C Using Function Pointers

The recent Arqua analysis of the Linux kernel has generated a few questions to me on how to untangle tangled dependencies. Here are a few ways to invert dependencies in C using function pointers.

Let’s have a look at a simple example of a system with two files:

Client.c

void clientNotify(int notification)
{
    printf("Notification %i\n", notification);
}

static void clientDoAction()
{
    serverDoAction(4);
}

int main()
{
    clientDoAction();

    return 0;
}

Server.c

void serverDoAction(int notifications)
{
    int i;

    for (i = 0; i < notifications; i++) {
        clientNotify(i);
    }
}

The server here notifies the client of an action as many times specified in the parameter to serverDoAction. When running Arqua on this system we get the following result:

Tangled System

As we can see in the image the two files Server and Client has a tangled dependency. It is because the notification back to the client is a static direct call to the client function notifyClient. This type of dependency is not healthy because of several reasons:

Read More



The Linux Kernel Structure Analysis Revisited - Interactive Map

I got lots of interesting feedback on my structure analysis of the Linux kernel last week. This has resulted in an improved version of Arqua as well as a new web page where the new Linux kernel analysis can be found in an interactive version, with pan/zoom support and links down to lower abstraction layers. The page can be found here:

http://analysis.ernstsson.net

The top level of the Linux kernel now looks a bit different and has an improved quality value, 50%:

Linux Kernel

Arqua Improvements

Arqua has been improved based on the suggestions made last week:

  • The quality value Q had some problems and have now been renormalized. The Q value was previously not evenly distributed between 0-100%, this has now been fixed.
  • The function parser had problems in some cases, this has been improved.
  • The visualisation is now color coded, darker green means more problems, brighter means fewer.
  • Arqua can now output URLs to lower abstractions.

The Linux Kernel Analysis - Revisited

A new analysis using the new version of Arqua has also been made. The biggest difference for the analysis is the changed normalization of the quality value Q. This resulted in a new value for the Linux kernel, 50%, very different to last weeks value, 3%. The adjustment to the function parser also impacted the numbers somewhat. In the end the list of smells looks almost the same, same order and same magnitude:

  • linux: Complexity Smells:110883
  • linux: Tangle Smells:85241
  • linux/fs: Complexity Smells:66276
  • linux/kernel: Complexity Smells:50253
  • linux/mm: Complexity Smells:36791

The full analysis can now be found on analysis.ernstsson.net together with an initial try to analyze git the same way. I am hoping to upload more projects here and possibly new versions/configurations of Linux as they are released.

Any comments or suggestions on improvements on the model, visualisation or projects to analyze are welcome. Any help at taming the combination of svg and iframes used on analysis.ernstsson.net is also welcome.



The Linux Kernel Challenge

I was meaning to write about switches vs. function pointers this week but got sidetracked by a friend challenging me to run an Arqua analysis on the Linux kernel. After cloning the kernel git and building an “unconfigured” kernel for arm with the appropriate Makefile patches I was struck by a storm of bugs in Arqua. After days of fixing my own unpleasantly unstructured code and adding new visualisation elements to Arqua, I managed to produce the following image of the recently tagged v3.5-rc4:

Linux Kernel Analysis

high resolution

Moving in even further we get this:

Linux Kernel Detailed Analysis

high resolution

Now, I am extremely naive when it comes to the Linux kernel, so this is strictly an objective analysis of the structure only. The result also assumes that the directory structure is a reflection of the architecture. This is of course not necessarily true, but the way Arqua operates (and in my personal opinion a sound way to structure source code).

Read More



C Code Complexity, Part 2: Measuring

There are a few tools that can be very helpful measuring complexity. My favorite is Structure 101 from Headway Software. I have been using this a lot analysing the complexity and the abstractions of Java projects. I have not used it for C/C++ or .Net but according to their webpage this should also be possible. Since I have had the need to analyse embedded C projects recently I have done a simplified analysis tool called Arqua (Architectural Quality Analyser). It lacks the ease of use and many features you get from Structure 101 but is still useful to visualize abstractions and identify areas of your code with high complexity.

Using Arqua

Arqua is a tool used to visualize software structure and complexity according to the model described in part 1. The implementation is based on the same ideas as the call graph generator Egypt done by Andreas Gustafsson.

Arqua is a perl script. Installation is a copy into any directory in your execution path. The script can be found on github.

Just like Egypt, Arqua parses RTL, an intermediate format used by gcc, and generates a graph in the dot language to be parsed by Graphviz. Let’s look at CalcFile.c from part 1:

#include "StringFile.h"

int doStringCalculation(const char *string)
{
    if (doOtherCalculation(string)) {
        return stringLength(string);
    }
    return 0;
}

int doCalculation(int param, const char *string)
{
    switch (param) {
    case 4:
        return 5;
    case 5:
        return doOtherCalculation(string);
    default:
        return stringLength(string);
    }
}

int doOtherCalculation(const char *string)
{
    if (stringCompare(string, "hello")) {
        return 0;
    }
    return stringLength(string);
}

Use the following command to get gcc to write an RTL file when compiling:

gcc -c -fdump-rtl-expand CalcFile.c

This generates an RTL file, usually with the name CalcFile.c.104r.expand (the name may vary from system to system). Now use Arqua and generate a dot file:

arqua -start 0 -stop 1 CalcFile.c.104r.expand > CalcFile.dot

The start and stop options controls what Arqua should generate, I will explain how they work soon. Arqua writes to stdout so make sure to redirect to where you want the result, (in the example file.dot). The file can now be viewed directly with a dot file viewer or you can use the Graphviz tools to generate an image, for instance a png image:

dot CalcFile.dot -Tpng -oCalcFile.png

Open the new png and have a look at the result:

Arqua visualization of CalcFile

Read More



C Code Complexity, Part 1: Complexity and Abstraction

Recently I have been working with a few software systems seemingly lacking or at least having a limited level of abstraction. My response to this was creating a simple tool measuring complexity caused by lack of abstraction. This tool was then used to prioritize my efforts to decrease the complexity. In part 1 I will write about my thoughts on how to reduce complexity through abstraction. In part 2 I will describe what tools and methods to use to measure this.

Complexity and Actual Complexity

Let’s start with the basics and define complexity: The complexity of an implementation correlates to the number of decisions and interactions made. So, let’s take a look at a problem described in C, and (subjectively) analyze it’s complexity:

int doCalculation(int param, const char *string)
{
    const char *s;

    switch (param) {
        case 4:
            return 5;
        case 5:
            return 3;
        default:
            for (s = string; *s; ++s);
            return (s - string);
    }
}

The function doCalculation has a few decisions to make. First it looks at param and directs us based on this value. Secondly, for all values of param but four and five, we calculate a value based on string. Two decisions are made here, one based on param and one based on string two. There are no interactions in our solutions to our problem, it only depends on the number of decisions.

Read More



Code coverage for C using gcc and lcov

Of all the C / embedded development projects I have worked in, so far I haven’t come across a single one that uses code coverage tools. This is somewhat odd since many of them have been using a tool-chain supporting this, namely gcc. Hoping that the reasons for this is lack of knowledge (and not lack of motivation to use code coverage and do unit testing), I felt the need to write a short guide on how to present code coverage data from gcc using lcov, building the whole thing using make. The guide was made on OS X, but should work on any Unix system (OS X, Linux and Cygwin etc).

Consider a simple program, main.c:

//main.c

#include <stdio.h>
#include <string.h>

static int paramGTThreeFunction()
{
    return 1;
}

static int paramLTFourFunction()
{
    return 2;
}

static int someFunction(int param)
{
    if (param > 3) {
        return paramGTThreeFunction();
    } else {
        return paramLTFourFunction();
    }
}

int main()
{
    int length = strlen("hey");
    if (someFunction(length) > 1) {
        printf("hello\n");
    }
    return 0;
}

Since it’s a short program (and perhaps slightly challenged on the functional side), it’s fairly easy to see which code is being run or not. Still, let’s use this as the basis for our code coverage example.

Read More