Using ccache on distcc server

In large c-family project (C, C++, Object-C, etc), building is time consuming. There are simple way to improve it:

  1. ccache: cache compiled result (object file) locally, and reuse it when source code is not changed, to avoid unnecessary  compilation. Huge improve for incremental build.
  2. distcc: distribute compilation tasks among a pool of machines via network (like a cloud). This scale up the build performance linearly to the number of machines you have (sure, it is bound by network I/O, still).

Even more, you can easily combine ccache with distcc by using CCACHE_PREFIX, as this article described. This will use ccache to cached pre-built result locally. For each compilation task, it first check if local cache result can be used, then dispatch to distcc machine pool if it has to re-compile it. Pretty nit, nah?

There is a further improvement we can do — let all distcc remote machines also use ccache to benefits from their own cache! The flow will be:

(local)                  (remote)
ccache > distcc ==> distcc > ccache > hit?  > return
                                    > miss? > gcc -> cache & return

The concept is so straightforward, but it took me a while to find the valid solution. Before describe the solution, first let me introduce how ccache works. After you install ccache, it will overwrite PATH=/usr/local/ccache:$PATH. This is a path masquerade trick: when you run  g++ main.cc -c, you actually run

/usr/local/ccache/g++ main.cc -c

which is a symlink to /usr/bin/ccache, which then look up for the real compiler in $PATH but not in usr/local/ccache to avoid recursion,  like /usr/bin/g++. So eventually your command is translated to:

/usr/bin/g++ main.cc -c

Before execute that command, ccache will compare the preprocessed result with local ccache, and use it if available. If not, it will execute the above command and save to local cache.

Back to our problem. When you use CCACHE_PREFIX=distcc, like the above example, ccache will take over the control first, expand the command, look up local cache, and dispatch the “expanded command” to remote servers — this cause the trouble. On remote server side, the distcc server will receive the command which compiler is an absolute path, so it will not invoke ccache, and it will not cache anything on remote cache.

So the point is: we want the command send to distcc server is not expanded — we don’t want server receive /usr/bin/g++ main.cc -c, we want server receive g++ main.cc -c, or /usr/local/ccache/g++ main.cc -c. Meaning that we need to modify the expanded command edited by ccache.

Is that achievable? Yes, the key is to use flag DISTCC_CMDLIST, from man page:

DISTCC_CMDLIST

If the environment variable DISTCC_CMDLIST is set, load a list of supported commands from the file named by DISTCC_CMDLIST, and refuse to serve any command whose last DISTCC_CMDLIST_MATCHWORDS last words do not match those of a command in that list. See the comments in src/serve.c.

What it actually do is to allow the distcc server to map a compiler absolute path to another path, some thing like

/usr/bin/g++  -> /usr/local/ccache/g++

Bingo! That’s what we want. The instructions is as follows:

1. create a file /home/.distcc/DISTCC_CMDLIST with this line:

/usr/local/ccache/g++

2. in /etc/default/distcc, append the following lines:

export DISTCC_CMDLIST=/home/.distcc/DISTCC_CMDLIST
export CCACHE_DIR=/home/.ccache
export PATH=/usr/local/ccache:$PATH

line 1 tells distcc server to use DISTCC_CMDLIST file for the mapping.
line 2 is necessary to make sure the child processing spawning in distcc knows where is the ccache directory.
line 3 tells distcc server to use ccache compiler masquerade

3. change ccache directory permission

sudo chmod 777 /home/.ccache

4. restart distcc server

sudo /etc/init.d/distcc restart

Good to go!

Now we get the benefits of cache, not just on local machine,  but also the remote servers! Cache is much  faster than compiling: from my test result, even a simple helloworld program can speed up from 20 ms to 1ms.

#include <iostream> 
using namespace std;
int main() {
 cout << "hello world" << endl;
}

I put short solution on Stack Overflow, too. Hope this is helpful to you, and let me know if there is any mistake: wilson100hong@gmail.com. Thanks!

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s