Farey Sequence

While solving this problem, I learned Farey sequence and its definition. There are two interesting properties of this sequence. Les said  Farey sequence F(n), where n is is the biggest denominator in all terms. Then for every successive term a/b, c/d we have:

b*c – a*d = 1 . (1)

And to generate F(n+1) from F(n), we only need to insert mediant of each successive terms, if its denominator is less or equal to n+1. In other words, if F(n+1) has a new term p/q between a/b and c/d,  then:

p = a+c .

q= b+d . (2)

Using above two properties we can easily generate any Farey sequence F(n). However, Wikipedia does not provide the proof of these properties. After couple search I found a blog with very good explanations:

https://sumidiot.wordpress.com/2009/11/03/farey-sequences/

https://sumidiot.wordpress.com/2009/11/05/neighbors-in-farey-sequences/

http://www.ww.ingeniousmathstat.org/sites/default/files/pdf/upload_library/22/Allendoerfer/1982/0025570x.di021121.02p0002y.pdf

TODO: describe the proof

Catalan Number

When solving this combinatorial problem, I learned a new thing called Catalan number. The problem is to find all isomorphic forms of binary trees using N nodes.

With simple divide and conquer, we can have the form:

F(0) = 1,

F(N) = sigma(0, N-1, F(i)*F(N-1-i))

Actually, F(N)  is indeed the famous Catalan number. Catalan number has closed form representation:

C(N) = (2N, N)

However, it is non-trivial to get the closed form. The proof requires the knowledge of using generating function and non-integer binomial expansion to solve sequence sum. I found great resources of the proof.

http://math.stackexchange.com/…/simplifying-catalan-number-…http://www.cs.nthu.edu.tw/…/algo08-tut…/tutorial-catalan.pdf#catalan_number

 

Catalan number has other interesting applications

TODO

[Linux] Use bash environment variables via SSH command

Recently I hit a problem when I setting our Jenkins testing system: we need to run a script via SSH command on a remote machine. However, the script itself depends on many environment variables which is set in .bashrc. Simple SSH command like this won’t work:

ssh user@ip "sh foo.sh"

because SSH command it is not a login terminal, nor an interactive terminal. Thus bash is not guaranteed to be invoke and environment variables won’t be load. Stack Overflow has a clear explanation. This cause our shell script fail because of unset variables.

However, the solutions posted on Stack Overflow does not work for me. After couple few hours exploration, I find a workable (but not very elegant) solution based on changing config of sshd

1. modify /etc/ssh/sshd_config, and insert these lines into the file

add PermitUserEnvironment yes
ForceCommand /etc/ssh/bar

here /etc/ssh/bar is the file we will add

2. add file /etc/ssh/bar, and set environment variables in the file. Or you can source other shell script. Example

export foofoo="FOOFOO"
export barbar="BARBAR"
source /home/user/env.sh

That’s it. The lesson I learn from this is don’t get environment variables and shell as granted, especially you are operating via ssh terminal.

Using ccache on distcc server

In large c-family project (C, C++, Object-C, etc), building is time consuming. There are simple way to improve it:

  1. ccache: cache compiled result (object file) locally, and reuse it when source code is not changed, to avoid unnecessary  compilation. Huge improve for incremental build.
  2. distcc: distribute compilation tasks among a pool of machines via network (like a cloud). This scale up the build performance linearly to the number of machines you have (sure, it is bound by network I/O, still).

Even more, you can easily combine ccache with distcc by using CCACHE_PREFIX, as this article described. This will use ccache to cached pre-built result locally. For each compilation task, it first check if local cache result can be used, then dispatch to distcc machine pool if it has to re-compile it. Pretty nit, nah?

There is a further improvement we can do — let all distcc remote machines also use ccache to benefits from their own cache! The flow will be:

(local)                  (remote)
ccache > distcc ==> distcc > ccache > hit?  > return
                                    > miss? > gcc -> cache & return

The concept is so straightforward, but it took me a while to find the valid solution. Before describe the solution, first let me introduce how ccache works. After you install ccache, it will overwrite PATH=/usr/local/ccache:$PATH. This is a path masquerade trick: when you run  g++ main.cc -c, you actually run

/usr/local/ccache/g++ main.cc -c

which is a symlink to /usr/bin/ccache, which then look up for the real compiler in $PATH but not in usr/local/ccache to avoid recursion,  like /usr/bin/g++. So eventually your command is translated to:

/usr/bin/g++ main.cc -c

Before execute that command, ccache will compare the preprocessed result with local ccache, and use it if available. If not, it will execute the above command and save to local cache.

Back to our problem. When you use CCACHE_PREFIX=distcc, like the above example, ccache will take over the control first, expand the command, look up local cache, and dispatch the “expanded command” to remote servers — this cause the trouble. On remote server side, the distcc server will receive the command which compiler is an absolute path, so it will not invoke ccache, and it will not cache anything on remote cache.

So the point is: we want the command send to distcc server is not expanded — we don’t want server receive /usr/bin/g++ main.cc -c, we want server receive g++ main.cc -c, or /usr/local/ccache/g++ main.cc -c. Meaning that we need to modify the expanded command edited by ccache.

Is that achievable? Yes, the key is to use flag DISTCC_CMDLIST, from man page:

DISTCC_CMDLIST

If the environment variable DISTCC_CMDLIST is set, load a list of supported commands from the file named by DISTCC_CMDLIST, and refuse to serve any command whose last DISTCC_CMDLIST_MATCHWORDS last words do not match those of a command in that list. See the comments in src/serve.c.

What it actually do is to allow the distcc server to map a compiler absolute path to another path, some thing like

/usr/bin/g++  -> /usr/local/ccache/g++

Bingo! That’s what we want. The instructions is as follows:

1. create a file /home/.distcc/DISTCC_CMDLIST with this line:

/usr/local/ccache/g++

2. in /etc/default/distcc, append the following lines:

export DISTCC_CMDLIST=/home/.distcc/DISTCC_CMDLIST
export CCACHE_DIR=/home/.ccache
export PATH=/usr/local/ccache:$PATH

line 1 tells distcc server to use DISTCC_CMDLIST file for the mapping.
line 2 is necessary to make sure the child processing spawning in distcc knows where is the ccache directory.
line 3 tells distcc server to use ccache compiler masquerade

3. change ccache directory permission

sudo chmod 777 /home/.ccache

4. restart distcc server

sudo /etc/init.d/distcc restart

Good to go!

Now we get the benefits of cache, not just on local machine,  but also the remote servers! Cache is much  faster than compiling: from my test result, even a simple helloworld program can speed up from 20 ms to 1ms.

#include <iostream> 
using namespace std;
int main() {
 cout << "hello world" << endl;
}

I put short solution on Stack Overflow, too. Hope this is helpful to you, and let me know if there is any mistake: wilson100hong@gmail.com. Thanks!