quinta-feira, 28 de junho de 2012

Programming languages start-up times

I don't remember the last time I wrote a substantial program in anything but Perl. It fills almost all of my needs as a sysadmin and diletante programmer. But there are situations in which I write bash scripts. Two, to be precise.

One situation in which I feel more like using bash than Perl is when the script is small and function as a driver to invoke other programs. Bash (any shell, in fact) syntax to invoke other programs is more succint than Perl's. As you know: succinctness is power.

The other situation, is when the script in question is short and is going to be invoked lots of times. In this case, I worry about its start-up time, because it may very well dominate the overall system performance. I always assumed that Perl's start-up time was much larger than any shell's start-up time. However, as Knuth wisely said: Premature optimization is the root of all evil. You should always verify your assumptions with a profiler or, at least, a stopwatch before investing in any optimization work.

These days I'm studying the implementation of Git hooks and I'm constantly struggling to decide if I should write them in Perl or bash, because they tend to be frequently invoked when you setup a Git server serving lots of developers.

So, I decided to check exactly that. What's the real difference between the start-up time of Perl and bash. My testing platform is bash. I simply timed one thousand invokations of bash and Perl telling them to do nothing. This is what I got in my Dell Latitude E6410 laptop running Ubuntu 12.04:

$ (time for i in `seq 1 1000`; do bash -c :; done) 2>&1 | grep real
real 0m2.858s

$ (time for i in `seq 1 1000`; do perl -e 0; done) 2>&1 | grep real
real 0m3.326s

Not that different at all, is it? Perl takes just 16% more time to do nothing than bash. I sure was expecting Perl to take much more time than bash. Of course, a script doing nothing isn't that useful, although it can inspire a blog post. But while in order to perform useful work a bash script needs to invoke other programs, a Perl script can do many things in a single process just by useing (sic) Perl modules. So, I guess that after starting-up behind a bash script, an equivalent Perl script is going to catch up and finish the run first almost all times.

I found this very interesting. So much so that I decided to extend my investigations to other scripting and compiled languages as well. Just out of curiosity. But the results were startling.

The other three main scripting languages fared much worse than Perl. I wasn't expecting such a huge difference:

$ (time for i in `seq 1 1000`; do ruby -e 0; done) 2>&1 | grep real
real 0m5.628s

$ (time for i in `seq 1 1000`; do python -c 0; done) 2>&1 | grep real
real 0m27.373s

$ (time for i in `seq 1 1000`; do echo exit | tclsh; done) 2>&1 | grep real
real 0m10.991s

Ruby is 1.7 times slower than Perl, TCL is 3.3 times slower, and Python is 8.2 times slower!

What about compiled languages? They should be faster, right? Of course they are. Let's C:

$ cat >null.c <<EOF
#include <stdlib.h>
int main()
{
    exit(0);
}
EOF

$ gcc -O -o null null.c

$ (time for i in `seq 1 1000`; do ./null; done) 2>&1 | grep real
real 0m1.185s

This is interesting, because I rekon that this C program must have one of the shortest possible start-up times. So we can use it as a yardstick with which to compare every other language.

What about Java? I don't speak Java, so I googled "java helloworld", found a good example and stripped it of every non-essential work:

$ cat >Null.java <<EOF
public class Null
{
    public static void main(String args[])
    {
    }
}
EOF

$ javac Null.java

$ (time for i in `seq 1 1000`; do java Null; done) 2>&1 | grep real
real 0m58.231s

What?!? Almost one minute for doing nothing one thousand times? I did it again and again just to be sure. I realize Java isn't a vanila compiled language. At least, not like C. The Java compiler generates byte codes that are interpreted by the JVM. But since scripting languages in general have to perform the source to bytecode conversion just before the interpretation I thought that Java would be at least a little faster than most. So much for enterprise languages...

So, to sum it all up, here is the final score of the game: