Thursday, October 11, 2007

Defensive Programming - What is that ?

Another rant on how to think in the programming language that you are using. In most languages we use, we strive to handle exceptions as close to the site of occurrence. Be it runtime, be it checked, we take great care to ensure that the system does not crash. This is a form of defensive programming. Erlang does not espouse to this idea - the philosophy of Erlang is "to let it crash". Of course it is not as trivial as that. You have a recovery plan, but the recovery semantics is totally decoupled (yes, I mean it .. physically) from the crash itself.

I found this posting from the Erlang mailing list, where Joe Armstrong, the inventor of Erlang, explains the philosophy. Some of the highlights of his premises ..
In C etc. you have to write *something* if you detect an error - in Erlang it's easy - don't even bother to write code that checks for errors - "just let it crash".

Of course he explains how to handle crash in Erlang. It is very much related to the basic idiom of concurrency and fault tolerance that forms the backbone of Erlang's process structure. In Erlang, you can link processes, so that the linked process can keep an eye on the health of the other process. Once linked, the processes will implicitly monitor each other and if one of them crashes, the other process will be signalled. To handle the crash, Erlang suggests to use linked processes to correct the error. These linked processes need not run on the same processor as the original process, and this is where the Erlang philosophy of make-everything-distributable comes in. Joe mentions in this thread ..
Why was error handling designed like this?

Easy - to make fault-tolerant systems you need TWO processors. You can never ever make a fault tolerant system using just one processor - because if that processor crashes you are scomblonked.

One physical processor does the job - another separated physical processor watches the first processor fixes errors if the first processor crashes - this is the simplest possible was of making a fault-tolerant system.

This is also an example of separation of concerns where the handling of a crash is separately handled through a distribution mechanism. You do not code for checking of errors - let it crash and then you have a built-in mechanism of recovery within the process structure. To a layman, it feels a bit unsettling to deploy your production system in a language that follows the "let-it-crash" philosophy, but with the amazing track record of Erlang in designing fault tolerant distributed systems, it speaks volumes of the robustness and reliability of the underlying engine.

4 comments:

Anonymous said...

"amazing track record of Erlang in designing fault tolerant distributed systems"

Can you back up here a little bit and provide me a few links on where I can see such amazing track record for myself?

Unknown said...

@Dilip: Erlang has been the backbone of Ericsson suite of products for a very long time. Erlang/OTP is what powers the Ericsson AXD301 ATM switch, which reportedly has an uptime of 99.9999999 % (9 9's). Google on Erlang telecom and u will get lots of results in support of the claim made in the post.

Anonymous said...

More correctly would be to refer to "design by contract" and Eiffel on this philosophy.

Unknown said...

@Anonymous: Very true, Bertrand Meyer also has emphasized on "let-it-crash" philosophy, but it is strictly during development. Erlang is possibly the only language which espouses this philosophy in production systems. And this is purely because of the way they have designed the Process structure. It is extremely simple to make processing distributed in Erlang. Refer to the Programming in Erlang book by Joe Armstrong.