Schemer's way

This is an English translation of a Japanese article I wrote to explain Scheme's merits to non-Scheme programmers.
Original article first published on 10/5/2001.
Revised 6/10/2002.

Programming languages should be designed not by piling feature on top of feature,
but by removing the weaknesses and restrictions that make additional features appear necessary.
(Revised(5) Report on the Algorithmic Language Scheme [1]).

Language features and libraries

Programmers who have been programming in the mainstream languages may, when they see Scheme, feel like ``how can I program with such a language with little features?''

The Scheme standard is only 50 pages thin. Scheme programmers claim it's the proof of simplicity of the language. Ok. But it has minimal I/O functions and you can't even delete files you've created. String operations are even inferior than C standard libraries, and without regular expression it is almost useless as a scripting language. I browse thorough the standard document and found unnecessarily complicated loop syntax, which can't be easily ``break''. What? I should use recursion? No joke. I'm writing production code, not a CS class assignment. Oh, well, and where's object oriented features? Nowadays even BASIC has an OO flavor. How can you recommend a language lacking of it? If you want Scheme to be used widely, should you extend the language to meet the modern programming style?

These claims, although they have some truth, miss the point. It is true that Scheme doesn't have enough library. No Scheme advocate will write a production code with only the functions in the standard. On the other hand, most Scheme programmers don't want to make Scheme bigger [2]. This is not a contradiction. Scheme programmers think the features of the language are different from the features of the library.

This difference is not a kind of thing that I can define formally, but you could get the idea: To extend a library, you can write functions in the language. To extend a language, you have to change the implementation of the language.

Of course distinction of these two are blurry, and depends on the language you're discussing. A lot of languages require langauge-level extension to add new operators. If you want to add Perl's `x' operator to Java, you should start rewriting syntactic definitions. The ``feature'' to overload new semantics to the existing operaters are built in to the most languages.

This has never been an issue in Lisp-family languages, for they do not distinguish operators and function calls. If you want a new operator, or to extend an existing operator, you can always go ahead to write your own.

How about the more basic features, such as calling a procedure? Probably this feature is intrinsic in most existing programming languages. If you have such a low level language that doesn't support a procedure call, you can emulate it with some string-substituting macros. However, such trick tends not to be transparent. In some assembly languages, a procedure call may be a macro definition of saving current PC to the stack, increment SP and jump to the specified address. This works until a careless programmer changes SP in unexpected ways.

In order to provide a procedure call as a robust feature without caring how that is implemented, such an operation must be handled in the language itself, and no careless programmer can break it. Locally scoped variables and recursions are such features of the language as well.

So, what's the feature set that is necessary and sufficient to abstract computers and allow programmers to express various range of algorithms easily? Here's some answers Scheme is trying to provide. (of course it's not all).

Built-in garbage collection
Lexical scoping
First-class closure
First-class continuation
Optimized tail calls
Syntactic macros

Don't be scared if you haven't heard some of the terms. I'm not trying to perplex you by using those terms. I need to use them so that we can discuss in abstract level without concerning implementation details.

Suppose somebody's discussing the necessity of do-while syntax. Some say for-syntax is enough, and some argue there are cases that do-while is handy. However, the point is the necessity of syntax specialized for loops. And it is known that if you have such things as ``optimized tail calls'', and with ``syntactic marcros'', any programmer who wants a special loop syntax can implement his own, and that will be as if it is built-in to the language. That is, a careless programmer won't break things in any way, and it runs as efficient as the built-in features. This fact makes the discussion of suitable loop syntax irrelevant.

In other words, Scheme provides a basis of language that you can adapt to whatever problem domain you want to solve. If you want a specialized language to describe your problem, you can have it by just writing libraries, without modifying the language itself. Actually, writing mini-language for particular problem domain is pretty common in the Lisp world, and Scheme is no exception[3]. A bare Scheme seems to provide so little, but it has enough features to grow into your ideal language.

This flexibility is double-edged, however. People write their own libraries and the compability between those code is lost. Lisp re-integrates those libraries into CommonLisp. In Scheme, there's an effort to define a recommended set of libraries [4].

A Legend of Object Oriented Programming

The popular languages claims they support object oriented paradigm. Even some debate that their support is ``purer'' OO than others. They claim, by writing in object oriented programming language, you can have benefits such as reusable and understandable code. Why doesn't Scheme have OOP?

I think there are two answers.

Answer 1. Object oriented paradigm is not so intrinsic for a programming language as to become its feature.

Answer 2. There are lots of ways to implement object oriented paradigm on a programming language, and the optimal implementation may depend on the problem. It is better to let programmers select the best fit implementation, rather than to tie programmers to one specific implementation.

Let's start from explaining the answer 2.

Think about the way of OOP implementation on C++ or Java. You get those features.

You can define a class on your code.
An instance is created at run time, from a class as a template.
Each instance allocates a chunk of memory where it keeps its state. When state changes, it rewrites the memory location.
You define a set of methods that can access internals of the instance.
Some states of instance can't be accessed except by the defined methods.
You can inherit class to change the behavior of the class or adding new state.

Which features are intrinsic to the OOP? Some object oriented languages don't have restriction of access, which is very useful when you develop interactively. CLOS is well used to large scale development, still doesn't restrict access to object slots. If you really want to restrict the access, you can write so by extending class definitions using metaobject protocol.

Some languages don't have class and instance. Self[5] has only one level, objects, and you can create another object by using the original one as a template, which is instancing and/or subclassing. In some languages, a class is just an instance of another class.

There's no reason to bound instance to its class permanently. In some languages, you can change class of an instance. It is extremely useful when you have a persistent instance, and you change its classes a lot during development.

Methods need not to belong to a class; see CLOS. It has some benefits to make methods belong to class, for it creates a namespace of methods by classes. But namespace is a different issue from OOP implementation.

In the functional OOP, instances does not have fixed memory to keep state. Such OOP doesn't modify the memory location to change state; rather, it creates a new instance when state changes, with keeping instance identity for some way.

Oleg Kiselyov showed that if you have OOP with inheritance and you allow subclasses to replace the superclass's behavior, it is practically impossible to ``separate interface from implementation'' [6]. He also showed how to avoid it, using a sort of functional approach.

Henry Baker pointed out the ``iterators'' in the OOP languages like C++ and Java shows fundamental weakness of the design, and there are problems that iterators can't solve, which pure functional approach can[7].

I'm not trying to dispute object oriented languages. I just want to emphasize there are lots of ways to implement object oriented paradigm, and each of them has its own strengths and weakness. It is not a problem if you choose one of them knowing its merits. However, no single implementation solves all domains of problems.

It is arguable that, at least it is handy to provide an OOP implementation if it can solve some of the problem domains. Here we come back to the answer 1. Do you really need to extend the language to implement OOP?

Scheme has more than half dozens of object systems written entirely in Scheme [8]. They implement inheritance and method dispatching; some are pure functional, some have a sophisticated metaobject protocol. All of them are very concise.

Note that ``written in Scheme'' doesn't mean it's slow. You can compile it, for example. Or, most Scheme system allows you to extend the system by writing native library, so you can tune the OO system by writing such modules. It is equivalent to write a string processing library in native code to improve performance [9].

The reason that you can implement an object system on Scheme so easily is because of the first-class closure. In Scheme you can create a function inside another function, and such functions can refer to the variables it can see. Those variable bindings are valid even you return from the first function. This is called a first-class closure.

Suppose you're writing a filter program that reads data from a source and writes result to a sink. In OOP terms, you define abstract base classes, one is that you can read from, another is that you can write to, and your code takes instances of those classes. Then, you can take any input and output, no matter if they are associated to files or sockets or on-memory objects, as far as they inherit those base classes.

In Scheme, it is much simpler. You just define your filter program to take two functions; one returns new data for each invocation, and the other you can call with the result. The function can encapsulate what it is operating on. This effectively allows programmers to create unnamed base class at run time, and you don't need to think about inheritance.

Programming in Scheme

I'm not claiming Scheme is perfect. As I wrote above, Scheme is just a seed unless you grow it toward your problem domain by adding libraries.

This explains why there are so many Scheme implementations. The basic design strategy may differ among problem domain. One implementation can't cover all the cases.

Nevertheless, it helps a lot to have a single standard. You can easily understand the Scheme code which is written for another Scheme implementation, and usually it is easy to port it unless the code uses very weird extention of the specific implementation.

If you want to write production code in Scheme, I recommend you to choose one implementation that is most suitable to your problem domain, and forget about porting. You can use all the extensions the implementation provides. Some implementations provides tons of libraries, although each of them covers different domains. It is silly to limit yourself in the greatest common divisor of all the implementations.

If what the implementation provides is not enough, you can extend it by writing more libraries (remember, you can even define your own loop syntax as a library extension). It is even better if you feed back your extension to the implementation developers. If you want your library to be used across many implementations, then, you go through the code to factor out implementation-dependent parts; usually it should be small.

Fortunately, the basic libraries become standardized by SRFI. Especially, SRFI-1 (List library), SRFI-13 (String library) and SRFI-14 (Character-set library) include rich set of functions most Scheme programmers have had written partially before. If you start now, you should use these SRFIs. Lots of active implementations support them.

If you know your library is big and nice to be cross-platform, you can take a look at other implementation to see if they have similar interface. If SLIB's implementation doesn't run efficiently on your implementation, it is good to write SLIB-compatible extention on your implementation; the other code that depends on yours may run on other implementations using SLIB.

In ``Being Popular'', Paul Graham says it will be important for languages to have well designed rich libraries[10]. I don't know if Scheme will grow towards that; but I can say Scheme won't limit such growth by its language design; it is the spirit of Scheme, shown in the epigram at the beginning.

References and footnotes

[1]: Richard Kelsey, William Clinger and Jonathan Rees (Ed.) Revised(5) Report on the Algorithmic Language Scheme, http://www.schemers.org/Documents/Standards/R5RS/
[2]: Frankly speaking, everybody has ideas he wants to add to Scheme. However, Scheme has a tradition to take time to see whether such a feature is essential or not before including it to the specification. According to the Scheme FAQ, there's no plan for the next specification. The ideas will be realized in SRFIs and each implementations.
[3]: The first Scheme is implemented in Lisp.
[4]: Library standarization efforts is done at Scheme Requests For Implementation. http://srfi.schemers.org/.
[5]: David Ungar and Randall B. Smith, Self: The Power of Simplicity, OOPSLA '87 Conference Proceedings, pp. 227-241, October 1987. You can obtain papers and the implementation from http://www.sun.com/research/self/.
[6]: Oleg Kiselyov: Subtyping, Subclassing and Trouble with OOP, http://pobox.com/~oleg/ftp/Computation/Subtyping/.
[7]: Henry Baker, Iterators: Signs of Weakness in Object-Oriented Languages, ACM OOPS Messenger 4(3) pp.18-25, July 1993. http://linux.rice.edu/~rahul/hbaker/Iterator.html.
[8]: You can find some of OOP in Scheme at: http://www-2.cs.cmu.edu/afs/cs/project/ai-repository/ai/lang/scheme/oop/0.html.
[9]: Of course, if you concerning the efficiency, it is better to build the message passing architecture into the evaluator of the language, and it is debatable that whether such extension is library-level or language-level. I see it is just another way of tuning and not a fundamental change of the language.
[10]: Paul Graham, Being Popular, May 2001. http://www.paulgraham.com/popular.html.

Addendum (6/10/2002)

I originally translated my own japanese article to English just to show the idea to my English-speaking colleagues. It was very rough translation but I left it alone and even forgot that I had put it on the Web. Then some curious mind came to find this, and made me revise the translation and decide to open it finally. Thank you, Oleg.

The ambiguity of the term "object oriented" described here is explained more clearly in an e-mail Jonathan Rees sent to Paul Graham.

Japanese version of this article started an interesting discussion about difference between closures and objects. If you can read Japanese, visit this page and this page.

The author is not an English native speaker. Please let me know if you find errors.