Chris Lamb


As we know, Xing Is Not Gaming. Last night mulletron, Odd_Bloke and myself spent a good 8 hours peering at the newly released source for Sun's javac. I had personally been putting off looking at the code, not only because it has an odd signup procedure but also because it could so easily distract me from finishing my own compiler project.

Our goal is implement 'map' functionality similar to the way map works in Python. Our reasoning was that if we could do this then we could add other higher order functions such as filter and fold.

We rejected a number of possible syntaxes that added more reserved words to the language (very bad) or violated Wall's First Law of Programming Language Redesign before settling on for <Iterable> do <method>. I like this syntax because of the syntax highlightable infix between the operands. Inside 'real' code, one might encounter it like so:

import java.util.*;
public class Xing {
    public static void main(String args[]) {
        List<Integer> myList = new ArrayList<Integer>();
        for myList do print;
    public static void print(int i) {

We've encountered a few problems though. Most people know that Java's generics are actually shorthand for writing out the old-style for-loop in conjunction with an Iterator object, and the modern syntax is converted to the older one by a process known as de-sugaring. We planned to implement map functionality by de-sugaring our syntax into the new-style for call (and then let the existing code de-sugar that to the old-style).

However, the generic-based for loop is de-sugared after dataflow analysis and semantic checking has occured, which means that we have to implement these (and various other features) for the map functionality as well, which is decidedly non-trivial in something like javacc. Hopefully I can make it to Qing this evening to finish it off. Qing Is Not Gaming either, if you hadn't guessed.

Anyway, it turns out that the javacc code is messy. Really really messy. But it's the source of great amusement though, not only from the scary amount of no-op casts, misleading indenting and undocumented functions, but the lexical token for the '@' symbol is 'MONKEYS_AT'. No, we have no idea either.


Chris Lamb is a freelance software developer and the current Debian Project Leader. You can read other posts by me, see software I have written or read more about me. You can also follow me @lolamby.


Saturday 13th January 2007

Two comments