Java and XML: Let's not use them together

My dissatisfaction with both java and XML is fairly well documented in previous posts. To recap two major points:

  • Java isn't flexible enough, both syntactically and with respect to it's type system. (Let's leave aside the lack of a reasonable lambda-style syntax for the moment, which is higher on my List of Things That Make Me Cuss When I'm Programming In Java, but is not as relevant for this article.)
  • XML is, by design, horrifically redundant. (This is almost acceptable for what it was intended to be: a non-human readable data format that didn't have the negative connotations associated with s-expressions. It is totally unacceptable now that people are forced to look at it all day long.)

What I'd like to look at in this post is why I think XML has become such a huge part of java development and why I think that is unfortunate. The short answer to the first part is:


People need Domain Specific Language (DSLs)



Why do people need DSL's? Because there are whole chunks of applications that don't need a full, general programming language, and for which a full, general programming language is poorly suited. O/R mapping is a good and common enough example. Build tools are another: you want a syntax that encapsulates the common operations so you don't end up generating reams of general code to do basic activities.

As noted in
this paper, there is a continuum between libraries and DSLs. So, why have DSLs at all? Why not simply design libraries?

The answer to that question is: syntax. As much as academics might scoff at it, syntax matters, and it matters
a lot.

This is precisely why ruby is enjoying so much success right now: ruby's syntax and evaluation rules are so flexible that it allows you to create minimal, expressive DSL's with very little effort. You simply have to get your head around how meta-programming in ruby works, and you are off to the races. Ruby on rails is a DSL for building web applications, and a pretty darned good one.

Java is, of course, much more locked down than ruby. And this isn't necessarily a bad thing. The java designers were coming from a world replete with horrible C macro-kludges, so it's understandable that they decided to leave out syntactic extensions. If every man is a language designer, you end up with a ton of badly designed languages. But you also end up with a few very well designed ones. And I'm not entirely convinced that the vast majority of useful, small DSL's aren't simply badly designed languages that answer a particular specific need, akin to German's relationship with soldiering.

In any even, that's wandering a bit off point. The facts on the ground today are that java developers have been in need of a way to design and implement DSL's for a while now (even when they don't call it that) and the accepted way to do it has become XML. Why?

My theory is this: in java, DSL's usually start out as a library, then progress to a library with a smidgen of configuration. XML became the de facto standard for config files during the .com boom, property files apparently not being cool enough, so config information ended up in XML files. Additionally, XSD's give us a rudimentary language syntax (though not semantic) verification tools.

All fine and well. I might pick another syntax for structured configuration (say,
YAML), but whatever. XML is reasonably suited for simple declarative programming.

But then we java developers started doing more and more in those config files and, at some point, they began to cross over that invisible line and become
semantically crucial parts of our applications. They no longer simply contained a few flags used to slightly modify runtime behavior. They became an XML-based programming language for crucial subsystems.

This is unfortunate, for many reasons. Among them:

  • We have traded java, a language that, while certainly not beautiful, is at least plausible for one that was never designed for human consumption. XML is utterly miserable to use in large quantities. See ant build files, and weep.
  • We now have to think in two different syntaxes. I maintain that this is a difficult transition for a significant portion of the programming populace.
  • We cannot have any sort of locality with related java code. Again, the syntax is so utterly foreign that it is like mixing Japanese and English. Even if we could put it in the same file, or add IDE support to navigate from one to the other, it wouldn't work well.
  • And, most interestingly to me, it becomes difficult to communicate whatever type information we have built into our DSL to Java. We have two choices I can see:


      So, that outlines why I think we ended up with so much XML in our java applications, and why I view that as an unfortunate thing. Now the hard part: what can be done about it.

      Frankly, I have no idea.

      My first reaction is that we need to open up java with a type-safe macro language to allow for syntactic extensions. But as nonchalant as ruby has made me about language extensions, it still seems insane in java. The macro (meta?) language needs the ability to communicate with the java type system easily, making it easy to generate coherent error messages.

      I realize, of course, that I may simply be saying something as absurd as "let's make hard problems easy," but I have to believe that there is a better way than the current state of things.

      I'm going to spend some quality time with O'caml/camlp4 over the next month and see if I get anything out of it.

      Related Links:
      • Wikipedia link on DSLs
      • Camlp4 - O'caml's DSL support framework
      |