Java and XML: Let's not use them together
- Java isn't flexible enough, both syntactically and with respect to it's type system. (Let's leave aside the lack of a reasonable lambda-style syntax for the moment, which is higher on my List of Things That Make Me Cuss When I'm Programming In Java, but is not as relevant for this article.)
- XML is, by design, horrifically redundant. (This is almost acceptable for what it was intended to be: a non-human readable data format that didn't have the negative connotations associated with s-expressions. It is totally unacceptable now that people are forced to look at it all day long.)
What I'd like to look at in this post is why I think XML has become such a huge part of java development and why I think that is unfortunate. The short answer to the first part is:
People need Domain Specific Language
(DSLs)
Why do people need DSL's? Because there are whole
chunks of applications that don't need a full,
general programming language, and for which a full,
general programming language is poorly suited. O/R
mapping is a good and common enough example. Build
tools are another: you want a syntax that
encapsulates the common operations so you don't end
up generating reams of general code to do basic
activities.
As noted in
this
paper,
there is a continuum between libraries and DSLs. So,
why have DSLs at all? Why not simply design
libraries?
The answer to that question is: syntax. As much as
academics might scoff at it, syntax matters, and it
matters
a lot.
This is precisely why ruby is enjoying so much
success right now: ruby's syntax and evaluation rules
are so flexible that it allows you to create minimal,
expressive DSL's with very little effort. You simply
have to get your head around how meta-programming in
ruby works, and you are off to the races. Ruby on
rails is a DSL for building web applications, and a
pretty darned good one.
Java is, of course, much more locked down than ruby.
And this isn't necessarily a bad thing. The java
designers were coming from a world replete with
horrible C macro-kludges, so it's understandable that
they decided to leave out syntactic extensions. If
every man is a language designer, you end up with a
ton of badly designed languages. But you also end up
with a few very well designed ones. And I'm not
entirely convinced that the vast majority of useful,
small DSL's aren't simply badly designed languages
that answer a particular specific need, akin to
German's relationship with soldiering.
In any even, that's wandering a bit off point. The
facts on the ground today are that java developers
have been in need of a way to design and implement
DSL's for a while now (even when they don't call it
that) and the accepted way to do it has become XML.
Why?
My theory is this: in java, DSL's usually start out
as a library, then progress to a library with a
smidgen of configuration. XML became the de facto
standard for config files during the .com boom,
property files apparently not being cool enough, so
config information ended up in XML files.
Additionally, XSD's give us a rudimentary language
syntax (though not semantic) verification tools.
All fine and well. I might pick another syntax for
structured configuration (say,
YAML),
but whatever. XML is reasonably suited for simple
declarative programming.
But then we java developers started doing more and
more in those config files and, at some point, they
began to cross over that invisible line and
become
semantically crucial parts of our
applications.
They no longer simply contained a few flags used to
slightly modify runtime behavior. They became an
XML-based programming language for crucial
subsystems.
This is unfortunate, for many reasons. Among them:
- We have traded java, a language that, while certainly not beautiful, is at least plausible for one that was never designed for human consumption. XML is utterly miserable to use in large quantities. See ant build files, and weep.
- We now have to think in two different syntaxes. I maintain that this is a difficult transition for a significant portion of the programming populace.
- We cannot have any sort of locality with related java code. Again, the syntax is so utterly foreign that it is like mixing Japanese and English. Even if we could put it in the same file, or add IDE support to navigate from one to the other, it wouldn't work well.
- And, most interestingly to me, it becomes difficult to communicate whatever type information we have built into our DSL to Java. We have two choices I can see:
-
- We can do java code generation off of our XML-based DSL's, which everyone hates. Among other things, it takes time, requires a lot of infrastructure work and can introduce some nasty build dependencies. We do a fair amount of this a Guidewire.
-
- We can communicate with our DSL via non-typesafe mechanisms (usually hashes and strings). This is the preferred mechanism because it is the easiest. Simply do nothing! But then one wonders why we spend so much time crucifying ourselves on the cross of type safety in java, when increasing amounts of our application code reside across this great type-unsafe divide.
So, that outlines why I think we ended up with so much XML in our java applications, and why I view that as an unfortunate thing. Now the hard part: what can be done about it.
Frankly, I have no idea.
My first reaction is that we need to open up java with a type-safe macro language to allow for syntactic extensions. But as nonchalant as ruby has made me about language extensions, it still seems insane in java. The macro (meta?) language needs the ability to communicate with the java type system easily, making it easy to generate coherent error messages.
I realize, of course, that I may simply be saying something as absurd as "let's make hard problems easy," but I have to believe that there is a better way than the current state of things.
I'm going to spend some quality time with O'caml/camlp4 over the next month and see if I get anything out of it.
Related Links: