I started to experiment Domain-Specific Language (DSL) implementation two weeks ago. The tool I use is antlr v3. We know that Xtext, TCS, Boo are good DSL toolkit. The reason why we don’t use them for now, is because you may be restricted inside their available functions. For example, xtex is more a Domain-Specific Modeling tool than DSL tool, though they have overlaps. I will have to wait for future xtext versions as I see our requirements go beyonds its current features.
Right now have some experience that may be useful, which is from the coding debugging experience. The prototype I implemented is a dynamic script language. It has its own file structure, which defines input, process, and output options and rules.
The input part defines rules that will assign domain meaning to input data, based on evaluating conditions.
The process part supports arithmetical and string computing. It helps further yield, update data from input.
The output part supports dumping data into certain format like XML. Closure output is supported, since XML has such nature.
The following technologies are used in the prototype:
- antlr parser – to construct AST as intermediate format
- antlr tree walkers – to go through AST for certain tasks, like variable mapping, math computing, and string operation.
- XML pretty printing – I spend no time on embedding such feature in any tree walker. Therefore, a utility class is used for pretty printing.
ANTLR works is a great GUI tool for grammar authoring and debugging. There are eclipse plug-ins for antlr, but I haven’t had time to test them. But more importantly, let’s talk about the grammars.
At antlr.org, we can download many sample grammars for popular languages. However, the parser grammar is only useful to say if an input is valid as the language. The main work is to define compiler/interpreter part in implementing a language. Generally there are two kinds of work in this part: (1). The supporting platform/framework to run the language; (2). The tree walker(s) to do staggered jobs, normally compiling or interpreting tasks. Or simply put, make the language runnable.
I defined 4 grammars, one for parser/lexer, and 3 for AST tree walking. When you DSL goes complicated, it helps to define separate groups functions in different grammar files. For example, we define an emitter class, which only deals with outputting XML results. You may say that there is performance problem if you walking through AST multiple times. I don’t have benchmark data so far and we will see. At least I can see that walking in-memory data structure should be quick and its time complexity is only increased linearly.
The expression calculation is in a separate grammar file, and the corresponding tree parser is a utility class. The statement executor is defined by another grammar file. By this way, it helps to refine the expression definition, and make it easy and quick to be reused in another project. In a production environment, it helps to define them in the same tree walker, but it helps to make it clear in current phase.
A beginner may encounter many issues going through the process. One possible issue is to generated runnable lexer/parser/walker. Since you embed action codes (java, c#, …) in grammar files, you may have to go back and forth multiple times to eliminate compilation errors. A good IDE (eclipse, visual studio) helps a lot in locating such issues.
Another issue is to understand how it works. I’d recommend: (1). debugging through parser/walker code to get famliar with it; (2). Read and run examples at antlr.org to explore wider problem space.
Wish the above stuff may help someone in some way. Feel free to comment if any question.
p.s. I found a bug with CommonTreeNodeStream#index() of antlr-v3.1.3. A bug report had been filed at antlr.org days ago. The bug will affect the following tasks:
- shortcut circuiting implementation (&&, ||)
- conditional expression ( ? : )
- if-else statement (depends on your action implementation)
- function definition
- may be others (I only explored a subset for now).