May 26

I started to experiment Domain-Specific Language (DSL) implementation two weeks ago. The tool I use is antlr v3. We know that Xtext, TCS, Boo are good DSL toolkit. The reason why we don’t use them for now, is because you may be restricted inside their available functions. For example, xtex is more a Domain-Specific Modeling tool than DSL tool, though they have overlaps. I will have to wait for future xtext versions as I see our requirements go beyonds its current features.

Right now have some experience that may be useful, which is from the coding debugging experience. The prototype I implemented is a dynamic script language. It has its own file structure, which defines input, process, and output options and rules.

The input part defines rules that will assign domain meaning to input data, based on evaluating conditions.

The process part supports arithmetical and string computing. It helps further yield, update data from input.

The output part supports dumping data into certain format like XML. Closure output is supported, since XML has such nature.

The following technologies are used in the prototype:

  1. antlr parser - to construct AST as intermediate format
  2. antlr tree walkers - to go through AST for certain tasks, like variable mapping, math computing, and string operation.
  3. XML pretty printing - I spend no time on embedding such feature in any tree walker. Therefore, a utility class is used for pretty printing.

ANTLR works is a great GUI tool for grammar authoring and debugging. There are eclipse plug-ins for antlr, but I haven’t had time to test them. But more importantly, let’s talk about the grammars.

At antlr.org, we can download many sample grammars for popular languages. However, the parser grammar is only useful to say if an input is valid as the language. The main work is to define compiler/interpreter part in implementing a language. Generally there are two kinds of work in this part: (1). The supporting platform/framework to run the language; (2). The tree walker(s) to do staggered jobs, normally compiling or interpreting tasks. Or simply put, make the language runnable.

I defined 4 grammars, one for parser/lexer, and 3 for AST tree walking. When you DSL goes complicated, it helps to define separate groups functions in different grammar files. For example, we define an emitter class, which only deals with outputting XML results. You may say that there is performance problem if you walking through AST multiple times. I don’t have benchmark data so far and we will see. At least I can see that walking in-memory data structure should be quick and its time complexity is only increased linearly.

The expression calculation is in a separate grammar file, and the corresponding tree parser is a utility class. The statement executor is defined by another grammar file. By this way, it helps to refine the expression definition, and make it easy and quick to be reused in another project. In a production environment, it helps to define them in the same tree walker, but it helps to make it clear in current phase.

A beginner may encounter many issues going through the process. One possible issue is to generated runnable lexer/parser/walker. Since you embed action codes (java, c#, …) in grammar files, you may have to go back and forth multiple times to eliminate compilation errors. A good IDE (eclipse, visual studio) helps a lot in locating such issues.

Another issue is to understand how it works. I’d recommend: (1). debugging through parser/walker code to get famliar with it; (2). Read and run examples at antlr.org to explore wider problem space.

Wish the above stuff may help someone in some way. Feel free to comment if any question.

p.s. I found a bug with CommonTreeNodeStream#index() of antlr-v3.1.3. A bug report had been filed at antlr.org days ago. The bug will affect the following tasks:

  1. shortcut circuiting implementation (&&, ||)
  2. conditional expression ( ? : )
  3. if-else statement (depends on your action implementation)
  4. function definition
  5. may be others (I only explored a subset for now).
  • Share/Bookmark
Feb 27

Part 1 of the notes is located at http://www.frankdu.com/weblog/archives/46

The relative presentation slide is located at openArchitectureWare.org.

23. In XText, you start to work with defining concret syntax.

24. For existing meta model, use importMetamodel directive. Use preventMMGeneration to prvent any meta model generation.

25. Simple Editor Customization in Xtext
- Xtend, expression language used throughout oAW
- Constraint checks: oAW check language, based on Xtend
- OutlineView customization: override label(…) and image(…) for meta types
- Content Assist
- customize the font style for keyword (keyword only)

26. Xtext instantiates Ecore metamodels, which means that it can be processed with any EMF tool.
- Within oAW workflow: the only Xtext-specific aspect is using the generated parser. Xpand template language is powerful code generation tool. Easily traverse the model/meta model using Xtend language
- EMF way: EMF’s native resource mechanism (what are the details?)
- Your own code: use the generated parser.

27. NodeUtil with generated parser
- Typically you only work with AST (ecore file)
- Help to obtain info from the parser tree: element location, element text, parser tree node at certain offset.

28. Two phases for doing your DSL:
- designing your language
- building language tools
Xtext focuses on the second phase. Except from the phases, it is also important to provide framework that run the tasks defined your DSL.

29. oAW Xtext become a part of TMF project. The first release is expected in later half 2009.

30. A Xtext parser limitation. It’s impossible to add custom action code in the parser. Sometimes it results in ugly meta models, especially with building expression languages.

  • Share/Bookmark
Feb 26

The relative presentation slide is located at openArchitectureWare.org.

Part 2 of the notes is located at http://www.frankdu.com/weblog/archives/52

Below are my reading notes for Textual DSLs and text modeling in eclipse. I haven’t finished the ppt slides. Therefore, this is only part one.

1. EMF servers as the foundation. It provides Ecore Metamodel and framework tools like:
- editing
- transactions
- validation
- query
- distribution/persistence

2. GMF is used for building custom graphical editors based on EMF meta models. It is industry-proven technology. Based on GEF.

3. TMF is used for building custom textual editors. It is in incubation phase. There are two implementations: Xtext and TCS.

4. M2M (Model-to-Model) delivers an extensible framework for m2m transformation languages. ATL is M2M language from INRIA. QVT is an implementation.

5. M2T (Model-to-Text) focuses on transforming models into text (code generation, model serialization). For example, you may want to convert in-memory models into xml files for persistence/transportation purposes. You may want to use a parser to convert xml files back to models. There are 2 so-called frameworks:

- JET is code generation tools that are used by EMF
- Xpand is code generation tools that are part of M2T releases.

6. Xtext is originally from openArchitectureWare.com. It’s a good integration with eclipse. The oAW uses EMF as a basis, bases graphical editors on GMF, and all tooling are based on eclipse. Since Xtext has become part of eclipse TMF, there are two versions of Xtext: oAW Xtext, and TMF Xtext. The former is relatively mature. The latter is under active development, and expected to be first released sometime this year, namely in 2009.

7. DSL is a focused, processable language for addressing specific concerns in a specific domain. It is targeted to be a simple tool for a relatively complex domain. Therefore, in most cases, DSLs are human-readable to domain experts without any training. The popular DSL examples are SQL and Excel.

8. DSLs can be classified in many ways:
- configuration vs. customization
- internal vs. external
- graphical vs. textual

9. Xtext is a so-called framework tool for building external textual customization DSLs.

10. Is it possible to edit same model with both textual and graphical editing interfaces?
It might be possible. Consider one of the following: a. Visualize a subset of the model, using graphvis or prefuse. But it is typically read-only. b. Use different perspectives. Some of them use graphcial editor. It requires cross references between textual and graphical models). c. Edit the same model textually and graphically. Textual format is used as the serialization format from the graphical model. It requires writability and sync of both models!

11. Typically textual DSLs leverage one of many parser generators (ANTLR, Java CC, Lex/yacc). They help to generate a parser based on grammar definition. Consequencely, a parser tries to match text, and try to create a parse tree.

12. Typically, textual DSLs are transformed into an Abstract Syntax Tree (AST). It is ofen a binary tree. For exampe, the AST for 1 + 2*3:

Literal[1] AddExpression ( Literal[2] MultiplyExpression Literal[3])

Literal, AddExpression, and MultiplyExpression are binary nodes.

13. The AST can be taken as a model. But textual DSLs are written without careness of the AST. They can even be against AST.

14. Challenges in Xtext DSL implementation:
- Writting a parser is non-trivial.
- A parser generator makes life easier, but still not one for all.
- A parser generator only creates a matcher and/or a simplistic AST. You still need to further transform the model to easily processable form, and create an editor with syntax highlighting, code completion, etc.

Xtext is designed to ease unbearable burden of life like that.

15. Xtext is based on an EBNF grammar (what’s that? Why will it make a difference?). Xtext will generate:
- ANTLR-based parser
- EMF-based metamodel
- Eclipse editor with or extensible for: syntax highlighting, code completion, code folding, constraint checking, and so on.

16. Different Kinds of Xtext Rules:
- Type Rule
- String Rule
- Enum Rule
- Native Rule

17. Built-in Lexer Types in Xtext:
- ID
- STRING
- INT
- Comments ( Single line and multiple lines)
- Whitespace
The content of those rules is not transformed into the meta model. (How it matters?)

18. Built-in Reference Types in Xtext:
- Reference
- File Reference/Import

19. Abstract Type Rules are implicitly declared with a collection of OR-ed alternatives: R1 | R2 | R3. They will be mapped to abstract metaclass. The alternatives will become subclass. The common properties will be lifted into abstract superclass.

20. String Rules are declared in the format: String [rule_name]: [rule_definition];

21. Enum Rule is mapped to Enum in metamodel. Its format: Enum [rule_name]: [token_name="string"]+;

22. Native Rule. Example:
Native SL_COMMENT:
“‘#’ ~(’\n’|'\r’)* ‘\r’? ‘\n’”;

Term:
1. EMF - Eclipse Modeling Framework
2. GMF - Graphical Modeling Framework
3. GEF - Graphical Editing Framework
4. TMF - Textual Modeling Framework
5. DSL - Domain Specific Langauge
6. JET - Jave Emitter Templates
7. AST - Abstract Syntax Tree

Continue to read: Part 2 of the notes is located at http://www.frankdu.com/weblog/archives/52

  • Share/Bookmark
Jan 28

These three videos are very helpful:

http://www.openarchitectureware.org/screencasts/externaldsl_part1.htm

http://www.openarchitectureware.org/screencasts/externaldsl_part2.htm

http://www.openarchitectureware.org/screencasts/externaldsl_part3.htm

  • Share/Bookmark
Jan 16

I am looking into DSL (domain specific language) recently. There are a few links below, bookmarked for my own convenience:

  1. Language Oriented Programming: The Next Programming Paradigm
    A good paper introduction to DSL. The language is plain, vivid, and easy to understand. Really a good start point.
  2. An introductory example of domain specific languages - by Martin Fowler
    It’s a very good start point. Martin gives explanations by examples, which make everything clear to understand. There are some examples in XML, ruby-like, and vbscript-like forms.
  3. The Pragmatic Code Generator Programmer by Sven Efftinge et. al.
    Another article for using openArchitectWare Xtext.
  4. The Help Documentation coming along with the Xtext Framework
    Definitely the good documentation to start with Xtext.
  5. Sven Efftinge’s Blog
    Svem plays an important role in developing Xtext, which is a DSL framework.
  6. More blogs ……
    More blogs at ohloh……
  7. The Xtext Framework: TMF Xtext and oAW Xtext
    The former is a subproject of eclipse TMF (textual modeling project). The latter is under the umbrella of OpenArchitectAWare. Definitely there is connection between them. Yes, by the same group of people.
  8. MPS Project - Developed by a famous Russian company located in Saint-Petersburg, they published the paper of the #1 item. (Many thanks to Andrey for the correction :-)
  • Share/Bookmark