Here are some tutorial links for learning antlr v3:
Recently I noticed many languages released or emerged, such as GO, NOOP, Simple for Android, and the one by vim author (sorry I cannot remember it).
Days ago, Google released a new language named ‘GO’. It is available at http://golang.org.
This is interesting, because recently google released a couple of language tools. I am wondering: will it be natively supported by the possible upcoming Google Chrome OS?
I looked around the language grammar, and notice something interesting:
1. Is GO a dynamic language, or strong typed? Look at the variable declaration. They have the keyword ‘var’ which implies dynamic typing. However, you can specify the variable types!!
var i int; // define an integer. Doesn’t it look strange?
var x, y float = -1, -2; // define 2 floating numbers. The initializers are even stranger.
var a, b, c = 2.0, 1, “hello, world”; // they have different types. This looks like tuple in Python.
2. The authors try to reduce keyboard typing.
If you look at the if-statement or for-statement, you will notice that the parenthesis follow if/for is eliminated, although you can still use them, (because if/for statements expect an expression, while parenthesis enclosed expression is still expression).
3. The semicolon plays an important role. It is used a lot as statement delimiters. They don’t use NEWLINE. NEWLINE is treated as space char.
4. The switch statement is interesting. It helps to reduce the ugly of if-else-if-else-if-else statement. By their grammar, the code formatting can be prettier:
case a < b:
case a == b:
case a > b:
5. There is no while statement. Neither is do-while statement. They don’t even reserve while as a keyword!
6. The package import implies dynamic package loading. In Java/C#/Python/C++, generally you use a qualifier to denote a package name, such as: import java.util.Date.
In GO, it’s different by using string:
I think it implies that they want dynamic loading of packages. It will be much easier to compute a string at runtime, and then import the denoted package.
The above are what I have found so far. Haven’t installed the language environment so far. I probably will do it in December, after my CFA examination.
A friend mentioned that: nowadays main research in compilation is not about lexical analysis or parsing, but about optimization. The former is pretty mature today. But optimization has continually improvement space. For example, JVM implementations may have different optimization strategies. We seldom feel about it because it’s transparent by design. Therefore, Java developers seldom answer questions like C++ programmer: ‘it depends on the compiler implementation….’
DSL is becoming popular this year. Many tools emerge on the horizon. For these tools, I think next possible improvement of all is optimization. Since DSL frameworks or tools are in higher abstraction layer, the optimization will vibrate in larger range, balancing between productivity and costs.
I started to experiment Domain-Specific Language (DSL) implementation two weeks ago. The tool I use is antlr v3. We know that Xtext, TCS, Boo are good DSL toolkit. The reason why we don’t use them for now, is because you may be restricted inside their available functions. For example, xtex is more a Domain-Specific Modeling tool than DSL tool, though they have overlaps. I will have to wait for future xtext versions as I see our requirements go beyonds its current features.
Right now have some experience that may be useful, which is from the coding debugging experience. The prototype I implemented is a dynamic script language. It has its own file structure, which defines input, process, and output options and rules.
The input part defines rules that will assign domain meaning to input data, based on evaluating conditions.
The process part supports arithmetical and string computing. It helps further yield, update data from input.
The output part supports dumping data into certain format like XML. Closure output is supported, since XML has such nature.
The following technologies are used in the prototype:
- antlr parser – to construct AST as intermediate format
- antlr tree walkers – to go through AST for certain tasks, like variable mapping, math computing, and string operation.
- XML pretty printing – I spend no time on embedding such feature in any tree walker. Therefore, a utility class is used for pretty printing.
ANTLR works is a great GUI tool for grammar authoring and debugging. There are eclipse plug-ins for antlr, but I haven’t had time to test them. But more importantly, let’s talk about the grammars.
At antlr.org, we can download many sample grammars for popular languages. However, the parser grammar is only useful to say if an input is valid as the language. The main work is to define compiler/interpreter part in implementing a language. Generally there are two kinds of work in this part: (1). The supporting platform/framework to run the language; (2). The tree walker(s) to do staggered jobs, normally compiling or interpreting tasks. Or simply put, make the language runnable.
I defined 4 grammars, one for parser/lexer, and 3 for AST tree walking. When you DSL goes complicated, it helps to define separate groups functions in different grammar files. For example, we define an emitter class, which only deals with outputting XML results. You may say that there is performance problem if you walking through AST multiple times. I don’t have benchmark data so far and we will see. At least I can see that walking in-memory data structure should be quick and its time complexity is only increased linearly.
The expression calculation is in a separate grammar file, and the corresponding tree parser is a utility class. The statement executor is defined by another grammar file. By this way, it helps to refine the expression definition, and make it easy and quick to be reused in another project. In a production environment, it helps to define them in the same tree walker, but it helps to make it clear in current phase.
A beginner may encounter many issues going through the process. One possible issue is to generated runnable lexer/parser/walker. Since you embed action codes (java, c#, …) in grammar files, you may have to go back and forth multiple times to eliminate compilation errors. A good IDE (eclipse, visual studio) helps a lot in locating such issues.
Another issue is to understand how it works. I’d recommend: (1). debugging through parser/walker code to get famliar with it; (2). Read and run examples at antlr.org to explore wider problem space.
Wish the above stuff may help someone in some way. Feel free to comment if any question.
p.s. I found a bug with CommonTreeNodeStream#index() of antlr-v3.1.3. A bug report had been filed at antlr.org days ago. The bug will affect the following tasks:
- shortcut circuiting implementation (&&, ||)
- conditional expression ( ? : )
- if-else statement (depends on your action implementation)
- function definition
- may be others (I only explored a subset for now).
I watched a good demo about M language from Microsoft. The link is listed below:
Modeling: Transformation and Constraints
As I watch the demo video, I start to realize that DSL and domain modeling may be two things, but with certern intersection. For example, if you really devise a computer language, it should have the abilities of executing programs. However, in the demo, oslo looks more like to describe your data or model, and lacks of executing support.
Another example is Xtext, which is a sub project of eclipse Textual Modeling Framework. There will release first version in June this year. I had a quick talk with some people of them. So far, to me, it looks more like to describe the strucutres of your data or domain model. In comparison, ANTLR v3 is supporting actions and templated transformations, AST and tree-based interpretion/transform. Therefore it can do more than just describe your stuff. You can define tree grammar to rewrite or translate your model to another model or even excutable stuff (bytecode, java code, c# code). Xtext people said that they plan to support actions in next version. Sounds like that they are going to be more like ANTLR v3. An interesting roadmap!
I am playing with DSLs. As a developer without too much language/compiler experience, it is not easy to get started. Fortunately, my friend Yu has a lot of experience, and Internet is another resort for resources. So, it becomes a bit easier for my evening learning.
Based on my recent reading, I would recommend the following resources for kick-off. If you are a DSL beginner who dones’t have plenty of experience (like me), it may be helpful:
1. There are two papers
a. On the Specification of Textual Syntaxes for Models
b. TCS: a DSL for the Specification of Textual Concrete Syntaxes in Model Engineering
The papers are written by committers of eclipse TCS project. They help to give you a feel and look in an academic perspective. I think it good to know about basic framework of methodology and a few terms in implementing a DSL.
2. ANTLR, get your hands dirty
Having knowledge with ANTLR will help your language definition journey. It is quite easy to master ANTLR with well-prepared documentations:
a. ANTLR Getting Start
b. Five minites introduction ANTLR v3
c. ANTLR v3 by Mark Volkmann
d. Some DSL posts in the Article section.
3. Some Links for Domain Specific Languages
This is a post I wrote earlier. It contains a few useful links:
Part 1 of the notes is located at http://www.frankdu.com/weblog/archives/46
The relative presentation slide is located at openArchitectureWare.org.
23. In XText, you start to work with defining concret syntax.
24. For existing meta model, use importMetamodel directive. Use preventMMGeneration to prvent any meta model generation.
25. Simple Editor Customization in Xtext
- Xtend, expression language used throughout oAW
- Constraint checks: oAW check language, based on Xtend
- OutlineView customization: override label(…) and image(…) for meta types
- Content Assist
- customize the font style for keyword (keyword only)
26. Xtext instantiates Ecore metamodels, which means that it can be processed with any EMF tool.
- Within oAW workflow: the only Xtext-specific aspect is using the generated parser. Xpand template language is powerful code generation tool. Easily traverse the model/meta model using Xtend language
- EMF way: EMF’s native resource mechanism (what are the details?)
- Your own code: use the generated parser.
27. NodeUtil with generated parser
- Typically you only work with AST (ecore file)
- Help to obtain info from the parser tree: element location, element text, parser tree node at certain offset.
28. Two phases for doing your DSL:
- designing your language
- building language tools
Xtext focuses on the second phase. Except from the phases, it is also important to provide framework that run the tasks defined your DSL.
29. oAW Xtext become a part of TMF project. The first release is expected in later half 2009.
30. A Xtext parser limitation. It’s impossible to add custom action code in the parser. Sometimes it results in ugly meta models, especially with building expression languages.
The relative presentation slide is located at openArchitectureWare.org.
Part 2 of the notes is located at http://www.frankdu.com/weblog/archives/52
Below are my reading notes for Textual DSLs and text modeling in eclipse. I haven’t finished the ppt slides. Therefore, this is only part one.
1. EMF servers as the foundation. It provides Ecore Metamodel and framework tools like:
2. GMF is used for building custom graphical editors based on EMF meta models. It is industry-proven technology. Based on GEF.
3. TMF is used for building custom textual editors. It is in incubation phase. There are two implementations: Xtext and TCS.
4. M2M (Model-to-Model) delivers an extensible framework for m2m transformation languages. ATL is M2M language from INRIA. QVT is an implementation.
5. M2T (Model-to-Text) focuses on transforming models into text (code generation, model serialization). For example, you may want to convert in-memory models into xml files for persistence/transportation purposes. You may want to use a parser to convert xml files back to models. There are 2 so-called frameworks:
- JET is code generation tools that are used by EMF
- Xpand is code generation tools that are part of M2T releases.
6. Xtext is originally from openArchitectureWare.com. It’s a good integration with eclipse. The oAW uses EMF as a basis, bases graphical editors on GMF, and all tooling are based on eclipse. Since Xtext has become part of eclipse TMF, there are two versions of Xtext: oAW Xtext, and TMF Xtext. The former is relatively mature. The latter is under active development, and expected to be first released sometime this year, namely in 2009.
7. DSL is a focused, processable language for addressing specific concerns in a specific domain. It is targeted to be a simple tool for a relatively complex domain. Therefore, in most cases, DSLs are human-readable to domain experts without any training. The popular DSL examples are SQL and Excel.
8. DSLs can be classified in many ways:
- configuration vs. customization
- internal vs. external
- graphical vs. textual
9. Xtext is a so-called framework tool for building external textual customization DSLs.
10. Is it possible to edit same model with both textual and graphical editing interfaces?
It might be possible. Consider one of the following: a. Visualize a subset of the model, using graphvis or prefuse. But it is typically read-only. b. Use different perspectives. Some of them use graphcial editor. It requires cross references between textual and graphical models). c. Edit the same model textually and graphically. Textual format is used as the serialization format from the graphical model. It requires writability and sync of both models!
11. Typically textual DSLs leverage one of many parser generators (ANTLR, Java CC, Lex/yacc). They help to generate a parser based on grammar definition. Consequencely, a parser tries to match text, and try to create a parse tree.
12. Typically, textual DSLs are transformed into an Abstract Syntax Tree (AST). It is ofen a binary tree. For exampe, the AST for 1 + 2*3:
Literal AddExpression ( Literal MultiplyExpression Literal)
Literal, AddExpression, and MultiplyExpression are binary nodes.
13. The AST can be taken as a model. But textual DSLs are written without careness of the AST. They can even be against AST.
14. Challenges in Xtext DSL implementation:
- Writting a parser is non-trivial.
- A parser generator makes life easier, but still not one for all.
- A parser generator only creates a matcher and/or a simplistic AST. You still need to further transform the model to easily processable form, and create an editor with syntax highlighting, code completion, etc.
Xtext is designed to ease unbearable burden of life like that.
15. Xtext is based on an EBNF grammar (what’s that? Why will it make a difference?). Xtext will generate:
- ANTLR-based parser
- EMF-based metamodel
- Eclipse editor with or extensible for: syntax highlighting, code completion, code folding, constraint checking, and so on.
16. Different Kinds of Xtext Rules:
- Type Rule
- String Rule
- Enum Rule
- Native Rule
17. Built-in Lexer Types in Xtext:
- Comments ( Single line and multiple lines)
The content of those rules is not transformed into the meta model. (How it matters?)
18. Built-in Reference Types in Xtext:
- File Reference/Import
19. Abstract Type Rules are implicitly declared with a collection of OR-ed alternatives: R1 | R2 | R3. They will be mapped to abstract metaclass. The alternatives will become subclass. The common properties will be lifted into abstract superclass.
20. String Rules are declared in the format: String [rule_name]: [rule_definition];
21. Enum Rule is mapped to Enum in metamodel. Its format: Enum [rule_name]: [token_name="string"]+;
22. Native Rule. Example:
“‘#’ ~(‘n’|'r’)* ‘r’? ‘n’”;
1. EMF – Eclipse Modeling Framework
2. GMF – Graphical Modeling Framework
3. GEF – Graphical Editing Framework
4. TMF – Textual Modeling Framework
5. DSL – Domain Specific Langauge
6. JET – Jave Emitter Templates
7. AST – Abstract Syntax Tree
Continue to read: Part 2 of the notes is located at http://www.frankdu.com/weblog/archives/52
I am looking into DSL (domain specific language) recently. There are a few links below, bookmarked for my own convenience:
- Language Oriented Programming: The Next Programming Paradigm
A good paper introduction to DSL. The language is plain, vivid, and easy to understand. Really a good start point.
- An introductory example of domain specific languages – by Martin Fowler
It’s a very good start point. Martin gives explanations by examples, which make everything clear to understand. There are some examples in XML, ruby-like, and vbscript-like forms.
- The Pragmatic Code Generator Programmer by Sven Efftinge et. al.
Another article for using openArchitectWare Xtext.
- The Help Documentation coming along with the Xtext Framework
Definitely the good documentation to start with Xtext.
- Sven Efftinge’s Blog
Svem plays an important role in developing Xtext, which is a DSL framework.
- More blogs ……
More blogs at ohloh……
- The Xtext Framework: TMF Xtext and oAW Xtext
The former is a subproject of eclipse TMF (textual modeling project). The latter is under the umbrella of OpenArchitectAWare. Definitely there is connection between them. Yes, by the same group of people.
- MPS Project – Developed by a famous European company, they published the paper of the #1 item.