Jan 29

We know that you can easily get the line numbers and column numbers information from the lexer token, via the properties: line & pos.

When facing an Abstract Syntax Tree (AST), the rules may be rewritten so that the tree node tokens are in different order. Then how to retrieve the line numbers?

This is very important in language execution, because users will definitely need information about which line caused the issue.

The answer is simple: Retrieve the information from underlying lexer tokens. In a tree grammar rule, there may be multiple tokens. So it’s up to you to decide which token to look into. A naive method may be to look into the first token of  a tree node, using the start property.

There is another method. You can check it out at Recovering line and column numbers in your Antlr AST.

  • Share/Bookmark
Jan 23

While writing code, programmers run into issue after issue. As examples, I will talk about the issues I encountered, and wish it may help others who run into similar issues. I wrote the program to monitor certain web pages. Therefore it should download web pages in correct encoding, in multi-threading environment. It should save the page to back-end database.  The issues I encountered:

1. Threading issue

HttpClient from Apache is used as a web client. From their explanation page, we see no issue in the Multithreading section. There is no problem to run their multithreading example. However, you will run into problem if you create the HttpClient instance not in current Thread or it’s parent thread. This issue is not documented there, but very confusing to new users.

2. Implicit Dependency

I use Hibernate to persistent data, and maven 2 for dependency management. Maven 2 is awesome. However, what if some dependencies are missed out? Antlr v2.7.5H3 is for HQL parsing. However, Hibernate maven guide doesn’t statement it clearly. When you start to run HQL, the error will pop-up. Additionally, the antlr dependency is not available in official maven 2 repository. You will need to use the JBoss repository.

3. Session cache

After saving the data to database, another thread will take out the data for further processing. However, if you call getCurrentSession() on SessionFactory instance, you will get nothing. If you stopped the program, and run it again, you will get data from last run, but not data from new run. The issue is caused by the Session cache, which was designed for performance improvement. The solution is to call openSession() on the SessionFactory instance (or call openStatelessSession() if you hate cache).

I encountered other issues as well, such as encoding issue. However, the preceding issues are typical confusing issues.Why is software engineering so complicated?

It’s because people try to ask software to address the whole world, which is much more complicated.

Tire production can use assemble line, because the requirement is simple: I need a production for my Toyota car.

The car production can use assemble line, because the requirement is simple as well: I need a transportation tool from city A to city B, in land.

What if clients ask for a transportation vehicle to travel from any two point in a 3-dimension space? What if clients add the dimension of time?

People wish software to be capable of handling situations more intelligently, therefore people can be more lazy. There is nothing wrong here. But it can be an issue of feasibility, if the requirements have no boundary.

  • Share/Bookmark
Jan 10

By default, the source level is 1.3 in maven 2. We need to config maven compiler to use Java 1.6. Add the following snippet to your pom:

  1.  
  2. <build>
  3.         <plugins>
  4.                 <plugin>
  5.                         <groupId>org.apache.maven.plugins</groupId>
  6.                         <artifactId>maven-compiler-plugin</artifactId>
  7.                         <configuration>
  8.                                 <source>1.6</source>
  9.                                 <target>1.6</target>
  10.                         </configuration>
  11.                 </plugin>
  12.         </plugins>
  13. </build>
  14.  
  • Share/Bookmark
Jan 08

If you are doing antlr v3 for Java, you can be very interested in this series of video tutorials. It demos how to develop with antlr3 IDE, and also the process of a small language. It credits to Marcel for the recommendation. Check it out at:

ANTLR 3.x Tutorials:
http://vimeo.com/groups/29150/videos

  • Share/Bookmark
Dec 21

The source code can be checked out with CVS or subversion. I assume that you are running eclipse. We will use the CVS tool.

1. Open the CVS view: Window -> Show View -> Other -> CVS -> CVS Repositories

2. Add a new repository location, using the settings below:
Host: dev.eclipse.org
Repository path: /cvsroot/eclipse
User: anonymous
Connection Type: pserver
Port: use default. If it doesn’t work or you are behind firewall, use 80.

And, that’s it! It should works now.

  • Share/Bookmark
Dec 21

Since eclipse 3.5, you will find 3 versions for Mac OS X users: carbon, cocoa, and cocoa-64.

A simple note here: Install carbon version for Leopard and earlier versions. Try Cocoa 64-bit for Snow Leopard.

It took me a couple of minutes. Therefore, a quick note here.

  • Share/Bookmark
Nov 17

Recently I noticed many languages released or emerged, such as GO, NOOP, Simple for Android, and the one by vim author (sorry I cannot remember it).

Days ago, Google released a new language named ‘GO’. It is available at http://golang.org.

This is interesting, because recently google released a couple of language tools. I am wondering: will it be natively supported by the possible upcoming Google Chrome OS?

I looked around the language grammar, and notice something interesting:

1. Is GO a dynamic language, or strong typed? Look at the variable declaration. They have the keyword ‘var’ which implies dynamic typing. However, you can specify the variable types!!

var i int; // define an integer. Doesn’t it look strange?

var j = 10; // j is an integer. This is like JavaScript

var x, y float = -1, -2; // define 2 floating numbers. The initializers are even stranger.

var a, b, c = 2.0, 1, “hello, world”; // they have different types. This looks like tuple in Python.

2. The authors try to reduce keyboard typing.

If you look at the if-statement or for-statement, you will notice that the parenthesis follow if/for is eliminated, although you can still use them, (because if/for statements expect an expression, while parenthesis enclosed expression is still expression).

3. The semicolon plays an important role. It is used a lot as statement delimiters. They don’t use NEWLINE. NEWLINE is treated as space char.

4. The switch statement is interesting. It helps to reduce the ugly of if-else-if-else-if-else statement. By their grammar, the code formatting can be prettier:

switch {
case a < b:
    return -1
case a == b:
    return 0
case a > b:
    return 1
}

5. There is no while statement. Neither is do-while statement. They don’t even reserve while as a keyword!

6. The package import implies dynamic package loading. In Java/C#/Python/C++, generally you use a qualifier to denote a package name, such as: import java.util.Date.

In GO, it’s different by using string:

import{

“flag”;
“http”;
“io” ;

}

I think it implies that they want dynamic loading of packages. It will be much easier to compute a string at runtime, and then import the denoted package.

The above are what I have found so far. Haven’t installed the language environment so far. I probably will do it in December, after my CFA examination. :-)

  • Share/Bookmark
Nov 02

GEF involves many concepts as it establishes its MVC convention. You may find tutorials from eclipse official site spend most paragraphs in concept explanation.

Here is a very nice tutorial focusing on step-by-step application. It’s not long, around 80 pages in PDF version:

http://www.psykokwak.com/blog/index.php/tag/gef

  • Share/Bookmark
Oct 19

Last week, I read an article about a famous IM company. One statement is very interesting: A company has its genes carrying by its people. The company has more internet genes, while Microsoft has more software genes as a comparison.

Internet generally requires or allows different development-shipping cycles. It can be shipping in finer granularity, e.g. a small release every 1~3 day. Therefore, it blows up the whole process of traditional software. Therefore, the production organizing around internet services is structured more to such cycles. Therefore, actively participant people are more important while they face more intensive life. :)

  • Share/Bookmark
Oct 19

A friend mentioned that: nowadays main research in compilation is not about lexical analysis or parsing, but about optimization. The former is pretty mature today. But optimization has continually improvement space. For example, JVM implementations may have different optimization strategies. We seldom feel about it because it’s transparent by design. Therefore, Java developers seldom answer questions like C++ programmer: ‘it depends on the compiler implementation….’

DSL is becoming popular this year. Many tools emerge on the horizon. For these tools, I think next possible improvement of all is optimization. Since DSL frameworks or tools are in higher abstraction layer, the optimization will vibrate in larger range, balancing between productivity and costs.

  • Share/Bookmark