July 01, 2021

#scala

A Tour of Implementing TAPL in Scala

Last week I implemented sub-type algorithms in Rust and put the source code on GitHub. In this week, I rewrite the project with Scala, because I am going to work in Scala. This post is not a step-by-step explanation of the project, but more like a topic-based notes during the implementation.

Singleton Object

A newly created Scala project has code like this.

object Main extends App {
  println("Hello, world!")
}

I have several questions in my first glance.

What the heck is Main? Is it an object, or a type (class)?
Why and how could Main extend App?
Why println can be placed in Main? Does Main works like a function?

Here are my answers.

Main is an object. It is categorized as a value. It is not a type. You can try this.
```
def test(main: Main) = 0 // not found: type Main
```
I think Main has a hidden and inaccessible type. Let’s call it T. So it actually means that T extends App. In this way, Main has everything in App.
This is syntax inherited from classes, I think. Each class declaration is actually a constructor declaration (I think Kotlin learned from this). In the primary class constructor, people can put method declarations, statements and expressions.

Another interesting yet familiar thing is companion object. It means you can create a singleton object called the same name of a class. I heard that static methods in Scala are mostly done in this way.

While Statements

I will make an REPL. So I need something like

while (true) {
  line = readLine()
  reply(line)
}

Soon I found Scala is a expression-based language, so while expressions are typed. And they are typed to Unit since the halt problem. That’s okay.

Then, how about break and continue? To my surprise, they are not included in the language. I read the Scala Book but I didn’t notice this point. I found scala.util.control.Breaks, but it seems be implemented with exceptions.

Scala Standard Library 2.13.3 - scala.util.control.Breaks

So I put the while expression in a method and use return instead. Here is the skeleton of the REPL.

object Main extends App {
  val context = new Context()

  def repl: Unit = while (true) {
    val command = readLine("> ")
    command match {
      case "exit" => return
      case command => println(command)
    }
  }

  println("Welcome to TAPL Scala!")
  repl
}

Deal with `Option`

I’m dealing with Option in Scala. There are few ways I know so far.

Pattern matching. x match { Some(y) => z None => a }.
Methods from Iterable. Because Option is actually a container of a single value, so what people can do with Iterable also works here.
Some specific methods. Like getOrElse, isEmpty, isDefined.

Derive with Exceptions

When type inference fails, we need to handle errors. In my Rust implementation, errors are contained by Result<TermType, String>. Rust offers some syntactic sugars dealing with them, for example, if the return type of surrounding function is also the same Result type, appending an ? to the expression will return the error part if it is an Err, just like the lift in Monad.

In Scala, we can use exception, just like how we do it in Java. Therefore, we need custom errors. I found a solution on StackOverflow: Custom Exception in scala.

Operator Overloading

Because I was learning the book Types and Programming Languages. I wanted to define <: as the “is subtype of” operator. Soon I found this is impossible because <: is reserved as the sub-type operator. Therefore, I chose <:: as the operator.

Operator overloading is easy in Scala. The syntax does not differ from defining functions.

trait Type {
  override def toString: String
  def <::(that: Type): Boolean
  def ==(other: Type): Boolean
}

But, I found another problem. When I test the expression if true then {} else { a: 0 }, the result type is Record { a: Int }. What the heck? I thought I messed up some order, for example, reversing the order of if branches or something like that. But the logs indicate that a <:: b is actually equivalent to b.<::(a). Hmm, interesting.

I started searching in Scala documentation and Scala Book, but they told me nothing. Soon I found the answer in Scala specifications. I like reading specifications, because they are mostly well-structured so I can locate what I want immediately. I remembered I read ECMA-262 for thousands times because it is the only document telling me what’s really happening in the execution context.

Okay, let’s back to the topic. Here is what I found from this page: Expressions.

The associativity of an operator is determined by the operator's last character. Operators ending in a colon `:' are right-associative. All other operators are left-associative.

Because I don’t want to change my code too much, I just rename the operator :<. It looks like an unhappy face. Just like my mood. (I’m just joking.) I have to manually rename the operator because Metals says it will break my code. Even if I had replace all occurrences of infix notations with member function notations.

Finally, everything about operator overloading is done.

Parsing Rules

In the Rust implementation, I use LALRPOP, which generates Rust code at the build time from a DSL. Because it uses LALR, it can inform me of ambiguity when there is a one. It even show you a nice diagram in case you don’t know what it is talking about.

Fastparse is handy. I can write a parser in less than 20 lines of code. But it is based on backtracking strategies, which means sometimes I find the problem at the runtime.

Here is my grammar without any modification.

term          -> apply
              |  project
              |  true
              |  false
              |  int
              |  if
              |  abstract
              |  variable
              |  record
              |  parenthesized
apply         -> term "(" term ")"
project       -> term "." ident
true          -> "true"
false         -> "false"
int           -> [0-9]+
if            -> "if" term "then" term "else" term
abstract      -> "(" ident ":" type ")" "=>" term
variable      -> ident
record        -> "{" entry ("," entry)* "}" where entry -> ident ":" term
parenthesized -> "(" term ")"

There are some obvious left-recursive rules: apply and project, so we can make them as a postfix rule. Replace the first 3 rules with these 4 rules.

term          -> term' (apply-tail | project-tail)*
term'         -> true
              |  false
              |  int
              |  if
              |  abstract
              |  variable
              |  record
              |  parenthesized
apply-tail    -> "(" term ")"
project-tail  -> "." ident

Then we will spot some “dangling-else” problems (not the else in the if rule because else clause is required). They are if and abstract. For example, (x: Int) => x.y can be interpreted into either ((x: Int) => x).y or (x: Int) => (x.y. The solution is to give apply-tail and project-tail higher binding weights than if and abstract have.

term    -> simple (apply-tail | project-tail)* | complex
simple  -> true | false | int | variable | record | parenthesized
complex -> if | abstract

In the new rules above, term is either a simple term (terms without “dangling-else” problems) with any number of tails, or a complex term with no tails. By far the parser is almost done. However, I stepped into other two rabbit holes.

Should cut (operator ~)at appropriate position, or the parser might fail even if your grammar is correct! To cut is to tell the parser never look back.
When using whitespace consumers, remember to parse tokens with repX rather than rep. For example, I spent a whole hour on debugging ident = CharIn("a-zA-Z").rep(1)!

Conclusion

Finally, I had finish the project and uploaded to GitHub. You can check it out here.

Besides, I also want to address some off-topic discoveries about language servers. I’m writing Scala with Visual Studio Code. I felt that the Scala language server is very stable and ergonomic. It could expand complicated macro in seconds. It made the development smoother than I had expected.

Before I write Scala in Visual Studio Code, I often use Rust and TypeScript.

The TypeScript language server works pretty well. But there are still bunch of times when it stuck because there are too many type assertions and sub-types, especially when I was writing a large tree visitor with Babel. Rust language server sometimes are lazy to check ownership so everything seems fine in the editor but you get bunch of errors when compiling. There are also times the language server failed to expanded macro from newly installed crates.

Anyway, I’m not complaining about the language servers of TypeScript and Rust. They are also very great.