Modern programming languages like Haskell and Python use indentation to structure code. Haskell’s layout rules give quite a bit of freedom to the programmer to arrange her code, at the expense that programs are not stable under some simple operations such as renaming an identifier. Consider the Haskell snippet:
Renaming arg
to argument
will produce the text
which is no longer valid Haskell code.
Haskell has so-called layout keywords such as do
which start a new block. The column of the token after the layout keyword is the reference for grouping declarations into the block: the block closes if the first token on a line is indented less than the reference token. This makes the first snippet parse as
whereas the second is parsed as
The following rules make layout robust with regard to renaming identifiers (such as arg
to argument
):
The layout keyword must be the last token on the line, or
the layout keyword that starts a new block spanning several lines can only be preceded by other keywords or punctuation characters on the same line.
The task of this Master’s thesis is to design a framework for layout-sensitive grammars and a parser generator. Traditionally, layout is handled on the level of lexical analysis, but this thesis should investigate a handling on the level of parsing.