CPSC425 Spring 2006
HW 10: A Lexical Analyzer in ML

Your assignment is to write a simple lexical analyzer in ML for tokens in the language PB as described here. Lexers are often table-driven, taking the lexical grammar for the language as input, but you will use some unique features of ML to take a somewhat different approach.

Input and Output

You should provide the following top-level function: You should also provide a function printList : string list -> unit that takes a list of strings and prints them, one per line. This is convenient for looking at the list that lex returns.

Program Structure

Lexical analysis is usually quite tedious, with much of the processing devoted to getting and checking characters to determine where one token stops and the next begins. But this sort of processing can be minimized by strategic use of the following built-in functions from ML's String structure:

The strategy for the lexer is to use String.translate to separate out characters that need to be dealt with in special ways, leaving characters that will ultimately stay together (e.g., in numbers and identifiers) intact. That is, the delimiters that are needed by String.tokens can be inserted by String.translate before the string is passed to String.tokens. The string returned by String.tokens will still need some work, of course (e.g., distinguishing between < and <=, and between 23 and 23.4), but it's a good start.

General Comments

Be sure that you use good ML style throughout, pattern matching as appropriate and so on. Your program should be clearly written but should still be well documented. In addition to a brief description, be sure to include a type signature with each function. -