CPSC425 Spring 2008
Lexical Analysis in ML

Your assignment is to write a simple lexical analyzer in ML for tokens in the language Simple as described here. Lexers are often table-driven, taking the lexical grammar for the language as input, but you will use some unique features of ML to take a somewhat different approach.

Input and Output

You should provide the following top-level function: You may find it convenient to view your result using function printTokens:string list -> unit, which take a list of strings and prints the strings one per line. That is, typing printTokens(lex "testfile")) will give you nicer looking output than just lex "testfile".

Program Structure

Lexical analysis is usually quite tedious, with much of the processing devoted to getting and checking characters to determine where one token stops and the next begins. But this sort of processing can be minimized by strategic use of the following built-in functions from ML's String structure:

The strategy for the lexer is to use String.translate to separate out characters that need to be dealt with in special ways, leaving characters that will ultimately stay together (e.g., in numbers and identifiers) intact. That is, the delimiters that are needed by String.tokens can be inserted by String.translate before the string is passed to String.tokens. The string returned by String.tokens will still need some work, of course (e.g., distinguishing between < and <=, and between 23 and 23.4), but it's a good start.

General Comments

Be sure that you use good ML style throughout, pattern matching as appropriate and so on. Your program should be clearly written but should still be well documented. In addition to a brief description, be sure to include a type signature with each function. -