Top-down (Predictive) Parsing

Top-down parsing creates a leftmost derivation, producing a parse tree in preorder (hence "top-down"). Top-down parser's job:
Given a sentential form in a leftmost derivation, find (predict) 
the next senetential form in that leftmost derivation.  Use one
token of lookahead.
Let For top-down parsing, the grammar must be such that you can predict what production to expand to given one character of lookahead -- such a grammar is said to be LL(1). Another way to ask this question is this: for a production A->alpha, what set of lookahead tokens predict this production (as opposed to another production for A)? This set is called Predict(A), and it can be computed as follows:
Predict: Production -> set of terminals
Predict(A->alpha) = First(alpha) (+ Follow(A) if alpha can derive the empty string)

First: String of terminals and/or nonterminals -> set of terminals
First(alpha) = the set of terminals that begin strings derived from alpha 

Follow: Nonterminal -> set of terminals
Follow(A) = the set of terminals that can appear immediately to the right of
A in some sentential form

IMPORTANT: If (1)A->alpha and (2)A->beta and Predict(1) intersect Predict(2) is not empty, then the grammar is not LL(1). How to make it LL(1)?

Making a grammar LL(1)

Usual problems:
  1. Rule above violated (common prefix)
  2. Left recursion
  1. To fix common prefix problem, need to left factor:
    A -> alpha beta
    A -> alpha gamma
    becomes
    A -> alpha B
    B -> beta
    B -> gamma
    
    Example:
    <stmt> -> if <exp> then <stmt> 
    <stmt> -> if <exp> then <stmt> else <stmt>
    becomes
    <stmt> -> if <exp> then <stmt> <tail>
    <tail> -> empty
    <tail> -> else <stmt>
    
    (Look how this would be different if it had an END IF at the end.)

  2. To fix left recursion, note that after left factoring at most one production for a given nonterminal can be left-recursive. So must have
    A -> A alpha
    A -> beta_1
    ...
    A -> beta_n
    
    Create a new nonterminal T:
    T -> alpha T | epislon
    
    now
    
    A -> beta_1 T 
    A -> beta_2 T 
    ...
    A -> beta_n T 
    
    Why does this work? Because eventually one of the non-left-recursive productions will have to be applied for A, at which point the start of the derived string will be determined. So if choose A alpha three times, get (((A alpha)alpha)alpha). If the fourth time choose beta_1, now have (beta_1 alpha alpha alpha). Example:
    <exp> -> <exp> + <term> | <term>
    <term> -> <term> * <factor> | <factor>
    <factor> -> <id>
    
    Rewrite to remove left recursion:
    
    <exp> -> <term> <exptail>
    <exptail> -> + <term> <exptail>| epsilon
    
    <term> -> <factor> <termtail>
    <termtail> -> * <factor><termtail> | epsilon