Top-down (Predictive) Parsing

Top-down parsing creates a leftmost derivation, producing a parse tree in preorder (hence "top-down"). Top-down parser's job:

Given a sentential form in a leftmost derivation, find (predict) 
the next senetential form in that leftmost derivation.  Use one
token of lookahead.

Let

a,b,c = terminal
A,B,C = nonterminal
alpha, beta, etc = mixed string of terminals or nonterminals

For top-down parsing, the grammar must be such that you can predict what production to expand to given one character of lookahead -- such a grammar is said to be LL(1). Another way to ask this question is this: for a production A->alpha, what set of lookahead tokens predict this production (as opposed to another production for A)? This set is called Predict(A), and it can be computed as follows:

Predict: Production -> set of terminals
Predict(A->alpha) = First(alpha) (+ Follow(A) if alpha can derive the empty string)

First: String of terminals and/or nonterminals -> set of terminals
First(alpha) = the set of terminals that begin strings derived from alpha 

Follow: Nonterminal -> set of terminals
Follow(A) = the set of terminals that can appear immediately to the right of
A in some sentential form

IMPORTANT: If (1)A->alpha and (2)A->beta and Predict(1) intersect Predict(2) is not empty, then the grammar is not LL(1). How to make it LL(1)?

Making a grammar LL(1)

Usual problems:

Rule above violated (common prefix)
Left recursion>

To fix common prefix problem, need to left factor:

A -> alpha beta
A -> alpha gamma
becomes
A -> alpha B
B -> beta
B -> gamma

Example:

<stmt> -> if <exp> then <stmt> 
<stmt> -> if <exp> then <stmt> else <stmt>
becomes
<stmt> -> if <exp> then <stmt> <tail>
<tail> -> empty
<tail> -> else <stmt>

(Look how this would be different if it had an END IF at the end.)

To fix left recursion, note that after left factoring at most one production for a given nonterminal can be left-recursive. So must have
```
A -> A alpha
A -> beta_1
...
A -> beta_n
```
Create a new nonterminal T:
```
T -> alpha T | epislon

now

A -> beta_1 T 
A -> beta_2 T 
...
A -> beta_n T 
```
Why does this work? Because eventually one of the non-left-recursive productions will have to be applied for A, at which point the start of the derived string will be determined. So if choose A alpha three times, get (((A alpha)alpha)alpha). If the fourth time choose beta_1, now have (beta_1 alpha alpha alpha). Example:
```
<exp> -> <exp> + <term> | <term>
<term> -> <term> * <factor> | <factor>
<factor> -> <id>

Rewrite to remove left recursion:

<exp> -> <term> <exptail>
<exptail> -> + <term> <exptail>| epsilon

<term> -> <factor> <termtail>
<termtail> -> * <factor><termtail> | epsilon
```