## Top-down (Predictive) Parsing

Top-down parsing creates a leftmost derivation, producing a parse tree in preorder (hence "top-down"). Top-down parser's job:
```Given a sentential form in a leftmost derivation, find (predict)
the next senetential form in that leftmost derivation.  Use one
```
Let
• a,b,c = terminal
• A,B,C = nonterminal
• alpha, beta, etc = mixed string of terminals or nonterminals
For top-down parsing, the grammar must be such that you can predict what production to expand to given one character of lookahead -- such a grammar is said to be LL(1). Another way to ask this question is this: for a production A->alpha, what set of lookahead tokens predict this production (as opposed to another production for A)? This set is called Predict(A), and it can be computed as follows:
```Predict: Production -> set of terminals
Predict(A->alpha) = First(alpha) (+ Follow(A) if alpha can derive the empty string)

First: String of terminals and/or nonterminals -> set of terminals
First(alpha) = the set of terminals that begin strings derived from alpha

Follow: Nonterminal -> set of terminals
Follow(A) = the set of terminals that can appear immediately to the right of
A in some sentential form

```
IMPORTANT: If (1)A->alpha and (2)A->beta and Predict(1) intersect Predict(2) is not empty, then the grammar is not LL(1). How to make it LL(1)?

### Making a grammar LL(1)

Usual problems:
1. Rule above violated (common prefix)
2. Left recursion
1. To fix common prefix problem, need to left factor:
```A -> alpha beta
A -> alpha gamma
becomes
A -> alpha B
B -> beta
B -> gamma
```
Example:
```<stmt> -> if <exp> then <stmt>
<stmt> -> if <exp> then <stmt> else <stmt>
becomes
<stmt> -> if <exp> then <stmt> <tail>
<tail> -> empty
<tail> -> else <stmt>
```
(Look how this would be different if it had an END IF at the end.)

2. To fix left recursion, note that after left factoring at most one production for a given nonterminal can be left-recursive. So must have
```A -> A alpha
A -> beta_1
...
A -> beta_n
```
Create a new nonterminal T:
```T -> alpha T | epislon

now

A -> beta_1 T
A -> beta_2 T
...
A -> beta_n T
```
Why does this work? Because eventually one of the non-left-recursive productions will have to be applied for A, at which point the start of the derived string will be determined. So if choose A alpha three times, get (((A alpha)alpha)alpha). If the fourth time choose beta_1, now have (beta_1 alpha alpha alpha). Example:
```<exp> -> <exp> + <term> | <term>
<term> -> <term> * <factor> | <factor>
<factor> -> <id>

Rewrite to remove left recursion:

<exp> -> <term> <exptail>
<exptail> -> + <term> <exptail>| epsilon

<term> -> <factor> <termtail>
<termtail> -> * <factor><termtail> | epsilon
```