Context free grammars and predictive parsing

Transcrição

Context free grammars and predictive parsing
Context free grammars
and predictive parsing
Programming Language Concepts and Implementation
Fall 2012, Lecture 6
Rasmus Ejlers Møgelberg
Overview
•
•
•
•
•
•
Context free grammars
Describing programming language syntax
Ambiguities and eliminating these
The parser generator coco/R
Predictive parsing: Under the hood of coco/R
Next week: LR parsing
Rasmus Ejlers Møgelberg
2
Context free grammars
•
An example and a derivation
Expr =
|
|
|
•
•
Expr + Expr
Expr * Expr
(Expr)
num
Expr
=> Expr + Expr
=> Expr + Expr * Expr
=> 2 + 3*4
Think of it asGrammar
regular
+ recursion
3.1 expressions
is an example of a grammar
for straight-line programs. The start symbol is S
Terminology:
-
(when the start symbol is not written explicitly it is conventional to assume that the left-hand
nonterminal in the first production is the start symbol). The terminal symbols are
id print num, + ( ) := ;
1 non-terminal
Expr
GRAMMAR
3.1: A syntax for straight-line programs.
5 terminals (tokens):
1. S ! S; +,
S *, (, ), num
2. S ! id := E
4 productions (right hand sides)
3. S ! print (L)
4. E ! id
5. E ! num
6. E ! E + E
7. E ! (S, E)
8. L ! E
9. L ! L, E
Terminals and nonterminals collectively are symbols
and the nonterminals are S, E, and L. One sentence in the language of this grammar is
Rasmus Ejlers Møgelberg
id := num; id := id + (id := num + num, id)
3
where the source text (before lexical analysis) might have been
a : = 7;
b : = c + (d : = 5 + 6, d)
The token-types (terminal symbols) are id, num, :=, and so on; the names (a,b,c,d) and
numbers (7, 5, 6) are semantic values associated with some of the tokens.
•
DERIVATIONS
Another example
Straight line programs
(from
To show that this
sentencebook)
is in the language of the grammar, we can perform a derivation:
S =
|
|
E =
|
|
|
L =
|
S;S
id := E
print(L)
id
num
E + E
(S,E)
E
L,E
Start with the start symbol, then repeatedly replace any nonterminal by one of its right-hand
sides, as shown in Derivation 3.2.
DERIVATION 3.2
!
!
!
!
!
!
!
!
!
!
!
!
!
S
S;S
S ; id := E
id := E; id := E
id := num ; id := E
id := num ; id := E + E
id := num ; id := E + (S, E)
id := num ; id := id + (S, E)
id := num ; id := id + (id := E, E)
id := num ; id := id + (id := E + E, E)
id := num ; id := id + (id := E + E, id )
id := num ; id := id + (id := num + E, id)
id := num ; id := id + (id := num + num, id)
Rasmus Ejlers Møgelberg
4
Official definition
•
•
•
A context free grammar consists of
-
A finite set of nonterminals
A finite set of terminals
A finite set of productions
A choice of start symbol (a non-terminal)
A production consists of
-
A nonterminal (called the left hand side)
A string of symbols (terminals or nonterminals)
This is called Backus-Naur Form (BNF)
Rasmus Ejlers Møgelberg
5
Example: Mini Java
•
From MCIJ (note mixed notation)
Rasmus Ejlers Møgelberg
6
SQL specification (in extended BNF)
...
<query specification> ::=
! ! SELECT [ <set quantifier> ] <select list> <table expression>
<select list> ::=
! ! <asterisk>
! |! <select sublist> [ { <comma> <select sublist> }... ]
<select sublist> ::= <derived column> | <qualifier> <period> <asterisk>
<derived column> ::= <value expression> [ <as clause> ]
<as clause> ::= [ AS ] <column name>
<table
! !
! !
! !
! !
expression> ::=
<from clause>
[ <where clause> ]
[ <group by clause> ]
[ <having clause> ]
http://savage.net.au/
SQL/sql-92.bnf
<from clause> ::= FROM <table reference> [ { <comma> <table
reference> }... ]
...
Rasmus Ejlers Møgelberg
Ambiguity
7
Ambiguity
Expr =
|
|
|
Expr + Expr
Expr * Expr
(Expr)
num
Expr
Expr
Expr
+
Expr
2
Expr
⇤
3
Expr
Expr
4
2
Expr
=> Expr + Expr
=> Expr + Expr * Expr
=> 2 + 3*4
Expr
⇤
Expr
+
Expr
4
3
Expr
=> Expr * Expr
=> Expr + Expr * Expr
=> 2 + 3*4
Rasmus Ejlers Møgelberg
9
Encoding operator precedence
•
Multiplication has higher precedence (binds
stronger) than addition
•
One nonterminal per precedence level
Expr = Expr + Expr
| Term
•
Term = Term * Term
| (Expr)
| num
Exercise:
-
How many ways can you parse 2+3*4?
How about 2 + 3 + 4?
Rasmus Ejlers Møgelberg
10
Ambiguity and associativity
Expr = Expr - Expr
| num
Expr
Expr
5
•
Expr
Expr
Expr
Expr
Expr
Expr
Expr
Expr
3
2
5
3
2
Forcing left associativity
Expr = Expr - num
| num
Rasmus Ejlers Møgelberg
11
Exercise
•
What ambiguities exist in the following grammar,
and how do we get rid of them?
Expr =
|
|
|
|
|
Expr +
Expr *
Expr Expr /
(Expr)
num
Expr
Expr
Expr
Expr
Rasmus Ejlers Møgelberg
12
Exercise
•
•
•
What ambiguities exist in the following grammar,
and how do we get rid of them?
Expr =
|
|
|
|
|
Expr +
Expr *
Expr Expr /
(Expr)
num
Expr
Expr
Expr
Expr
* and / have higher precedence than -,+
All operators associate to the left, e.g.,
-
6-3-2 = (6-3)-2 ≠ 6-(3-2)
6/3*2 = (6/3)*2 ≠ 6/(3*2)
6-3+2 = (6-3)+2 ≠ 6-(3+2)
Rasmus Ejlers Møgelberg
13
Exercise
•
•
•
Encoding operator precedence
Expr =
|
|
|
|
|
Expr +
Expr *
Expr Expr /
(Expr)
num
Expr
Expr
Expr
Expr
Expr =
|
|
Term =
|
|
|
Expr +
Expr Term
Term *
Term /
(Expr)
num
Expr
Expr
Expr =
|
|
Term =
|
|
Prim =
|
Expr +
Expr Term
Term *
Term /
Prim
(Expr)
num
Term
Term
Term
Term
Use one non-terminal per precedence level
Encoding associativity
Expr =
|
|
Term =
|
|
|
|
|
Expr +
Expr Term
Term *
Term *
Term /
Term /
(Expr)
num
Term
Term
num
(Expr)
num
(Expr)
or(better)
Rasmus Ejlers Møgelberg
Prim
Prim
14
Associativity of operators
•
Most binary operators are left associative, e.g., +, -,
*, /
•
Few are right associative, e.g. = in C: x = y = 2
parsed as x = (y = 2)
•
Forcing right associativity
Expr = ident ‘=’ Expr
| ident
| num
•
Some are non-associative, e.g., 1<2<3 is not legal
LogExpr = Expr < Expr
Expr = ...
Rasmus Ejlers Møgelberg
15
Ambiguity: Dangling else
•
•
Consider the grammar
Stmt = if Expr then Stmt else Stmt
| if Expr then Stmt
| id = Expr
Amguity: How to parse
if Expr then if Expr then id = Expr else id = Expr
•
?
Rasmus Ejlers Møgelberg
16
Ambiguity: Dangling else
•
•
Consider the grammar
Stmt = if Expr then Stmt else Stmt
| if Expr then Stmt
| id = Expr
Amguity: How to parse
if Expr then if Expr then id = Expr else id = Expr
•
Resolving the ambiguity
Stmt = Matched_Stmt
| Unmatched_Stmt
Better to handle
this using parser
tricks. See later
Matched_Stmt = if Expr then Matched_Stmt else Matched_Stmt
| id = Expr
Unmatched_Stmt = if Expr then Stmt
| if Expr then Matched_Stmt else Unmatched_Stmt
Rasmus Ejlers Møgelberg
The parser generator
Coco/R
17
Extended BNF
•
Example
Expr = Term
{
‘+’ Term
| ‘-’ Term
}
Term = num {‘*’ num}
•
•
Extra symbols
-
{α} means zero, one or many α
[α] means zero or one α
(α) is used for grouping
EBNF is no more expressive than BNF, only more
convenient
Rasmus Ejlers Møgelberg
19
Using coco/R
COMPILER Expressions
...
PRODUCTIONS
/*-------------------------------------------------------------------*/
Expr
= Term
{
'+' Term
|
'-' Term
}.
Term
= number
{
'*' number
}.
Expressions
= Expr.
Specification of start
symbol
END Expressions.
Rasmus Ejlers Møgelberg
20
Using coco/R
Rasmus Ejlers Møgelberg
21
Semantic actions in coco/R
COMPILER Expressions
public int res;
...
PRODUCTIONS
/*-------------------------------------------------------------------*/
Expr<out int n>
(. int n1, n2; .)
= Term<out n1>
(. n = n1; .)
{
'+' Term<out n2> (. n = n+n2; .)
|
'-' Term<out n2> (. n = n-n2; .)
}
.
Term<out int n>
= number
{
'*' number
}
.
Expressions
= Expr<out n>
.
END Expressions.
(. n = Convert.ToInt32(t.val); .)
(. n = n*Convert.ToInt32(t.val); .)
(. int n; .)
(. res = n; .)
Rasmus Ejlers Møgelberg
22
The generated parser
•
In resulting Parser.cs
Pass by reference,
similar to ref
void Expr(out int n) {
!
int n1, n2;
!
Term(out n1);
Method for
!
n = n1;
parsing! while (la.kind == 3 || la.kind == 4) {
!
!
if (la.kind == 3) {
expressions
!
!
!
Get();
!
!
!
Term(out n2);
!
!
!
n = n+n2;
!
!
} else {
!
!
!
Get();
!
!
!
Term(out n2);
!
!
!
n = n-n2;
!
!
}
!
}
}
If next token is ‘+’
Rasmus Ejlers Møgelberg
23
Using coco/R with semantic actions
Rasmus Ejlers Møgelberg
24
Suppose S is the start symbol of a grammar. To indicate that $ must come after a complete Sphrase, we augment the grammar with a new start symbol S! and a new production S! " S$.
Predictive parsing
In Grammar 3.8, E is the start symbol, so an augmented grammar is Grammar 3.10.
Top-down parsing method aka LL-parsing
•GRAMMAR
3.10
a production based on the next token
• Guess
S"E$
T"T*F
generates LL parsers
• coco/R
E"E+T
T"T/F
derivations
E " E # left-most
T
T"F
• Produces
E"T
Example
grammar 3.11
•
!
!
!
!
!
!
!
!
F " num
!
!
!
!
F " (E)
!
3.2 PREDICTIVE PARSING
S =
|
|
L =
|
E =
|
F " id
if E then S else S
begin S L
print E
end
; S L
num
ident
Some grammars are easy to parse using a simple algorithm known as recursive descent. In
essence,
each grammar
production
turns into one clause of a recursive function. We illustrate
Example
parsing
on board
this by writing a recursive-descent parser for Grammar 3.11.
•
GRAMMAR 3.11
!
Rasmus
S " if E then S else
S Ejlers Møgelberg
!
S " begin S L
!
!
S " print E
!
L " end
!
!
L";SL
!
E " num = num
25
Parser implementation
A recursive-descent parser for this language has one function for each nonterminal and one
clause for each production.
final int IF=1, THEN=2, ELSE=3, BEGIN=4, END=5, PRINT=6,
SEMI=7, NUM=8, EQ=9;
int tok = getToken();
void advance() {tok=getToken();}
void eat(int t) {if (tok==t) advance(); else error();}
void S() {switch(tok) {
case IF: eat(IF); E(); eat(THEN); S();
eat(ELSE); S(); break;
case BEGIN: eat(BEGIN); S(); L(); break;
case PRINT: eat(PRINT); E(); break;
default: error();
}}
void L() {switch(tok) {
case END: eat(END); break;
case SEMI: eat(SEMI); S(); L(); break;
default: error();
47
Rasmus Ejlers Møgelberg
26
Parsing table
|
S
|
L
|
E
---------------------------------------------------if
| S->if E then S else S |
|
begin | S->begin S L
|
|
print | S->print E
|
|
end
|
| L->end |
;
|
| L->;S L |
num
|
|
| E->num
ident |
|
| E->ident
S =
|
|
L =
|
E =
|
if E then S else S
begin S L
print E
end
; S L
num
ident
Rasmus Ejlers Møgelberg
27
Intended learning outcomes
•
•
•
Construct grammars for programming languages
Eliminate ambiguity by
-
Encoding operator precedence
Encoding operator associativity
Use coco/R to create parsers and lexers
Rasmus Ejlers Møgelberg
28

Documentos relacionados

O que há de novo no C# 4 e no Visual Basic 10 (1,08

O que há de novo no C# 4 e no Visual Basic 10 (1,08 object res = calcType.InvokeMember("Add", BindingFlags.InvokeMethod, null, new object[] { 10, 20 }); int sum = Convert.ToInt32(res);

Leia mais