Context free grammars and predictive parsing
Transcrição
Context free grammars and predictive parsing
Context free grammars and predictive parsing Programming Language Concepts and Implementation Fall 2012, Lecture 6 Rasmus Ejlers Møgelberg Overview • • • • • • Context free grammars Describing programming language syntax Ambiguities and eliminating these The parser generator coco/R Predictive parsing: Under the hood of coco/R Next week: LR parsing Rasmus Ejlers Møgelberg 2 Context free grammars • An example and a derivation Expr = | | | • • Expr + Expr Expr * Expr (Expr) num Expr => Expr + Expr => Expr + Expr * Expr => 2 + 3*4 Think of it asGrammar regular + recursion 3.1 expressions is an example of a grammar for straight-line programs. The start symbol is S Terminology: - (when the start symbol is not written explicitly it is conventional to assume that the left-hand nonterminal in the first production is the start symbol). The terminal symbols are id print num, + ( ) := ; 1 non-terminal Expr GRAMMAR 3.1: A syntax for straight-line programs. 5 terminals (tokens): 1. S ! S; +, S *, (, ), num 2. S ! id := E 4 productions (right hand sides) 3. S ! print (L) 4. E ! id 5. E ! num 6. E ! E + E 7. E ! (S, E) 8. L ! E 9. L ! L, E Terminals and nonterminals collectively are symbols and the nonterminals are S, E, and L. One sentence in the language of this grammar is Rasmus Ejlers Møgelberg id := num; id := id + (id := num + num, id) 3 where the source text (before lexical analysis) might have been a : = 7; b : = c + (d : = 5 + 6, d) The token-types (terminal symbols) are id, num, :=, and so on; the names (a,b,c,d) and numbers (7, 5, 6) are semantic values associated with some of the tokens. • DERIVATIONS Another example Straight line programs (from To show that this sentencebook) is in the language of the grammar, we can perform a derivation: S = | | E = | | | L = | S;S id := E print(L) id num E + E (S,E) E L,E Start with the start symbol, then repeatedly replace any nonterminal by one of its right-hand sides, as shown in Derivation 3.2. DERIVATION 3.2 ! ! ! ! ! ! ! ! ! ! ! ! ! S S;S S ; id := E id := E; id := E id := num ; id := E id := num ; id := E + E id := num ; id := E + (S, E) id := num ; id := id + (S, E) id := num ; id := id + (id := E, E) id := num ; id := id + (id := E + E, E) id := num ; id := id + (id := E + E, id ) id := num ; id := id + (id := num + E, id) id := num ; id := id + (id := num + num, id) Rasmus Ejlers Møgelberg 4 Official definition • • • A context free grammar consists of - A finite set of nonterminals A finite set of terminals A finite set of productions A choice of start symbol (a non-terminal) A production consists of - A nonterminal (called the left hand side) A string of symbols (terminals or nonterminals) This is called Backus-Naur Form (BNF) Rasmus Ejlers Møgelberg 5 Example: Mini Java • From MCIJ (note mixed notation) Rasmus Ejlers Møgelberg 6 SQL specification (in extended BNF) ... <query specification> ::= ! ! SELECT [ <set quantifier> ] <select list> <table expression> <select list> ::= ! ! <asterisk> ! |! <select sublist> [ { <comma> <select sublist> }... ] <select sublist> ::= <derived column> | <qualifier> <period> <asterisk> <derived column> ::= <value expression> [ <as clause> ] <as clause> ::= [ AS ] <column name> <table ! ! ! ! ! ! ! ! expression> ::= <from clause> [ <where clause> ] [ <group by clause> ] [ <having clause> ] http://savage.net.au/ SQL/sql-92.bnf <from clause> ::= FROM <table reference> [ { <comma> <table reference> }... ] ... Rasmus Ejlers Møgelberg Ambiguity 7 Ambiguity Expr = | | | Expr + Expr Expr * Expr (Expr) num Expr Expr Expr + Expr 2 Expr ⇤ 3 Expr Expr 4 2 Expr => Expr + Expr => Expr + Expr * Expr => 2 + 3*4 Expr ⇤ Expr + Expr 4 3 Expr => Expr * Expr => Expr + Expr * Expr => 2 + 3*4 Rasmus Ejlers Møgelberg 9 Encoding operator precedence • Multiplication has higher precedence (binds stronger) than addition • One nonterminal per precedence level Expr = Expr + Expr | Term • Term = Term * Term | (Expr) | num Exercise: - How many ways can you parse 2+3*4? How about 2 + 3 + 4? Rasmus Ejlers Møgelberg 10 Ambiguity and associativity Expr = Expr - Expr | num Expr Expr 5 • Expr Expr Expr Expr Expr Expr Expr Expr 3 2 5 3 2 Forcing left associativity Expr = Expr - num | num Rasmus Ejlers Møgelberg 11 Exercise • What ambiguities exist in the following grammar, and how do we get rid of them? Expr = | | | | | Expr + Expr * Expr Expr / (Expr) num Expr Expr Expr Expr Rasmus Ejlers Møgelberg 12 Exercise • • • What ambiguities exist in the following grammar, and how do we get rid of them? Expr = | | | | | Expr + Expr * Expr Expr / (Expr) num Expr Expr Expr Expr * and / have higher precedence than -,+ All operators associate to the left, e.g., - 6-3-2 = (6-3)-2 ≠ 6-(3-2) 6/3*2 = (6/3)*2 ≠ 6/(3*2) 6-3+2 = (6-3)+2 ≠ 6-(3+2) Rasmus Ejlers Møgelberg 13 Exercise • • • Encoding operator precedence Expr = | | | | | Expr + Expr * Expr Expr / (Expr) num Expr Expr Expr Expr Expr = | | Term = | | | Expr + Expr Term Term * Term / (Expr) num Expr Expr Expr = | | Term = | | Prim = | Expr + Expr Term Term * Term / Prim (Expr) num Term Term Term Term Use one non-terminal per precedence level Encoding associativity Expr = | | Term = | | | | | Expr + Expr Term Term * Term * Term / Term / (Expr) num Term Term num (Expr) num (Expr) or(better) Rasmus Ejlers Møgelberg Prim Prim 14 Associativity of operators • Most binary operators are left associative, e.g., +, -, *, / • Few are right associative, e.g. = in C: x = y = 2 parsed as x = (y = 2) • Forcing right associativity Expr = ident ‘=’ Expr | ident | num • Some are non-associative, e.g., 1<2<3 is not legal LogExpr = Expr < Expr Expr = ... Rasmus Ejlers Møgelberg 15 Ambiguity: Dangling else • • Consider the grammar Stmt = if Expr then Stmt else Stmt | if Expr then Stmt | id = Expr Amguity: How to parse if Expr then if Expr then id = Expr else id = Expr • ? Rasmus Ejlers Møgelberg 16 Ambiguity: Dangling else • • Consider the grammar Stmt = if Expr then Stmt else Stmt | if Expr then Stmt | id = Expr Amguity: How to parse if Expr then if Expr then id = Expr else id = Expr • Resolving the ambiguity Stmt = Matched_Stmt | Unmatched_Stmt Better to handle this using parser tricks. See later Matched_Stmt = if Expr then Matched_Stmt else Matched_Stmt | id = Expr Unmatched_Stmt = if Expr then Stmt | if Expr then Matched_Stmt else Unmatched_Stmt Rasmus Ejlers Møgelberg The parser generator Coco/R 17 Extended BNF • Example Expr = Term { ‘+’ Term | ‘-’ Term } Term = num {‘*’ num} • • Extra symbols - {α} means zero, one or many α [α] means zero or one α (α) is used for grouping EBNF is no more expressive than BNF, only more convenient Rasmus Ejlers Møgelberg 19 Using coco/R COMPILER Expressions ... PRODUCTIONS /*-------------------------------------------------------------------*/ Expr = Term { '+' Term | '-' Term }. Term = number { '*' number }. Expressions = Expr. Specification of start symbol END Expressions. Rasmus Ejlers Møgelberg 20 Using coco/R Rasmus Ejlers Møgelberg 21 Semantic actions in coco/R COMPILER Expressions public int res; ... PRODUCTIONS /*-------------------------------------------------------------------*/ Expr<out int n> (. int n1, n2; .) = Term<out n1> (. n = n1; .) { '+' Term<out n2> (. n = n+n2; .) | '-' Term<out n2> (. n = n-n2; .) } . Term<out int n> = number { '*' number } . Expressions = Expr<out n> . END Expressions. (. n = Convert.ToInt32(t.val); .) (. n = n*Convert.ToInt32(t.val); .) (. int n; .) (. res = n; .) Rasmus Ejlers Møgelberg 22 The generated parser • In resulting Parser.cs Pass by reference, similar to ref void Expr(out int n) { ! int n1, n2; ! Term(out n1); Method for ! n = n1; parsing! while (la.kind == 3 || la.kind == 4) { ! ! if (la.kind == 3) { expressions ! ! ! Get(); ! ! ! Term(out n2); ! ! ! n = n+n2; ! ! } else { ! ! ! Get(); ! ! ! Term(out n2); ! ! ! n = n-n2; ! ! } ! } } If next token is ‘+’ Rasmus Ejlers Møgelberg 23 Using coco/R with semantic actions Rasmus Ejlers Møgelberg 24 Suppose S is the start symbol of a grammar. To indicate that $ must come after a complete Sphrase, we augment the grammar with a new start symbol S! and a new production S! " S$. Predictive parsing In Grammar 3.8, E is the start symbol, so an augmented grammar is Grammar 3.10. Top-down parsing method aka LL-parsing •GRAMMAR 3.10 a production based on the next token • Guess S"E$ T"T*F generates LL parsers • coco/R E"E+T T"T/F derivations E " E # left-most T T"F • Produces E"T Example grammar 3.11 • ! ! ! ! ! ! ! ! F " num ! ! ! ! F " (E) ! 3.2 PREDICTIVE PARSING S = | | L = | E = | F " id if E then S else S begin S L print E end ; S L num ident Some grammars are easy to parse using a simple algorithm known as recursive descent. In essence, each grammar production turns into one clause of a recursive function. We illustrate Example parsing on board this by writing a recursive-descent parser for Grammar 3.11. • GRAMMAR 3.11 ! Rasmus S " if E then S else S Ejlers Møgelberg ! S " begin S L ! ! S " print E ! L " end ! ! L";SL ! E " num = num 25 Parser implementation A recursive-descent parser for this language has one function for each nonterminal and one clause for each production. final int IF=1, THEN=2, ELSE=3, BEGIN=4, END=5, PRINT=6, SEMI=7, NUM=8, EQ=9; int tok = getToken(); void advance() {tok=getToken();} void eat(int t) {if (tok==t) advance(); else error();} void S() {switch(tok) { case IF: eat(IF); E(); eat(THEN); S(); eat(ELSE); S(); break; case BEGIN: eat(BEGIN); S(); L(); break; case PRINT: eat(PRINT); E(); break; default: error(); }} void L() {switch(tok) { case END: eat(END); break; case SEMI: eat(SEMI); S(); L(); break; default: error(); 47 Rasmus Ejlers Møgelberg 26 Parsing table | S | L | E ---------------------------------------------------if | S->if E then S else S | | begin | S->begin S L | | print | S->print E | | end | | L->end | ; | | L->;S L | num | | | E->num ident | | | E->ident S = | | L = | E = | if E then S else S begin S L print E end ; S L num ident Rasmus Ejlers Møgelberg 27 Intended learning outcomes • • • Construct grammars for programming languages Eliminate ambiguity by - Encoding operator precedence Encoding operator associativity Use coco/R to create parsers and lexers Rasmus Ejlers Møgelberg 28
Documentos relacionados
O que há de novo no C# 4 e no Visual Basic 10 (1,08
object res = calcType.InvokeMember("Add", BindingFlags.InvokeMethod, null, new object[] { 10, 20 }); int sum = Convert.ToInt32(res);
Leia mais