Grammars - Columbus State University

Download Report

Transcript Grammars - Columbus State University

Grammars

CPSC 5135

Formal Definitions

• A symbol is a character. It represents an abstract entity that has no inherent meaning • Examples: a, A, 3, *, - ,=

Formal Definitions

• An alphabet is a finite set of symbols.

• Examples: A = { a, b, c } B = { 0, 1 }

Formal Definitions

• A string (or word) is a finite sequence of symbols from a given alphabet.

• Examples: S = { 0, 1 } is a alphabet 0, 1, 11010, 101, 111 are strings from S A = { a, b, c ,d } is an alphabet bad, cab, dab, d, aaaaa are strings from A

Formal Definitions

• A language is a set of strings from an alphabet.

• The set can be finite or infinite.

• Examples: A = { 0, 1} L1 = { 00, 01, 10, 11 } L2 = { 010, 0110, 01110,011110, …}

Formal Definitions

• A grammar is a quadruple G = (V, Σ, R, S) where 1) V is a finite set of variables (non-terminals), 2) Σ is a finite set of terminals, disjoint from V, 3) R is a finite set of rules. The left side of each rule is a string of one or more elements from V U Σ and whose right side is a string of 0 or more elements from V U Σ 4) S is an element of V and is called the start symbol

Formal Definitions

• Example grammar: • G = (V, Σ, R, S) V = { S, A } Σ = { a, b } R = { S → A → aA bA A → a }

Derivations

R = S → A → aA bA A → a • A derivation is a sequence of replacements , beginning with the start symbol, and replacing a substring matching the left side of a rule with the string on the right side of a rule S → aA → abA → abbA → abba

Derivations

• What strings can be generated from the following grammar?

S → aBa B → aBa B → b

Formal Definitions

• The language generated by a grammar is the set of all strings of terminal symbols which are derivable from S in 0 or more steps.

• What is the language generated by this grammar?

• S → a S → aB B → aB B → a

Kleene Closure

• Let Σ be a set of strings. Σ* is called the Kleene closure of Σ and represents the set of all concatenations of 0 or more strings in Σ.

• Examples Σ* = { 1 }* = { ø, 1, 11, 111, 1111, …} Σ* = { 01 }* = { ø, 01, 0101, 010101, …} Σ* = { 0 + 1 }* = set of all possible strings of 0’s and 1’s. (+ means union)

Formal Definitions

• A grammar G = (V,Σ, R, S) is right-linear if all rules are of the form: A → xB A → x where A, B ε V and x ε Σ*

Right-linear Grammar

• G = { V, Σ, R, S } V = { S, B } Σ = { a, b } R = { S → aS , S → B , B → bB , B → ε } What language is generated?

Formal Definitions

• A grammar G = (V,Σ, R, S) is left-linear if all rules are of the form: A → Bx A → x where A, B ε V and x ε Σ*

Formal Definitions

• A regular grammar is one that is either right or left linear.

• Let Q be a finite set and let Σ be a finite set of symbols. Also let δ be a function from Q x Σ to Q, let q 0 be a state in Q and let A be a subset of Q. We call each element of Q a state, δ the transition function, q 0 the initial state and A the set of accepting states. Then a deterministic finite automaton (DFA) is a 5 tuple < Q , Σ , q 0 , δ , A > • Every regular grammar is equivalent to a DFA

Language Definition

• Recognition – a machine is constructed that reads a string and pronounces whether the string is in the language or not. (Compiler) • Generation – a device is created to generate strings that belong to the language. (Grammar)

Chomsky Hierarchy

• Noam Chomsky (1950’s) described 4 classes of grammars 1) Type 0 – unrestricted grammars 2) Type 1 – Context sensitive grammars 3) Type 2 – Context free grammars 4) Type 3 – Regular grammars

Grammars

• Context-free and regular grammars have application in computing • Context-free grammar – each rule or production has a left side consisting of a single non-terminal

Backus-Naur form (BNF)

• BNF was used to describe programming language syntax and is similar to Chomsky’s context free grammars • A meta-language is a language used to describe another language • BNF is a meta-language for computer languages

BNF

• Consists of nonterminal symbols, terminal symbols (lexemes and tokens), and rules or productions • → if then → if then else → if then | if then else

A Small Grammar

 begin end | ; =  A | B | C + | - |

A Derivation

 begin end  begin end  begin = end  begin A = end  begin A = + end  begin A = B + end  begin A = B + C end

Terms

• Each of the strings in a derivation is called a sentential form.

• If the leftmost non-terminal is always the one selected for replacement, the derivation is a leftmost derivation.

• Derivations can be leftmost, rightmost, or neither • Derivation order has no effect on the language generated by the grammar

Derivations Yield Parse Trees

  begin end begin end  begin = end  begin A = end  begin A = + end  begin A = B + end  begin A = B + C end begin end = A + B C

Parse Trees

• Parse trees describe the hierarchical structure of the sentences of the language they define.

• A grammar that generates a sentence for which there are two or more distinct parse trees is ambiguous.

An Ambiguous Grammar

=  A | B | C + | * | ( ) |

Two Parse Trees – Same Sentence

= A + * B C A = A * + A B C

Derivation 1

=  A =  A = +  A = +  A = B +  A = B + *  A = B + *  A = B + C *  A = B + C *  A = B + C * A

Derivation 2

=  A =  A = *  A = + *  A = + *  A = B + *  A = B + *  A = B + C *  A = B + C *  A = B + C * A

Ambiguity

• Parse trees are used to determine the semantics of a sentence • Ambiguous grammars lead to semantic ambiguity - this is intolerable in a computer language • Often, ambiguity in a grammar can be removed

Unambiguous Grammar

=  A | B | C + | * |  ( ) | • This grammar makes multiplication take precedence over addition

Associativity of Operators

=  A | B | C + | * |  ( ) | Addition operators associate from left to right = A + + A C B

BNF

• A BNF rule that has its left hand side appearing at the beginning of its right hand side is left recursive .

• Left recursion specifies left associativity • Right recursion is usually used for associating exponetiation operators ** |  ( ) |

Ambiguous If Grammar

 if then | if then else • Consider the sentential form: if then if then else

Parse Trees for an If Statement

If then else If then if then if then else

Unambiguous Grammar for If Statements

|  if then else | any non-if statement  if then | if then else

Extended BNF (EBNF)

• Optional part denoted by […]  if ( ) [ else ] • Braces used to indicate the enclosed part can be repeated indefinitely or left out { , } • Multiple choice options are put in parentheses and separated by the or operator |  for := (to | downto) do

BNF vs EBNF for Expressions

BNF: + | - | * | / | EBNF: { (+ | - ) } { ( * | / )