including precedence, associativity,
error recovery, and interfacing to Lex.
123¯123¯123¯123¯123¯123
- The Ayacc specification file -
%token '(' ')' NUMBER IDENTIFIER NEW_LINE
%right '='
%left '+' '-'
%left '*' '/'
%right DUMMY
%nonassoc EXP
{
type key_type is (Cval, Ival, Empty);
type YYSType (Key : Key_Type := Empty) is
record
case Key is
when Cval = >
Register : Character;
when Ival = >
Value : Integer;
when Empty = >
null;
end case;
end record;
}
%%
statements : statements statement
|
;
statement : expr NEW_LINE
{ Put_Line(Integer'Image($1.value)); }
| error NEW_LINE
{ Put_Line("Try again");
YYErrOK;
}
statement
;
12345¯12¯1¯1¯123¯123
expr : IDENTIFIER '=' expr
{ registers($1.register) := $3.value;
$$ := (key = > ival, value = > $3.value); }
| expr '+' expr
{ $$ := (key = > ival, value = > $1.value + $3.value); }
| expr '-' expr
{ $$ := (key = > ival, value = > $1.value - $3.value); }
| expr '*' expr
{ $$ := (key = > ival, value = > $1.value * $3.value); }
| expr '/' expr
{ $$ := (key = > ival, value = > $1.value / $3.value); }
| expr EXP expr
{ $$ := (key = > ival,
value = > Integer(float($1.value) ** $3.value)); }
| '-' expr %prec DUMMY
{ $$ := (key = > ival, value = > - $2.value ); }
| '(' expr ')'
{ $$ := (key = > ival, value = > $2.value); }
| NUMBER
{ $$ := (key = > ival, value = > $1.value) ; }
| IDENTIFIER
{ $$ := (key = > ival, value = > registers($1.register)); }
;
%%
123¯123¯123¯123¯123
package Calculator is
procedure YYParse;
end Calculator;
with Calculator_Tokens,
Calculator_Shift_Reduce,
Calculator_Goto,
Text_IO;
use Calculator_Tokens,
Calculator_Shift_Reduce,
Calculator_Goto,
Text_IO;
package body Calculator is
Registers : array('A'..'Z') of Integer;
procedure YYError(Text : in String) is
begin
Put_Line(Text);
end;
##
end Calculator;
/* The Lex specification file */
%{
#include "calculator.h"
static char reg;
static int value;
%}
%%
[A-Z] { reg = yytext[0]; return IDENTIFIER; }
[a-z] { reg = yytext[0] - 'a' + 'A'; return IDENTIFIER; }
[0-9]+ { value = atoi(yytext); return NUMBER; }
"**" { return EXP; }
[()+/*=-] { return yytext[0]; }
\n { return NEW_LINE; }
[\t ]+ {}
%%
yywrap()
{
return 1;
}
get_token()
{
return yylex();
}
get_register()
{
return reg;
}
get_value()
{
return value;
}
123¯123¯123¯123¯123456789012345678901234¯
-A modified version of the C_Lex package - |
with Calculator_Tokens; use Calculator_Tokens;
package Calculator_C_Lex is
function YYLex return Token;
end Calculator_C_Lex;
package body Calculator_C_Lex is
function GET_TOKEN return Integer;
pragma INTERFACE(C, GET_TOKEN);
-These four declarations have been added
function get_register return Character;
function get_value return Integer;
pragma interface(c, get_register);
pragma interface(c, get_value);
type Table is array(0..255) of token;
Literals : constant Table := Table'( 0 = > END_OF_INPUT,
40 = > '(',
41 = > ')',
61 = > '=',
43 = > '+',
45 = > '-',
42 = > '*',
47 = > '/',
others = > ERROR);
-Continued on next page ...
123¯123¯123¯123¯123456789012345678901234¯
function YYLex return TOKEN is
X : Integer;
begin
X := GET_TOKEN;
if X > 255 then
-This case statement has been added to assign appropriate
-values to YYLVal.
case TOKEN'VAL(X-256) is
when number = >
YYLVal := (key = > ival, value = > get_value);
when identifier = >
YYLVal := (key = > cval, register = > get_register);
when others = > null;
end case;
return TOKEN'VAL(X-256);
else
return LITERALS(X);
end if;
end YYLex;
end Calculator_C_Lex;
B Using the Verbose and Debug Options
We will introduce the concept of an item to describe the output file
created by the Verbose option. An item consists of a rule with
an underscore at some position on the right side. The underscore denotes
the amount of input seen by the parser. For example, an item of the form
X : A B _ C
shows that the parser is examining input corresponding to the rule
X : A B C.
The underscore states that the parser has has already seen
A B
and is expecting input that will match
C.
Items provide a means of finitely representing the possible
legal inputs to the parser. Each state contains a set of items corresponding
to configurations indistinguishable to the parser. For technical reasons,
the set of items associated with a state fall into two categories termed
the kernel and closure. Users familiar with LR parsers
may be interested in both the kernel and closure items, however the
typical user need only be concerned with items in the closure.
The verbose file contains a list of the states along with their corresponding
item sets and parse actions. A sample verbose file for the simple
grammar,
%token id
%%
E : E '+' T
| T
;
T : T '*' id
| id
;
is shown below.
------------------
State 0
Kernel
( 0) $accept : _ E END_OF_INPUT
Closure
( 0) $accept : _ E END_OF_INPUT
( 1) E : _ E '+' T
( 2) E : _ T
( 3) T : _ T '*' ID
( 4) T : _ ID
T goto 2
E goto 1
ID shift 3
default error
------------------
State 1
Kernel
( 0) $accept : E _ END_OF_INPUT
( 1) E : E _ '+' T
Closure
( 0) $accept : E _ END_OF_INPUT
( 1) E : E _ '+' T
END_OF_INPUT accept
'+' shift 5
default error
------------------
State 2
Kernel
( 2) E : T _
( 3) T : T _ '*' ID
Closure
( 2) E : T _
( 3) T : T _ '*' ID
'*' shift 6
default reduce 2
------------------
State 3
Kernel
( 4) T : ID _
Closure
( 4) T : ID _
default reduce 4
------------------
State 4
Kernel
( 0) $accept : E END_OF_INPUT _
Closure
( 0) $accept : E END_OF_INPUT _
default error
------------------
State 5
Kernel
( 1) E : E '+' _ T
Closure
( 1) E : E '+' _ T
( 3) T : _ T '*' ID
( 4) T : _ ID
T goto 7
ID shift 3
default error
------------------
State 6
Kernel
( 3) T : T '*' _ ID
Closure
( 3) T : T '*' _ ID
ID shift 8
default error
------------------
State 7
Kernel
( 1) E : E '+' T _
( 3) T : T _ '*' ID
Closure
( 1) E : E '+' T _
( 3) T : T _ '*' ID
'*' shift 6
default reduce 1
------------------
State 8
Kernel
( 3) T : T '*' ID _
Closure
( 3) T : T '*' ID _
default reduce 3
Using the verbose file it is possible to trace how the parser will
process a string of tokens. For example the string `ID * ID + ID' would
be treated as follows:
State Stack Input
0 ID * ID + ID END_OF_INPUT
The verbose entry for state 0 shows that on a lookahead token of
ID the action is a shift and the new state is 3. Thus, the new
configuration of the parser becomes
State Stack Input
3
0 * ID + ID END_OF_INPUT
For state 3, there is no explicit action associated with the lookahead
token and therefore the default action is consulted. The default
action associated
with state 3 is a reduction by rule 4) T : ID. Recall that a reduction
consists of two steps. First state 3 is popped leaving state 0
on top of the stack. Next the parser determines a new state
by consulting the current top of
the stack and the right hand side of the rule. This new state is
signified by a
goto
entry in the current state. The verbose entry
for state 0 and symbol T is a goto to state 2 producing the
new configuration:
State Stack Input
2
0 * ID + ID END_OF_INPUT
The remaining configurations in the parse are shown below:
Shift 6
State Stack Input
6
2
0 ID + ID END_OF_INPUT
Shift 8
State Stack Input
8
6
2
0 + ID END_OF_INPUT
Default Reduce 3) T : T '*' ID
Pop 8 6 2
Goto state 2
State Stack Input
2
0 + ID END_OF_INPUT
Default Reduce 2) E : T
Pop 2
Goto state 1
State Stack Input
1
0 + ID END_OF_INPUT
Shift 5
State Stack Input
5
1
0 ID END_OF_INPUT
Shift 3
State Stack Input
3
5
1
0 END_OF_INPUT
Default reduce 4) T : ID
Pop 3
Goto 7
State Stack Input
7
5
1
0 END_OF_INPUT
Default reduce 1) E : E + T
Pop 7 3 5
Goto 1
State Stack Input
1
0 END_OF_INPUT
Accept END_OF_INPUT
Parse completes successfully
C Automatic Error Recovery
If the Error_Recovery command line parameter is set to On
then Ayacc will generate an extension for automated syntax error
correction. Note that the lexical analyzer must contain additional
functions which give the line number and column for each token. This
can be done by giving the -E option to Aflex.
Ayacc generates an additional file named
base_error_report.a and the user may specify another optional
section in the user specification file. Here is a description of that
section.
C.1 User Error-correction Messages section
This is a section for the user to supply routines if he/she wishes to
control the reporting of correction messages. Ayacc-extension supports
automatic correction of some syntactic errors, as explained later.
Along with each correction, a message is printed. In addition to the
default message printed, the user is able to report a message of
his/her own. Here is the spec of the message-reporting procedure which
is generated:
procedure Report_Continuable_Error(Line_Number : in Natural;
Offset : in Natural;
Finish : in Natural;
Message : in String;
Error : in Boolean);
Line_Number is the line at which the error occurred; Offset is the index
into the line of the start of the error; Finish is the index into the
line of the end of the error; Message is a string describing the error.
Error is true for all genuine syntax errors, when it is false it indicates
a syntax warning.
This section resembles the Token and Stack Element section syntactically
in that its entries begin with the '%' character. There are the options
available in this section:
- %with ....;
Generates a line 'With ....;' in the error report file (package body)
- %use ....;
Generates a line 'Use ....;' in the error report file (package body)
- %initialize_error_report
Following this line there should be the body of a no-argument procedure
which will be called once at the beginning of yyparse. It can be used to
initialize any data structures used by the user's error report.
- %terminate_error_report
Following this line there should be the body of a no-argument procedure
which will be called once at the end of yyparse. It can be used to close
any data ports utilized by the user's error reporting mechanism.
- %report_error
Following this line there should be the body of the procedure
Report_Continuable_Error described above (i.e. with those arguments).
The %with and %use lines, if present, should precede the others. An example
is shown below.
-- The other declarations would go up here.
%%
%with text_io;
%initialize_error_report
begin
text_io.put_line("Initializing Error Report...");
end;
%terminate_error_report
begin
text_io.put_line("Finishing Error Report...");
end;
%report_error
begin
text_io.put_line("Error at line" & natural'image(line_number)
& ": " & message);
end;
C.2 Change in running the Parser
If Error_Recovery is set to On, the generated parser has more power
in error recovery. Here is the additional power:
A run of YYParse will produce a listing file which records the parsed lines
of the input text and indicates where errors occurred. If the specification file
is named base.y, then the listing file will be named base.lis. The rest of the
output of the program is dependent on the action routines; as arbitrary code
these are free to perform output operations.
C.2.2 Error Recovery
When the Ayacc generated parser encounters a syntax error, it tries to
correct it. To correct the error it will try either to insert a legal
token before the error, to change the error to a legal token, or to
delete the error from the token stream. When a correction can be made,
a message is printed describing the correction. Even when it is
possible to correct an error, the correction will only be syntactic.
For example, say the user is parsing a Pascal program and the input is
missing a semicolon at the end of a statement. The parser may be able
to detect that and insert the semicolon. In all likelihood, the
semicolon has no semantic value, and the grammar rule in which it
appears would not reference it in its semantic action. However,
consider a case where the parser decides to insert an identifier token
in order to correct a syntax error. It is very likely that an action
routine for the grammar rule using the identifier would reference it
semantically, to find out what characters are in the string making up
the identifier. However, the parser has no way of knowing which
identifier would be best to insert at the point of insertion, so it
cannot provide this information. The action routine would therefore
probably make a mistake, since it relies on the semantic information
being present. If executing an action routine following a syntax error
raises an exception, YYParse handles the exception and stops
performing the code in action routines for the remainder of the parse.
Even before action routines are stopped, any actions following a
syntax error should not be trusted. To emphasize the abortive nature
of a run of YYParse with syntax error, the parser raises the exception
Syntax_Error at the end of an input with syntax errors, even if all
have been corrected. Here is a sample listing file from a two line
calculator program.
-----------------------------------------------------------------------
1 1 * 2 * 3 * 2 3
Error ^
token deleted
2 4 + 28 / 14 + 2
Ayacc.YYParse : 1 syntax error found.
-----------------------------------------------------------------------
In the listing file, non-blank lines of text are listed with their
line number at the left. The last token of the first line is a
syntactic error; evidently an infix notation was specified in this
grammar. Errors are indicated by a line that begins Ërror" and then
contains a caret character "^" underneath the start of the erroneous
token. YYParse can continue its parse if it deletes this token from
the token stream.
C.2.3 User Error Messages
As previously mentioned, in addition to the default messages produced during
error recovery, the user, in the last section of the specification file, can
provide routines for reporting error messages in his/her own way. Also, the
user is given an interface to those routines which he/she can use even when
there is no syntactic error as defined by the grammar rules. This is useful in
cases where there is input which parses properly but which "really" represents a
syntax error in the input file. For this the following package is provided:
package user_defined_errors is
procedure parser_error(Message : in String);
procedure parser_warning(Message : in String);
end user_defined_errors;
This package is automatically visible to code in the user's action
routines. Calling user_defined_errors.parser_error will increase the
count of total syntax errors and call the procedure
report_continuable_error, from the Üser Error-correction Messages"
section. The Message argument to parser_error becomes the Message
argument to
report_continuable_error, and the Error argument to
report_continuable_error is given the value True. The rest of the
arguments to report_continuable_error are taken from context.
User_defined_errors.parser_warning is similar, except that the Error
argument in the generated call to report_continuable_error is False.
It also increments a counter of syntax warnings rather than syntax
errors. There is an exception Syntax_Warning like Syntax_Error
mentioned above which will be raised if during the parse there is no
syntax errors but there are warnings. Note that the procedures from
package user_defined_errors can be called even if the procedure
report_continuable_error is not defined by the user; in that case
there will be no reporting of the Message, but the incrementing of the
proper counter will still take place.
D Differences between Yacc and Ayacc
Ayacc was modeled after Yacc and adheres to most of the conventions and
features of its C analogue. Most of the differences between the
two programs are minor, but some differences will make it difficult
to convert Yacc specifications into Ayacc
counterparts. Some of the most important differences
are listed below:
- Ayacc identifiers are case insensitive.
- Ayacc does not provide a feature analogous to the %union and %type
constructs of Yacc. At some sacrifice of convenience, similar
functionality may be obtained by declaring YYSType as a
variant record.
- Ayacc requires the user to define YYSType.
- There are no default actions in Ayacc .
- Ayacc does not support the old and discouraged features of
Yacc. In particular, %binary and %term are not allowed, actions
cannot be specified using the {
delimiter and all rules must end in a
semicolon. In addition, Ayacc uses braces rather than,
%{ and
%},
to denote declarations that should be written to the tokens package.
- In Ayacc the tokens are an enumeration type rather than integers.
- Ayacc does not permit escape characters to be entered as literals.
- Yacc generates a file containing the
parser and parse tables and another file containing the
macro definitions of the tokens declaration. Ayacc generates four
separate files corresponding to
the parser, the two parse tables, and the tokens package. In addition,
Ayacc can generate the parser as a procedure
or as a package depending on the user's specification
file.
E Ayacc Specification File Guidelines
The key to preparing efficient and readable Ayacc specification files is
to make each part of the specification distinguishable from the rest. This
appendix is provided to give the new Ayacc user suggestions on how to
prepare specification files which are:
- Efficient.
- Easy to read.
- Easy to modify.
Specification File Format
- Group grammar rules with a common left hand side together, and line up the
right hand sides with the `:', `|', and `;'. This enhances readability and
makes adding new rules or actions easier.
- Use upper case for Tokens returned by the lexical analyzer and lower
case for Non-Terminals of the grammar. This makes the distinction
between terminals and non-terminals very clear.
- Place grammar rules and their corresponding actions on separate
lines. This enhances readability and maintainability.
The rule fragment shown in Figure 10 defines the rules/actions for Ada
identifier lists and is intended to exemplify the guidelines
listed above.
identifier_list : IDENTIFIER
{
$$ := (Tag => Identifier_List,
List => Empty_List);
Insert (Item => $1,
Into => $$.List);
}
| identifier_list ',' IDENTIFIER
{
$$ := $1;
Insert (Item => $3,
Into => $$.List);
}
;
Figure 10. Identifier List Grammar and Actions |
F How the Parser Works
The parser generated by Ayacc belongs to the class of parsers technically
known as LALR(1). Although the use of Ayacc does not require extensive
knowledge of LALR grammars, an intuitive understanding of
how the parser works will help the user to resolve ambiguities in the
grammar specification and to write more efficient parsers.
The parser generated by Ayacc can be seen as a finite state
machine with a stack of states. The current state is always on the
top of the stack; initially the stack contains state 0. At any given
moment during the parse, the parser uses its current state and the
value of the next lookahead token (obtained by calling
YYLex)
to determine its next action. Based on the current state and
lookahead token, the parser performs one of four actions:
1234¯
ACCEPT
The parser accepts the grammar, completing the parse.
ERROR
The parser detects a syntax error and calls YYError. If no error recovery
is specified the parser aborts.
SHIFT
The parser uses the current state and the lookahead token to choose a new
state that is pushed onto the stack. The lookahead token is advanced
to the next token in the input.
REDUCE
Reduce actions occur when the parser recognizes the right hand side of a
rule. In a reduce action, the parser pops a state for every symbol on the
right hand side of the rule. The parser then uses the current state
uncovered by the succession of pops and the nonterminal on the left hand
side of the rule to choose a new state that is pushed onto the stack. The
lookahead token is unaffected by a reduce action.
G Porting Ayacc to Other Systems
Installing Ayacc
Ayacc was developed using the Verdix Ada compiler (version v04.06)
running under UNIX 4.2 BSD and enhanced using the Dec Ada Compiler
(version 1.2-15) running under VAX/VMS 4.3. If you are using a different
system, you may have to make a minor change to the Ayacc source.
Current Ayacc development is done using the SunAda (version 1.1i) on
Sun workstations. Ports done by users for many other compilers are
included in the distribution.
Reading arguments from the command line
The Verdix compiler uses a
library package U_Env
to provide C-like facilities for reading arguments from the command
line. If your machine uses a different mechanism for passing parameters to
the program, you will have to modify the subunit
Command_Line_Interface.Read_Command_Line
from the Command_Line_Interface package.