Professional Documents
Culture Documents
Yacc
Yacc
NAME
yacc - Generates an LR(1) parsing program from input consisting of a
context-free grammar specification
SYNOPSIS
yacc [-vltds] [-b prefix] [-N number] [-p symbol_prefix] [-P pathname]
grammar
STANDARDS
Interfaces documented on this reference page conform to industry standards
as follows:
yacc: XPG4, XPG4-UNIX
Refer to the standards(5) reference page for more information about
industry standards and associated tags.
OPTIONS
-b prefix
Uses prefix instead of y as the prefix for all output filenames
(prefix.tab.c, prefix.tab.h, and prefix.output).
-d
Produces the <y.tab.h> file, which contains the #define statements that
associate the yacc-assigned token codes with your token names. This
allows source files other than y.tab.c to access the token codes by
including this header file.
-l
-N number
[Tru64 UNIX] Provides yacc with extra storage for building its LALR
tables, which may be necessary when compiling very large grammars. The
number should be larger than 40,000 when you use this option.
-p symbol_prefix
-t
-v
OPERANDS
grammar
The pathname of a file containing input instructions. The format of
this file is described in the DESCRIPTION section.
DESCRIPTION
The yacc command converts a context-free grammar specification into a set
of tables for a simple automaton that executes an LR(1) parsing algorithm.
The yacc grammar can be ambiguous; specified precedence rules are used to
break ambiguities.
You must compile the y.tab.c output file with a C language compiler to
produce the yyparse() function. This function must be loaded with a yylex
lexical analyzer function, as well as two routines that you must provide,
main() and an error-handling routine, yyerror(). The lex command is useful
for creating lexical analyzers usable by yacc.
The yacc program reads its skeleton parser from the file
/usr/ccs/lib/yaccpar. Use the environment variable YACCPAR to specify
another location for the yacc program to read from. If you use this
environment variable, the -P option is ignored, if specified.
The general format of the yacc input file is as follows:
[definitions]
%%
rules
[%%
[user subroutines]]
where
definitions
Is the section where you define the variables to be used later in the
grammar, such as in the rules section. It is also where files are
included (#include) and processing conditions are defined. This
section is optional.
rules
Is the section that contains grammar rules for the parser.
input file must have a rules section.
A yacc
user subroutines
Is the section that contains user-supplied subroutines that can be used
by the actions in the rules section. This section is optional.
Comments, in C syntax, can appear anywhere in the user subroutines section
or the definitions section. In the rules section, comments can appear
wherever a symbol is allowed. Blank lines or lines consisting of white
space can be inserted anywhere in the file, and are ignored. The NULL
character must not be used in grammar rules or literals.
into the global definitions of the output file. Such lines commonly
include preprocessor directives and declarations of external variables
and functions.
%token [<type>] token [number] [token [number]... ]
Lists tokens or terminal symbols to be used in the rest of the input
file. This line is needed for tokens that do not appear in other %
definitions. If type is present, the C type for all tokens on this line
is declared to be the type referenced by type. If a positive integer
number follows a token, that value is assigned to the token.
%left [<type>] token [number] [token [number]... ]
Indicates that each token is an operator, all tokens in this definition
have equal precedence, and a succession of the operators listed in this
definition are evaluated left to right.
%right [<type>] token [number] [token [number]... ]
Indicates that each token is an operator, that all tokens in this
definition have equal precedence, and that a succession of the
operators listed in this definition are evaluated right to left.
%nonassoc [<type>] token [number] [token [number] ...]
Indicates that each token is an operator, and that the operators listed
in this definition cannot appear in succession. Indicates that the
token cannot be used associatively.
%start symbol
Indicates the highest-level production rule to be reduced; in other
words, the rule where the parser can consider its work done and can
terminate processing. If this definition is not included, the parser
uses the first production rule. The symbol must be non-terminal (not a
token).
%type <type> symbol [symbol ...]
Defines each symbol as data type type, to resolve ambiguities. If this
construct is present, yacc performs type checking and otherwise assumes
all symbols to be of type integer.
%union union-def
Defines the yylval global variable as a union, where union-def is a
standard C definition in the format:
{ type member ; [type member ; ...] }
At least one member should be an int. Any valid C data type can be
defined, including structures. When you run yacc with the -d option,
the definition of yylval is placed in the <y.tab.h> file and can be
referred to in a lex input file.
Every token (non-terminal symbol) must be listed in one of the preceding %
definitions. Multiple tokens can be separated by white space or commas. All
the tokens in %left, %right, and %nonassoc definitions are assigned a
precedence with tokens in later definitions having precedence over those in
earlier definitions.
In addition to symbols, a token can be literal character enclosed in single
Alert
\n
Newline
\t
Tab
\v
Vertical tab
\r
Carriage Return
\b
Backspace
\f
Form Feed
\\
Backslash
\'
Single Quote
\?
Question mark
\n
One or more octal digits specifying the integer value of the character
{
}
fprintf(stderr,"%s\n",s);
return (0);
NOTES
The LANG and LC_* variables affect the execution of the yacc command as
stated. The main() function defined by yacc issues the following call:
setlocale(LC_ALL, )
As a result, the program generated by yacc will also be affected by the
contents of these variables at run time.
The lex program can be compiled as a C program with -std0, -std, or -std1
mode. It can also be compiled as a C++ program. If YY_NOPROTO is defined on
the compilation command line, function prototypes are not generated.
EXIT STATUS
The following exit values are returned:
0
Successful completion.
>0
An error occurred.
EXAMPLES
This section describes the example programs for the lex and yacc commands,
which together create a simple desk calculator program that performs
addition, subtraction, multiplication, and division operations. The
calculator program also allows you to assign values to variables (each
designated by a single lowercase ASCII letter), and then use the variables
in calculations. The files that contain the program are as follows:
calc.l
The lex specification file that defines the lexical analysis rules.
calc.y
The yacc grammar file that defines the parsing rules and calls the
yylex() function created by lex to provide input.
The remaining text expects that the current directory is the directory that
contains the lex and yacc example program files.
Process the yacc grammar file using the -d option. The -d option tells
yacc to create a file that defines the tokens it uses in addition to
creating the C language source code file.
yacc -d calc.y
The following files are created:
y.tab.c
The C language source file that yacc created for the parser.
<y.tab.h>
A header file containing #define statements for the tokens used by
the parser.
(The *.o files are created temporarily and then removed.)
2.
3.
'|'
'&'
'+' '-'
'*' '/' '%'
UMINUS /*supplies precedence for unary minus */
%%
list
:
|
|
/*empty */
list stat '\n'
list error '\n'
{
yyerrok;
;
stat
:
|
expr
{
printf("%d\n",$1);
LETTER '=' expr
{
regs[$1] = $3; }
;
expr
:
|
|
{
|
|
|
|
|
|
|
|
;
number :
|
{
expr '+'
{
expr '-'
{
expr '&'
{
expr '|'
{
'-' expr
{
LETTER
{
number
DIGIT
{
number
{
$$ = $1 % $3;
expr
$$ = $1 + $3;
expr
$$ = $1 - $3;
expr
$$ = $1 & $3;
expr
$$ = $1 | $3;
%prec UMINUS
$$ = -$2;
$$ = regs[$1];
}
}
}
}
}
}
}
;
%%
/* beginning of user subroutines section */
main()
{
return(yyparse());
}
yyerror(s)
char *s;
{
fprintf(stderr,"%s\n",s);
}
yywrap()
{
return(1);
}
The Lexical Analyzer Source Code
The file calc.l contains the lexical analyzer source code. It contains the
rules used to generate the tokens from the input stream. It also contains
include statements for standard input and output, as well as for the
<y.tab.h> file. The yacc program generates the <y.tab.h> file from the yacc
grammar file information, if you use the -d option with the yacc command.
The file <y.tab.h> contains definitions for the tokens that the parser
program uses.
Contents of calc.1:
%{
#include <stdio.h>
#include "y.tab.h"
int c;
ENVIRONMENT VARIABLES
The following environment variables affect the execution of yacc:
LANG
Provides a default value for the internationalization variables that
are unset or null. If LANG is unset or null, the corresponding value
from the default locale is used.If any of the internationalization
variables contain an invalid setting, the utility behaves as if none of
the variables had been defined.
LC_ALL
If set to a non-empty string value, overrides the values of all the
other internationalization variables.
LC_CTYPE
Determines the locale for the interpretation of sequences of bytes of
text data as characters (for example, single-byte as opposed to multibyte characters in arguments and input files).
LC_MESSAGES
Determines the locale for the format and contents of diagnostic
messages written to standard error.
NLSPATH
Determines the location of message catalogs for the processing of
LC_MESSAGES.
FILES
y.output
A readable description of parsing tables and a report on conflicts
generated by grammar ambiguities
y.tab.c
Output file
<y.tab.h>
Definitions for token names
yacc.tmp
Temporary file
yacc.debug
Temporary file
yacc.acts
Temporary file
/usr/ccs/lib/yaccpar
Default skeleton parser for C programs
/usr/ccs/lib/liby.a
The yacc library
SEE ALSO
Commands:
Standards:
lex(1)
standards(5)