yacc - Generates an LR(1) parsing program from input consisting of a context-free grammar specification

yacc [-vltds] [-b prefix] [-N number] [-p symbol_prefix] [-P pathname] grammar

Interfaces documented on this reference page conform to industry standards as follows: yacc: XPG4, XPG4-UNIX Refer to the standards(5) reference page for more information about industry standards and associated tags.

-b prefix Uses prefix instead of y as the prefix for all output filenames (prefix.tab.c, prefix.tab.h, and prefix.output). -d Produces the <y.tab.h> file, which contains the #define statements that associate the yacc-assigned token codes with your token names. This allows source files other than y.tab.c to access the token codes by including this header file. Includes no #line constructs in y.tab.c. Use this only after the grammar and associated actions are fully debugged.


-N number [Tru64 UNIX] Provides yacc with extra storage for building its LALR tables, which may be necessary when compiling very large grammars. The number should be larger than 40,000 when you use this option. -p symbol_prefix Allows multiple yacc parsers to be linked together. Use symbol_prefix

c is compiled. the -P option is ignored.c output file with a C language compiler to produce the yyparse() function. Produces the y. If you use this environment variable. The lex command is useful for creating lexical analyzers usable by yacc. specified precedence rules are used to break ambiguities. if specified. -s [Tru64 UNIX] Breaks the yyparse() function into several smaller functions.tab. DESCRIPTION The yacc command converts a context-free grammar specification into a set of tables for a simple automaton that executes an LR(1) parsing algorithm. this code is not included when y. The general format of the yacc input file is as follows: [definitions] %% rules [%% . main() and an error-handling routine. The format of this file is described in the DESCRIPTION section. The pathname specifies the filename of the skeleton to be used in place of yaccpar). it is possible for yyparse() to become too large to compile. Use the environment variable YACCPAR to specify another location for the yacc program to read from. The yacc grammar can be ambiguous. Without compiling this code. or execute efficiently. the C compiler (cc) includes the debugging code. optimize. The yacc program reads its skeleton parser from the file /usr/ccs/lib/yaccpar. Compiles run-time debugging code. This function must be loaded with a yylex lexical analyzer function. You must compile the y.tab.instead of yy to prefix global symbols. yyerror(). Because its size is somewhat proportional to that of the grammar. If YYDEBUG has a nonzero value.output file. -P pathname [Tru64 UNIX] Specifies an alternative parser (instead of /usr/ccs/lib/yaccpar). yyparse() will run more quickly. which contains a readable description of the parsing tables and a report on conflicts generated by grammar ambiguities. whether or not the -t option was used. -t -v OPERANDS grammar The pathname of a file containing input instructions. By default. as well as two routines that you must provide.

Each line in the definitions section can be: When placed on lines by themselves. This section is optional. Defines the operators and their precedence.[user subroutines]] where definitions Is the section where you define the variables to be used later in the grammar. In the rules section. A yacc user subroutines Is the section that contains user-supplied subroutines that can be used by the actions in the rules section. The NULL character must not be used in grammar rules or literals. can appear anywhere in the user subroutines section or the definitions section. such as in the rules section. It is also where files are included (#include) and processing conditions are defined. Defines global variables. input file must have a rules section. Comments. comments can appear wherever a symbol is allowed. Definitions Section of Input File The definitions section of a yacc input file contains entries that perform the following functions: · · · · · %{ %} Includes standard I/O header file. Such lines commonly . Defines the list rule as the place to start processing. and are ignored. This section is optional. these enclose C code to be passed into the global definitions of the output file. Defines the tokens used by the parser. in C syntax. rules Is the section that contains grammar rules for the parser. Blank lines or lines consisting of white space can be inserted anywhere in the file.

If type is present.] } At least one member should be an int. (Multibyte characters are recognized by the lexical analyzer and . %union union-def Defines the yylval global variable as a union.. This line is needed for tokens that do not appear in other % definitions. a token can be literal character enclosed in single quotes. If a positive integer number follows a token. ] Indicates that each token is an operator. the rule where the parser can consider its work done and can terminate processing. The symbol must be non-terminal (not a token).. When you run yacc with the -d option. and that the operators listed in this definition cannot appear in succession. Every token (non-terminal symbol) must be listed in one of the preceding % definitions. %right. Any valid C data type can be defined. and that a succession of the operators listed in this definition are evaluated right to left. the parser uses the first production rule.. ] Indicates that each token is an operator.. and %nonassoc definitions are assigned a precedence with tokens in later definitions having precedence over those in earlier definitions. yacc performs type checking and otherwise assumes all symbols to be of type integer.. that all tokens in this definition have equal precedence.tab. %type <type> symbol [symbol . ] Lists tokens or terminal symbols to be used in the rest of the input file... In addition to symbols. that value is assigned to the token. All the tokens in %left.] Indicates that each token is an operator.. including structures. all tokens in this definition have equal precedence. %token [<type>] token [number] [token [number].. . %start symbol Indicates the highest-level production rule to be reduced.. %nonassoc [<type>] token [number] [token [number] .] Defines each symbol as data type type. and a succession of the operators listed in this definition are evaluated left to right. [type member . Indicates that the token cannot be used associatively. %left [<type>] token [number] [token [number]. If this construct is present. the definition of yylval is placed in the <y. to resolve ambiguities. in other words. %right [<type>] token [number] [token [number]. If this definition is not included. Multiple tokens can be separated by white space or commas. where union-def is a standard C definition in the format: { type member ..h> file and can be referred to in a lex input file.. the C type for all tokens on this line is declared to be the type referenced by type.include preprocessor directives and declarations of external variables and functions.

The following sequence indicates that the current sequence of symbols is to be preferred over others... This token causes the parser to invoke the yyerror function. at the level of precedence assigned to token in the definitions section of the input file: %prec token The specially defined token error matches any unrecognized sequence of input. All terminal symbols must be declared in %token definitions.) The following special characters can be used. It consists of a series of production rules that the parser tries to reduce.) If no error token appears in the yacc input file. Always use left-recursion (where the recursive symbol appears before the terminating case in symbolsequence). A symbol-sequence consists of zero or more symbols separated by white space. the parser tries to synchronize with the input and continue processing it by reading and discarding all input up to the symbol following error. A symbol can appear recursively in its own rule. (You can override this behavior through the yyerrok action.] . The first symbol must be the first character of the line. the parser exits with an error message upon encountering unrecognized input. just as in C programs: \a \n \t \v \r \b \f \\ \' \? \n Alert Newline Tab Vertical tab Carriage Return Backspace Form Feed Backslash Single Quote Question mark One or more octal digits specifying the integer value of the character Rules Section of Input File The rules section of a yacc input file defines the rules that parse the input stream. but newlines and other white space can appear anywhere else in the rule. By default. Each symbol-sequence represents an alternative way of reducing the rule. .returned as tokens. The format of each production rule is: symbol : symbol-sequence [action] [| symbol-sequence [action] .

The <type> syntax (non-standard) allows the value to be cast to a specific data type. yylval If the token returned by the yylex function is associated with a significant value. $ [<type>] n Refers to symbol n. yyerrok Causes the parser to start parsing tokens immediately after an erroneous sequence. If you supply a lexical analyzer (yylex) to the parser. Because these functions are included in this file. an action can appear in the middle of a symbol-sequence. In the last case. are invoked within the yyparse function generated by yacc. User Subroutines Section of Input File The user subroutines section of the yacc input file contains user-supplied functions. The following functions. a token index in the production. The type variable is the name of one of the union lines listed in the %union directive in the declaration section. The symbol-sequence generally assigns a value to $$. yylex() The lexical analyzer called by yyparse to recognize each token of input. it must be contained in the user subroutines section.h> file for access by lex. Usually this function is created by lex. the full yylval definition is passed into the <y. Note that you will rarely need to use the type syntax. variables. $ [<type>] $$ Refers to the value returned by the matched symbol-sequence and used for the matched symbol when reducing other rules. or after multiple instances of symbol-sequence. and returns a token number . The definitions section can include a %union definition to associate with other data types. after each symbol-sequence. The yyerrok action should appear immediately after the error token. you do not need to use the yacc library when processing this file. yylval is of type long. Note that you will rarely need to use the type syntax. and keywords. instead of performing the default action of reading and discarding tokens up to a synchronization token. The <type> syntax (non-standard) allows the value to be cast to a specific data type. including structures. By default. yylex reads input. where the first symbol after the colon is $1. Thus. If you run yacc with the -d option. which are contained in the user subroutines section.The parser always executes action after encountering the symbol that precedes it. recognizes expressions within the input. The action consists of standard C code within braces and can also take the following values. counting from the beginning of the production rule.tab. yylex should place the value in this global variable. The type variable is the name of one of the union lines listed in the %union directive in the declaration section. action is executed when the parser matches any of the sequences.

those tokens other than literals are assigned numbers greater than 256. The end of the input is marked by a special token called the endmarker that has a token number that is zero or negative. If the tokens up to. All assigned token numbers are unique and distinct from the token numbers used for literals. All lexical analyzers return zero or negative as a token number upon reaching the end of their input. the #define construct of C is used to allow yylex() to return these numbers symbolically. (main() is the required main program that calls yyparse() to start the program. yyerror(string) The function that the parser calls upon encountering an input error.a library contains default main() and yyerror() functions. reliable communication between them cannot occur. The #define statements are put into the code file. otherwise. char *s. Names and literals not defined in this way retain their default definition.a. Token names found to contain such characters will not be included in the #define declarations. The set of characters permitted by yacc in an identifier is larger than that permitted by C. In either case. respectively: main() { setlocale(LC_ALL. A return value of 0 (zero) means the end of input. yacc reports an error. The numbers for other tokens can be chosen by either yacc or the user. If the token numbers are chosen by yacc. (void) yyparse(). it is unspecified whether the token assignment is accepted or an error is reported. although no order is implied. If the endmarker is seen in any other context. yywrap() The wrap-up routine that returns a value of 1 when the end of input occurs. the parser accepts the input. The default function. The user can redefine the function.) These routines look like the following.representing the kind of token read. it is considered an error. } int yyerror(s). the endmarker form a structure that matches the start symbol. If the parser and yylex do not agree on these token numbers. return(0). but not excluding. A token can be explicitly assigned a number by following its first appearance in the declaration section with a number. defined in liby. { . For one-character literals. the token is simply the numeric value of the character in the current character set. The function returns an int value. simply prints string to the standard error. The function's type is void. ). If duplicate token numbers cause conflicts in parser generation. The liby. and into the header file if that file is requested.

calc. -std. and then use the variables in calculations.s). The files that contain the program are as follows: calc. The calculator program also allows you to assign values to variables (each designated by a single lowercase ASCII letter).l The lex specification file that defines the lexical analysis rules. Compiling the Example Program . } NOTES The LANG and LC_* variables affect the execution of the yacc command as stated. return (0). The main() function defined by yacc issues the following call: setlocale(LC_ALL.fprintf(stderr. The remaining text expects that the current directory is the directory that contains the lex and yacc example program files. It can also be compiled as a C++ program.y The yacc grammar file that defines the parsing rules and calls the yylex() function created by lex to provide input."%s\n". multiplication. and division operations. the program generated by yacc will also be affected by the contents of these variables at run time. function prototypes are not generated. subtraction. The lex program can be compiled as a C program with -std0. EXIT STATUS The following exit values are returned: 0 >0 Successful completion. EXAMPLES This section describes the example programs for the lex and yacc commands. An error occurred. If YY_NOPROTO is defined on the compilation command line. which together create a simple desk calculator program that performs addition. ) As a result. or -std1 mode.

c The C language source file that lex created for the lexical analyzer. yacc -d calc. (The *.l The following file is created: lex.yy.y The following files are created: y.tab. After you press <Return>.c The C language source file that yacc created for the parser. calc The executable program file.o The object file for y. enter numbers and operators in calculator fashion. the cursor moves to the next line: .Perform the following steps to create the example program using lex and yacc: 1.o The object file for lex.yy. lex.yy. Compile and link the two C language source files: cc -o calc y. If you assign a value to a variable as follows.c The following files are created: y. Process the lex specification file: lex calc. You can then run the program directly by entering: calc Then. the program displays the result of the operation.tab.tab. 3. <y.h> A header file containing #define statements for the tokens used by the parser.) 2. Process the yacc grammar file using the -d option.yy.tab.c.c.o files are created temporarily and then removed.c lex. The -d option tells yacc to create a file that defines the tokens it uses in addition to creating the C language source code file.tab.

It contains the following source code: %{ #include <stdio.m=4 <Return> _ You can then use the variable in calculations and it will have the value assigned to it: m+5 <Return> 9 The Parser Source Code The file calc. expr : | | { | '(' expr ')' { $$ = $2.h> int regs[26]. expr { printf("%d\n". } expr '%' expr { $$ = $1 % $3. } expr '*' expr { $$ = $1 * $3. stat : | . } } /*empty */ list stat '\n' list error '\n' { yyerrok.$1). expr '/' expr $$ = $1 / $3. and user subroutines. %} %start list %token DIGIT LETTER %left %left %left %left %left %% list '|' '&' '+' '-' '*' '/' '%' UMINUS /*supplies precedence for unary minus */ /* beginning of rules section */ : | | . LETTER '=' expr { regs[$1] = $3.y has entries in all three of the sections of a yacc grammar file--declarations. } } } . rules. int base.

h> file.h> contains definitions for the tokens that the parser program uses. DIGIT $$ = base * $1 + $2. { fprintf(stderr.| | | | | | | . } } %% /* beginning of user subroutines section */ main() { return(yyparse()). as well as for the <y. base = ($1==0) ? 8:10. %prec UMINUS $$ = -$2.tab. #if !defined (YYSTYPE) . It also contains include statements for standard input and output.h> file from the yacc grammar file information. expr $$ = $1 & $3. number : | .tab. $$ = regs[$1].$3. The yacc program generates the <y.tab. Contents of calc.tab. expr $$ = $1 | $3.h" int c. It contains the rules used to generate the tokens from the input stream. } The Lexical Analyzer Source Code The file calc. expr $$ = $1 . expr '+' { expr '-' { expr '&' { expr '|' { '-' expr { LETTER { number DIGIT { number { expr $$ = $1 + $3. The file <y. if you use the -d option with the yacc command.s). } } } } } } $$ = $1.l contains the lexical analyzer source code.1: %{ #include <stdio."%s\n".h> #include "y. } yyerror(s) char *s. } yywrap() { return(1).

the utility behaves as if none of the variables had been defined. yylval = c . LC_ALL If set to a non-empty string value. single-byte as opposed to multibyte characters in arguments and input files). LC_CTYPE Determines the locale for the interpretation of sequences of bytes of text data as characters (for example.'a'.#define YYSTYPE long #endif extern YYSTYPE yylval. NLSPATH Determines the location of message catalogs for the processing of LC_MESSAGES. return(c).'0'. [a-z] { c = yytext[0]. return(LETTER). %} %% " " . FILES . the corresponding value from the default locale is used. } [0-9] { c = yytext[0]. LC_MESSAGES Determines the locale for the format and contents of diagnostic messages written to standard error. If LANG is unset or null. } ENVIRONMENT VARIABLES The following environment variables affect the execution of yacc: LANG Provides a default value for the internationalization variables that are unset or null. yylval = c . overrides the values of all the other internationalization variables. return(DIGIT). } [^a-z 0-9] { c = yytext[0]. If any of the internationalization variables contain an invalid setting.

c Output file <y.a The yacc library SEE ALSO Commands: Standards: lex(1) standards(5) Programming Support Tools .y.h> Definitions for token names yacc.output A readable description of parsing tables and a report on conflicts generated by grammar ambiguities y.acts Temporary file /usr/ccs/lib/yaccpar Default skeleton parser for C programs /usr/ccs/lib/liby.debug Temporary file yacc.tmp Temporary file yacc.tab.tab.

Sign up to vote on this title
UsefulNot useful