(Do not be afraid of

)

PHP Compiler Internals
Sebastian Bergmann
June 13th 2009

Who I Am

Sebastian Bergmann Involved in the PHP project since 2000 Creator of PHPUnit Co-Founder and Principal Consultant with thePHP.cc

Under PHP's Hood
Extensions (date, dom, gd, json, mysql, pcre, pdo, reflection, session, standard, …)

PHP Core Request Management File and Network Operations

Zend Engine Compilation and Execution Memory and Resource Allocation

Server API (SAPI) (mod_php, FastCGI, CLI, ...)

This slide contains material by Sara Golemon

How PHP executes code

Lexical Analysis
Converts the source from a sequence of characters into a sequence of tokens

How PHP executes code

Lexical Analysis Syntax Analysis
Analyzes a sequence of tokens to determine their grammatical structure

How PHP executes code

Lexical Analysis Syntax Analysis Bytecode Generation
Generate bytecode based on the information gathered by analyzing the sourcecode

How PHP executes code

Lexical Analysis Syntax Analysis Bytecode Generation Bytecode Execution

Lexical Analysis
Scan a sequence of characters
1 2 3 4 5 <?php if (TRUE) { print '*'; } ?>

Lexical Analysis
Scan a sequence of characters
1 2 3 4 5 <?php if (TRUE) { print '*'; } ?> T_OPEN_TAG

Lexical Analysis
Scan a sequence of characters
1 <?php 2 if (TRUE) { T_OPEN_TAG T_IF T_WHITESPACE ( T_STRING ) T_WHITESPACE { T_WHITESPACE

3 4 } 5 ?>

print '*';

Lexical Analysis
Scan a sequence of characters
1 <?php 2 if (TRUE) { T_OPEN_TAG T_IF T_WHITESPACE ( T_STRING ) T_WHITESPACE { T_WHITESPACE T_PRINT T_WHITESPACE T_CONSTANT_ENCAPSED_STRING ;

3

print '*';

4 } 5 ?>

Lexical Analysis
Scan a sequence of characters
1 <?php 2 if (TRUE) { T_OPEN_TAG T_IF T_WHITESPACE ( T_STRING ) T_WHITESPACE { T_WHITESPACE T_PRINT T_WHITESPACE T_CONSTANT_ENCAPSED_STRING ; T_WHITESPACE }

3

print '*';

4 } 5 ?>

Lexical Analysis
Scan a sequence of characters
1 <?php 2 if (TRUE) { T_OPEN_TAG T_IF T_WHITESPACE ( T_STRING ) T_WHITESPACE { T_WHITESPACE T_PRINT T_WHITESPACE T_CONSTANT_ENCAPSED_STRING ; T_WHITESPACE } T_WHITESPACE T_CLOSE_TAG

3

print '*';

4 } 5 ?>

Lexical Analysis
Scan a sequence of characters
T_OPEN_TAG T_IF T_WHITESPACE ( T_STRING ) T_WHITESPACE { T_WHITESPACE T_PRINT T_WHITESPACE T_CONSTANT_ENCAPSED_STRING ; T_WHITESPACE } T_WHITESPACE T_CLOSE_TAG <?php if TRUE

print '*'

?>

Lexical Analysis
Scan a sequence of characters

Lexical Analysis
Scanner Generators

You do not want to write a scanner by hand

At least when the code for the scanner should be efficient and maintainable

Tools such as flex or re2c generate the code for a scanner from a set of rules <ST_IN_SCRIPTING>"if" { "if" { return T_IF; }

Lexical Analysis
PHP Tokens

T_ABSTRACT T_AND_EQUAL T_ARRAY T_ARRAY_CAST T_AS T_BAD_CHARACTER T_BOOLEAN_AND T_BOOLEAN_OR T_BOOL_CAST T_BREAK T_CASE T_CATCH T_CHARACTER T_CLASS T_CLASS_C T_CLONE T_CLOSE_TAG T_COMMENT

T_CONCAT_EQUAL T_CONST T_CONSTANT_ENCAPSED_STRING T_CONTINUE T_CURLY_OPEN T_DEC T_DECLARE T_DEFAULT T_DIR T_DIV_EQUAL T_DNUMBER T_DOC_COMMENT T_DO T_DOLLAR_OPEN_CURLY_BRACES T_DOUBLE_ARROW T_DOUBLE_CAST T_DOUBLE_COLON T_ECHO

T_ELSE T_ELSEIF T_EMPTY T_ENCAPSED_AND_WHITESPACE T_ENDDECLARE T_ENDFOR T_ENDFOREACH T_ENDIF T_ENDSWITCH T_ENDWHILE T_END_HEREDOC T_EVAL T_EXIT T_EXTENDS T_FILE T_FINAL T_FOR T_FOREACH

T_FUNCTION T_FUNC_C T_GLOBAL T_GOTO T_HALT_COMPILER T_IF T_IMPLEMENTS T_INC T_INCLUDE T_INCLUDE_ONCE T_INLINE_HTML T_INSTANCEOF T_INT_CAST T_INTERFACE T_ISSET T_IS_EQUAL T_IS_GREATER_OR_EQUAL T_IS_IDENTICAL

Lexical Analysis
PHP Tokens

T_IS_NOT_EQUAL T_IS_NOT_IDENTICAL T_IS_SMALLER_OR_EQUAL T_LINE T_LIST T_LNUMBER T_LOGICAL_AND T_LOGICAL_OR T_LOGICAL_XOR T_METHOD_C T_MINUS_EQUAL T_ML_COMMENT T_MOD_EQUAL T_MUL_EQUAL T_NAMESPACE T_NS_C T_NEW T_NUM_STRING

T_OBJECT_CAST T_OBJECT_OPERATOR T_OLD_FUNCTION T_OPEN_TAG T_OPEN_TAG_WITH_ECHO T_OR_EQUAL T_PAAMAYIM_NEKUDOTAYIM T_PLUS_EQUAL T_PRINT T_PRIVATE T_PUBLIC T_PROTECTED T_REQUIRE T_REQUIRE_ONCE T_RETURN T_SL T_SL_EQUAL T_SR

T_SR_EQUAL T_START_HEREDOC T_STATIC T_STRING T_STRING_CAST T_STRING_VARNAME T_SWITCH T_THROW T_TRY T_UNSET T_UNSET_CAST T_USE T_VAR T_VARIABLE T_WHILE T_WHITESPACE T_XOR_EQUAL

Syntax Analysis
Analyze a sequence of tokens

Syntax Analysis
Parser Generators

You do not want to write a parser by hand

At least when the code for the scanner should be efficient and maintainable

Tools such as bison or lemon generate the code for a parser from a set of rules T_IF '(' expr ')' { ... } statement { ... } elseif_list else_single { ... }

PHP Bytecode
Disassembling with vld
1 2 3 4 5 <?php if (TRUE) { print '*'; } ?>

sb@thinkpad ~ % php -dextension=vld.so -dvld.active=1 -dvld.execute=0 if.php filename: /home/sb/if.php function name: (null) number of ops: 8 compiled vars: none line # op fetch ext return operands ------------------------------------------------------------------------------2 0 EXT_STMT 1 JMPZ true, ->6 3 2 EXT_STMT 3 PRINT ~0 '%2A' 4 FREE ~0 4 5 JMP ->6 6 6 EXT_STMT 7 RETURN 1

PHP Bytecode
Disassembling with bytekit-cli
1 2 3 4 5 <?php if (TRUE) { print '*'; } ?>

sb@thinkpad ~ % bytekit if.php bytekit-cli 1.0.0 by Sebastian Bergmann. Filename: Function: Number of oplines: /home/sb/if.php main 8

line # opcode result operands ----------------------------------------------------------------------------2 0 EXT_STMT 1 JMPZ true, ->6 3 4 6 2 3 4 5 6 7 EXT_STMT PRINT FREE JMP EXT_STMT RETURN ~0 '*' ~0 ->6 1

PHP Bytecode
Bytecode visualization with bytekit-cli
1 2 3 4 5 <?php if (TRUE) { print '*'; } ?>

sb@thinkpad ~ % bytekit --graph /tmp --format svg if.php

PHP Bytecode
Disassembling with bytekit-cli
1 2 3 4 5 <?php $a = 1; $b = 2; print $a + $b; ?>

sb@thinkpad ~ % bytekit add.php bytekit-cli 1.0.0 by Sebastian Bergmann. Filename: Function: Number of oplines: Compiled variables: /home/sb/add.php main 10 !0 = $a, !1 = $b

line # opcode result operands ----------------------------------------------------------------------------2 0 EXT_STMT 1 ASSIGN !0, 1 3 2 EXT_STMT 3 ASSIGN !1, 2 4 4 EXT_STMT 5 ADD ~2 !0, !1 6 PRINT ~3 ~2 7 FREE ~3 6 8 EXT_STMT 9 RETURN 1

PHP Bytecode
List of Opcodes

NOP ADD SUB MUL DIV MOD SL SR CONCAT BW_OR BW_AND BW_XOR BW_NOT BOOL_NOT BOOL_XOR IS_IDENTICAL IS_NOT_IDENTICAL IS_EQUAL

IS_NOT_EQUAL IS_SMALLER IS_SMALLER_OR_EQUAL CAST QM_ASSIGN ASSIGN_ADD ASSIGN_SUB ASSIGN_MUL ASSIGN_DIV ASSIGN_MOD ASSIGN_SL ASSIGN_SR ASSIGN_CONCAT ASSIGN_BW_OR ASSIGN_BW_AND ASSIGN_BW_XOR PRE_INC PRE_DEC

POST_INC POST_DEC ASSIGN ASSIGN_REF ECHO PRINT JMPZ JMPNZ JMPZNZ JMPZ_EX JMPNZ_EX CASE SWITCH_FREE BRK BOOL INIT_STRING ADD_CHAR ADD_STRING

ADD_VAR BEGIN_SILENCE END_SILENCE INIT_FCALL_BY_NAME DO_FCALL DO_FCALL_BY_NAME RETURN RECV RECV_INIT SEND_VAL SEND_VAR SEND_REF NEW FREE INIT_ARRAY ADD_ARRAY_ELEMENT INCLUDE_OR_EVAL UNSET_VAR

UNSET_DIM UNSET_OBJ FE_RESET FE_FETCH EXIT FETCH_R FETCH_DIM_R FETCH_OBJ_R FETCH_W FETCH_DIM_W FETCH_OBJ_W FETCH_RW FETCH_DIM_RW FETCH_OBJ_RW FETCH_IS FETCH_DIM_IS FETCH_OBJ_IS FETCH_FUNC_ARG

PHP Bytecode
List of Opcodes

FETCH_DIM_FUNC_ARG FETCH_OBJ_FUNC_ARG FETCH_UNSET FETCH_DIM_UNSET FETCH_OBJ_UNSET FETCH_DIM_TMP_VAR FETCH_CONSTANT EXT_STMT EXT_FCALL_BEGIN EXT_FCALL_END EXT_NOP TICKS SEND_VAR_NO_REF CATCH THROW FETCH_CLASS CLONE INIT_METHOD_CALL

INIT_STATIC_METHOD_CALL ISSET_ISEMPTY_VAR ISSET_ISEMPTY_DIM_OBJ PRE_INC_OBJ PRE_DEC_OBJ POST_INC_OBJ POST_DEC_OBJ ASSIGN_OBJ INSTANCEOF DECLARE_CLASS DECLARE_INHERITED_CLASS DECLARE_FUNCTION RAISE_ABSTRACT_ERROR ADD_INTERFACE VERIFY_ABSTRACT_CLASS ASSIGN_DIM ISSET_ISEMPTY_PROP_OBJ HANDLE_EXCEPTION

Extending the Compiler

Test First!
Zend/tests/unless.phpt
--TEST-unless statement --FILE-<?php unless (FALSE) { print 'unless FALSE is TRUE, this is printed'; } unless (TRUE) { print 'unless TRUE is TRUE, this is printed'; } ?> --EXPECT-unless FALSE is TRUE, this is printed

Extending the Compiler

Add token for unless to the scanner Add rule for unless to the parser Generate bytecode for unless in the compiler Add token for unless to ext/tokenizer

Add unless scanner token
Zend/zend_language_scanner.l
<ST_IN_SCRIPTING>"if" { return T_IF; } <ST_IN_SCRIPTING>"unless" { return T_UNLESS; } <ST_IN_SCRIPTING>"elseif" { return T_ELSEIF; } <ST_IN_SCRIPTING>"endif" { return T_ENDIF; } <ST_IN_SCRIPTING>"else" { return T_ELSE; }

Add unless parser rule
Zend/zend_language_parser.y
%token T_NAMESPACE %token T_NS_C %token T_DIR %token T_NS_SEPARATOR %token T_UNLESS . . unticked_statement: '{' inner_statement_list '}' | T_IF '(' expr ')' { . . | T_UNLESS '(' expr ')' { zend_do_unless_cond(&$3, &$4 TSRMLS_CC); } statement { zend_do_if_after_statement(&$4, 1 TSRMLS_CC); } { zend_do_if_end(TSRMLS_C); } . .

How if is compiled
Zend/zend_compile.c
void zend_do_if_cond (const znode *cond, znode *closing_bracket_token TSRMLS_DC) {
typedef struct _znode { int op_type; union { zval constant; zend_uint var; zend_uint opline_num; zend_op_array *op_array; zend_op *jmp_addr; struct { zend_uint var; zend_uint type; } EA; } u; } znode;

}

zend_do_if_cond() is called when an if statement is compiled

How if is compiled
Zend/zend_compile.c
void zend_do_if_cond (const znode *cond, znode *closing_bracket_token TSRMLS_DC) { int if_cond_op_number = get_next_op_number(CG(active_op_array)); zend_op *opline = get_next_op(CG(active_op_array) TSRMLS_CC);
struct _zend_op { opcode_handler_t handler; znode result; znode op1; znode op2; ulong extended_value; uint lineno; zend_uchar opcode; };

}

Allocate a new opline in the current oparray

How if is compiled
Zend/zend_compile.c
void zend_do_if_cond (const znode *cond, znode *closing_bracket_token TSRMLS_DC) { int if_cond_op_number = get_next_op_number(CG(active_op_array)); zend_op *opline = get_next_op(CG(active_op_array) TSRMLS_CC); opline->opcode = ZEND_JMPZ;

} Set the opcode of the new opline to JMPZ (jump if zero)

How if is compiled
Zend/zend_compile.c
void zend_do_if_cond (const znode *cond, znode *closing_bracket_token TSRMLS_DC) { int if_cond_op_number = get_next_op_number(CG(active_op_array)); zend_op *opline = get_next_op(CG(active_op_array) TSRMLS_CC); opline->opcode = ZEND_JMPZ; opline->op1 = *cond;

} Set the first operand of the new opline to the if condition

How if is compiled
Zend/zend_compile.c
void zend_do_if_cond (const znode *cond, znode *closing_bracket_token TSRMLS_DC) { int if_cond_op_number = get_next_op_number(CG(active_op_array)); zend_op *opline = get_next_op(CG(active_op_array) TSRMLS_CC); opline->opcode = ZEND_JMPZ; opline->op1 = *cond; closing_bracket_token->u.opline_num = if_cond_op_number; SET_UNUSED(opline->op2); INC_BPC(CG(active_op_array));

}

Perform book keeping tasks such as marking the second operand of the new opline as unused or incrementing the backpatching counter for the current oparray

Add unless to compiler
Zend/zend_compile.c
void zend_do_unless_cond (const znode *cond, znode *closing_bracket_token TSRMLS_DC) { int unless_cond_op_number = get_next_op_number(CG(active_op_array)); zend_op *opline = get_next_op(CG(active_op_array) TSRMLS_CC); opline->opcode = ZEND_JMPNZ; opline->op1 = *cond; closing_bracket_token->u.opline_num = unless_cond_op_number; SET_UNUSED(opline->op2); INC_BPC(CG(active_op_array));

}

All we have to do to generate code for the unless statement, as compared to generate code for the if statement, is to use the JMPNZ (jump if not zero) opcode instead of the JMPZ (jump if zero) opcode

Add unless to compiler
The generated bytecode
1 2 3 4 5 <?php unless (FALSE) { print '*'; } ?>

sb@thinkpad ~ % bytekit unless.php bytekit-cli 1.0.0 by Sebastian Bergmann. Filename: Function: Number of oplines: /home/sb/unless.php main 8

line # opcode result operands ----------------------------------------------------------------------------2 0 EXT_STMT 1 JMPNZ true, ->6 3 4 6 2 3 4 5 6 7 EXT_STMT PRINT FREE JMP EXT_STMT RETURN ~0 '*' ~0 ->6 1

Run the test
sb@thinkpad php-5.3-unless % make test TESTS=Zend/tests/unless.phpt Build complete. Don't forget to run 'make test'. ===================================================================== PHP : /usr/local/src/php/php-5.3-unless/sapi/cli/php PHP_SAPI : cli PHP_VERSION : 5.3.0RC3-dev ZEND_VERSION: 2.3.0 PHP_OS : Linux 2.6.28-11-generic #42-Ubuntu SMP Fri Apr 17 01:57:59 UTC 2009 i686 GNU/Linux INI actual : /usr/local/src/php/php-5.3-unless/tmp-php.ini More .INIs : CWD : /usr/local/src/php/php-5.3-unless Extra dirs : VALGRIND : Not used ===================================================================== Running selected tests. PASS unless statement [Zend/tests/unless.phpt] ===================================================================== Number of tests : 1 1 Tests skipped : 0 ( 0.0%) -------Tests warned : 0 ( 0.0%) ( 0.0%) Tests failed : 0 ( 0.0%) ( 0.0%) Expected fail : 0 ( 0.0%) ( 0.0%) Tests passed : 1 (100.0%) (100.0%) --------------------------------------------------------------------Time taken : 0 seconds =====================================================================

Add unless to ext/tokenizer
ext/tokenizer/tokenizer_data.c
sb@thinkpad tokenizer % ./tokenizer_data_gen.sh Wrote tokenizer_data.c

The End
Thank you for your interest!
These slides will be linked soon from http://sebastian-bergmann.de/ You can vote for this talk on http://joind.in/582

Acknowledgements

Thomas Lee, whose Python Language Internals presentation at OSDC 2008 inspired this presentation Stefan Esser for creating the Bytekit extension that provides PHP bytecode access and analysis features Derick Rethans, David Soria Parra, and Scott MacVicar for reviewing these slides

References

http://www.php.net/manual/en/tokens.php http://www.zapt.info/opcodes.html Sara Golemon: ”Extending and Embedding PHP” http://derickrethans.nl/vld.php http://bytekit.org/ http://github.com/sebastianbergmann/bytekit-cli/

License

This presentation material is published under the Attribution-Share Alike 3.0 Unported license. You are free:

to Share – to copy, distribute and transmit the work. to Remix – to adapt the work.

Under the following conditions:

Attribution. You must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work). Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under the same, similar or a compatible license.

For any reuse or distribution, you must make clear to others the license terms of this work. Any of the above conditions can be waived if you get permission from the copyright holder. Nothing in this license impairs or restricts the author's moral rights.

Sign up to vote on this title
UsefulNot useful