.. original =================================== Bytecode Interpreter =================================== =================================== バイトコードインタプリタ =================================== .. contents:: .. original Introduction and Overview =============================== 導入と概要 =============================== .. original This document describes the implementation of PyPy's Bytecode Interpreter and related Virtual Machine functionalities. このドキュメントではPyPyのバイトコードインタプリタと仮想マシンの機能関連について説明します。 .. original PyPy's bytecode interpreter has a structure reminiscent of CPython's Virtual Machine: It processes code objects parsed and compiled from Python source code. It is implemented in the `pypy/interpreter/`_ directory. People familiar with the CPython implementation will easily recognize similar concepts there. The major differences are the overall usage of the `object space`_ indirection to perform operations on objects, and the organization of the built-in modules (described `here`_). PyPyのバイトコードインタプリタはCPythonの仮想マシンによく似ている構造をしています(バイトコードインタプリタはPythonソースコードを解析しコンパイルしたコードオブジェクトを処理します)。この実装は `pypy/interpreter/`_ ディレクトリに存在します。 CPythonの実装に精通している人々はそれらが類似している概念だと容易にわかるでしょう。大きな違いといえば、オブジェクト上の命令の間接的な実行をする `object space`_ 全般の使い方や、組み込みモジュールの機構(詳細は `here`_)です。 .. original Code objects are a nicely preprocessed, structured representation of source code, and their main content is *bytecode*. We use the same compact bytecode format as CPython 2.7, with minor differences in the bytecode set. Our bytecode compiler is implemented as a chain of flexible passes (tokenizer, lexer, parser, abstract syntax tree builder, bytecode generator). The latter passes are based on the ``compiler`` package from the standard library of CPython, with various improvements and bug fixes. The bytecode compiler (living under `pypy/interpreter/astcompiler/`_) is now integrated and is translated with the rest of PyPy. コードオブジェクトは入念に前処理され、構造化されたソースコードの表現で、それらの主な内容は*バイトコード*です。私たちはCPython 2.7とほぼ同じ(バイトコード集合に若干の際がある)形式のコンパクトなバイトコードを使用します。このバイトコードコンパイラは柔軟な設計の各工程(字句分割[tokenizer]、字句解析[lexer]、構文解析[parser]、抽象構文木生成[AST]、バイトコード生成)をつないで実装されています。後半の工程は様々な改良とバグ修正を行ったCPythonの標準ライブラリの ``compiler`` パッケージが元になっています。そのバイトコードコンパイラー(`pypy/interpreter/astcompiler/`_ 以下に存在しています)は現在では統合され、PyPyの他のものと一緒に変換されました。 .. original Code objects contain condensed information about their respective functions, class and module body source codes. Interpreting such code objects means instantiating and initializing a `Frame class`_ and then calling its ``frame.eval()`` method. This main entry point initialize appropriate namespaces and then interprets each bytecode instruction. Python's standard library contains the `lib-python/2.7/dis.py`_ module which allows to view the Virtual's machine bytecode instructions:: コードオブジェクトは各関数、クラス、モジュール本体のソースコードについての凝縮された情報を含んでいます。コードオブジェクトの解釈はインスタンス化し、 `Frame class`_ を初期化し、そのインスタンスの ``frame.eval()`` メソッドを呼び出すことを意味します。このメインエントリーポイントは適切な名前空間を初期化し、それぞれのバイトコード命令解釈します。Pythonの標準ライブラリは仮想マシンのバイトコード命令を可視化する `lib-python/2.7/dis.py`_ モジュールを持っています。 >>> import dis >>> def f(x): ... return x + 1 >>> dis.dis(f) 2 0 LOAD_FAST 0 (x) 3 LOAD_CONST 1 (1) 6 BINARY_ADD 7 RETURN_VALUE .. original CPython as well as PyPy are stack-based virtual machines, i.e. they don't have registers but put object to and pull objects from a stack. The bytecode interpreter is only responsible for implementing control flow and putting and pulling black box objects to and from this value stack. The bytecode interpreter does not know how to perform operations on those black box (`wrapped`_) objects for which it delegates to the `object space`_. In order to implement a conditional branch in a program's execution, however, it needs to gain minimal knowledge about a wrapped object. Thus, each object space has to offer a ``is_true(w_obj)`` operation which returns an interpreter-level boolean value. CPythonもPyPyもスタックベース仮想マシンです。すなわちレジスタを持たず、スタックにオブジェクトを置いたり取ったりします。バイトコードインタプリタはブラックボックスオブジェクトのバリュースタックからの取り出しまたは挿入の制御手順の実行にのみ関与します。バイトコードインタプリタはこれらブラックボックスオブジェクト(`wrapped`_)上の命令がどのように `object space`_ に移譲され、実行されるかを知りません。プログラム実行中の条件分岐を実装するためにはどのような形であれ、ラップされたオブジェクトについての最低限の情報を得る必要があります。それ故に、それぞれのオブジェクトスペースはインタプリタレベルでの真偽値を返す ``is_true(w_obj)`` のような操作を提供するものを持っています。 .. original For the understanding of the interpreter's inner workings it is crucial to recognize the concepts of `interpreter-level and application-level`_ code. In short, interpreter-level is executed directly on the machine and invoking application-level functions leads to an bytecode interpretation indirection. However, special care must be taken regarding exceptions because application level exceptions are wrapped into ``OperationErrors`` which are thus distinguished from plain interpreter-level exceptions. See `application level exceptions`_ for some more information on ``OperationErrors``. インタプリタの内部動作についての理解するためには `inerpreter-level and application-level`_ のコードの概念について納得することがとても重要です。端的に言えば、インタプリタレベルはバイトコードの関節的な解釈をもたらアプリケーションレベルの関数をマシン上で呼び出して直接実行します。また一方で、例外に関しては得に慎重に扱わなければならりません。なぜならアプリケーションレベルの例外は ``OperationErrors`` にラップされており、それらは素のインタプリタレベルの例外と区別されます。 ``OperationErrors`` についてのさらに詳しい情報については `application level exceptions`_ を参照してください。 .. original The interpreter implementation offers mechanisms to allow a caller to be unaware if a particular function invocation leads to bytecode interpretation or is executed directly at interpreter-level. The two basic kinds of `Gateway classes`_ expose either an interpreter-level function to application-level execution (``interp2app``) or allow transparent invocation of application-level helpers (``app2interp``) at interpreter-level. 個々の関数の呼び出しがバイトコードに解釈されるか、インタープリタレベルで直接実行されるならば、インタープリタの実装は呼び出し側の不注意を許可するメカニズムを提供します。 2の基本的な `Gateway classes`_ はインタープリタレベルの関数からアプリケーションレベルの実行(``interp2app``)かインタプリタレベルでのアプリケーションレベルヘルパ(``app2interp``)の透過的呼び出し許可をさらします。 .. original Another task of the bytecode interpreter is to care for exposing its basic code, frame, module and function objects to application-level code. Such runtime introspection and modification abilities are implemented via `interpreter descriptors`_ (also see Raymond Hettingers `how-to guide for descriptors`_ in Python, PyPy uses this model extensively). バイトコードインタプリタの他のタスクは基底コード、フレーム、モジュール、そして関数オブジェクトをアプリケーションレベルのコードへ昇華させることです。このようなランタイムイントロスペクションや修正能力は `interpreter descriptors`_ を使用して実装されます。(PyPyはこのモデルを広く使用しています。また、Raymond HettingersによるPythonの `how-to guide for descriptors`_ でも見れます。) .. original A significant complexity lies in `function argument parsing`_. Python as a language offers flexible ways of providing and receiving arguments for a particular function invocation. Not only does it take special care to get this right, it also presents difficulties for the `annotation pass`_ which performs a whole-program analysis on the bytecode interpreter, argument parsing and gatewaying code in order to infer the types of all values flowing across function calls. とても重要で複雑な問題は `function argument parsing`_ にあります。Pythonは個々の関数の実行時に柔軟な方法で、引数を渡したり受け取ったりする言語です。それはこの権限を得るために最新の注意を払うだけでなく、バイトコードインタプリタ上のプログラム全体の解析を実行する `annotation pass`_ の難しさも示し、関数呼び出しは交差し、流れ、すべての値の型を推論をするための引数解析とゲートウェイ接続をプログラムします。 .. original It is for this reason that PyPy resorts to generate specialized frame classes and functions at `initialization time`_ in order to let the annotator only see rather static program flows with homogeneous name-value assignments on function invocations. それはこの理由のためにあり、PyPyはアノテータに関数呼び出し上の一様な名前と値の参照をもった静的プログラムの動きのみ参照させるために `initialization time`_ 時の特殊なフレームクラス群と関数群の生成に頼っています。 .. _`how-to guide for descriptors`: http://users.rcn.com/python/download/Descriptor.htm .. _`annotation pass`: translation.html#the-annotation-pass .. _`initialization time`: translation.html#initialization-time .. _`interpreter-level and application-level`: coding-guide.html#interpreter-level .. _`wrapped`: coding-guide.html#wrapping-rules .. _`object space`: objspace.html .. _`application level exceptions`: coding-guide.html#applevel-exceptions .. _`here`: coding-guide.html#modules .. original Bytecode Interpreter Implementation Classes ================================================ バイトコードインタプリタの実装クラス ==================================== .. _`Frame class`: .. _`Frame`: .. original Frame classes ----------------- フレームクラス -------------- .. original The concept of Frames is pervasive in executing programs and on virtual machines in particular. They are sometimes called *execution frame* because they hold crucial information regarding the execution of a Code_ object, which in turn is often directly related to a Python `Function`_. Frame instances hold the following state: フレームのコンセプトはプログラムの実行時や仮想マシンの詳細まで広きに渡ります。それらは時々 *execution frame* と呼ばれます。なぜならば、 Code_ オブジェクトの実行に関してのクリティカルな情報を保持しているからで、その次は、しばしば直接実行されるPythonの `Function`_です。フレームインスタンスは状態の流れを保持しています。 # TODO:next - the local scope holding name-value bindings, usually implemented via a "fast scope" which is an array of wrapped objects - a blockstack containing (nested) information regarding the control flow of a function (such as ``while`` and ``try`` constructs) - a value stack where bytecode interpretation pulls object from and puts results on. - a reference to the *globals* dictionary, containing module-level name-value bindings - debugging information from which a current line-number and file location can be constructed for tracebacks Moreover the Frame class itself has a number of methods which implement the actual bytecodes found in a code object. The methods of the ``PyFrame`` class are added in various files: - the class ``PyFrame`` is defined in `pypy/interpreter/pyframe.py`_. - the file `pypy/interpreter/pyopcode.py`_ add support for all Python opcode. - nested scope support is added to the ``PyFrame`` class in `pypy/interpreter/nestedscope.py`_. .. _Code: Code Class ------------ PyPy's code objects contain the same information found in CPython's code objects. They differ from Function_ objects in that they are only immutable representations of source code and don't contain execution state or references to the execution environment found in `Frames`. Frames and Functions have references to a code object. Here is a list of Code attributes: * ``co_flags`` flags if this code object has nested scopes/generators * ``co_stacksize`` the maximum depth the stack can reach while executing the code * ``co_code`` the actual bytecode string * ``co_argcount`` number of arguments this code object expects * ``co_varnames`` a tuple of all argument names pass to this code object * ``co_nlocals`` number of local variables * ``co_names`` a tuple of all names used in the code object * ``co_consts`` a tuple of prebuilt constant objects ("literals") used in the code object * ``co_cellvars`` a tuple of Cells containing values for access from nested scopes * ``co_freevars`` a tuple of Cell names from "above" scopes * ``co_filename`` source file this code object was compiled from * ``co_firstlineno`` the first linenumber of the code object in its source file * ``co_name`` name of the code object (often the function name) * ``co_lnotab`` a helper table to compute the line-numbers corresponding to bytecodes In PyPy, code objects also have the responsibility of creating their Frame_ objects via the `'create_frame()`` method. With proper parser and compiler support this would allow to create custom Frame objects extending the execution of functions in various ways. The several Frame_ classes already utilize this flexibility in order to implement Generators and Nested Scopes. .. _Function: Function and Method classes ---------------------------- The PyPy ``Function`` class (in `pypy/interpreter/function.py`_) represents a Python function. A ``Function`` carries the following main attributes: * ``func_doc`` the docstring (or None) * ``func_name`` the name of the function * ``func_code`` the Code_ object representing the function source code * ``func_defaults`` default values for the function (built at function definition time) * ``func_dict`` dictionary for additional (user-defined) function attributes * ``func_globals`` reference to the globals dictionary * ``func_closure`` a tuple of Cell references ``Functions`` classes also provide a ``__get__`` descriptor which creates a Method object holding a binding to an instance or a class. Finally, ``Functions`` and ``Methods`` both offer a ``call_args()`` method which executes the function given an `Arguments`_ class instance. .. _Arguments: .. _`function argument parsing`: Arguments Class -------------------- The Argument class (in `pypy/interpreter/argument.py`_) is responsible for parsing arguments passed to functions. Python has rather complex argument-passing concepts: - positional arguments - keyword arguments specified by name - default values for positional arguments, defined at function definition time - "star args" allowing a function to accept remaining positional arguments - "star keyword args" allow a function to accept additional arbitrary name-value bindings Moreover, a Function_ object can get bound to a class or instance in which case the first argument to the underlying function becomes the bound object. The ``Arguments`` provides means to allow all this argument parsing and also cares for error reporting. .. _`Module`: Module Class ------------------- A ``Module`` instance represents execution state usually constructed from executing the module's source file. In addition to such a module's global ``__dict__`` dictionary it has the following application level attributes: * ``__doc__`` the docstring of the module * ``__file__`` the source filename from which this module was instantiated * ``__path__`` state used for relative imports Apart from the basic Module used for importing application-level files there is a more refined ``MixedModule`` class (see `pypy/interpreter/mixedmodule.py`_) which allows to define name-value bindings both at application level and at interpreter level. See the ``__builtin__`` module's `pypy/module/__builtin__/__init__.py`_ file for an example and the higher level `chapter on Modules in the coding guide`_. .. _`__builtin__ module`: https://bitbucket.org/pypy/pypy/src/tip/pypy/module/__builtin__/ .. _`chapter on Modules in the coding guide`: coding-guide.html#modules .. _`Gateway classes`: Gateway classes ---------------------- A unique PyPy property is the ability to easily cross the barrier between interpreted and machine-level code (often referred to as the difference between `interpreter-level and application-level`_). Be aware that the according code (in `pypy/interpreter/gateway.py`_) for crossing the barrier in both directions is somewhat involved, mostly due to the fact that the type-inferring annotator needs to keep track of the types of objects flowing across those barriers. .. _typedefs: Making interpreter-level functions available at application-level +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ In order to make an interpreter-level function available at application level, one invokes ``pypy.interpreter.gateway.interp2app(func)``. Such a function usually takes a ``space`` argument and any number of positional arguments. Additionally, such functions can define an ``unwrap_spec`` telling the ``interp2app`` logic how application-level provided arguments should be unwrapped before the actual interpreter-level function is invoked. For example, `interpreter descriptors`_ such as the ``Module.__new__`` method for allocating and constructing a Module instance are defined with such code:: Module.typedef = TypeDef("module", __new__ = interp2app(Module.descr_module__new__.im_func, unwrap_spec=[ObjSpace, W_Root, Arguments]), __init__ = interp2app(Module.descr_module__init__), # module dictionaries are readonly attributes __dict__ = GetSetProperty(descr_get_dict, cls=Module), __doc__ = 'module(name[, doc])\n\nCreate a module object...' ) The actual ``Module.descr_module__new__`` interpreter-level method referenced from the ``__new__`` keyword argument above is defined like this:: def descr_module__new__(space, w_subtype, __args__): module = space.allocate_instance(Module, w_subtype) Module.__init__(module, space, None) return space.wrap(module) Summarizing, the ``interp2app`` mechanism takes care to route an application level access or call to an internal interpreter-level object appropriately to the descriptor, providing enough precision and hints to keep the type-inferring annotator happy. Calling into application level code from interpreter-level +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Application level code is `often preferable`_. Therefore, we often like to invoke application level code from interpreter-level. This is done via the Gateway's ``app2interp`` mechanism which we usually invoke at definition time in a module. It generates a hook which looks like an interpreter-level function accepting a space and an arbitrary number of arguments. When calling a function at interpreter-level the caller side does usually not need to be aware if its invoked function is run through the PyPy interpreter or if it will directly execute on the machine (after translation). Here is an example showing how we implement the Metaclass finding algorithm of the Python language in PyPy:: app = gateway.applevel(r''' def find_metaclass(bases, namespace, globals, builtin): if '__metaclass__' in namespace: return namespace['__metaclass__'] elif len(bases) > 0: base = bases[0] if hasattr(base, '__class__'): return base.__class__ else: return type(base) elif '__metaclass__' in globals: return globals['__metaclass__'] else: try: return builtin.__metaclass__ except AttributeError: return type ''', filename=__file__) find_metaclass = app.interphook('find_metaclass') The ``find_metaclass`` interpreter-level hook is invoked with five arguments from the ``BUILD_CLASS`` opcode implementation in `pypy/interpreter/pyopcode.py`_:: def BUILD_CLASS(f): w_methodsdict = f.valuestack.pop() w_bases = f.valuestack.pop() w_name = f.valuestack.pop() w_metaclass = find_metaclass(f.space, w_bases, w_methodsdict, f.w_globals, f.space.wrap(f.builtin)) w_newclass = f.space.call_function(w_metaclass, w_name, w_bases, w_methodsdict) f.valuestack.push(w_newclass) Note that at a later point we can rewrite the ``find_metaclass`` implementation at interpreter-level and we would not have to modify the calling side at all. .. _`often preferable`: coding-guide.html#app-preferable .. _`interpreter descriptors`: Introspection and Descriptors ------------------------------ Python traditionally has a very far-reaching introspection model for bytecode interpreter related objects. In PyPy and in CPython read and write accesses to such objects are routed to descriptors. Of course, in CPython those are implemented in ``C`` while in PyPy they are implemented in interpreter-level Python code. All instances of a Function_, Code_, Frame_ or Module_ classes are also ``Wrappable`` instances which means they can be represented at application level. These days, a PyPy object space needs to work with a basic descriptor lookup when it encounters accesses to an interpreter-level object: an object space asks a wrapped object for its type via a ``getclass`` method and then calls the type's ``lookup(name)`` function in order to receive a descriptor function. Most of PyPy's internal object descriptors are defined at the end of `pypy/interpreter/typedef.py`_. You can use these definitions as a reference for the exact attributes of interpreter classes visible at application level. .. include:: _ref.txt