diff options
Diffstat (limited to 'docs/develop/memorymgt.rst')
-rw-r--r-- | docs/develop/memorymgt.rst | 141 |
1 files changed, 141 insertions, 0 deletions
diff --git a/docs/develop/memorymgt.rst b/docs/develop/memorymgt.rst new file mode 100644 index 000000000..5b1690cc8 --- /dev/null +++ b/docs/develop/memorymgt.rst @@ -0,0 +1,141 @@ +.. _memorymanagement: + +Memory Management +================= + +Unlike programming languages such as C/C++, MicroPython hides memory management +details from the developer by supporting automatic memory management. +Automatic memory management is a technique used by operating systems or applications to automatically manage +the allocation and deallocation of memory. This eliminates challenges such as forgetting to +free the memory allocated to an object. Automatic memory management also avoids the critical issue of using memory +that is already released. Automatic memory management takes many forms, one of them being +garbage collection (GC). + +The garbage collector usually has two responsibilities; + +#. Allocate new objects in available memory. +#. Free unused memory. + +There are many GC algorithms but MicroPython uses the +`Mark and Sweep <https://en.wikipedia.org/wiki/Tracing_garbage_collection#Basic_algorithm>`_ +policy for managing memory. This algorithm has a mark phase that traverses the heap marking all +live objects while the sweep phase goes through the heap reclaiming all unmarked objects. + +Garbage collection functionality in MicroPython is available through the ``gc`` built-in +module: + +.. code-block:: bash + + >>> x = 5 + >>> x + 5 + >>> import gc + >>> gc.enable() + >>> gc.mem_alloc() + 1312 + >>> gc.mem_free() + 2071392 + >>> gc.collect() + 19 + >>> gc.disable() + >>> + +Even when ``gc.disable()`` is invoked, collection can be triggered with ``gc.collect()``. + +The object model +---------------- + +All MicroPython objects are referred to by the ``mp_obj_t`` data type. +This is usually word-sized (i.e. the same size as a pointer on the target architecture), +and can be typically 32-bit (STM32, nRF, ESP32, Unix x86) or 64-bit (Unix x64). +It can also be greater than a word-size for certain object representations, for +example ``OBJ_REPR_D`` has a 64-bit sized ``mp_obj_t`` on a 32-bit architecture. + +An ``mp_obj_t`` represents a MicroPython object, for example an integer, float, type, dict or +class instance. Some objects, like booleans and small integers, have their value stored directly +in the ``mp_obj_t`` value and do not require additional memory. Other objects have their value +store elsewhere in memory (for example on the garbage-collected heap) and their ``mp_obj_t`` contains +a pointer to that memory. A portion of ``mp_obj_t`` is the tag which tells what type of object it is. + +See ``py/mpconfig.h`` for the specific details of the available representations. + +**Pointer tagging** + +Because pointers are word-aligned, when they are stored in an ``mp_obj_t`` the +lower bits of this object handle will be zero. For example on a 32-bit architecture +the lower 2 bits will be zero: + +``********|********|********|******00`` + +These bits are reserved for purposes of storing a tag. The tag stores extra information as +opposed to introducing a new field to store that information in the object, which may be +inefficient. In MicroPython the tag tells if we are dealing with a small integer, interned +(small) string or a concrete object, and different semantics apply to each of these. + +For small integers the mapping is this: + +``********|********|********|*******1`` + +Where the asterisks hold the actual integer value. For an interned string or an immediate +object (e.g. ``True``) the layout of the ``mp_obj_t`` value is, respectively: + +``********|********|********|*****010`` + +``********|********|********|*****110`` + +While a concrete object that is none of the above takes the form: + +``********|********|********|******00`` + +The stars here correspond to the address of the concrete object in memory. + +Allocation of objects +---------------------- + +The value of a small integer is stored directly in the ``mp_obj_t`` and will be +allocated in-place, not on the heap or elsewhere. As such, creation of small +integers does not affect the heap. Similarly for interned strings that already have +their textual data stored elsewhere, and immediate values like ``None``, ``False`` +and ``True``. + +Everything else which is a concrete object is allocated on the heap and its object structure is such that +a field is reserved in the object header to store the type of the object. + +.. code-block:: bash + + +++++++++++ + + + + + type + object header + + + + +++++++++++ + + + object items + + + + + + + +++++++++++ + +The heap's smallest unit of allocation is a block, which is four machine words in +size (16 bytes on a 32-bit machine, 32 bytes on a 64-bit machine). +Another structure also allocated on the heap tracks the allocation of +objects in each block. This structure is called a *bitmap*. + +.. image:: img/bitmap.png + +The bitmap tracks whether a block is "free" or "in use" and use two bits to track this state +for each block. + +The mark-sweep garbage collector manages the objects allocated on the heap, and also +utilises the bitmap to mark objects that are still in use. +See `py/gc.c <https://github.com/micropython/micropython/blob/master/py/gc.c>`_ +for the full implementation of these details. + +**Allocation: heap layout** + +The heap is arranged such that it consists of blocks in pools. A block +can have different properties: + +- *ATB(allocation table byte):* If set, then the block is a normal block +- *FREE:* Free block +- *HEAD:* Head of a chain of blocks +- *TAIL:* In the tail of a chain of blocks +- *MARK :* Marked head block +- *FTB(finaliser table byte):* If set, then the block has a finaliser |