summaryrefslogtreecommitdiff
path: root/docs/develop
diff options
context:
space:
mode:
Diffstat (limited to 'docs/develop')
-rw-r--r--docs/develop/index.rst1
-rw-r--r--docs/develop/qstr.rst112
2 files changed, 113 insertions, 0 deletions
diff --git a/docs/develop/index.rst b/docs/develop/index.rst
index 64dbc4661..fff3e43d7 100644
--- a/docs/develop/index.rst
+++ b/docs/develop/index.rst
@@ -10,3 +10,4 @@ See the `getting started guide
:maxdepth: 1
cmodules.rst
+ qstr.rst
diff --git a/docs/develop/qstr.rst b/docs/develop/qstr.rst
new file mode 100644
index 000000000..1b3b9f903
--- /dev/null
+++ b/docs/develop/qstr.rst
@@ -0,0 +1,112 @@
+MicroPython string interning
+============================
+
+MicroPython uses `string interning`_ to save both RAM and ROM. This avoids
+having to store duplicate copies of the same string. Primarily, this applies to
+identifiers in your code, as something like a function or variable name is very
+likely to appear in multiple places in the code. In MicroPython an interned
+string is called a QSTR (uniQue STRing).
+
+A QSTR value (with type ``qstr``) is a index into a linked list of QSTR pools.
+QSTRs store their length and a hash of their contents for fast comparison during
+the de-duplication process. All bytecode operations that work with strings use
+a QSTR argument.
+
+Compile-time QSTR generation
+----------------------------
+
+In the MicroPython C code, any strings that should be interned in the final
+firmware are written as ``MP_QSTR_Foo``. At compile time this will evaluate to
+a ``qstr`` value that points to the index of ``"Foo"`` in the QSTR pool.
+
+A multi-step process in the ``Makefile`` makes this work. In summary this
+process has three parts:
+
+1. Find all ``MP_QSTR_Foo`` tokens in the code.
+
+2. Generate a static QSTR pool containing all the string data (including lengths
+ and hashes).
+
+3. Replace all ``MP_QSTR_Foo`` (via the preprocessor) with their corresponding
+ index.
+
+``MP_QSTR_Foo`` tokens are searched for in two sources:
+
+1. All files referenced in ``$(SRC_QSTR)``. This is all C code (i.e. ``py``,
+ ``extmod``, ``ports/stm32``) but not including third-party code such as
+ ``lib``.
+
+2. Additional ``$(QSTR_GLOBAL_DEPENDENCIES)`` (which includes ``mpconfig*.h``).
+
+*Note:* ``frozen_mpy.c`` (generated by mpy-tool.py) has its own QSTR generation
+and pool.
+
+Some additional strings that can't be expressed using the ``MP_QSTR_Foo`` syntax
+(e.g. they contain non-alphanumeric characters) are explicitly provided in
+``qstrdefs.h`` and ``qstrdefsport.h`` via the ``$(QSTR_DEFS)`` variable.
+
+Processing happens in the following stages:
+
+1. ``qstr.i.last`` is the concatenation of putting every single input file
+ through the C pre-processor. This means that any conditionally disabled code
+ will be removed, and macros expanded. This means we don't add strings to the
+ pool that won't be used in the final firmware. Because at this stage (thanks
+ to the ``NO_QSTR`` macro added by ``QSTR_GEN_EXTRA_CFLAGS``) there is no
+ definition for ``MP_QSTR_Foo`` it passes through this stage unaffected. This
+ file also includes comments from the preprocessor that include line number
+ information. Note that this step only uses files that have changed, which
+ means that ``qstr.i.last`` will only contain data from files that have
+ changed since the last compile.
+2. ``qstr.split`` is an empty file created after running ``makeqstrdefs.py split``
+ on qstr.i.last. It's just used as a dependency to indicate that the step ran.
+ This script outputs one file per input C file, ``genhdr/qstr/...file.c.qstr``,
+ which contains only the matched QSTRs. Each QSTR is printed as ``Q(Foo)``.
+ This step is necessary to combine the existing files with the new data
+ generated from the incremental update in ``qstr.i.last``.
+
+3. ``qstrdefs.collected.h`` is the output of concatenating ``genhdr/qstr/*``
+ using ``makeqstrdefs.py cat``. This is now the full set of ``MP_QSTR_Foo``'s
+ found in the code, now formatted as ``Q(Foo)``, one-per-line, with duplicates.
+ This file is only updated if the set of qstrs has changed. A hash of the QSTR
+ data is written to another file (``qstrdefs.collected.h.hash``) which allows
+ it to track changes across builds.
+
+4. ``qstrdefs.preprocessed.h`` adds in the QSTRs from qstrdefs*. It
+ concatenates ``qstrdefs.collected.h`` with ``qstrdefs*.h``, then it transforms
+ each line from ``Q(Foo)`` to ``"Q(Foo)"`` so they pass through the preprocessor
+ unchanged. Then the preprocessor is used to deal with any conditional
+ compilation in ``qstrdefs*.h``. Then the transformation is undone back to
+ ``Q(Foo)``, and saved as ``qstrdefs.preprocessed.h``.
+
+5. ``qstrdefs.generated.h`` is the output of ``makeqstrdata.py``. For each
+ ``Q(Foo)`` in qstrdefs.preprocessed.h (plus some extra hard-coded ones), it outputs
+ ``QDEF(MP_QSTR_Foo, (const byte*)"hash" "Foo")``.
+
+Then in the main compile, two things happen with ``qstrdefs.generated.h``:
+
+1. In qstr.h, each QDEF becomes an entry in an enum, which makes ``MP_QSTR_Foo``
+ available to code and equal to the index of that string in the QSTR table.
+
+2. In qstr.c, the actual QSTR data table is generated as elements of the
+ ``mp_qstr_const_pool->qstrs``.
+
+.. _`string interning`: https://en.wikipedia.org/wiki/String_interning
+
+Run-time QSTR generation
+------------------------
+
+Additional QSTR pools can be created at runtime so that strings can be added to
+them. For example, the code::
+
+ foo[x] = 3
+
+Will need to create a QSTR for the value of ``x`` so it can be used by the
+"load attr" bytecode.
+
+Also, when compiling Python code, identifiers and literals need to have QSTRs
+created. Note: only literals shorter than 10 characters become QSTRs. This is
+because a regular string on the heap always takes up a minimum of 16 bytes (one
+GC block), whereas QSTRs allow them to be packed more efficiently into the pool.
+
+QSTR pools (and the underlying "chunks" that store the string data) are allocated
+on-demand on the heap with a minimum size.