summaryrefslogtreecommitdiff
path: root/tools/mpy-tool.py
AgeCommit message (Collapse)Author
2025-03-05tools/mpy-tool.py: Support calling main() from an external script.Volodymyr Shymanskyy
Signed-off-by: Volodymyr Shymanskyy <vshymanskyi@gmail.com> Signed-off-by: Angus Gratton <angus@redyak.com.au>
2025-03-05tools/mpy-tool.py: Add support for self-hosting of mpy-tool.Volodymyr Shymanskyy
This allows running mpy-tool using MicroPython itself. An appropriate test is added to CI to make sure it continues to work. Signed-off-by: Volodymyr Shymanskyy <vshymanskyi@gmail.com> Signed-off-by: Angus Gratton <angus@redyak.com.au>
2024-09-04tools/mpy-tool.py: Support freezing rv32imc native code.Damien George
Signed-off-by: Damien George <damien@micropython.org>
2024-06-27tools/mpy-tool.py: Implement freezing of long-long ints.Yoctopuce
Allow inclusion of large integer constants in frozen files using long-long representation (mpy-cross option -mlongint-impl=longlong). Signed-off-by: Yoctopuce <dev@yoctopuce.com>
2024-06-21mpy-cross: Add RISC-V RV32IMC support in MPY files.Alessandro Gatti
MPY files can now hold generated RV32IMC native code. This can be accomplished by passing the `-march=rv32imc` flag to mpy-cross. Signed-off-by: Alessandro Gatti <a.gatti@frob.it>
2024-03-28py/persistentcode: Bump .mpy sub-version to 6.3.Damien George
This is required because the .mpy native ABI was changed by the introduction of `mp_proto_fun_t`, see commits: - 416465d81e911b088836f4e7c37fac2bc0f67917 - 5e3006f1172d0eabbbefeb3268dfb942ec7cf9cd - e2ff00e81113d7a3f32f860652017644b5d68bf1 And three `mp_binary` functions were added to `mp_fun_table` in commit d2276f0d41c2fa66a224725fdb2411846c91cf1a. Signed-off-by: Damien George <damien@micropython.org>
2024-03-19tools/mpy-tool.py: Fix merging of more than 128 mpy files.Damien George
The argument to MP_BC_MAKE_FUNCTION (raw code index) was being encoded as a byte instead of a variable unsigned int. That meant that if there were more than 128 merged mpy files the encoding would be invalid. Fix that by using `mp_encode_uint(idx)` to encode the raw code index. And also use `Opcode` constants for the opcode values to make it easier to understand the code. Signed-off-by: Damien George <damien@micropython.org>
2024-02-16py/emitglue: Include fun_data_len in mp_raw_code_t only when saving.Damien George
Reduces the size of mp_raw_code_t in the case when MICROPY_DEBUG_PRINTERS is enabled. Signed-off-by: Damien George <damien@micropython.org>
2024-02-16tools/mpy-tool.py: Skip generating frozen mp_raw_code_t when possible.Damien George
This reduces frozen code size by using the bytecode directly as the `mp_proto_fun_t`. Signed-off-by: Damien George <damien@micropython.org>
2024-02-16py/emitglue: Introduce mp_proto_fun_t as a more general mp_raw_code_t.Damien George
Allows bytecode itself to be used instead of an mp_raw_code_t in the simple and common cases of a bytecode function without any children. This can be used to further reduce frozen code size, and has the potential to optimise other areas like importing. Signed-off-by: Damien George <damien@micropython.org>
2024-02-16py/emitglue: Simplify mp_raw_code_t's kind and scope_flags members.Damien George
To simplify their access and reduce code size. The `scope_flags` member is only ever used to determine if a function is a generator or not, so make it reflect that fact as a bool type. Signed-off-by: Damien George <damien@micropython.org>
2024-02-16py/emitglue: Provide a truncated mp_raw_code_t for non-asm code.Damien George
The `asm_n_pos_args` and `asm_type_sig` members of `mp_raw_code_t` are only used for raw codes of type MP_CODE_NATIVE_ASM, which are rare, for example in frozen code. So using a truncated `mp_raw_code_t` in these cases helps to reduce frozen code size on targets that have MICROPY_EMIT_INLINE_ASM enabled. With this, change in firmware size of RPI_PICO builds is -648. Signed-off-by: Damien George <damien@micropython.org>
2024-02-16py/emitglue: Reorder and resize members of mp_raw_code_t.Damien George
The mp_raw_code_t struct has been reordered and some members resized. The `n_pos_args` member is renamed to `asm_n_pos_args`, and `type_sig` renamed to `asm_type_sig` to indicate that these are used only for the inline-asm emitters. These two members are also grouped together in the struct. The justifications for resizing the members are: - `fun_data_len` can be 32-bits without issue - `n_children` is already limited to 16-bits by `mp_emit_common_t::ct_cur_child` - `scope_flags` is already limited to 16-bits by `scope_t::scope_flags` - `prelude_offset` is already limited to 16-bits by the argument to `mp_emit_glue_assign_native()` - it's reasonable to limit the maximim number of inline-asm arguments to 12 (24 bits for `asm_type_sig` divided by 2) This change helps to reduce frozen code size (and in some cases RAM usage) in the following cases: - 64-bit targets - builds with MICROPY_PY_SYS_SETTRACE enabled - builds with MICROPY_EMIT_MACHINE_CODE enabled but MICROPY_EMIT_INLINE_ASM disabled With this change, unix 64-bit builds are -4080 bytes in size. Bare-metal ports like rp2 are unchanged (because mp_raw_code_t is still 32 bytes on those 32-bit targets). Signed-off-by: Damien George <damien@micropython.org>
2024-02-12tools/mpy-tool.py: Fix static qstrs when freezing without qstr header.Damien George
It's rare to freeze .mpy files without specifying a qstr header from a firmware build, but it can be useful for testing, eg `mpy-tool.py -f test.mpy`. Fix this case so static qstrs are properly excluded from the frozen qstr list. Signed-off-by: Damien George <damien@micropython.org>
2024-01-25py/qstr: Add support for MICROPY_QSTR_BYTES_IN_HASH=0.Jim Mussared
This disables using qstr hashes altogether, which saves RAM and flash (two bytes per interned string on a typical build) as well as code size. On PYBV11 this is worth over 3k flash. qstr comparison will now be done just by length then data. This affects qstr_find_strn although this has a negligible performance impact as, for a given comparison, the length and first character will ~usually be different anyway. String hashing (e.g. builtin `hash()` and map.c) now need to compute the hash dynamically, and for the map case this does come at a performance cost. This work was funded through GitHub Sponsors. Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
2023-11-03all: Update Python formatting to ruff-format.Jim Mussared
This updates a small number of files that change with ruff-format's (vs black's) rules. This work was funded through GitHub Sponsors. Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
2023-10-30py/qstr: Add support for sorted qstr pools.Jim Mussared
This provides a significant performance boost for qstr_find_strn, which is called a lot during parsing and loading of .mpy files, as well as interning of string objects (which happens in most string methods that return new strings). Also adds comments to explain the "static" qstrs. These are part of the .mpy ABI and avoid needing to duplicate string data for QSTRs known to already be in the firmware. The static pool isn't currently sorted, but in the future we could either split the static pool into the sorted regions, or in the next .mpy version just sort them. Based on initial work done by @amirgon in #6896. This work was funded through GitHub Sponsors. Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
2023-10-16py/persistentcode: Bump .mpy sub-version.Damien George
This is required because the previous commit changed the .mpy native ABI. Signed-off-by: Damien George <damien@micropython.org>
2023-08-16tools/mpy-tool.py: Ignore linter failure in Python 2 compatibility code.Angus Gratton
Found by Ruff checking F821. Signed-off-by: Angus Gratton <angus@redyak.com.au>
2023-08-09tools/mpy-tool.py: Use isinstance() for type checking.Angus Gratton
Ruff version 283 expanded E721 to fail when making direct comparison against a built-in type. Change the code to use isinstance() as suggested, these usages appear to have equivalent functionality. Signed-off-by: Angus Gratton <angus@redyak.com.au>
2023-05-02all: Fix cases of Python variable assigned but never used.Christian Clauss
This fixes ruff rule F841.
2023-02-01tools/mpy-tool.py: Initialize line_info_top.Martin Milata
Without it the line number mapping doesn't work. Signed-off-by: Martin Milata <martin@martinmilata.cz>
2022-10-25py/persistentcode: Only emit sub-version if generated code has native.Jim Mussared
In order for v1.19.1 to load a .mpy, the formerly-feature-flags which are now used for the sub-version must be zero. The sub-version is only used to indicate a native version change, so it should be zero when emitting bytecode-only .mpy files. This work was funded through GitHub Sponsors. Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
2022-09-19py/persistentcode: Introduce .mpy sub-version.Jim Mussared
The intent is to allow us to make breaking changes to the native ABI (e.g. changes to dynruntime.h) without needing the bytecode version to increment. With this commit the two bits previously used for the feature flags (but now unused as of .mpy version 6) encode a sub-version. A bytecode-only .mpy file can be loaded as long as MPY_VERSION matches, but a native .mpy (i.e. one with an arch set) must also match MPY_SUB_VERSION. This allows 3 additional updates to the native ABI per bytecode revision. The sub-version is set to 1 because the previous commits that changed the layout of mp_obj_type_t have changed the native ABI. Signed-off-by: Jim Mussared <jim.mussared@gmail.com> Signed-off-by: Damien George <damien@micropython.org>
2022-06-07tools/mpy-tool.py: Improve generated frozen identifiers.Damien George
Frozen identifiers now include their full name hierarchy, eg their class name. This makes it easier to understand the generated code. Signed-off-by: Damien George <damien@micropython.org>
2022-06-07tools/mpy-tool.py: Rework .mpy merging feature.Damien George
Now that the native qstr link table is gone, merging a native .mpy file with a bytecode .mpy file is not as simple as concatenating the .mpy data. The qstr_table and obj_table tables from all merged .mpy files must now be joined together, because they are global to the .mpy file (and hence global to the merged .mpy file). This means the bytecode needs to be be decoded, qstr_table and obj_table indices updated to point to the correct entries in the new tables, and then the bytecode re-encoded. This commit makes this change to the merging feature in mpy-tool.py. This can now merge an arbitrary number of bytecode .mpy files, and up to one native .mpy file. Signed-off-by: Damien George <damien@micropython.org>
2022-06-07py/bc: Remove unused mp_opcode_format function.Damien George
This was made redundant by f2040bfc7ee033e48acef9f289790f3b4e6b74e5, which also did not update this function for the change to qstr-opcode encoding, so it does not work correctly anyway. Signed-off-by: Damien George <damien@micropython.org>
2022-06-07py/persistentcode: Remove remaining native qstr linking support.Damien George
Support for architecture-specific qstr linking was removed in d4d53e9e114d779523e382c4ea38f0398e880aae, where native code was changed to access qstr values via qstr_table. The only remaining use for the special qstr link table in persistentcode.c is to support native module written in C, linked via mpy_ld.py. But native modules can also use the standard module-level qstr_table (and obj_table) which was introduced in the .mpy file reworking in f2040bfc7ee033e48acef9f289790f3b4e6b74e5. This commit removes the remaining native qstr liking support in persistentcode.c's load_raw_code function, and adds two new relocation options for constants.qstr_table and constants.obj_table. mpy_ld.py is updated to use these relocations options instead of the native qstr link table. Signed-off-by: Damien George <damien@micropython.org>
2022-05-26tools/mpy-tool.py: Remove obsolete unicode flag in .mpy header.Damien George
This was removed in c49d5207e9437755be364639632be31c001955a8 Signed-off-by: Damien George <damien@micropython.org>
2022-05-23py/emitnative: Access qstr values using indirection table qstr_table.Damien George
This changes the native emitter to access qstr values using the qstr indirection table qstr_table, but only when generating native code that will be saved to a .mpy file. This makes the resulting native code fully static, ie it does not require any fix-ups or rewriting when it is imported. The performance of native code is more or less unchanged. Benchmark results on PYBv1.0 (using --via-mpy and --emit native) are: N=100 M=100 baseline -> this-commit diff diff% (error%) bm_chaos.py 407.16 -> 411.85 : +4.69 = +1.152% (+/-0.01%) bm_fannkuch.py 100.89 -> 101.20 : +0.31 = +0.307% (+/-0.01%) bm_fft.py 3521.17 -> 3441.72 : -79.45 = -2.256% (+/-0.00%) bm_float.py 6707.29 -> 6644.83 : -62.46 = -0.931% (+/-0.00%) bm_hexiom.py 55.91 -> 55.41 : -0.50 = -0.894% (+/-0.00%) bm_nqueens.py 5343.54 -> 5326.17 : -17.37 = -0.325% (+/-0.00%) bm_pidigits.py 603.89 -> 632.79 : +28.90 = +4.786% (+/-0.33%) core_qstr.py 64.18 -> 64.09 : -0.09 = -0.140% (+/-0.01%) core_yield_from.py 313.61 -> 311.11 : -2.50 = -0.797% (+/-0.03%) misc_aes.py 654.29 -> 659.75 : +5.46 = +0.834% (+/-0.02%) misc_mandel.py 4205.10 -> 4272.08 : +66.98 = +1.593% (+/-0.01%) misc_pystone.py 3077.79 -> 3128.39 : +50.60 = +1.644% (+/-0.01%) misc_raytrace.py 388.45 -> 393.71 : +5.26 = +1.354% (+/-0.01%) viper_call0.py 576.83 -> 566.76 : -10.07 = -1.746% (+/-0.05%) viper_call1a.py 550.39 -> 540.12 : -10.27 = -1.866% (+/-0.11%) viper_call1b.py 438.32 -> 432.09 : -6.23 = -1.421% (+/-0.11%) viper_call1c.py 442.96 -> 436.11 : -6.85 = -1.546% (+/-0.08%) viper_call2a.py 536.31 -> 527.37 : -8.94 = -1.667% (+/-0.04%) viper_call2b.py 378.99 -> 377.50 : -1.49 = -0.393% (+/-0.08%) Signed-off-by: Damien George <damien@micropython.org>
2022-05-17py/emitnative: Put a pointer to the native prelude in child_table array.Damien George
Some architectures (like esp32 xtensa) cannot read byte-wise from executable memory. This means the prelude for native functions -- which is usually located after the machine code for the native function -- must be placed in separate memory that can be read byte-wise. Prior to this commit this was achieved by enabling N_PRELUDE_AS_BYTES_OBJ for the emitter and MICROPY_EMIT_NATIVE_PRELUDE_AS_BYTES_OBJ for the runtime. The prelude was then placed in a bytes object, pointed to by the module's constant table. This behaviour is changed by this commit so that a pointer to the prelude is stored either in mp_obj_fun_bc_t.child_table, or in mp_obj_fun_bc_t.child_table[num_children] if num_children > 0. The reasons for doing this are: 1. It decouples the native emitter from runtime requirements, the emitted code no longer needs to know if the system it runs on can/can't read byte-wise from executable memory. 2. It makes all ports have the same emitter behaviour, there is no longer the N_PRELUDE_AS_BYTES_OBJ option. 3. The module's constant table is now used only for actual constants in the Python code. This allows further optimisations to be done with the constants (eg constant deduplication). Code size change for those ports that enable the native emitter: unix x64: +80 +0.015% stm32: +24 +0.004% PYBV10 esp8266: +88 +0.013% GENERIC esp32: -20 -0.002% GENERIC[incl -112(data)] rp2: +32 +0.005% PICO Signed-off-by: Damien George <damien@micropython.org>
2022-04-14tools/mpy-tool.py: Intern more strings when freezing.Damien George
Signed-off-by: Damien George <damien@micropython.org>
2022-04-14tools/mpy-tool.py: Optimise freezing of str when str data is a qstr.Damien George
Signed-off-by: Damien George <damien@micropython.org>
2022-04-14tools/mpy-tool.py: Make global qstr list a dedicated class.Damien George
Signed-off-by: Damien George <damien@micropython.org>
2022-04-14tools/mpy-tool.py: Optimise freezing of empty str and bytes objects.Damien George
Signed-off-by: Damien George <damien@micropython.org>
2022-04-14tools/mpy-tool.py: Optimise freezing of ints that can fit a small int.Damien George
Signed-off-by: Damien George <damien@micropython.org>
2022-04-14tools/mpy-tool.py: Support freezing tuples and other consts.Damien George
This also simplifies how constants are frozen. Signed-off-by: Damien George <damien@micropython.org>
2022-04-14tools/mpy-tool.py: Support loading tuples from .mpy files.Damien George
Signed-off-by: Damien George <damien@micropython.org>
2022-04-14py/persistentcode: Define enum values for obj types instead of letters.Damien George
To keep the separate parts of the code that use these values in sync. And make it easier to add new object types. Signed-off-by: Damien George <damien@micropython.org>
2022-03-28py: Change jump-if-x-or-pop opcodes to have unsigned offset argument.Damien George
These jumps are always forwards, and it's more efficient in the VM to decode an unsigned argument. These opcodes are already optimised versions of the sequence "dup-top pop-jump-if-x pop" so it doesn't hurt generality to optimise them further. Signed-off-by: Damien George <damien@micropython.org>
2022-03-28py: Change jump opcodes to emit 1-byte jump offset when possible.Damien George
This commit introduces changes: - All jump opcodes are changed to have variable length arguments, of either 1 or 2 bytes (previously they were fixed at 2 bytes). In most cases only 1 byte is needed to encode the short jump offset, saving bytecode size. - The bytecode emitter now selects 1 byte jump arguments when the jump offset is guaranteed to fit in 1 byte. This is achieved by checking if the code size changed during the last pass and, if it did (if it shrank), then requesting that the compiler make another pass to get the correct offsets of the now-smaller code. This can continue multiple times until the code stabilises. The code can only ever shrink so this iteration is guaranteed to complete. In most cases no extra passes are needed, the original 4 passes are enough to get it right by the 4th pass (because the 2nd pass computes roughly the correct labels and the 3rd pass computes the correct size for the jump argument). This change to the jump opcode encoding reduces .mpy files and RAM usage (when bytecode is in RAM) by about 2% on average. The performance of the VM is not impacted, at least within measurment of the performance benchmark suite. Code size is reduced for builds that include a decent amount of frozen bytecode. ARM Cortex-M builds without any frozen code increase by about 350 bytes. Signed-off-by: Damien George <damien@micropython.org>
2022-02-28tools/mpy-tool.py: Fix frozen comment generation to escape chars.robert-hh
That caused the compile of frozen_content.c to fail if characters like backslash were in a short string. Thanks to @hippy for identifying the spot to change.
2022-02-24py: Rework bytecode and .mpy file format to be mostly static data.Damien George
Background: .mpy files are precompiled .py files, built using mpy-cross, that contain compiled bytecode functions (and can also contain machine code). The benefit of using an .mpy file over a .py file is that they are faster to import and take less memory when importing. They are also smaller on disk. But the real benefit of .mpy files comes when they are frozen into the firmware. This is done by loading the .mpy file during compilation of the firmware and turning it into a set of big C data structures (the job of mpy-tool.py), which are then compiled and downloaded into the ROM of a device. These C data structures can be executed in-place, ie directly from ROM. This makes importing even faster because there is very little to do, and also means such frozen modules take up much less RAM (because their bytecode stays in ROM). The downside of frozen code is that it requires recompiling and reflashing the entire firmware. This can be a big barrier to entry, slows down development time, and makes it harder to do OTA updates of frozen code (because the whole firmware must be updated). This commit attempts to solve this problem by providing a solution that sits between loading .mpy files into RAM and freezing them into the firmware. The .mpy file format has been reworked so that it consists of data and bytecode which is mostly static and ready to run in-place. If these new .mpy files are located in flash/ROM which is memory addressable, the .mpy file can be executed (mostly) in-place. With this approach there is still a small amount of unpacking and linking of the .mpy file that needs to be done when it's imported, but it's still much better than loading an .mpy from disk into RAM (although not as good as freezing .mpy files into the firmware). The main trick to make static .mpy files is to adjust the bytecode so any qstrs that it references now go through a lookup table to convert from local qstr number in the module to global qstr number in the firmware. That means the bytecode does not need linking/rewriting of qstrs when it's loaded. Instead only a small qstr table needs to be built (and put in RAM) at import time. This means the bytecode itself is static/constant and can be used directly if it's in addressable memory. Also the qstr string data in the .mpy file, and some constant object data, can be used directly. Note that the qstr table is global to the module (ie not per function). In more detail, in the VM what used to be (schematically): qst = DECODE_QSTR_VALUE; is now (schematically): idx = DECODE_QSTR_INDEX; qst = qstr_table[idx]; That allows the bytecode to be fixed at compile time and not need relinking/rewriting of the qstr values. Only qstr_table needs to be linked when the .mpy is loaded. Incidentally, this helps to reduce the size of bytecode because what used to be 2-byte qstr values in the bytecode are now (mostly) 1-byte indices. If the module uses the same qstr more than two times then the bytecode is smaller than before. The following changes are measured for this commit compared to the previous (the baseline): - average 7%-9% reduction in size of .mpy files - frozen code size is reduced by about 5%-7% - importing .py files uses about 5% less RAM in total - importing .mpy files uses about 4% less RAM in total - importing .py and .mpy files takes about the same time as before The qstr indirection in the bytecode has only a small impact on VM performance. For stm32 on PYBv1.0 the performance change of this commit is: diff of scores (higher is better) N=100 M=100 baseline -> this-commit diff diff% (error%) bm_chaos.py 371.07 -> 357.39 : -13.68 = -3.687% (+/-0.02%) bm_fannkuch.py 78.72 -> 77.49 : -1.23 = -1.563% (+/-0.01%) bm_fft.py 2591.73 -> 2539.28 : -52.45 = -2.024% (+/-0.00%) bm_float.py 6034.93 -> 5908.30 : -126.63 = -2.098% (+/-0.01%) bm_hexiom.py 48.96 -> 47.93 : -1.03 = -2.104% (+/-0.00%) bm_nqueens.py 4510.63 -> 4459.94 : -50.69 = -1.124% (+/-0.00%) bm_pidigits.py 650.28 -> 644.96 : -5.32 = -0.818% (+/-0.23%) core_import_mpy_multi.py 564.77 -> 581.49 : +16.72 = +2.960% (+/-0.01%) core_import_mpy_single.py 68.67 -> 67.16 : -1.51 = -2.199% (+/-0.01%) core_qstr.py 64.16 -> 64.12 : -0.04 = -0.062% (+/-0.00%) core_yield_from.py 362.58 -> 354.50 : -8.08 = -2.228% (+/-0.00%) misc_aes.py 429.69 -> 405.59 : -24.10 = -5.609% (+/-0.01%) misc_mandel.py 3485.13 -> 3416.51 : -68.62 = -1.969% (+/-0.00%) misc_pystone.py 2496.53 -> 2405.56 : -90.97 = -3.644% (+/-0.01%) misc_raytrace.py 381.47 -> 374.01 : -7.46 = -1.956% (+/-0.01%) viper_call0.py 576.73 -> 572.49 : -4.24 = -0.735% (+/-0.04%) viper_call1a.py 550.37 -> 546.21 : -4.16 = -0.756% (+/-0.09%) viper_call1b.py 438.23 -> 435.68 : -2.55 = -0.582% (+/-0.06%) viper_call1c.py 442.84 -> 440.04 : -2.80 = -0.632% (+/-0.08%) viper_call2a.py 536.31 -> 532.35 : -3.96 = -0.738% (+/-0.06%) viper_call2b.py 382.34 -> 377.07 : -5.27 = -1.378% (+/-0.03%) And for unix on x64: diff of scores (higher is better) N=2000 M=2000 baseline -> this-commit diff diff% (error%) bm_chaos.py 13594.20 -> 13073.84 : -520.36 = -3.828% (+/-5.44%) bm_fannkuch.py 60.63 -> 59.58 : -1.05 = -1.732% (+/-3.01%) bm_fft.py 112009.15 -> 111603.32 : -405.83 = -0.362% (+/-4.03%) bm_float.py 246202.55 -> 247923.81 : +1721.26 = +0.699% (+/-2.79%) bm_hexiom.py 615.65 -> 617.21 : +1.56 = +0.253% (+/-1.64%) bm_nqueens.py 215807.95 -> 215600.96 : -206.99 = -0.096% (+/-3.52%) bm_pidigits.py 8246.74 -> 8422.82 : +176.08 = +2.135% (+/-3.64%) misc_aes.py 16133.00 -> 16452.74 : +319.74 = +1.982% (+/-1.50%) misc_mandel.py 128146.69 -> 130796.43 : +2649.74 = +2.068% (+/-3.18%) misc_pystone.py 83811.49 -> 83124.85 : -686.64 = -0.819% (+/-1.03%) misc_raytrace.py 21688.02 -> 21385.10 : -302.92 = -1.397% (+/-3.20%) The code size change is (firmware with a lot of frozen code benefits the most): bare-arm: +396 +0.697% minimal x86: +1595 +0.979% [incl +32(data)] unix x64: +2408 +0.470% [incl +800(data)] unix nanbox: +1396 +0.309% [incl -96(data)] stm32: -1256 -0.318% PYBV10 cc3200: +288 +0.157% esp8266: -260 -0.037% GENERIC esp32: -216 -0.014% GENERIC[incl -1072(data)] nrf: +116 +0.067% pca10040 rp2: -664 -0.135% PICO samd: +844 +0.607% ADAFRUIT_ITSYBITSY_M4_EXPRESS As part of this change the .mpy file format version is bumped to version 6. And mpy-tool.py has been improved to provide a good visualisation of the contents of .mpy files. In summary: this commit changes the bytecode to use qstr indirection, and reworks the .mpy file format to be simpler and allow .mpy files to be executed in-place. Performance is not impacted too much. Eventually it will be possible to store such .mpy files in a linear, read-only, memory- mappable filesystem so they can be executed from flash/ROM. This will essentially be able to replace frozen code for most applications. Signed-off-by: Damien George <damien@micropython.org>
2022-02-11py/qstr: Use `const` consistently to avoid a cast.Artyom Skrobov
Originally at adafruit#4707 Signed-off-by: Artyom Skrobov <tyomitch@gmail.com>
2022-02-11py/qstr: Separate hash and len from string data.Artyom Skrobov
This allows the compiler to merge strings: e.g. "update", "difference_update" and "symmetric_difference_update" will all point to the same memory. No functional change. The size reduction depends on the number of qstrs in the build. The change this commit brings is: bare-arm: -4 -0.007% minimal x86: +150 +0.092% [incl +48(data)] unix x64: -608 -0.118% unix nanbox: -572 -0.126% [incl +32(data)] stm32: -1392 -0.352% PYBV10 cc3200: -448 -0.244% esp8266: -1208 -0.173% GENERIC esp32: -1028 -0.068% GENERIC[incl -1020(data)] nrf: -440 -0.252% pca10040 rp2: -1072 -0.217% PICO samd: -368 -0.264% ADAFRUIT_ITSYBITSY_M4_EXPRESS Performance is also improved (on bare metal at least) for the core_import_mpy_multi.py, core_import_mpy_single.py and core_qstr.py performance benchmarks. Originally at adafruit#4583 Signed-off-by: Artyom Skrobov <tyomitch@gmail.com>
2021-12-18py: Only search frozen modules when '.frozen' is found in sys.path.Jim Mussared
This changes makemanifest.py & mpy-tool.py to merge string and mpy names into the same list (now mp_frozen_names). The various paths for loading a frozen module (mp_find_frozen_module) and checking existence of a frozen module (mp_frozen_stat) use a common function that searches this list. In addition, the frozen lookup will now only take place if the path starts with ".frozen", which needs to be added to sys.path. This fixes issues #1804, #2322, #3509, #6419. Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
2021-09-16all: Remove MICROPY_OPT_CACHE_MAP_LOOKUP_IN_BYTECODE.Jim Mussared
This commit removes all parts of code associated with the existing MICROPY_OPT_CACHE_MAP_LOOKUP_IN_BYTECODE optimisation option, including the -mcache-lookup-bc option to mpy-cross. This feature originally provided a significant performance boost for Unix, but wasn't able to be enabled for MCU targets (due to frozen bytecode), and added significant extra complexity to generating and distributing .mpy files. The equivalent performance gain is now provided by the combination of MICROPY_OPT_LOAD_ATTR_FAST_PATH and MICROPY_OPT_MAP_LOOKUP_CACHE (which has been enabled on the unix port in the previous commit). It's hard to provide precise performance numbers, but tests have been run on a wide variety of architectures (x86-64, ARM Cortex, Aarch64, RISC-V, xtensa) and they all generally agree on the qualitative improvements seen by the combination of MICROPY_OPT_LOAD_ATTR_FAST_PATH and MICROPY_OPT_MAP_LOOKUP_CACHE. For example, on a "quiet" Linux x64 environment (i3-5010U @ 2.10GHz) the change from CACHE_MAP_LOOKUP_IN_BYTECODE, to LOAD_ATTR_FAST_PATH combined with MAP_LOOKUP_CACHE is: diff of scores (higher is better) N=2000 M=2000 bccache -> attrmapcache diff diff% (error%) bm_chaos.py 13742.56 -> 13905.67 : +163.11 = +1.187% (+/-3.75%) bm_fannkuch.py 60.13 -> 61.34 : +1.21 = +2.012% (+/-2.11%) bm_fft.py 113083.20 -> 114793.68 : +1710.48 = +1.513% (+/-1.57%) bm_float.py 256552.80 -> 243908.29 : -12644.51 = -4.929% (+/-1.90%) bm_hexiom.py 521.93 -> 625.41 : +103.48 = +19.826% (+/-0.40%) bm_nqueens.py 197544.25 -> 217713.12 : +20168.87 = +10.210% (+/-3.01%) bm_pidigits.py 8072.98 -> 8198.75 : +125.77 = +1.558% (+/-3.22%) misc_aes.py 17283.45 -> 16480.52 : -802.93 = -4.646% (+/-0.82%) misc_mandel.py 99083.99 -> 128939.84 : +29855.85 = +30.132% (+/-5.88%) misc_pystone.py 83860.10 -> 82592.56 : -1267.54 = -1.511% (+/-2.27%) misc_raytrace.py 21490.40 -> 22227.23 : +736.83 = +3.429% (+/-1.88%) This shows that the new optimisations are at least as good as the existing inline-bytecode-caching, and are sometimes much better (because the new ones apply caching to a wider variety of map lookups). The new optimisations can also benefit code generated by the native emitter, because they apply to the runtime rather than the generated code. The improvement for the native emitter when LOAD_ATTR_FAST_PATH and MAP_LOOKUP_CACHE are enabled is (same Linux environment as above): diff of scores (higher is better) N=2000 M=2000 native -> nat-attrmapcache diff diff% (error%) bm_chaos.py 14130.62 -> 15464.68 : +1334.06 = +9.441% (+/-7.11%) bm_fannkuch.py 74.96 -> 76.16 : +1.20 = +1.601% (+/-1.80%) bm_fft.py 166682.99 -> 168221.86 : +1538.87 = +0.923% (+/-4.20%) bm_float.py 233415.23 -> 265524.90 : +32109.67 = +13.756% (+/-2.57%) bm_hexiom.py 628.59 -> 734.17 : +105.58 = +16.796% (+/-1.39%) bm_nqueens.py 225418.44 -> 232926.45 : +7508.01 = +3.331% (+/-3.10%) bm_pidigits.py 6322.00 -> 6379.52 : +57.52 = +0.910% (+/-5.62%) misc_aes.py 20670.10 -> 27223.18 : +6553.08 = +31.703% (+/-1.56%) misc_mandel.py 138221.11 -> 152014.01 : +13792.90 = +9.979% (+/-2.46%) misc_pystone.py 85032.14 -> 105681.44 : +20649.30 = +24.284% (+/-2.25%) misc_raytrace.py 19800.01 -> 23350.73 : +3550.72 = +17.933% (+/-2.79%) In summary, compared to MICROPY_OPT_CACHE_MAP_LOOKUP_IN_BYTECODE, the new MICROPY_OPT_LOAD_ATTR_FAST_PATH and MICROPY_OPT_MAP_LOOKUP_CACHE options: - are simpler; - take less code size; - are faster (generally); - work with code generated by the native emitter; - can be used on embedded targets with a small and constant RAM overhead; - allow the same .mpy bytecode to run on all targets. See #7680 for further discussion. And see also #7653 for a discussion about simplifying mpy-cross options. Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
2021-05-26tools/mpy-tool.py: Support relocating ARMv6 arch.Damien George
Signed-off-by: Damien George <damien@micropython.org>
2021-01-29tools/mpy-tool.py: List frozen modules in MICROPY_FROZEN_LIST_ITEM.Damien George
Signed-off-by: Damien George <damien@micropython.org>
2020-09-09tools/mpy-tool.py: Fix merge of multiple mpy files to POP_TOP correctly.Damien George
MP_BC_CALL_FUNCTION will leave the result on the Python stack, so that result must be discarded by MP_BC_POP_TOP. Signed-off-by: Damien George <damien@micropython.org>