summaryrefslogtreecommitdiff
path: root/tools/mpy_ld.py
AgeCommit message (Collapse)Author
2021-09-16all: Remove MICROPY_OPT_CACHE_MAP_LOOKUP_IN_BYTECODE.Jim Mussared
This commit removes all parts of code associated with the existing MICROPY_OPT_CACHE_MAP_LOOKUP_IN_BYTECODE optimisation option, including the -mcache-lookup-bc option to mpy-cross. This feature originally provided a significant performance boost for Unix, but wasn't able to be enabled for MCU targets (due to frozen bytecode), and added significant extra complexity to generating and distributing .mpy files. The equivalent performance gain is now provided by the combination of MICROPY_OPT_LOAD_ATTR_FAST_PATH and MICROPY_OPT_MAP_LOOKUP_CACHE (which has been enabled on the unix port in the previous commit). It's hard to provide precise performance numbers, but tests have been run on a wide variety of architectures (x86-64, ARM Cortex, Aarch64, RISC-V, xtensa) and they all generally agree on the qualitative improvements seen by the combination of MICROPY_OPT_LOAD_ATTR_FAST_PATH and MICROPY_OPT_MAP_LOOKUP_CACHE. For example, on a "quiet" Linux x64 environment (i3-5010U @ 2.10GHz) the change from CACHE_MAP_LOOKUP_IN_BYTECODE, to LOAD_ATTR_FAST_PATH combined with MAP_LOOKUP_CACHE is: diff of scores (higher is better) N=2000 M=2000 bccache -> attrmapcache diff diff% (error%) bm_chaos.py 13742.56 -> 13905.67 : +163.11 = +1.187% (+/-3.75%) bm_fannkuch.py 60.13 -> 61.34 : +1.21 = +2.012% (+/-2.11%) bm_fft.py 113083.20 -> 114793.68 : +1710.48 = +1.513% (+/-1.57%) bm_float.py 256552.80 -> 243908.29 : -12644.51 = -4.929% (+/-1.90%) bm_hexiom.py 521.93 -> 625.41 : +103.48 = +19.826% (+/-0.40%) bm_nqueens.py 197544.25 -> 217713.12 : +20168.87 = +10.210% (+/-3.01%) bm_pidigits.py 8072.98 -> 8198.75 : +125.77 = +1.558% (+/-3.22%) misc_aes.py 17283.45 -> 16480.52 : -802.93 = -4.646% (+/-0.82%) misc_mandel.py 99083.99 -> 128939.84 : +29855.85 = +30.132% (+/-5.88%) misc_pystone.py 83860.10 -> 82592.56 : -1267.54 = -1.511% (+/-2.27%) misc_raytrace.py 21490.40 -> 22227.23 : +736.83 = +3.429% (+/-1.88%) This shows that the new optimisations are at least as good as the existing inline-bytecode-caching, and are sometimes much better (because the new ones apply caching to a wider variety of map lookups). The new optimisations can also benefit code generated by the native emitter, because they apply to the runtime rather than the generated code. The improvement for the native emitter when LOAD_ATTR_FAST_PATH and MAP_LOOKUP_CACHE are enabled is (same Linux environment as above): diff of scores (higher is better) N=2000 M=2000 native -> nat-attrmapcache diff diff% (error%) bm_chaos.py 14130.62 -> 15464.68 : +1334.06 = +9.441% (+/-7.11%) bm_fannkuch.py 74.96 -> 76.16 : +1.20 = +1.601% (+/-1.80%) bm_fft.py 166682.99 -> 168221.86 : +1538.87 = +0.923% (+/-4.20%) bm_float.py 233415.23 -> 265524.90 : +32109.67 = +13.756% (+/-2.57%) bm_hexiom.py 628.59 -> 734.17 : +105.58 = +16.796% (+/-1.39%) bm_nqueens.py 225418.44 -> 232926.45 : +7508.01 = +3.331% (+/-3.10%) bm_pidigits.py 6322.00 -> 6379.52 : +57.52 = +0.910% (+/-5.62%) misc_aes.py 20670.10 -> 27223.18 : +6553.08 = +31.703% (+/-1.56%) misc_mandel.py 138221.11 -> 152014.01 : +13792.90 = +9.979% (+/-2.46%) misc_pystone.py 85032.14 -> 105681.44 : +20649.30 = +24.284% (+/-2.25%) misc_raytrace.py 19800.01 -> 23350.73 : +3550.72 = +17.933% (+/-2.79%) In summary, compared to MICROPY_OPT_CACHE_MAP_LOOKUP_IN_BYTECODE, the new MICROPY_OPT_LOAD_ATTR_FAST_PATH and MICROPY_OPT_MAP_LOOKUP_CACHE options: - are simpler; - take less code size; - are faster (generally); - work with code generated by the native emitter; - can be used on embedded targets with a small and constant RAM overhead; - allow the same .mpy bytecode to run on all targets. See #7680 for further discussion. And see also #7653 for a discussion about simplifying mpy-cross options. Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
2021-05-20tools/mpy_ld.py: Support R_X86_64_GOTPCREL reloc for x86-64 arch.Damien George
This can be treated by the linker the same as R_X86_64_REX_GOTPCRELX, according to https://reviews.llvm.org/D18301. Signed-off-by: Damien George <damien@micropython.org>
2020-08-29all: Update Python code to conform to latest black formatting.Damien George
Updating to Black v20.8b1 there are two changes that affect the code in this repository: - If there is a trailing comma in a list (eg [], () or function call) then that list is now written out with one line per element. So remove such trailing commas where the list should stay on one line. - Spaces at the start of """ doc strings are removed. Signed-off-by: Damien George <damien@micropython.org>
2020-02-28all: Reformat C and Python source code with tools/codeformat.py.Damien George
This is run with uncrustify 0.70.1, and black 19.10b0.
2019-12-17py/persistentcode: Move loading of rodata/bss to before obj/raw-code.Damien George
This makes the loading of viper-code-with-relocations a bit neater and easier to understand, by treating the rodata/bss like a special object to be loaded into the constant table (which is how it behaves).
2019-12-12py/dynruntime: Add support for float API to make/get floats.Damien George
We don't want to add a feature flag to .mpy files that indicate float support because it will get complex and difficult to use. Instead the .mpy is built using whatever precision it chooses (float or double) and the native glue API will convert between this choice and what the host runtime actually uses.
2019-12-12tools/mpy_ld.py: Add new mpy_ld.py tool and associated build files.Damien George
This commit adds a new tool called mpy_ld.py which is essentially a linker that builds .mpy files directly from .o files. A new header file (dynruntime.h) and makefile fragment (dynruntime.mk) are also included which allow building .mpy files from C source code. Such .mpy files can then be dynamically imported as though they were a normal Python module, even though they are implemented in C. Converting .o files directly (rather than pre-linked .elf files) allows the resulting .mpy to be more efficient because it has more control over the relocations; for example it can skip PLT indirection. Doing it this way also allows supporting more architectures, such as Xtensa which has specific needs for position-independent code and the GOT. The tool supports targets of x86, x86-64, ARM Thumb and Xtensa (windowed and non-windowed). BSS, text and rodata sections are supported, with relocations to all internal sections and symbols, as well as relocations to some external symbols (defined by dynruntime.h), and linking of qstrs.