summaryrefslogtreecommitdiff
path: root/py/parsenum.c
AgeCommit message (Collapse)Author
7 dayspy/formatfloat: Improve accuracy of float formatting code.Yoctopuce dev
Following discussions in PR #16666, this commit updates the float formatting code to improve the `repr` reversibility, i.e. the percentage of valid floating point numbers that do parse back to the same number when formatted by `repr` (in CPython it's 100%). This new code offers a choice of 3 float conversion methods, depending on the desired tradeoff between code size and conversion precision: - BASIC method is the smallest code footprint - APPROX method uses an iterative method to approximate the exact representation, which is a bit slower but but does not have a big impact on code size. It provides `repr` reversibility on >99.8% of the cases in double precision, and on >98.5% in single precision (except with REPR_C, where reversibility is 100% as the last two bits are not taken into account). - EXACT method uses higher-precision floats during conversion, which provides perfect results but has a higher impact on code size. It is faster than APPROX method, and faster than the CPython equivalent implementation. It is however not available on all compilers when using FLOAT_IMPL_DOUBLE. Here is the table comparing the impact of the three conversion methods on code footprint on PYBV10 (using single-precision floats) and reversibility rate for both single-precision and double-precision floats. The table includes current situation as a baseline for the comparison: PYBV10 REPR_C FLOAT DOUBLE current = 364688 12.9% 27.6% 37.9% basic = 364812 85.6% 60.5% 85.7% approx = 365080 100.0% 98.5% 99.8% exact = 366408 100.0% 100.0% 100.0% Signed-off-by: Yoctopuce dev <dev@yoctopuce.com>
7 dayspy/parsenum: Refactor float parsing code.Yoctopuce dev
This commit extracts from the current float parsing code two functions which could be reused elsewhere in MicroPython. The code used to multiply a float x by a power of 10 is also simplified by applying the binary exponent separately from the power of 5. This avoids the risk of overflow in the intermediate stage, before multiplying by x. Signed-off-by: Yoctopuce dev <dev@yoctopuce.com>
2025-07-18py/parsenum: Extend mp_parse_num_integer() to parse long long.Angus Gratton
If big integer support is 'long long' then mp_parse_num_integer() can parse to it directly instead of failing over from small int. This means strtoll() is no longer pulled in, and fixes some bugs parsing long long integers (i.e. can now parse negative values correctly, can now parse values which aren't NULL terminated). The (default) smallint parsing compiled code should stay the same here, macros and a typedef are used to abstract some parts of it out. When bigint is long long we parse to 'unsigned long long' first (to avoid the code size hit of pulling in signed 64-bit math routines) and the convert to signed at the end. One tricky case this routine correctly overflows on is int("9223372036854775808") which is one more than LLONG_MAX in decimal. No unit test case added for this as it's too hard to detect 64-bit long integer mode. This work was funded through GitHub Sponsors. Signed-off-by: Angus Gratton <angus@redyak.com.au>
2025-07-18py/smallint: Update mp_small_int_mul_overflow() to perform the multiply.Angus Gratton
Makes it compatible with the __builtin_mul_overflow() syntax, used in follow-up commit. Includes optimisation in runtime.c to minimise the code size impact from additional param. Signed-off-by: Damien George <damien@micropython.org> Signed-off-by: Angus Gratton <angus@redyak.com.au>
2025-06-10py/parsenum: Fix parsing complex literals with negative real part.Jeff Epler
If a complex literal had a negative real part and a positive imaginary part, it was not parsed properly because the imaginary part also came out negative. Includes a test of complex parsing, which fails without this fix. Co-authored-by: ComplexSymbol <141301057+ComplexSymbol@users.noreply.github.com> Signed-off-by: Jeff Epler <jepler@gmail.com>
2025-06-10py/parsenum: Further reduce code size in check for inf/nan.Jeff Epler
A few more bytes can be saved by not using nested `if`s (4 bytes for `build-MICROBIT/py/parsenum.o`, 8 bytes for RPI_PICO firmware). This commit is better viewed with whitespace changes hidden, because two blocks were reindented (e.g., `git show -b`). Signed-off-by: Jeff Epler <jepler@gmail.com>
2025-06-10py/parsenum: Reduce code size in check for inf/nan.Jeff Epler
By avoiding two different checks of the string length, code size is reduced without changing behavior: Some invalid float/complex strings like "ix" will get handled just like "xx" in the main number literal parsing code instead. The optimizer alone couldn't remove the reundant comparisons because it couldn't make a transformation that let an invalid string like "ix" pass into the generic number parsing code. Signed-off-by: Jeff Epler <jepler@gmail.com>
2025-05-13all: Rename the "NORETURN" macro to "MP_NORETURN".Alessandro Gatti
This commit renames the NORETURN macro, indicating to the compiler that a function does not return, into MP_NORETURN to maintain the same naming convention of other similar macros. To maintain compaitiblity with existing code NORETURN is aliased to MP_NORETURN, but it is also deprecated for MicroPython v2. This changeset was created using a similar process to decf8e6a8bb940d5829ca3296790631fcece7b21 ("all: Remove the "STATIC" macro and just use "static" instead."), with no documentation or python scripts to change to reflect the new macro name. Signed-off-by: Alessandro Gatti <a.gatti@frob.it>
2025-02-28py/parsenum: Reduce code footprint of mp_parse_num_float.Yoctopuce dev
The mantissa parsing code uses a floating point variable to accumulate digits. Using an `mp_float_uint_t` variable instead and casting to `mp_float_t` at the very end reduces code size. In some cases, it also improves the rounding behaviour as extra digits are taken into account by the int-to-float conversion code. An extra test case handles the special case where mantissa overflow occurs while processing deferred trailing zeros. Signed-off-by: Yoctopuce dev <dev@yoctopuce.com>
2025-01-26py/parsenum: Throw an exception for invalid int literals like "01".Jeff Epler
This includes making int("01") parse in base 10 like standard Python. When a base of 0 is specified it means auto-detect based on the prefix, and literals begining with 0 (except when the literal is all 0's) like "01" are then invalid and now throw an exception. The new error message is different from CPython. It says e.g., `SyntaxError: invalid syntax for integer with base 0: '09'` Additional test cases were added to cover the changed & added code. Co-authored-by: Damien George <damien@micropython.org> Signed-off-by: Jeff Epler <jepler@gmail.com>
2024-03-07all: Remove the "STATIC" macro and just use "static" instead.Angus Gratton
The STATIC macro was introduced a very long time ago in commit d5df6cd44a433d6253a61cb0f987835fbc06b2de. The original reason for this was to have the option to define it to nothing so that all static functions become global functions and therefore visible to certain debug tools, so one could do function size comparison and other things. This STATIC feature is rarely (if ever) used. And with the use of LTO and heavy inline optimisation, analysing the size of individual functions when they are not static is not a good representation of the size of code when fully optimised. So the macro does not have much use and it's simpler to just remove it. Then you know exactly what it's doing. For example, newcomers don't have to learn what the STATIC macro is and why it exists. Reading the code is also less "loud" with a lowercase static. One other minor point in favour of removing it, is that it stops bugs with `STATIC inline`, which should always be `static inline`. Methodology for this commit was: 1) git ls-files | egrep '\.[ch]$' | \ xargs sed -Ei "s/(^| )STATIC($| )/\1static\2/" 2) Do some manual cleanup in the diff by searching for the word STATIC in comments and changing those back. 3) "git-grep STATIC docs/", manually fixed those cases. 4) "rg -t python STATIC", manually fixed codegen lines that used STATIC. This work was funded through GitHub Sponsors. Signed-off-by: Angus Gratton <angus@redyak.com.au>
2023-06-14py/parsenum: Fix typo in #endif comment.David Lechner
This fixes a `#endif` comment to exactly match the `#if`. Signed-off-by: David Lechner <david@pybricks.com>
2022-08-26py/objstr: Optimise mp_obj_new_str_from_vstr for known-safe strings.Jim Mussared
The new `mp_obj_new_str_from_utf8_vstr` can be used when you know you already have a unicode-safe string. Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
2022-08-26py/objstr: Split mp_obj_str_from_vstr into bytes/str versions.Jim Mussared
Previously the desired output type was specified. Now make the type part of the function name. Because this function is used in a few places this saves code size due to smaller call-site. This makes `mp_obj_new_str_type_from_vstr` a private function of objstr.c (which is almost the only place where the output type isn't a compile-time constant). This saves ~140 bytes on PYBV11. Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
2022-08-12py/parsenum: Ensure that trailing zeros lead to identical results.Dan Ellis
Prior to this commit, parsenum would calculate "1e-20" as 1.0*pow(10, -20), and "1.000e-20" as 1000.0*pow(10, -23); in certain cases, this could make seemingly-identical values compare as not equal. This commit watches for trailing zeros as a special case, and ignores them when appropriate, so "1.000e-20" is also calculated as 1.0*pow(10, -20). Fixes issue #5831.
2022-06-23py/parsenum: Optimise when building with complex disabled.Damien George
To reduce code size when MICROPY_PY_BUILTINS_COMPLEX is disabled. Signed-off-by: Damien George <damien@micropython.org>
2022-06-23py/parsenum: Fix parsing of complex "j" and also "nanj", "infj".Damien George
Prior to this commit, complex("j") would return 0j, and complex("nanj") would return nan+0j. This commit makes sure "j" is tested for after parsing the number (nan, inf or a decimal), and also supports the case of "j" on its own. Signed-off-by: Damien George <damien@micropython.org>
2022-06-23py/parsenum: Support parsing complex numbers of the form "a+bj".Jim Mussared
To conform with CPython. Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
2021-04-27py: Add option to compile without any error messages at all.Damien George
This introduces a new option, MICROPY_ERROR_REPORTING_NONE, which completely disables all error messages. To be used in cases where MicroPython needs to fit in very limited systems. Signed-off-by: Damien George <damien@micropython.org>
2020-04-18all: Fix implicit floating point promotion.stijn
Initially some of these were found building the unix coverage variant on MacOS because that build uses clang and has -Wdouble-promotion enabled, and clang performs more vigorous promotion checks than gcc. Additionally the codebase has been compiled with clang and msvc (the latter with warning level 3), and with MICROPY_FLOAT_IMPL_FLOAT to find the rest of the conversions. Fixes are implemented either as explicit casts, or by using the correct type, or by using one of the utility functions to handle floating point casting; these have been moved from nativeglue.c to the public API.
2020-04-18Revert "all: Fix implicit casts of float/double, and signed comparison."stijn
This reverts commit a2110bd3fca59df8b16a2b5fe4645a4af30b06ed. There's nothing inherently wrong with it, but upcoming commits will apply similar fixes in a slightly different way.
2020-04-05all: Use MP_ERROR_TEXT for all error messages.Jim Mussared
2020-04-05py: Use preprocessor to detect error reporting level (terse/detailed).Jim Mussared
Instead of compiler-level if-logic. This is necessary to know what error strings are included in the build at the preprocessor stage, so that string compression can be implemented.
2020-03-30all: Fix implicit casts of float/double, and signed comparison.David Lechner
These were found by buiding the unix coverage variant on macOS (so clang compiler). Mostly, these are fixing implicit cast of float/double to mp_float_t which is one of those two and one mp_int_t to size_t fix for good measure.
2020-02-28all: Reformat C and Python source code with tools/codeformat.py.Damien George
This is run with uncrustify 0.70.1, and black 19.10b0.
2019-09-26py: Rename MP_QSTR_NULL to MP_QSTRnull to avoid intern collisions.Josh Lloyd
Fixes #5140.
2018-09-20py/parsenum: Avoid rounding errors with negative powers-of-10.Romain Goyet
This patches avoids multiplying with negative powers-of-10 when parsing floating-point values, when those powers-of-10 can be exactly represented as a positive power. When represented as a positive power and used to divide, the resulting float will not have any rounding errors. The issue is that mp_parse_num_decimal will sometimes not give the closest floating representation of the input string. Eg for "0.3", which can't be represented exactly in floating point, mp_parse_num_decimal gives a slightly high (by 1LSB) result. This is because it computes the answer as 3 * 0.1, and since 0.1 also can't be represented exactly, multiplying by 3 multiplies up the rounding error in the 0.1. Computing it as 3 / 10, as now done by the change in this commit, gives an answer which is as close to the true value of "0.3" as possible.
2018-06-12py/lexer: Add support for underscores in numeric literals.Damien George
This is a very convenient feature introduced in Python 3.6 by PEP 515.
2018-05-22py/parsenum: Adjust braces so they are balanced.Damien George
2018-05-21py/parsenum: Avoid undefined behavior parsing floats with large exponents.Jeff Epler
Fuzz testing combined with the undefined behavior sanitizer found that parsing unreasonable float literals like 1e+9999999999999 resulted in undefined behavior due to overflow in signed integer arithmetic, and a wrong result being returned.
2018-05-21py/parsenum: Use int instead of mp_int_t for parsing float exponent.Damien George
There is no need to use the mp_int_t type which may be 64-bits wide, there is enough bit-width in a normal int to parse reasonable exponents. Using int helps to reduce code size for 64-bit ports, especially nan-boxing builds. (Similarly for the "dig" variable which is now an unsigned int.)
2018-02-08py/parsenum: Fix parsing of floats that are close to subnormal.Damien George
Prior to this patch, a float literal that was close to subnormal would have a loss of precision when parsed. The worst case was something like float('10000000000000000000e-326') which returned 0.0.
2017-11-27py/parsenum: Improve parsing of floating point numbers.Damien George
This patch improves parsing of floating point numbers by converting all the digits (integer and fractional) together into a number 1 or greater, and then applying the correct power of 10 at the very end. In particular the multiple "multiply by 0.1" operations to build a fraction are now combined together and applied at the same time as the exponent, at the very end. This helps to retain precision during parsing of floats, and also includes a check that the number doesn't overflow during the parsing. One benefit is that a float will have the same value no matter where the decimal point is located, eg 1.23 == 123e-2.
2017-07-31all: Use the name MicroPython consistently in commentsAlexander Steffen
There were several different spellings of MicroPython present in comments, when there should be only one.
2017-03-28py: Use mp_raise_TypeError/mp_raise_ValueError helpers where possible.Damien George
Saves 168 bytes on bare-arm.
2016-12-28py/parsenum: Fix warning for signed/unsigned comparison.Damien George
2016-12-28py/parsenum: Simplify and generalise decoding of digit values.Damien George
This function should be able to parse integers with any value for the base, because it is called by int('xxx', base).
2016-11-03py: Add MICROPY_FLOAT_CONST macro for defining float constants.Damien George
All float constants in the core should use this macro to prevent unnecessary creation of double-precision floats, which makes code less efficient.
2016-10-17py: Use mp_raise_msg helper function where appropriate.Damien George
Saves the following number of bytes of code space: 176 for bare-arm, 352 for minimal, 272 for unix x86-64, 140 for stmhal, 120 for esp8266.
2016-03-29py/parsenum: Use pow function to apply exponent to decimal number.Damien George
Pow is already a dependency when compiling with floats, so may as well use it here to reduce code size and speed up the conversion for most cases.
2016-03-14py/parsenum: Fix compiler warnings for no decl and signed comparison.Damien George
2016-03-14py/parsenum: Use size_t to count bytes, and int for type of base arg.Damien George
size_t is the proper type to count number of bytes in a string. The base argument does not need to be a full mp_uint_t, int is enough.
2015-11-29py: Wrap all obj-ptr conversions in MP_OBJ_TO_PTR/MP_OBJ_FROM_PTR.Damien George
This allows the mp_obj_t type to be configured to something other than a pointer-sized primitive type. This patch also includes additional changes to allow the code to compile when sizeof(mp_uint_t) != sizeof(void*), such as using size_t instead of mp_uint_t, and various casts.
2015-10-01py/parsenum: Provide detailed error for int parsing with escaped bytes.Damien George
This patch adds more fine grained error message control for errors when parsing integers (now has terse, normal and detailed). When detailed is enabled, the error now escapes bytes when printing them so they can be more easily seen.
2015-06-23py: Clarify comment in parsenum.c about ValueError vs SyntaxError.Damien George
2015-06-23py: Change exception type to ValueError when error reporting is terse.Daniel Campora
Addresses issue #1347
2015-05-30py/parsenum.c: Rename "raise" func to "raise_exc" to avoid name clash.Damien George
"raise" is a common word that was found to exist in a vendor's stdlib.
2015-03-16py: Fix printing of error message when parsing malformed integer.Damien George
2015-02-08py: Parse big-int/float/imag constants directly in parser.Damien George
Previous to this patch, a big-int, float or imag constant was interned (made into a qstr) and then parsed at runtime to create an object each time it was needed. This is wasteful in RAM and not efficient. Now, these constants are parsed straight away in the parser and turned into objects. This allows constants with large numbers of digits (so addresses issue #1103) and takes us a step closer to #722.
2015-01-01py: Move to guarded includes, everywhere in py/ core.Damien George
Addresses issue #1022.