summaryrefslogtreecommitdiff
path: root/py/objstr.c
AgeCommit message (Collapse)Author
2023-05-19py/objstr: Return unsupported binop instead of raising TypeError.Damien George
So that user types can implement reverse operators and have them work with str on the left-hand-side, eg `"a" + UserType()`. Signed-off-by: Damien George <damien@micropython.org>
2022-12-06py: Remove the word "yet" from exception messages.Damien George
These unimplemented features may never be implemented, and having the word "yet" there takes up space. Signed-off-by: Damien George <damien@micropython.org>
2022-11-08py/objarray: Detect bytearray(str) without an encoding.Jim Mussared
This prevents a very subtle bug caused by writing e.g. `bytearray('\xfd')` which gives you `(0xc3, 0xbd)`. This work was funded through GitHub Sponsors. Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
2022-10-11py/objstr: Add a helper to set mp_obj_str_t data.Jim Mussared
Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
2022-09-26py/objstr: Don't treat bytes as unicode in str.count.Jim Mussared
`b'\xaa \xaa'.count(b'\xaa')` now (correctly) returns 2 instead of 1. Fixes issue #9404. This work was funded through GitHub Sponsors. Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
2022-09-19py/obj: Optimise code size and performance for make_new as a slot.Jim Mussared
The check for make_new (i.e. used to determine something's type) is now more complicated due to the slot access. This commit changes the inlining of a few frequently-used helpers to overall improve code size and performance.
2022-09-19py/obj: Convert make_new into a mp_obj_type_t slot.Jim Mussared
Instead of being an explicit field, it's now a slot like all the other methods. This is a marginal code size improvement because most types have a make_new (100/138 on PYBV11), however it improves consistency in how types are declared, removing the special case for make_new. Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
2022-09-19py/obj: Merge getiter and iternext mp_obj_type_t slots.Jim Mussared
The goal here is to remove a slot (making way to turn make_new into a slot) as well as reduce code size by the ~40 references to mp_identity_getiter and mp_stream_unbuffered_iter. This introduces two new type flags: - MP_TYPE_FLAG_ITER_IS_ITERNEXT: This means that the "iter" slot in the type is "iternext", and should use the identity getiter. - MP_TYPE_FLAG_ITER_IS_CUSTOM: This means that the "iter" slot is a pointer to a mp_getiter_iternext_custom_t instance, which then defines both getiter and iternext. And a third flag that is the OR of both, MP_TYPE_FLAG_ITER_IS_STREAM: This means that the type should use the identity getiter, and mp_stream_unbuffered_iter as iternext. Finally, MP_TYPE_FLAG_ITER_IS_GETITER is defined as a no-op flag to give the default case where "iter" is "getiter". Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
2022-09-19all: Remove unnecessary locals_dict cast.Jim Mussared
Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
2022-09-19all: Make all mp_obj_type_t defs use MP_DEFINE_CONST_OBJ_TYPE.Jim Mussared
In preparation for upcoming rework of mp_obj_type_t layout. Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
2022-09-19all: Simplify buffer protocol to just a "get buffer" callback.Jim Mussared
The buffer protocol type only has a single member, and this existing layout creates problems for the upcoming split/slot-index mp_obj_type_t layout optimisations. If we need to make the buffer protocol more sophisticated in the future either we can rely on the mp_obj_type_t optimisations to just add additional slots to mp_obj_type_t or re-visit the buffer protocol then. This change is a no-op in terms of generated code. Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
2022-08-26py/objstr: Always validate utf-8 for mp_obj_new_str.Jim Mussared
All uses of this are either tiny strings or not-known-to-be-safe. Update comments for mp_obj_new_str_copy and mp_obj_new_str_of_type. Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
2022-08-26py/objstr: Optimise mp_obj_new_str_from_vstr for known-safe strings.Jim Mussared
The new `mp_obj_new_str_from_utf8_vstr` can be used when you know you already have a unicode-safe string. Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
2022-08-26py/objstr: Always ensure mp_obj_str_from_vstr is unicode-safe.Jim Mussared
Now that we have `mp_obj_new_str_type_from_vstr` (private helper used by objstr.c) split from the public API (`mp_obj_new_str_from_vstr`), we can enforce a unicode check at the public API without incurring a performance cost on the various objstr.c methods (which are already working on known unicode-safe strings). Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
2022-08-26py/objstr: Split mp_obj_str_from_vstr into bytes/str versions.Jim Mussared
Previously the desired output type was specified. Now make the type part of the function name. Because this function is used in a few places this saves code size due to smaller call-site. This makes `mp_obj_new_str_type_from_vstr` a private function of objstr.c (which is almost the only place where the output type isn't a compile-time constant). This saves ~140 bytes on PYBV11. Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
2022-08-12py/objstr: Add hex/fromhex to bytes/memoryview/bytearray.Jim Mussared
These were added in Python 3.5. Enabled via MICROPY_PY_BUILTINS_BYTES_HEX, and enabled by default for all ports that currently have ubinascii. Rework ubinascii to use the implementation of these methods. Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
2022-08-11py/objstr: Consolidate methods for str/bytes/bytearray/array.Andrew Leech
This commit adds the bytes methods to bytearray, matching CPython. The existing implementations of these methods for str/bytes are reused for bytearray with minor updates to match CPython return types. For details on the CPython behaviour see https://docs.python.org/3/library/stdtypes.html#bytes-and-bytearray-operations The work to merge locals tables for str/bytes/bytearray/array was done by @jimmo. Because of this merging of locals the change in code size for this commit is mostly negative: bare-arm: +0 +0.000% minimal x86: +29 +0.018% unix x64: -792 -0.128% standard[incl -448(data)] unix nanbox: -436 -0.078% nanbox[incl -448(data)] stm32: -40 -0.010% PYBV10 cc3200: -32 -0.017% esp8266: -28 -0.004% GENERIC esp32: -72 -0.005% GENERIC[incl -200(data)] mimxrt: -40 -0.011% TEENSY40 renesas-ra: -40 -0.006% RA6M2_EK nrf: -16 -0.009% pca10040 rp2: -64 -0.013% PICO samd: +148 +0.105% ADAFRUIT_ITSYBITSY_M4_EXPRESS
2022-07-18py/obj: Add static safety checks to mp_obj_is_type().Yonatan Goldschmidt
Commit d96cfd13e3a464862c introduced a regression by breaking existing users of mp_obj_is_type(.., &mp_obj_bool). This function (and associated helpers like mp_obj_is_int()) have some specific nuances, and mistakes like this one can happen again. This commit adds mp_obj_is_exact_type() which behaves like the the old mp_obj_is_type(). The new mp_obj_is_type() has the same prototype but it attempts to statically assert that it's not called with types which should be checked using mp_obj_is_type(). If called with any of these types: int, str, bool, NoneType - it will cause a compilation error. Additional checked types (e.g function types) can be added in the future. Existing users of mp_obj_is_type() with the now "invalid" types, were translated to use mp_obj_is_exact_type(). The use of MP_STATIC_ASSERT() is not bulletproof - usually GCC (and other compilers) can't statically check conditions that are only known during link-time (like variables' addresses comparison). However, in this case, GCC is able to statically detect these conditions, probably because it's the exact same object - `&mp_type_int == &mp_type_int` is detected. Misuses of this function with runtime-chosen types (e.g: `mp_obj_type_t *x = ...; mp_obj_is_type(..., x);` won't be detected. MSC is unable to detect this, so we use MP_STATIC_ASSERT_NOT_MSC(). Compiling with this commit and without the fix for d96cfd13e3a464862c shows that it detects the problem. Signed-off-by: Yonatan Goldschmidt <yon.goldschmidt@gmail.com>
2022-05-03all: Use mp_obj_malloc everywhere it's applicable.Jim Mussared
This replaces occurences of foo_t *foo = m_new_obj(foo_t); foo->base.type = &foo_type; with foo_t *foo = mp_obj_malloc(foo_t, &foo_type); Excludes any places where base is a sub-field or when new0/memset is used. Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
2022-01-19py/objstr: Support '{:08}'.format("Jan") like Python 3.10.Jeff Epler
The new test has an .exp file, because it is not compatible with Python 3.9 and lower. See CPython version of the issue at https://bugs.python.org/issue27772 Signed-off-by: Jeff Epler <jepler@gmail.com>
2021-07-15py: Introduce and use mp_raise_type_arg helper.Damien George
To reduce code size. Signed-off-by: Damien George <damien@micropython.org>
2021-04-27py: Add option to compile without any error messages at all.Damien George
This introduces a new option, MICROPY_ERROR_REPORTING_NONE, which completely disables all error messages. To be used in cases where MicroPython needs to fit in very limited systems. Signed-off-by: Damien George <damien@micropython.org>
2020-12-07py/mpprint: Fix length calculation for strings with precision-modifier.Joris Peeraer
Two issues are tackled: 1. The calculation of the correct length to print is fixed to treat the precision as a maximum length instead as the exact length. This is done for both qstr (%q) and for regular str (%s). 2. Fix the incorrect use of mp_printf("%.*s") to mp_print_strn(). Because of the fix of above issue, some testcases that would print an embedded null-byte (^@ in test-output) would now fail. The bug here is that "%s" was used to print null-bytes. Instead, mp_print_strn is used to make sure all bytes are outputted and the exact length is respected. Test-cases are added for both %s and %q with a combination of precision and padding specifiers.
2020-09-24py/objstr: Make bytes(bytes_obj) return bytes_obj.Iyassou Shimels
Calling the bytes constructor on a bytes object returns the original bytes object. This saves allocating a new instance, and matches CPython. Signed-off-by: Iyassou Shimels <s.iyassou@gmail.com>
2020-04-23all: Format code to add space after C++-style comment start.stijn
Note: the uncrustify configuration is explicitly set to 'add' instead of 'force' in order not to alter the comments which use extra spaces after // as a means of indenting text for clarity.
2020-04-05all: Use MP_ERROR_TEXT for all error messages.Jim Mussared
2020-04-05py: Use preprocessor to detect error reporting level (terse/detailed).Jim Mussared
Instead of compiler-level if-logic. This is necessary to know what error strings are included in the build at the preprocessor stage, so that string compression can be implemented.
2020-03-11py/objstr: Remove duplicate % in error string.Tom Collins
The double-% was added in 11de8399fe5f9ef54589b14470faf8d4fcc5ccaa (Jun 2014) when such errors were formatted with printf. But then 55830dd9bf4fee87c0a6d3f38c51614fea0eb483 (Dec 2018) changed mp_obj_new_exception_msg() to not format the message, as discussed in #3004. So such error strings are no longer formatted and a % is just that.
2020-02-28all: Reformat C and Python source code with tools/codeformat.py.Damien George
This is run with uncrustify 0.70.1, and black 19.10b0.
2020-02-13py: Add mp_raise_msg_varg helper and use it where appropriate.Damien George
This commit adds mp_raise_msg_varg(type, fmt, ...) as a helper for nlr_raise(mp_obj_new_exception_msg_varg(type, fmt, ...)). It makes the C-level API for raising exceptions more consistent, and reduces code size on most ports: bare-arm: +28 +0.042% minimal x86: +100 +0.067% unix x64: -56 -0.011% unix nanbox: -300 -0.068% stm32: -204 -0.054% PYBV10 cc3200: +0 +0.000% esp8266: -64 -0.010% GENERIC esp32: -104 -0.007% GENERIC nrf: -136 -0.094% pca10040 samd: +0 +0.000% ADAFRUIT_ITSYBITSY_M4_EXPRESS
2020-01-24py/obj.h: Add and use mp_obj_is_bool() helper.Yonatan Goldschmidt
Commit d96cfd13e3a464862cecffb2718c6286b52c77b0 introduced a regression in testing for bool objects, that such objects were in some cases no longer recognised and bools, eg when using mp_obj_is_type(o, &mp_type_bool), or mp_obj_is_integer(o). This commit fixes that problem by adding mp_obj_is_bool(o). Builds with MICROPY_OBJ_IMMEDIATE_OBJS enabled check if the object is any of the const True or False objects. Builds without it use the old method of ->type checking, which compiles to smaller code (compared with the former mentioned method). Fixes #5538.
2020-01-09py: Make mp_obj_get_type() return a const ptr to mp_obj_type_t.Damien George
Most types are in rodata/ROM, and mp_obj_base_t.type is a constant pointer, so enforce this const-ness throughout the code base. If a type ever needs to be modified (eg a user type) then a simple cast can be used.
2019-12-27py/objstr: Don't use inline GET_STR_DATA_LEN for object-repr D.Damien George
Changing to use the helper function mp_obj_str_get_data_no_check() reduces code size of nan-boxing builds by about 1000 bytes.
2019-10-22py/objstr: Size-optimise failure path for mp_obj_str_get_buffer.Jim Mussared
These fields are never looked at if the function returns non-zero.
2019-09-26py: Rename MP_QSTR_NULL to MP_QSTRnull to avoid intern collisions.Josh Lloyd
Fixes #5140.
2019-02-12py: Downcase all MP_OBJ_IS_xxx macros to make a more consistent C API.Damien George
These macros could in principle be (inline) functions so it makes sense to have them lower case, to match the other C API functions. The remaining macros that are upper case are: - MP_OBJ_TO_PTR, MP_OBJ_FROM_PTR - MP_OBJ_NEW_SMALL_INT, MP_OBJ_SMALL_INT_VALUE - MP_OBJ_NEW_QSTR, MP_OBJ_QSTR_VALUE - MP_OBJ_FUN_MAKE_SIG - MP_DECLARE_CONST_xxx - MP_DEFINE_CONST_xxx These must remain macros because they are used when defining const data (at least, MP_OBJ_NEW_SMALL_INT is so it makes sense to have MP_OBJ_SMALL_INT_VALUE also a macro). For those macros that have been made lower case, compatibility macros are provided for the old names so that users do not need to change their code immediately.
2019-02-06py: Update my copyright info on some files.Paul Sokolovsky
Based on git history.
2018-10-22py/objstr: Make str.count() method configurable.Paul Sokolovsky
Configurable via MICROPY_PY_BUILTINS_STR_COUNT. Default is enabled. Disabled for bare-arm, minimal, unix-minimal and zephyr ports. Disabling it saves 408 bytes on x86.
2018-09-26py/objstr: format: Return bytes result for bytes format string.Paul Sokolovsky
This is an improvement over previous behavior when str was returned for both str and bytes input format. This new behaviour is also consistent with how the % operator works, as well as many other str/bytes methods. It should be noted that it's not how current versions of CPython work, where there's a gap in the functionality and bytes.format() is not supported.
2018-09-20py/objstr: Make % (__mod__) formatting operator configurable.Paul Sokolovsky
Default is enabled, disabled for minimal builds. Saves 1296 bytes on x86, 976 bytes on ARM.
2018-09-20py: Shorten error messages by using contractions and some rewording.Damien George
2018-07-30py/objstr: In format error message, use common string with %s for type.Damien George
This error message did not consume all of its variable args, a bug introduced long ago in baf6f14deb567ab626c1b05213af346108f41700. By fixing it to use %s (instead of keeping the string as-is and deleting the last arg) the same error message string is now reused three times in this format function and gives a code size reduction of around 130 bytes. It also now gives a better error message when a non-string is passed in as an argument to format, eg '{:d}'.format([]).
2018-04-05py/objstr: In find/rfind, don't crash when end < start.Jeff Epler
2018-03-30py/runtime: Check that keys in dicts passed as ** args are strings.Damien George
Prior to this patch the code would crash if a key in a ** dict was anything other than a str or qstr. This is because mp_setup_code_state() assumes that keys in kwargs are qstrs (for efficiency). Thanks to @jepler for finding the bug.
2018-02-20py/objstr: Remove unnecessary check for positive splits variable.Damien George
At this point in the code the variable "splits" is guaranteed to be positive due to the check for "splits == 0" above it.
2018-02-19py/objstr: Protect against creating bytes(n) with n negative.Damien George
Prior to this patch uPy (on a 32-bit arch) would have severe issues when calling bytes(-1): such a call would call vstr_init_len(vstr, -1) which would then +1 on the len and call vstr_init(vstr, 0), which would then round this up and allocate a small amount of memory for the vstr. The bytes constructor would then attempt to zero out all this memory, thinking it had allocated 2^32-1 bytes.
2018-02-14py/unicode: Clean up utf8 funcs and provide non-utf8 inline versions.Damien George
This patch provides inline versions of the utf8 helper functions for the case when unicode is disabled (MICROPY_PY_BUILTINS_STR_UNICODE set to 0). This saves code size. The unichar_charlen function is also renamed to utf8_charlen to match the other utf8 helper functions, and the signature of this function is adjusted for consistency (const char* -> const byte*, mp_uint_t -> size_t).
2017-11-29py: Annotate func defs with NORETURN when their corresp decls have it.Damien George
2017-11-24py/runtime: Add MP_BINARY_OP_CONTAINS as reverse of MP_BINARY_OP_IN.Damien George
Before this patch MP_BINARY_OP_IN had two meanings: coming from bytecode it meant that the args needed to be swapped, but coming from within the runtime meant that the args were already in the correct order. This lead to some confusion in the code and comments stating how args were reversed. It also lead to 2 bugs: 1) containment for a subclass of a native type didn't work; 2) the expression "{True} in True" would illegally succeed and return True. In both of these cases it was because the args to MP_BINARY_OP_IN ended up being reversed twice. To fix these things this patch introduces MP_BINARY_OP_CONTAINS which corresponds exactly to the __contains__ special method, and this is the operator that built-in types should implement. MP_BINARY_OP_IN is now only emitted by the compiler and is converted to MP_BINARY_OP_CONTAINS by swapping the arguments.
2017-11-16py/objstr: When constructing str from bytes, check for existing qstr.Damien George
This patch uses existing qstr data where possible when constructing a str from a bytes object.