summaryrefslogtreecommitdiff
path: root/py/objstr.c
AgeCommit message (Collapse)Author
2018-09-20py: Shorten error messages by using contractions and some rewording.Damien George
2018-07-30py/objstr: In format error message, use common string with %s for type.Damien George
This error message did not consume all of its variable args, a bug introduced long ago in baf6f14deb567ab626c1b05213af346108f41700. By fixing it to use %s (instead of keeping the string as-is and deleting the last arg) the same error message string is now reused three times in this format function and gives a code size reduction of around 130 bytes. It also now gives a better error message when a non-string is passed in as an argument to format, eg '{:d}'.format([]).
2018-04-05py/objstr: In find/rfind, don't crash when end < start.Jeff Epler
2018-03-30py/runtime: Check that keys in dicts passed as ** args are strings.Damien George
Prior to this patch the code would crash if a key in a ** dict was anything other than a str or qstr. This is because mp_setup_code_state() assumes that keys in kwargs are qstrs (for efficiency). Thanks to @jepler for finding the bug.
2018-02-20py/objstr: Remove unnecessary check for positive splits variable.Damien George
At this point in the code the variable "splits" is guaranteed to be positive due to the check for "splits == 0" above it.
2018-02-19py/objstr: Protect against creating bytes(n) with n negative.Damien George
Prior to this patch uPy (on a 32-bit arch) would have severe issues when calling bytes(-1): such a call would call vstr_init_len(vstr, -1) which would then +1 on the len and call vstr_init(vstr, 0), which would then round this up and allocate a small amount of memory for the vstr. The bytes constructor would then attempt to zero out all this memory, thinking it had allocated 2^32-1 bytes.
2018-02-14py/unicode: Clean up utf8 funcs and provide non-utf8 inline versions.Damien George
This patch provides inline versions of the utf8 helper functions for the case when unicode is disabled (MICROPY_PY_BUILTINS_STR_UNICODE set to 0). This saves code size. The unichar_charlen function is also renamed to utf8_charlen to match the other utf8 helper functions, and the signature of this function is adjusted for consistency (const char* -> const byte*, mp_uint_t -> size_t).
2017-11-29py: Annotate func defs with NORETURN when their corresp decls have it.Damien George
2017-11-24py/runtime: Add MP_BINARY_OP_CONTAINS as reverse of MP_BINARY_OP_IN.Damien George
Before this patch MP_BINARY_OP_IN had two meanings: coming from bytecode it meant that the args needed to be swapped, but coming from within the runtime meant that the args were already in the correct order. This lead to some confusion in the code and comments stating how args were reversed. It also lead to 2 bugs: 1) containment for a subclass of a native type didn't work; 2) the expression "{True} in True" would illegally succeed and return True. In both of these cases it was because the args to MP_BINARY_OP_IN ended up being reversed twice. To fix these things this patch introduces MP_BINARY_OP_CONTAINS which corresponds exactly to the __contains__ special method, and this is the operator that built-in types should implement. MP_BINARY_OP_IN is now only emitted by the compiler and is converted to MP_BINARY_OP_CONTAINS by swapping the arguments.
2017-11-16py/objstr: When constructing str from bytes, check for existing qstr.Damien George
This patch uses existing qstr data where possible when constructing a str from a bytes object.
2017-11-16py/objstr: Make mp_obj_new_str_of_type check for existing interned qstr.Damien George
The function mp_obj_new_str_of_type is a general str object constructor used in many places in the code to create either a str or bytes object. When creating a str it should first check if the string data already exists as an interned qstr, and if so then return the qstr object. This patch makes the function have such behaviour, which helps to reduce heap usage by reusing existing interned data where possible. The old behaviour of mp_obj_new_str_of_type (which didn't check for existing interned data) is made available through the function mp_obj_new_str_copy, but should only be used in very special cases. One consequence of this patch is that the following expression is now True: 'abc' is ' abc '.split()[0]
2017-11-16py/objstr: Remove "make_qstr_if_not_already" arg from mp_obj_new_str.Damien George
This patch simplifies the str creation API to favour the common case of creating a str object that is not forced to be interned. To force interning of a new str the new mp_obj_new_str_via_qstr function is added, and should only be used if warranted. Apart from simplifying the mp_obj_new_str function (and making it have the same signature as mp_obj_new_bytes), this patch also reduces code size by a bit (-16 bytes for bare-arm and roughly -40 bytes on the bare-metal archs).
2017-10-04py/objstr: Make empty bytes object have a null-terminating byte.Damien George
Because a lot of string processing functions assume there is a null terminating byte, so they can work in an efficient way. Fixes issue #3334.
2017-10-04all: Remove inclusion of internal py header files.Damien George
Header files that are considered internal to the py core and should not normally be included directly are: py/nlr.h - internal nlr configuration and declarations py/bc0.h - contains bytecode macro definitions py/runtime0.h - contains basic runtime enums Instead, the top-level header files to include are one of: py/obj.h - includes runtime0.h and defines everything to use the mp_obj_t type py/runtime.h - includes mpstate.h and hence nlr.h, obj.h, runtime0.h, and defines everything to use the general runtime support functions Additional, specific headers (eg py/objlist.h) can be included if needed.
2017-09-19py/objstr: strip: Don't strip "\0" by default.Paul Sokolovsky
An issue was due to incorrectly taking size of default strip characters set.
2017-09-06py/objstr: Add check for valid UTF-8 when making a str from bytes.tll
This patch adds a function utf8_check() to check for a valid UTF-8 encoded string, and calls it when constructing a str from raw bytes. The feature is selectable at compile time via MICROPY_PY_BUILTINS_STR_UNICODE_CHECK and is enabled if unicode is enabled. It costs about 110 bytes on Thumb-2, 150 bytes on Xtensa and 170 bytes on x86-64.
2017-08-29all: Convert mp_uint_t to mp_unary_op_t/mp_binary_op_t where appropriateDamien George
The unary-op/binary-op enums are already defined, and there are no arithmetic tricks used with these types, so it makes sense to use the correct enum type for arguments that take these values. It also reduces code size quite a bit for nan-boxing builds.
2017-08-29py/objstr: startswith, endswith: Check arg to be a string.Paul Sokolovsky
Otherwise, it will silently get incorrect result on other values types, including CPython tuple form like "foo.png".endswith(("png", "jpg")) (which MicroPython doesn't support for unbloatedness).
2017-08-13all: Raise exceptions via mp_raise_XXXJavier Candeira
- Changed: ValueError, TypeError, NotImplementedError - OSError invocations unchanged, because the corresponding utility function takes ints, not strings like the long form invocation. - OverflowError, IndexError and RuntimeError etc. not changed for now until we decide whether to add new utility functions.
2017-08-09py/objstr: Raise an exception for wrong type on RHS of str binary op.Damien George
The main case to catch is invalid types for the containment operator, of the form str.__contains__(non-str).
2017-07-31all: Use the name MicroPython consistently in commentsAlexander Steffen
There were several different spellings of MicroPython present in comments, when there should be only one.
2017-07-04py/objstr: Remove unnecessary "sign" variable in formatting code.Damien George
2017-07-02py/objstr: Move uPy function wrappers to just after the C function.Damien George
This matches the coding/layout style of all the other objects.
2017-06-08py/objstr: Allow to compile with obj-repr D, and unicode disabled.Damien George
2017-06-02py/objstr: Catch case of negative "maxsplit" arg to str.rsplit().Damien George
Negative values mean no limit on the number of splits so should delegate to the .split() method.
2017-05-29various: Spelling fixesVille Skyttä
2017-04-02py/objstr: Use MICROPY_FULL_CHECKS for range checking when constructing bytes.Paul Sokolovsky
Split this setting from MICROPY_CPYTHON_COMPAT. The idea is to be able to keep MICROPY_CPYTHON_COMPAT disabled, but still pass more of regression testsuite. In particular, this fixes last failing test in basics/ for Zephyr port.
2017-03-29py: Change mp_uint_t to size_t for mp_obj_str_get_data len arg.Damien George
2017-03-29py: Convert mp_uint_t to size_t for tuple/list accessors.Damien George
This patch changes mp_uint_t to size_t for the len argument of the following public facing C functions: mp_obj_tuple_get mp_obj_list_get mp_obj_get_array These functions take a pointer to the len argument (to be filled in by the function) and callers of these functions should update their code so the type of len is changed to size_t. For ports that don't use nan-boxing there should be no change in generate code because the size of the type remains the same (word sized), and in a lot of cases there won't even be a compiler warning if the type remains as mp_uint_t. The reason for this change is to standardise on the use of size_t for variables that count memory (or memory related) sizes/lengths. It helps builds that use nan-boxing.
2017-03-23py: Use size_t as len argument and return type of mp_get_index.Damien George
These values are used to compute memory addresses and so size_t is the more appropriate type to use.
2017-03-20py/objstr: Use better msg in bad implicit str/bytes conversion exceptionstijn
Instead of always reporting some object cannot be implicitly be converted to a 'str', even when it is a 'bytes' object, adjust the logic so that when trying to convert str to bytes it is shown like that. This will still report bad implicit conversion from e.g. 'int to bytes' as 'int to str' but it will not result in the confusing 'can't convert 'str' object to str implicitly' anymore for calls like b'somestring'.count('a').
2017-03-16py/objstr: Fix eager optimisation of str/bytes addition.Damien George
The RHS can only be returned if it is the same type as the LHS.
2017-03-07py: Use mp_obj_get_array where sequence may be a tuple or a list.Krzysztof Blazewicz
2017-02-16py: Add iter_buf to getiter type method.Damien George
Allows to iterate over the following without allocating on the heap: - tuple - list - string, bytes - bytearray, array - dict (not dict.keys, dict.values, dict.items) - set, frozenset Allows to call the following without heap memory: - all, any, min, max, sum TODO: still need to allocate stack memory in bytecode for iter_buf.
2017-02-16py/objstr: Convert mp_uint_t to size_t (and use int) where appropriate.Damien George
2017-02-03py/objstr: Convert some instances of mp_uint_t to size_t.Damien George
2017-02-03py/objstr: Give correct behaviour when passing a dict to %-formatting.Damien George
This patch fixes two main things: - dicts can be printed directly using '%s' % dict - %-formatting should not crash when passed a non-dict to, eg, '%(foo)s'
2017-01-27py/objstr: Optimize string concatenation with empty string.Paul Sokolovsky
In this, don't allocate copy, just return non-empty string. This helps with a standard pattern of buffering data in case of short reads: buf = b"" while ...: s = f.read(...) buf += s ... For a typical case when single read returns all data needed, there won't be extra allocation. This optimization helps uasyncio.
2016-09-27py/objstr: Remove unreachable function used only for terse error msgs.Damien George
2016-09-02py: If str/bytes hash is 0 then explicitly compute it.Damien George
2016-08-14py/objstr: Use mp_raise_{Type,Value}Error instead of mp_raise_msg.Damien George
This patch does further refactoring using the new mp_raise_TypeError and mp_raise_ValueError functions.
2016-08-12py: Get rid of assert() in method argument checking functions.Paul Sokolovsky
Checks for number of args removes where guaranteed by function descriptor, self checking is replaced with mp_check_self(). In few cases, exception is raised instead of assert.
2016-08-12py/runtime: Factor out exception raising helpers.Paul Sokolovsky
Introduce mp_raise_msg(), mp_raise_ValueError(), mp_raise_TypeError() instead of previous pattern nlr_raise(mp_obj_new_exception_msg(...)). Save few bytes on each call, which are many.
2016-08-07py/objstr,objstrunicode: Fix inconistent #if indentation.Paul Sokolovsky
2016-08-07py/objstr: Make .partition()/.rpartition() methods configurable.Paul Sokolovsky
Default is disabled, enabled for unix port. Saves 600 bytes on x86.
2016-05-22py/objstr: Fix mix-signed comparison in str.center().Paul Sokolovsky
2016-05-22py/objstr*: Properly ifdef str.center().Dave Hylands
2016-05-22py/objstr: Implement str.center().Paul Sokolovsky
Disabled by default, enabled in unix port. Need for this method easily pops up when working with text UI/reporting, and coding workalike manually again and again counter-productive.
2016-05-13py/objstr: Make dedicated splitlines function, supporting diff newlines.Damien George
It now supports \n, \r and \r\n as newline separators. Adds 56 bytes to stmhal and 80 bytes to unix x86-64. Fixes issue #1689.
2016-05-09Revert "py/objstr: .format(): Avoid call to vstr_null_terminated_str()."Paul Sokolovsky
This reverts commit 6de8dbb4880e58c68a08205cb2b9c15940143439. The change was incorrect (correct change would require comparing with end pointer in each if statement in the block).