summaryrefslogtreecommitdiff
path: root/py/lexer.c
AgeCommit message (Collapse)Author
2023-11-03py/lexer: Change token position for new lines.Mathieu Serandour
Set the position of new line tokens as the end of the preceding line instead of the beginning of the next line. This is done by first moving the pointer to the end of the current line to skip any whitespace, record the position for the token, then finaly skip any other line and whitespace. The previous behavior was to skip every new line and whitespace, including the indent of the next line, before recording the token position. (Note that both lex->emit_dent and lex->nested_bracket_level equal 0 if had_physical_newline == true, which allows simplifying the if-logic for MP_TOKEN_NEWLINE.) And update the cmd_parsetree.py test expected output, because the position of the new-line token has changed. Fixes issue #12792. Signed-off-by: Mathieu Serandour <mathieu.serandour@numworks.fr>
2023-10-12py/builtinevex: Handle invalid filenames for execfile.Jim Mussared
If a non-string buffer was passed to execfile, then it would be passed as a non-null-terminated char* to mp_lexer_new_from_file. This changes mp_lexer_new_from_file to take a qstr instead (as in almost all cases a qstr will be created from this input anyway to set the `__file__` attribute on the module). This now makes execfile require a string (not generic buffer) argument, which is probably a good fix to make anyway. Fixes issue #12522. This work was funded through GitHub Sponsors. Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
2023-09-29py/lexer: Add missing initialisation for fstring_args_idx.Jim Mussared
This was missed in 692d36d779192f32371f7f9daa845b566f26968d. Probably never noticed because everything enables `MICROPY_GC_CONSERVATIVE_CLEAR`, but found via ASAN thanks to @gwangmu & @chibinz. This work was funded through GitHub Sponsors. Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
2023-06-14py/lexer: Allow conversion specifiers in f-strings (e.g. !r).Jared Hancock
PEP-498 allows for conversion specifiers like !r and !s to convert the expression declared in braces to be passed through repr() and str() respectively. This updates the logic that detects the end of the expression to also stop when it sees "![rs]" that is either at the end of the f-string or before the ":" indicating the start of the format specifier. The "![rs]" is now retained in the format string, whereas previously it stayed on the end of the expression leading to a syntax error. Previously: `f"{x!y:z}"` --> `"{:z}".format(x!y)` Now: `f"{x!y:z}"` --> `"{!y:z}".format(x)` Note that "!a" is not supported by `str.format` as MicroPython has no `ascii()`, but now this will raise the correct error. Updated cpydiff and added tests. Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
2023-01-20py/lexer: Wrap in parenthesis all f-string arguments passed to format.Jim Mussared
This is important for literal tuples, e.g. f"{a,b,}, {c}" --> "{}".format((a,b), (c),) which would otherwise result in either a syntax error or the wrong result. Fixes issue #9635. This work was funded through GitHub Sponsors. Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
2022-05-17py/persistentcode: Remove unicode feature flag from .mpy file.Damien George
Prior to this commit, even with unicode disabled .py and .mpy files could contain unicode characters, eg by entering them directly in a string as utf-8 encoded. The only thing the compiler disallowed (with unicode disabled) was using \uxxxx and \Uxxxxxxxx notation to specify a character within a string with value >= 0x100; that would give a SyntaxError. With this change mpy-cross will now accept \u and \U notation to insert a character with value >= 0x100 into a string (because the -mno-unicode option is now gone, there's no way to forbid this). The runtime will happily work with strings with such characters, just like it already works with strings with characters that were utf-8 encoded directly. This change simplifies things because there are no longer any feature flags in .mpy files, and any bytecode .mpy will now run on any target. Signed-off-by: Damien George <damien@micropython.org>
2021-11-25py/lexer: Support nested [] and {} characters within f-string params.Damien George
Signed-off-by: Damien George <damien@micropython.org>
2021-08-19py/lexer: Clear fstring_args vstr on lexer free.Jim Mussared
This was missed in 692d36d779192f32371f7f9daa845b566f26968d. It's not strictly necessary as the GC will clean it anyway, but it's good to pre-emptively gc_free() all the blocks used in lexing/parsing. Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
2021-08-14py: Implement partial PEP-498 (f-string) support.Jim Mussared
This implements (most of) the PEP-498 spec for f-strings and is based on https://github.com/micropython/micropython/pull/4998 by @klardotsh. It is implemented in the lexer as a syntax translation to `str.format`: f"{a}" --> "{}".format(a) It also supports: f"{a=}" --> "a={}".format(a) This is done by extracting the arguments into a temporary vstr buffer, then after the string has been tokenized, the lexer input queue is saved and the contents of the temporary vstr buffer are injected into the lexer instead. There are four main limitations: - raw f-strings (`fr` or `rf` prefixes) are not supported and will raise `SyntaxError: raw f-strings are not supported`. - literal concatenation of f-strings with adjacent strings will fail "{}" f"{a}" --> "{}{}".format(a) (str.format will incorrectly use the braces from the non-f-string) f"{a}" f"{a}" --> "{}".format(a) "{}".format(a) (cannot concatenate) - PEP-498 requires the full parser to understand the interpolated argument, however because this entirely runs in the lexer it cannot resolve nested braces in expressions like f"{'}'}" - The !r, !s, and !a conversions are not supported. Includes tests and cpydiffs. Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
2020-10-22py, extmod: Introduce and use MP_FALLTHROUGH macro.Emil Renner Berthing
Newer GCC versions are able to warn about switch cases that fall through. This is usually a sign of a forgotten break statement, but in the few cases where a fall through is intended we annotate it with this macro to avoid the warning.
2020-06-16py/compile: Implement PEP 572, assignment expressions with := operator.Damien George
The syntax matches CPython and the semantics are equivalent except that, unlike CPython, MicroPython allows using := to assign to comprehension iteration variables, because disallowing this would take a lot of code to check for it. The new compile-time option MICROPY_PY_ASSIGN_EXPR selects this feature and is enabled by default, following MICROPY_PY_ASYNC_AWAIT.
2020-04-05all: Use MP_ERROR_TEXT for all error messages.Jim Mussared
2020-02-28all: Reformat C and Python source code with tools/codeformat.py.Damien George
This is run with uncrustify 0.70.1, and black 19.10b0.
2019-09-26py: Add support for matmul operator @ as per PEP 465.Damien George
To make progress towards MicroPython supporting Python 3.5, adding the matmul operator is important because it's a really "low level" part of the language, being a new token and modifications to the grammar. It doesn't make sense to make it configurable because 1) it would make the grammar and lexer complicated/messy; 2) no other operators are configurable; 3) it's not a feature that can be "dynamically plugged in" via an import. And matmul can be useful as a general purpose user-defined operator, it doesn't have to be just for numpy use. Based on work done by Jim Mussared.
2018-06-12py/lexer: Add support for underscores in numeric literals.Damien George
This is a very convenient feature introduced in Python 3.6 by PEP 515.
2017-10-04all: Remove inclusion of internal py header files.Damien George
Header files that are considered internal to the py core and should not normally be included directly are: py/nlr.h - internal nlr configuration and declarations py/bc0.h - contains bytecode macro definitions py/runtime0.h - contains basic runtime enums Instead, the top-level header files to include are one of: py/obj.h - includes runtime0.h and defines everything to use the mp_obj_t type py/runtime.h - includes mpstate.h and hence nlr.h, obj.h, runtime0.h, and defines everything to use the general runtime support functions Additional, specific headers (eg py/objlist.h) can be included if needed.
2017-08-13all: Raise exceptions via mp_raise_XXXJavier Candeira
- Changed: ValueError, TypeError, NotImplementedError - OSError invocations unchanged, because the corresponding utility function takes ints, not strings like the long form invocation. - OverflowError, IndexError and RuntimeError etc. not changed for now until we decide whether to add new utility functions.
2017-07-31all: Use the name MicroPython consistently in commentsAlexander Steffen
There were several different spellings of MicroPython present in comments, when there should be only one.
2017-07-07py,extmod: Some casts and minor refactors to quiet compiler warnings.Tom Collins
2017-05-12py/lexer: Process CR earlier to allow newlines checks on chr1.Tom Collins
Resolves an issue where lexer failed to accept CR after line continuation character. It also simplifies the code.
2017-05-09py/lexer: Simplify lexer startup by using dummy bytes and next_char().Tom Collins
Now consistently uses the EOL processing ("\r" and "\r\n" convert to "\n") and EOF processing (ensure "\n" before EOF) provided by next_char(). In particular the lexer can now correctly handle input that starts with CR.
2017-03-29py/lexer: Simplify and reduce code size for operator tokenising.Damien George
By removing the 'E' code from the operator token encoding mini-language the tokenising can be simplified. The 'E' code was only used for the != operator which is now handled as a special case; the optimisations for the general case more than make up for the addition of this single, special case. Furthermore, the . and ... operators can be handled in the same way as != which reduces the code size a little further. This simplification also removes a "goto". Changes in code size for this patch are (measured in bytes): bare-arm: -48 minimal x86: -64 unix x86-64: -112 unix nanbox: -64 stmhal: -48 cc3200: -48 esp8266: -76
2017-03-23py/lexer: Remove obsolete comment, since lexer can now raise exceptions.Damien George
2017-03-14py: Allow lexer to raise exceptions during construction.Damien George
This patch refactors the error handling in the lexer, to simplify it (ie reduce code size). A long time ago, when the lexer/parser/compiler were first written, the lexer and parser were designed so they didn't use exceptions (ie nlr) to report errors but rather returned an error code. Over time that has gradually changed, the parser in particular has more and more ways of raising exceptions. Also, the lexer never really handled all errors without raising, eg there were some memory errors which could raise an exception (and in these rare cases one would get a fatal nlr-not-handled fault). This patch accepts the fact that the lexer can raise exceptions in some cases and allows it to raise exceptions to handle all its errors, which are for the most part just out-of-memory errors during construction of the lexer. This makes the lexer a bit simpler, and also the persistent code stuff is simplified. What this means for users of the lexer is that calls to it must be wrapped in a nlr handler. But all uses of the lexer already have such an nlr handler for the parser (and compiler) so that doesn't put any extra burden on the callers.
2017-02-17py/lexer: Convert mp_uint_t to size_t where appropriate.Damien George
2017-02-17py: Do adjacent str/bytes literal concatenation in lexer, not compiler.Damien George
It's much more efficient in RAM and code size to do implicit literal string concatenation in the lexer, as opposed to the compiler. RAM usage is reduced because the concatenation can be done right away in the tokeniser by just accumulating the string/bytes literals into the lexer's vstr. Prior to this patch adjacent strings/bytes would create a parse tree (one node per string/bytes) and then in the compiler a whole new chunk of memory was allocated to store the concatenated string, which used more than double the memory compared to just accumulating in the lexer. This patch also significantly reduces code size: bare-arm: -204 minimal: -204 unix x64: -328 stmhal: -208 esp8266: -284 cc3200: -224
2017-02-17py/lexer: Simplify handling of line-continuation error.Damien George
Previous to this patch there was an explicit check for errors with line continuation (where backslash was not immediately followed by a newline). But this check is not necessary: if there is an error then the remaining logic of the tokeniser will reject the backslash and correctly produce a syntax error.
2017-02-17py/lexer: Use strcmp to make keyword searching more efficient.Damien George
Since the table of keywords is sorted, we can use strcmp to do the search and stop part way through the search if the comparison is less-than. Because all tokens that are names are subject to this search, this optimisation will improve the overall speed of the lexer when processing a script. The change also decreases code size by a little bit because we now use strcmp instead of the custom str_strn_equal function.
2017-02-17py/lexer: Move check for keyword to name-tokenising block.Damien George
Keywords only needs to be searched for if the token is a MP_TOKEN_NAME, so we can move the seach to the part of the code that does the tokenising for MP_TOKEN_NAME.
2017-02-17py/lexer: Simplify handling of indenting of very first token.Damien George
2017-02-16py/lexer: Don't generate string representation for period or ellipsis.Damien George
It's not needed.
2017-01-30extmod/vfs_fat: Remove MICROPY_READER_FATFS component.Damien George
2017-01-27extmod: Add generic VFS sub-system.Damien George
This provides mp_vfs_XXX functions (eg mount, open, listdir) which are agnostic to the underlying filesystem type, and just require an object with the relevant filesystem-like methods (eg .mount, .open, .listidr) which can then be mounted. These mp_vfs_XXX functions would typically be used by a port to implement the "uos" module, and mp_vfs_open would be the builtin open function. This feature is controlled by MICROPY_VFS, disabled by default.
2016-12-22py/lexer: Permanently disable the mp_lexer_show_token function.Damien George
The lexer is very mature and this debug function is no longer used. If it's really needed one can uncomment it and recompile.
2016-12-22py/lexer: Remove unnecessary check for EOF in lexer's next_char func.Damien George
This check always fails (ie chr0 is never EOF) because the callers of this function never call it past the end of the input stream. And even if they did it would be harmless because 1) reader.readbyte must continue to return an EOF char if the stream is exhausted; 2) next_char would just count the subsequent EOF's as characters worth 1 column.
2016-12-22py/lexer: Remove unreachable code in string tokeniser.Damien George
2016-12-22tests/basics/lexer: Add a test for newline-escaping within a string.Damien George
2016-11-16py/lexer: Make lexer use an mp_reader as its source.Damien George
2016-11-16py/lexer: Rewrite mp_lexer_new_from_fd in terms of mp_reader.Damien George
2016-11-16py/lexer: Provide generic mp_lexer_new_from_file based on mp_reader.Damien George
If a port defines MICROPY_READER_POSIX or MICROPY_READER_FATFS then lexer.c now provides an implementation of mp_lexer_new_from_file using the mp_reader_new_file function.
2016-11-16py/lexer: Rewrite mp_lexer_new_from_str_len in terms of mp_reader_mem.Damien George
2016-10-12py/lexer: Remove unnecessary code, and unreachable code.Damien George
Setting emit_dent=0 is unnecessary because arriving in that part of the if-logic will guarantee that emit_dent is already zero. The block to check indent_top(lex)>0 is unreachable because a newline is always inserted an the end of the input stream, and hence dedents are always processed before EOF.
2016-09-19py/vstr: Remove vstr.had_error flag and inline basic vstr functions.Damien George
The vstr.had_error flag was a relic from the very early days which assumed that the malloc functions (eg m_new, m_renew) returned NULL if they failed to allocate. But that's no longer the case: these functions will raise an exception if they fail. Since it was impossible for had_error to be set, this patch introduces no change in behaviour. An alternative option would be to change the malloc calls to the _maybe variants, which return NULL instead of raising, but then a lot of code will need to explicitly check if the vstr had an error and raise if it did. The code-size savings for this patch are, in bytes: bare-arm:188, minimal:456, unix(NDEBUG,x86-64):368, stmhal:228, esp8266:360.
2016-05-20py: Declare constant data as properly constant.Damien George
Otherwise some compilers (eg without optimisation) will put this read-only data in RAM instead of ROM.
2016-04-13py: add async/await/async for/async with syntaxpohmelie
They are sugar for marking function as generator, "yield from" and pep492 python "semantically equivalents" respectively. @dpgeorge was the original author of this patch, but @pohmelie made changes to implement `async for` and `async with`.
2016-02-25py: Add MICROPY_DYNAMIC_COMPILER option to config compiler at runtime.Damien George
This new compile-time option allows to make the bytecode compiler configurable at runtime by setting the fields in the mp_dynamic_compiler structure. By using this feature, the compiler can generate bytecode that targets any MicroPython runtime/VM, regardless of the host and target compile-time settings. Options so far that fall under this dynamic setting are: - maximum number of bits that a small int can hold; - whether caching of lookups is used in the bytecode; - whether to use unicode strings or not (lexer behaviour differs, and therefore generated string constants differ).
2015-12-18py: Add MICROPY_ENABLE_COMPILER and MICROPY_PY_BUILTINS_EVAL_EXEC opts.Damien George
MICROPY_ENABLE_COMPILER can be used to enable/disable the entire compiler, which is useful when only loading of pre-compiled bytecode is supported. It is enabled by default. MICROPY_PY_BUILTINS_EVAL_EXEC controls support of eval and exec builtin functions. By default they are only included if MICROPY_ENABLE_COMPILER is enabled. Disabling both options saves about 40k of code size on 32-bit x86.
2015-09-07py/lexer: Properly classify floats that look like hex numbers.Damien George
Eg 0e0 almost looks like a hex number but in fact is a float.
2015-09-07py/lexer: Raise SyntaxError when unicode char point out of range.Damien George
2015-09-07py/lexer: Raise NotImplError for unicode name escape, instead of assert.Damien George