diff options
author | Damien George <damien.p.george@gmail.com> | 2020-04-19 23:47:22 +1000 |
---|---|---|
committer | Damien George <damien.p.george@gmail.com> | 2020-04-20 10:32:49 +1000 |
commit | 388d419ba39b061923d2568814195e8bf73330d4 (patch) | |
tree | bf381425e4a1d36fa77954a217e74af11c30520c /tools/metrics.py | |
parent | 1b1ceb67b25e0ea56c1e972514a48468fe478ad3 (diff) |
py/makecompresseddata.py: Make compression deterministic.
Error string compression is not deterministic in certain cases: it depends
on the Python version (whether dicts are ordered by default or not) and
probably also the order files are passed to this script, leading to a
difference in which words are included in the top 128 most common.
The changes in this commit use OrderedDict to keep parsed lines in a known
order, and, when computing how many bytes are saved by a given word, it
uses the word itself to break ties (which would otherwise be "random").
Diffstat (limited to 'tools/metrics.py')
0 files changed, 0 insertions, 0 deletions