summaryrefslogtreecommitdiff
path: root/t/unit-tests/t-basic.c
diff options
context:
space:
mode:
authorJeff King <peff@peff.net>2024-02-28 17:37:13 -0500
committerJunio C Hamano <gitster@pobox.com>2024-02-28 14:42:01 -0800
commitfae9627470615df5d40f2892e518447058567316 (patch)
treeec3d4c12d60b0ecd1a0c3b02781d2846df169c15 /t/unit-tests/t-basic.c
parent3c2a3fdc388747b9eaf4a4a4f2035c1c9ddb26d0 (diff)
upload-pack: drop separate v2 "haves" array
When upload-pack sees a "have" line in the v0 protocol, it immediately calls got_oid() with its argument and potentially produces an ACK response. In the v2 protocol, we simply record the argument in an oid_array, and only later process all of the "have" objects by calling the equivalent of got_oid() on the contents of the array. This makes some sense, as v2 is a pure request/response protocol, as opposed to v0's asynchronous negotiation phase. But there's a downside: a client can send us an infinite number of garbage "have" lines, which we'll happily slurp into the array, consuming memory. Whereas in v0, they are limited by the number of objects in the repository (because got_oid() only records objects we have ourselves, and we avoid duplicates by setting a flag on the object struct). We can make v2 behave more like v0 by also calling got_oid() directly when v2 parses a "have" line. Calling it early like this is OK because got_oid() itself does not interact with the client; it only confirms that we have the object and sets a few flags. Note that unlike v0, v2 does not ever (before or after this patch) check the return code of got_oid(), which lets the caller know whether we have the object. But again, that makes sense; v0 is using it to asynchronously tell the client to stop sending. In v2's synchronous protocol, we just discard those entries (and decide how to ACK at the end of each round). There is one slight tweak we need, though. In v2's state machine, we reach the SEND_ACKS state if the other side sent us any "have" lines, whether they were useful or not. Right now we do that by checking whether the "have" array had any entries, but if we record only the useful ones, that doesn't work. Instead, we can add a simple boolean that tells us whether we saw any have line (even if it was useless). This lets us drop the "haves" array entirely, as we're now placing objects directly into the "have_obj" object array (which is where got_oid() put them in the long run anyway). And as a bonus, we can drop the secondary "common" array used in process_haves_and_send_acks(). It was essentially a copy of "haves" minus the objects we do not have. But now that we are using "have_obj" directly, we know everything in it is useful. So in addition to protecting ourselves against malicious input, we should slightly lower our memory usage for normal inputs. Note that there is one user-visible effect. The trace2 output records the number of "haves". Previously this was the total number of "have" lines we saw, but now is the number of useful ones. We could retain the original meaning by keeping a separate counter, but it doesn't seem worth the effort; this trace info is for debugging and metrics, and arguably the count of common oids is at least as useful as the total count. Reported-by: Benjamin Flesch <benjaminflesch@icloud.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Diffstat (limited to 't/unit-tests/t-basic.c')
0 files changed, 0 insertions, 0 deletions