summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--Documentation/Makefile1
-rw-r--r--Documentation/technical/large-object-promisors.adoc64
-rw-r--r--Documentation/technical/meson.build1
3 files changed, 34 insertions, 32 deletions
diff --git a/Documentation/Makefile b/Documentation/Makefile
index a3fbd29744..a3ba25e659 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -122,6 +122,7 @@ TECH_DOCS += technical/bundle-uri
TECH_DOCS += technical/commit-graph
TECH_DOCS += technical/directory-rename-detection
TECH_DOCS += technical/hash-function-transition
+TECH_DOCS += technical/large-object-promisors
TECH_DOCS += technical/long-running-process-protocol
TECH_DOCS += technical/multi-pack-index
TECH_DOCS += technical/packfile-uri
diff --git a/Documentation/technical/large-object-promisors.adoc b/Documentation/technical/large-object-promisors.adoc
index dea8dafa66..2aa815e023 100644
--- a/Documentation/technical/large-object-promisors.adoc
+++ b/Documentation/technical/large-object-promisors.adoc
@@ -34,8 +34,8 @@ a new object representation for large blobs as discussed in:
https://lore.kernel.org/git/xmqqbkdometi.fsf@gitster.g/
-0) Non goals
-------------
+Non goals
+---------
- We will not discuss those client side improvements here, as they
would require changes in different parts of Git than this effort.
@@ -90,8 +90,8 @@ later in this document:
even more to host content with larger blobs or more large blobs
than currently.
-I) Issues with the current situation
-------------------------------------
+I Issues with the current situation
+-----------------------------------
- Some statistics made on GitLab repos have shown that more than 75%
of the disk space is used by blobs that are larger than 1MB and
@@ -138,8 +138,8 @@ I) Issues with the current situation
complaining that these tools require significant effort to set up,
learn and use correctly.
-II) Main features of the "Large Object Promisors" solution
-----------------------------------------------------------
+II Main features of the "Large Object Promisors" solution
+---------------------------------------------------------
The main features below should give a rough overview of how the
solution may work. Details about needed elements can be found in
@@ -166,7 +166,7 @@ format. They should be used along with main remotes that contain the
other objects.
Note 1
-++++++
+^^^^^^
To clarify, a LOP is a normal promisor remote, except that:
@@ -178,7 +178,7 @@ To clarify, a LOP is a normal promisor remote, except that:
itself.
Note 2
-++++++
+^^^^^^
Git already makes it possible for a main remote to also be a promisor
remote storing both regular objects and large blobs for a client that
@@ -186,13 +186,13 @@ clones from it with a filter on blob size. But here we explicitly want
to avoid that.
Rationale
-+++++++++
+^^^^^^^^^
LOPs aim to be good at handling large blobs while main remotes are
already good at handling other objects.
Implementation
-++++++++++++++
+^^^^^^^^^^^^^^
Git already has support for multiple promisor remotes, see
link:partial-clone.html#using-many-promisor-remotes[the partial clone documentation].
@@ -213,19 +213,19 @@ remote helper (see linkgit:gitremote-helpers[7]) which makes the
underlying object storage appear like a remote to Git.
Note
-++++
+^^^^
A LOP can be a promisor remote accessed using a remote helper by
both some clients and the main remote.
Rationale
-+++++++++
+^^^^^^^^^
This looks like the simplest way to create LOPs that can cheaply
handle many large blobs.
Implementation
-++++++++++++++
+^^^^^^^^^^^^^^
Remote helpers are quite easy to write as shell scripts, but it might
be more efficient and maintainable to write them using other languages
@@ -247,7 +247,7 @@ The underlying object storage that a LOP uses could also serve as
storage for large files handled by Git LFS.
Rationale
-+++++++++
+^^^^^^^^^
This would simplify the server side if it wants to both use a LOP and
act as a Git LFS server.
@@ -259,7 +259,7 @@ On the server side, a main remote should have a way to offload to a
LOP all its blobs with a size over a configurable threshold.
Rationale
-+++++++++
+^^^^^^^^^
This makes it easy to set things up and to clean things up. For
example, an admin could use this to manually convert a repo not using
@@ -268,7 +268,7 @@ some users would sometimes push large blobs, a cron job could use this
to regularly make sure the large blobs are moved to the LOP.
Implementation
-++++++++++++++
+^^^^^^^^^^^^^^
Using something based on `git repack --filter=...` to separate the
blobs we want to offload from the other Git objects could be a good
@@ -284,13 +284,13 @@ should have ways to prevent oversize blobs to be fetched, and also
perhaps pushed, into it.
Rationale
-+++++++++
+^^^^^^^^^
A main remote containing many oversize blobs would defeat the purpose
of LOPs.
Implementation
-++++++++++++++
+^^^^^^^^^^^^^^
The way to offload to a LOP discussed in 4) above can be used to
regularly offload oversize blobs. About preventing oversize blobs from
@@ -326,18 +326,18 @@ large blobs directly from the LOP and the server would not need to
fetch those blobs from the LOP to be able to serve the client.
Note
-++++
+^^^^
For fetches instead of clones, a protocol negotiation might not always
happen, see the "What about fetches?" FAQ entry below for details.
Rationale
-+++++++++
+^^^^^^^^^
Security, configurability and efficiency of setting things up.
Implementation
-++++++++++++++
+^^^^^^^^^^^^^^
A "promisor-remote" protocol v2 capability looks like a good way to
implement this. The way the client and server use this capability
@@ -356,7 +356,7 @@ the client should be able to offload some large blobs it has fetched,
but might not need anymore, to the LOP.
Note
-++++
+^^^^
It might depend on the context if it should be OK or not for clients
to offload large blobs they have created, instead of fetched, directly
@@ -367,13 +367,13 @@ This should be discussed and refined when we get closer to
implementing this feature.
Rationale
-+++++++++
+^^^^^^^^^
On the client, the easiest way to deal with unneeded large blobs is to
offload them.
Implementation
-++++++++++++++
+^^^^^^^^^^^^^^
This is very similar to what 4) above is about, except on the client
side instead of the server side. So a good solution to 4) could likely
@@ -385,8 +385,8 @@ when cloning (see 6) above). Also if the large blobs were fetched from
a LOP, it is likely, and can easily be confirmed, that the LOP still
has them, so that they can just be removed from the client.
-III) Benefits of using LOPs
----------------------------
+III Benefits of using LOPs
+--------------------------
Many benefits are related to the issues discussed in "I) Issues with
the current situation" above:
@@ -406,8 +406,8 @@ the current situation" above:
- Reduced storage needs on the client side.
-IV) FAQ
--------
+IV FAQ
+------
What about using multiple LOPs on the server and client side?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -533,7 +533,7 @@ some objects it already knows about but doesn't have because they are
on a promisor remote.
Regular fetch
-+++++++++++++
+^^^^^^^^^^^^^
In a regular fetch, the client will contact the main remote and a
protocol negotiation will happen between them. It's a good thing that
@@ -551,7 +551,7 @@ new fetch will happen in the same way as the previous clone or fetch,
using, or not using, the same LOP(s) as last time.
"Backfill" or "lazy" fetch
-++++++++++++++++++++++++++
+^^^^^^^^^^^^^^^^^^^^^^^^^^
When there is a backfill fetch, the client doesn't necessarily contact
the main remote first. It will try to fetch from its promisor remotes
@@ -576,8 +576,8 @@ from the client when it fetches from them. The client could get the
token when performing a protocol negotiation with the main remote (see
section II.6 above).
-V) Future improvements
-----------------------
+V Future improvements
+---------------------
It is expected that at the beginning using LOPs will be mostly worth
it either in a corporate context where the Git version that clients
diff --git a/Documentation/technical/meson.build b/Documentation/technical/meson.build
index a13aafcfbb..34b5ebe5c3 100644
--- a/Documentation/technical/meson.build
+++ b/Documentation/technical/meson.build
@@ -13,6 +13,7 @@ articles = [
'commit-graph.adoc',
'directory-rename-detection.adoc',
'hash-function-transition.adoc',
+ 'large-object-promisors.adoc',
'long-running-process-protocol.adoc',
'multi-pack-index.adoc',
'packfile-uri.adoc',