diff options
| author | Andrew Morton <akpm@digeo.com> | 2002-10-29 23:35:32 -0800 |
|---|---|---|
| committer | Linus Torvalds <torvalds@penguin.transmeta.com> | 2002-10-29 23:35:32 -0800 |
| commit | 38e419f5b012874111de20e4589c718421dc43ab (patch) | |
| tree | 158b7b3e906922349dc3362c998bfc55aa7959aa /kernel | |
| parent | afce7191a73f632a138f5511cbe245d39c526331 (diff) | |
[PATCH] hot-n-cold pages: bulk page allocator
This is the hot-n-cold-pages series. It introduces a per-cpu lockless
LIFO pool in front of the page allocator. For three reasons:
1: To reduce lock contention on the buddy lock: we allocate and free
pages in, typically, 16-page chunks.
2: To return cache-warm pages to page allocation requests.
3: As infrastructure for a page reservation API which can be used to
ensure that the GFP_ATOMIC radix-tree node and pte_chain allocations
cannot fail. That code is not complete, and does not absolutely
require hot-n-cold pages. It'll work OK though.
We add two queues per CPU. The "hot" queue contains pages which the
freeing code thought were likely to be cache-hot. By default, new
allocations are satisfied from this queue.
The "cold" queue contains pages which the freeing code expected to be
cache-cold. The cold queue is mainly for lock amortisation, although
it is possible to explicitly allocate cold pages. The readahead code
does that.
I have been hot and cold on these patches for quite some time - the
benefit is not great.
- 4% speedup in Randy Hron's benching of the autoconf regression
tests on a 4-way. Most of this came from savings in pte_alloc and
pmd_alloc: the pagetable clearing code liked the warmer pages (some
architectures still have the pgt_cache, and can perhaps do away with
them).
- 1% to 2% speedup in kernel compiles on my 4-way and Martin's 32-way.
- 60% speedup in a little test program which writes 80 kbytes to a
file and ftruncates it to zero again. Ran four instances of that on
4-way and it loved the cache warmth.
- 2.5% speedup in Specweb testing on 8-way
- The thing which won me over: an 11% increase in throughput of the
SDET benchmark on an 8-way PIII:
with hot & cold:
RESULT for 8 users is 17971 +12.1%
RESULT for 16 users is 17026 +12.0%
RESULT for 32 users is 17009 +10.4%
RESULT for 64 users is 16911 +10.3%
without:
RESULT for 8 users is 16038
RESULT for 16 users is 15200
RESULT for 32 users is 15406
RESULT for 64 users is 15331
SDET is a very old SPEC test which simulates a development
environment with a large number of users. Lots of users running a
mix of shell commands, basically.
These patches were written by Martin Bligh and myself.
This one implements rmqueue_bulk() - a function for removing multiple
pages of a given order from the buddy lists.
This is for lock amortisation: take the highly-contended zone->lock
with less frequency, do more work once it has been acquired.
Diffstat (limited to 'kernel')
0 files changed, 0 insertions, 0 deletions
