diff options
| author | Andrew Morton <akpm@zip.com.au> | 2002-06-02 03:22:17 -0700 |
|---|---|---|
| committer | Linus Torvalds <torvalds@home.transmeta.com> | 2002-06-02 03:22:17 -0700 |
| commit | 02eaba7ffd145ef1389f4adc420f94af20ab3068 (patch) | |
| tree | 3272dee889c99b2ce974258bc899f853504a0a9d /include/linux | |
| parent | 0f2b38d5e227d968384c4f97cc67dabb341552af (diff) | |
[PATCH] fix swapcache packing in the radix tree
First some terminology: this patch introduces a kernel-wide `pgoff_t'
type. It is the index of a page into the pagecache. The thing at
page->index. For most mappings it is also the offset of the page into
that mapping. This type has a very distinct function in the kernel and
it needs a name. I don't have any particular plans to go and migrate
everything so we can support 64-bit pagecache indices on x86, but this
would be the way to do it.
This patch improves the packing density of swapcache pages in the radix
tree.
A swapcache page is identified by the `swap type' (indexes the swap
device) and the `offset' (into that swap device). These two numbers
are encoded into a `swp_entry_t' machine word in arch-specific code
because the resulting number is placed into pagetables in a form which
will generate a fault.
The kernel also need to generate a pgoff_t for that page to index it
into the swapper_space radix tree. That pgoff_t is usually
bitwise-identical to the swp_entry_t. That worked OK when the
pagecache was using a hash. But with a radix tree, it produces
catastrophically bad results.
x86 (and many other architectures) place the `type' field into the
low-order bits of the swp_entry_t. So *all* swapcache pages are
basically identical in the eight low-order bits. This produces a very
sparse radix tree for swapcache. I'm observing packing densities of 1%
to 2%: so the typical 128-slot radix tree node has only one or two
pages in it.
The end result is that the kernel needs to allocate approximately one
new radix-tree node for each page which is added to the swapcache. So
no wonder we're having radix-tree node exhaustion during swapout!
(It's actually quite encouraging that the kernel works as well as it
does).
The patch changes the encoding of the swp_entry_t so that its
most-significant bits contain the `type' field and the
least-significant bits contain the `offset' field, right-aligned.
That is: the encoding in swp_entry_t is now arch-independent. The new
file <linux/swapops.h> has conversion functions which convert the
swp_entry_t to and from its machine pte representation.
Packing density in the swapper_space mapping goes up to around 90%
(observed) and the kernel is tons happier under swap load.
An alternative approach would be to create new conversion functions
which convert an arch-specific swp_entry_t to and from a pgoff_t. I
tried that. It worked, but I liked it less.
Diffstat (limited to 'include/linux')
| -rw-r--r-- | include/linux/swap.h | 11 | ||||
| -rw-r--r-- | include/linux/swapops.h | 68 | ||||
| -rw-r--r-- | include/linux/types.h | 8 |
3 files changed, 86 insertions, 1 deletions
diff --git a/include/linux/swap.h b/include/linux/swap.h index 7e20b3016c7f..d0160265e3c5 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -11,7 +11,16 @@ #define SWAP_FLAG_PRIO_MASK 0x7fff #define SWAP_FLAG_PRIO_SHIFT 0 -#define MAX_SWAPFILES 32 +/* + * MAX_SWAPFILES defines the maximum number of swaptypes: things which can + * be swapped to. The swap type and the offset into that swap type are + * encoded into pte's and into pgoff_t's in the swapcache. Using five bits + * for the type means that the maximum number of swapcache pages is 27 bits + * on 32-bit-pgoff_t architectures. And that assumes that the architecture packs + * the type/offset into the pte as 5/27 as well. + */ +#define MAX_SWAPFILES_SHIFT 5 +#define MAX_SWAPFILES (1 << MAX_SWAPFILES_SHIFT) /* * Magic header for a swap area. The first part of the union is diff --git a/include/linux/swapops.h b/include/linux/swapops.h new file mode 100644 index 000000000000..500f9960d939 --- /dev/null +++ b/include/linux/swapops.h @@ -0,0 +1,68 @@ +/* + * swapcache pages are stored in the swapper_space radix tree. We want to + * get good packing density in that tree, so the index should be dense in + * the low-order bits. + * + * We arrange the `type' and `offset' fields so that `type' is at the five + * high-order bits of the smp_entry_t and `offset' is right-aligned in the + * remaining bits. + * + * swp_entry_t's are *never* stored anywhere in their arch-dependent format. + */ +#define SWP_TYPE_SHIFT(e) (sizeof(e.val) * 8 - MAX_SWAPFILES_SHIFT) +#define SWP_OFFSET_MASK(e) ((1 << SWP_TYPE_SHIFT(e)) - 1) + +/* + * Store a type+offset into a swp_entry_t in an arch-independent format + */ +static inline swp_entry_t swp_entry(unsigned type, pgoff_t offset) +{ + swp_entry_t ret; + + ret.val = (type << SWP_TYPE_SHIFT(ret)) | + (offset & SWP_OFFSET_MASK(ret)); + return ret; +} + +/* + * Extract the `type' field from a swp_entry_t. The swp_entry_t is in + * arch-independent format + */ +static inline unsigned swp_type(swp_entry_t entry) +{ + return (entry.val >> SWP_TYPE_SHIFT(entry)) & + ((1 << MAX_SWAPFILES_SHIFT) - 1); +} + +/* + * Extract the `offset' field from a swp_entry_t. The swp_entry_t is in + * arch-independent format + */ +static inline pgoff_t swp_offset(swp_entry_t entry) +{ + return entry.val & SWP_OFFSET_MASK(entry); +} + +/* + * Convert the arch-dependent pte representation of a swp_entry_t into an + * arch-independent swp_entry_t. + */ +static inline swp_entry_t pte_to_swp_entry(pte_t pte) +{ + swp_entry_t arch_entry; + + arch_entry = __pte_to_swp_entry(pte); + return swp_entry(__swp_type(arch_entry), __swp_offset(arch_entry)); +} + +/* + * Convert the arch-independent representation of a swp_entry_t into the + * arch-dependent pte representation. + */ +static inline pte_t swp_entry_to_pte(swp_entry_t entry) +{ + swp_entry_t arch_entry; + + arch_entry = __swp_entry(swp_type(entry), swp_offset(entry)); + return __swp_entry_to_pte(arch_entry); +} diff --git a/include/linux/types.h b/include/linux/types.h index 211461bc97c0..c102bcf8be83 100644 --- a/include/linux/types.h +++ b/include/linux/types.h @@ -124,6 +124,14 @@ typedef u64 sector_t; typedef unsigned long sector_t; #endif +/* + * The type of an index into the pagecache. Use a #define so asm/types.h + * can override it. + */ +#ifndef pgoff_t +#define pgoff_t unsigned long +#endif + #endif /* __KERNEL_STRICT_NAMES */ /* |
