user/sven/linux.git/include/linux/hugetlb.h, branch v4.4.2

mm: hugetlb: proc: add HugetlbPages field to /proc/PID/status

2015-11-06T03:34:48Z

Currently there's no easy way to get per-process usage of hugetlb pages, which is inconvenient because userspace applications which use hugetlb typically want to control their processes on the basis of how much memory (including hugetlb) they use. So this patch simply provides easy access to the info via /proc/PID/status. Signed-off-by: Naoya Horiguchi Acked-by: Joern Engel Acked-by: David Rientjes Acked-by: Michal Hocko Cc: Mike Kravetz Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

hugetlbfs: add hugetlbfs_fallocate()

2015-09-08T22:35:28Z

This is based on the shmem version, but it has diverged quite a bit. We have no swap to worry about, nor the new file sealing. Add synchronication via the fault mutex table to coordinate page faults, fallocate allocation and fallocate hole punch. What this allows us to do is move physical memory in and out of a hugetlbfs file without having it mapped. This also gives us the ability to support MADV_REMOVE since it is currently implemented using fallocate(). MADV_REMOVE lets madvise() remove pages from the middle of a hugetlbfs file, which wasn't possible before. hugetlbfs fallocate only operates on whole huge pages. Based on code by Dave Hansen. Signed-off-by: Mike Kravetz Reviewed-by: Naoya Horiguchi Acked-by: Hillf Danton Cc: Dave Hansen Cc: David Rientjes Cc: Hugh Dickins Cc: Davidlohr Bueso Cc: Aneesh Kumar Cc: Christoph Hellwig Cc: Michal Hocko Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

hugetlbfs: New huge_add_to_page_cache helper routine

2015-09-08T22:35:28Z

Currently, there is only a single place where hugetlbfs pages are added to the page cache. The new fallocate code be adding a second one, so break the functionality out into its own helper. Signed-off-by: Dave Hansen Signed-off-by: Mike Kravetz Reviewed-by: Naoya Horiguchi Acked-by: Hillf Danton Cc: David Rientjes Cc: Hugh Dickins Cc: Davidlohr Bueso Cc: Aneesh Kumar Cc: Christoph Hellwig Cc: Michal Hocko Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

hugetlbfs: truncate_hugepages() takes a range of pages

2015-09-08T22:35:28Z

Modify truncate_hugepages() to take a range of pages (start, end) instead of simply start. If an end value of LLONG_MAX is passed, the current "truncate" functionality is maintained. Existing callers are modified to pass LLONG_MAX as end of range. By keying off end == LLONG_MAX, the routine behaves differently for truncate and hole punch. Page removal is now synchronized with page allocation via faults by using the fault mutex table. The hole punch case can experience the rare region_del error and must handle accordingly. Add the routine hugetlb_fix_reserve_counts to fix up reserve counts in the case where region_del returns an error. Since the routine handles more than just the truncate case, it is renamed to remove_inode_hugepages(). To be consistent, the routine truncate_huge_page() is renamed remove_huge_page(). Downstream of remove_inode_hugepages(), the routine hugetlb_unreserve_pages() is also modified to take a range of pages. hugetlb_unreserve_pages is modified to detect an error from region_del and pass it back to the caller. Signed-off-by: Mike Kravetz Reviewed-by: Naoya Horiguchi Acked-by: Hillf Danton Cc: Dave Hansen Cc: David Rientjes Cc: Hugh Dickins Cc: Davidlohr Bueso Cc: Aneesh Kumar Cc: Christoph Hellwig Cc: Michal Hocko Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

mm/hugetlb: expose hugetlb fault mutex for use by fallocate

2015-09-08T22:35:28Z

hugetlb page faults are currently synchronized by the table of mutexes (htlb_fault_mutex_table). fallocate code will need to synchronize with the page fault code when it allocates or deletes pages. Expose interfaces so that fallocate operations can be synchronized with page faults. Minor name changes to be more consistent with other global hugetlb symbols. Signed-off-by: Mike Kravetz Reviewed-by: Naoya Horiguchi Acked-by: Hillf Danton Cc: Dave Hansen Cc: David Rientjes Cc: Hugh Dickins Cc: Davidlohr Bueso Cc: Aneesh Kumar Cc: Christoph Hellwig Cc: Michal Hocko Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

mm/hugetlb: add cache of descriptors to resv_map for region_add

2015-09-08T22:35:28Z

hugetlbfs is used today by applications that want a high degree of control over huge page usage. Often, large hugetlbfs files are used to map a large number huge pages into the application processes. The applications know when page ranges within these large files will no longer be used, and ideally would like to release them back to the subpool or global pools for other uses. The fallocate() system call provides an interface for preallocation and hole punching within files. This patch set adds fallocate functionality to hugetlbfs. fallocate hole punch will want to remove a specific range of pages. When pages are removed, their associated entries in the region/reserve map will also be removed. This will break an assumption in the region_chg/region_add calling sequence. If a new region descriptor must be allocated, it is done as part of the region_chg processing. In this way, region_add can not fail because it does not need to attempt an allocation. To prepare for fallocate hole punch, create a "cache" of descriptors that can be used by region_add if necessary. region_chg will ensure there are sufficient entries in the cache. It will be necessary to track the number of in progress add operations to know a sufficient number of descriptors reside in the cache. A new routine region_abort is added to adjust this in progress count when add operations are aborted. vma_abort_reservation is also added for callers creating reservations with vma_needs_reservation/vma_commit_reservation. [akpm@linux-foundation.org: fix typo in comment, use more cols] Signed-off-by: Mike Kravetz Reviewed-by: Naoya Horiguchi Acked-by: Hillf Danton Cc: Dave Hansen Cc: David Rientjes Cc: Hugh Dickins Cc: Davidlohr Bueso Cc: Aneesh Kumar Cc: Christoph Hellwig Cc: Michal Hocko Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

mm: hugetlb: allow hugepages_supported to be architecture specific

2015-07-17T23:39:52Z

s390 has a constant hugepage size, by setting HPAGE_SHIFT we also change e.g. the pageblock_order, which should be independent in respect to hugepage support. With this patch every architecture is free to define how to check for hugepage support. Signed-off-by: Dominik Dingel Acked-by: Martin Schwidefsky Cc: Heiko Carstens Cc: Christian Borntraeger Cc: Michael Holzheu Cc: Gerald Schaefer Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

mm: hugetlb: cleanup using paeg_huge_active()

2015-04-15T23:35:19Z

Now we have an easy access to hugepages' activeness, so existing helpers to get the information can be cleaned up. [akpm@linux-foundation.org: s/PageHugeActive/page_huge_active/] Signed-off-by: Naoya Horiguchi Cc: Hugh Dickins Reviewed-by: Michal Hocko Cc: Mel Gorman Cc: Johannes Weiner Cc: David Rientjes Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

hugetlbfs: accept subpool min_size mount option and setup accordingly

2015-04-15T23:35:18Z

Make 'min_size=' be an option when mounting a hugetlbfs. This option takes the same value as the 'size' option. min_size can be specified without specifying size. If both are specified, min_size must be less that or equal to size else the mount will fail. If min_size is specified, then at mount time an attempt is made to reserve min_size pages. If the reservation fails, the mount fails. At umount time, the reserved pages are released. Signed-off-by: Mike Kravetz Cc: Davidlohr Bueso Cc: Aneesh Kumar Cc: Joonsoo Kim Cc: Andi Kleen Cc: David Rientjes Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

hugetlbfs: add minimum size tracking fields to subpool structure

2015-04-15T23:35:17Z

hugetlbfs allocates huge pages from the global pool as needed. Even if the global pool contains a sufficient number pages for the filesystem size at mount time, those global pages could be grabbed for some other use. As a result, filesystem huge page allocations may fail due to lack of pages. Applications such as a database want to use huge pages for performance reasons. hugetlbfs filesystem semantics with ownership and modes work well to manage access to a pool of huge pages. However, the application would like some reasonable assurance that allocations will not fail due to a lack of huge pages. At application startup time, the application would like to configure itself to use a specific number of huge pages. Before starting, the application can check to make sure that enough huge pages exist in the system global pools. However, there are no guarantees that those pages will be available when needed by the application. What the application wants is exclusive use of a subset of huge pages. Add a new hugetlbfs mount option 'min_size=' to indicate that the specified number of pages will be available for use by the filesystem. At mount time, this number of huge pages will be reserved for exclusive use of the filesystem. If there is not a sufficient number of free pages, the mount will fail. As pages are allocated to and freeed from the filesystem, the number of reserved pages is adjusted so that the specified minimum is maintained. This patch (of 4): Add a field to the subpool structure to indicate the minimimum number of huge pages to always be used by this subpool. This minimum count includes allocated pages as well as reserved pages. If the minimum number of pages for the subpool have not been allocated, pages are reserved up to this minimum. An additional field (rsv_hpages) is used to track the number of pages reserved to meet this minimum size. The hstate pointer in the subpool is convenient to have when reserving and unreserving the pages. Signed-off-by: Mike Kravetz Cc: Davidlohr Bueso Cc: Aneesh Kumar Cc: Joonsoo Kim Cc: Andi Kleen Cc: David Rientjes Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds