user/sven/linux.git/include/linux/dax.h, branch v4.10.4

mm: Invalidate DAX radix tree entries only if appropriate

2016-12-27T04:29:24Z

Currently invalidate_inode_pages2_range() and invalidate_mapping_pages() just delete all exceptional radix tree entries they find. For DAX this is not desirable as we track cache dirtiness in these entries and when they are evicted, we may not flush caches although it is necessary. This can for example manifest when we write to the same block both via mmap and via write(2) (to different offsets) and fsync(2) then does not properly flush CPU caches when modification via write(2) was the last one. Create appropriate DAX functions to handle invalidation of DAX entries for invalidate_inode_pages2_range() and invalidate_mapping_pages() and wire them up into the corresponding mm functions. Acked-by: Johannes Weiner Reviewed-by: Ross Zwisler Signed-off-by: Jan Kara Signed-off-by: Dan Williams

mm: move handling of COW faults into DAX code

2016-12-15T00:04:09Z

Move final handling of COW faults from generic code into DAX fault handler. That way generic code doesn't have to be aware of peculiarities of DAX locking so remove that knowledge and make locking functions private to fs/dax.c. Link: http://lkml.kernel.org/r/1479460644-25076-11-git-send-email-jack@suse.cz Signed-off-by: Jan Kara Acked-by: Kirill A. Shutemov Reviewed-by: Ross Zwisler Cc: Dan Williams Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

dax: rip out get_block based IO support

2016-11-21T01:48:36Z

No one uses functions using the get_block callback anymore. Rip them out and update documentation. Reviewed-by: Ross Zwisler Signed-off-by: Jan Kara Signed-off-by: Theodore Ts'o

dax: add struct iomap based DAX PMD support

2016-11-08T00:34:45Z

DAX PMDs have been disabled since Jan Kara introduced DAX radix tree based locking. This patch allows DAX PMDs to participate in the DAX radix tree based locking scheme so that they can be re-enabled using the new struct iomap based fault handlers. There are currently three types of DAX 4k entries: 4k zero pages, 4k DAX mappings that have an associated block allocation, and 4k DAX empty entries. The empty entries exist to provide locking for the duration of a given page fault. This patch adds three equivalent 2MiB DAX entries: Huge Zero Page (HZP) entries, PMD DAX entries that have associated block allocations, and 2 MiB DAX empty entries. Unlike the 4k case where we insert a struct page* into the radix tree for 4k zero pages, for HZP we insert a DAX exceptional entry with the new RADIX_DAX_HZP flag set. This is because we use a single 2 MiB zero page in every 2MiB hole mapping, and it doesn't make sense to have that same struct page* with multiple entries in multiple trees. This would cause contention on the single page lock for the one Huge Zero Page, and it would break the page->index and page->mapping associations that are assumed to be valid in many other places in the kernel. One difficult use case is when one thread is trying to use 4k entries in radix tree for a given offset, and another thread is using 2 MiB entries for that same offset. The current code handles this by making the 2 MiB user fall back to 4k entries for most cases. This was done because it is the simplest solution, and because the use of 2MiB pages is already opportunistic. If we were to try to upgrade from 4k pages to 2MiB pages for a given range, we run into the problem of how we lock out 4k page faults for the entire 2MiB range while we clean out the radix tree so we can insert the 2MiB entry. We can solve this problem if we need to, but I think that the cases where both 2MiB entries and 4K entries are being used for the same range will be rare enough and the gain small enough that it probably won't be worth the complexity. Signed-off-by: Ross Zwisler Reviewed-by: Jan Kara Signed-off-by: Dave Chinner

dax: move RADIX_DAX_* defines to dax.h

2016-11-08T00:33:35Z

The RADIX_DAX_* defines currently mostly live in fs/dax.c, with just RADIX_DAX_ENTRY_LOCK being in include/linux/dax.h so it can be used in mm/filemap.c. When we add PMD support, though, mm/filemap.c will also need access to the RADIX_DAX_PTE type so it can properly construct a 4k sized empty entry. Instead of shifting the defines between dax.c and dax.h as they are individually used in other code, just move them wholesale to dax.h so they'll be available when we need them. Signed-off-by: Ross Zwisler Reviewed-by: Christoph Hellwig Reviewed-by: Jan Kara Signed-off-by: Dave Chinner

dax: correct dax iomap code namespace

2016-11-08T00:32:46Z

The recently added DAX functions that use the new struct iomap data structure were named iomap_dax_rw(), iomap_dax_fault() and iomap_dax_actor(). These are actually defined in fs/dax.c, though, so should be part of the "dax" namespace and not the "iomap" namespace. Rename them to dax_iomap_rw(), dax_iomap_fault() and dax_iomap_actor() respectively. Signed-off-by: Ross Zwisler Suggested-by: Dave Chinner Reviewed-by: Christoph Hellwig Reviewed-by: Jan Kara Signed-off-by: Dave Chinner

dax: remove dax_pmd_fault()

2016-11-08T00:32:35Z

dax_pmd_fault() is the old struct buffer_head + get_block_t based 2 MiB DAX fault handler. This fault handler has been disabled for several kernel releases, and support for PMDs will be reintroduced using the struct iomap interface instead. Signed-off-by: Ross Zwisler Reviewed-by: Christoph Hellwig Reviewed-by: Jan Kara Signed-off-by: Dave Chinner

dax: coordinate locking for offsets in PMD range

2016-11-08T00:32:20Z

DAX radix tree locking currently locks entries based on the unique combination of the 'mapping' pointer and the pgoff_t 'index' for the entry. This works for PTEs, but as we move to PMDs we will need to have all the offsets within the range covered by the PMD to map to the same bit lock. To accomplish this, for ranges covered by a PMD entry we will instead lock based on the page offset of the beginning of the PMD entry. The 'mapping' pointer is still used in the same way. Signed-off-by: Ross Zwisler Reviewed-by: Christoph Hellwig Reviewed-by: Jan Kara Signed-off-by: Dave Chinner

dax: provide an iomap based fault handler

2016-09-19T01:24:50Z

Very similar to the existing dax_fault function, but instead of using the get_block callback we rely on the iomap_ops vector from iomap.c. That also avoids having to do two calls into the file system for write faults. Signed-off-by: Christoph Hellwig Reviewed-by: Ross Zwisler Signed-off-by: Dave Chinner

dax: provide an iomap based dax read/write path

2016-09-19T01:24:49Z

This is a much simpler implementation of the DAX read/write path that makes use of the iomap infrastructure. It does not try to mirror the direct I/O calling conventions and thus doesn't have to deal with i_dio_count or the end_io handler, but instead leaves locking and filesystem-specific I/O completion to the caller. Signed-off-by: Christoph Hellwig Reviewed-by: Ross Zwisler Signed-off-by: Dave Chinner