| Age | Commit message (Collapse) | Author |
|
Pull smb server and smbdirect updates from Steve French:
- Fix tcp connection leak
- Fix potential use after free when freeing multichannel
- Fix locking problem in showing channel list
- Locking improvement for tree connection
- Fix infinite loop when signing errors
- Add /proc interface for monitoring server state
- Fixes to avoid mixing iWarp and InfiniBand/RoCEv1/RoCEv2
port ranges used for smbdirect
- Fixes for smbdirect credit handling problems, these make
the connections more reliable
* tag 'v7.0-rc-part1-ksmbd-and-smbdirect-fixes' of git://git.samba.org/ksmbd: (32 commits)
ksmbd: fix non-IPv6 build
ksmbd: convert tree_conns_lock to rw_semaphore
ksmbd: fix missing chann_lock while iterating session channel list
ksmbd: add chann_lock to protect ksmbd_chann_list xarray
smb: server: correct value for smb_direct_max_fragmented_recv_size
smb: client: correct value for smbd_max_fragmented_recv_size
smb: server: fix leak of active_num_conn in ksmbd_tcp_new_connection()
ksmbd: add procfs interface for runtime monitoring and statistics
ksmbd: fix infinite loop caused by next_smb2_rcv_hdr_off reset in error paths
smb: server: make use of rdma_restrict_node_type()
smb: client: make use of rdma_restrict_node_type()
RDMA/core: introduce rdma_restrict_node_type()
smb: client: let send_done handle a completion without IB_SEND_SIGNALED
smb: client: let smbd_post_send_negotiate_req() use smbd_post_send()
smb: client: fix last send credit problem causing disconnects
smb: client: make use of smbdirect_socket.send_io.bcredits
smb: client: use smbdirect_send_batch processing
smb: client: introduce and use smbd_{alloc, free}_send_io()
smb: client: split out smbd_ib_post_send()
smb: client: port and use the wait_for_credits logic used by server
...
|
|
Pull nfsd updates from Chuck Lever:
"Neil Brown and Jeff Layton contributed a dynamic thread pool sizing
mechanism for NFSD. The sunrpc layer now tracks minimum and maximum
thread counts per pool, and NFSD adjusts running thread counts based
on workload: idle threads exit after a timeout when the pool exceeds
its minimum, and new threads spawn automatically when all threads are
busy. Administrators control this behavior via the nfsdctl netlink
interface.
Rick Macklem, FreeBSD NFS maintainer, generously contributed server-
side support for the POSIX ACL extension to NFSv4, as specified in
draft-ietf-nfsv4-posix-acls. This extension allows NFSv4 clients to
get and set POSIX access and default ACLs using native NFSv4
operations, eliminating the need for sideband protocols. The feature
is gated by a Kconfig option since the IETF draft has not yet been
ratified.
Chuck Lever delivered numerous improvements to the xdrgen tool. Error
reporting now covers parsing, AST transformation, and invalid
declarations. Generated enum decoders validate incoming values against
valid enumerator lists. New features include pass-through line support
for embedding C directives in XDR specifications, 16-bit integer
types, and program number definitions. Several code generation issues
were also addressed.
When an administrator revokes NFSv4 state for a filesystem via the
unlock_fs interface, ongoing async COPY operations referencing that
filesystem are now cancelled, with CB_OFFLOAD callbacks notifying
affected clients.
The remaining patches in this pull request are clean-ups and minor
optimizations. Sincere thanks to all contributors, reviewers, testers,
and bug reporters who participated in the v7.0 NFSD development cycle"
* tag 'nfsd-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (45 commits)
NFSD: Add POSIX ACL file attributes to SUPPATTR bitmasks
NFSD: Add POSIX draft ACL support to the NFSv4 SETATTR operation
NFSD: Add support for POSIX draft ACLs for file creation
NFSD: Add support for XDR decoding POSIX draft ACLs
NFSD: Refactor nfsd_setattr()'s ACL error reporting
NFSD: Do not allow NFSv4 (N)VERIFY to check POSIX ACL attributes
NFSD: Add nfsd4_encode_fattr4_posix_access_acl
NFSD: Add nfsd4_encode_fattr4_posix_default_acl
NFSD: Add nfsd4_encode_fattr4_acl_trueform_scope
NFSD: Add nfsd4_encode_fattr4_acl_trueform
Add RPC language definition of NFSv4 POSIX ACL extension
NFSD: Add a Kconfig setting to enable support for NFSv4 POSIX ACLs
xdrgen: Implement pass-through lines in specifications
nfsd: cancel async COPY operations when admin revokes filesystem state
nfsd: add controls to set the minimum number of threads per pool
nfsd: adjust number of running nfsd threads based on activity
sunrpc: allow svc_recv() to return -ETIMEDOUT and -EBUSY
sunrpc: split new thread creation into a separate function
sunrpc: introduce the concept of a minimum number of threads per pool
sunrpc: track the max number of requested threads in a pool
...
|
|
Add a blank line after variable declarations in fatent.c and file.c.
This improves readability and makes code style more consistent
across the exfat subsystem.
Signed-off-by: William Hansen-Baird <william.hansen.baird@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
|
|
Else-branch is unnecessary after return statement in if-branch.
Remove to enhance readability and reduce indentation.
Signed-off-by: William Hansen-Baird <william.hansen.baird@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
|
|
This patch introduces a count parameter to exfat_get_cluster, which
serves as an input parameter for the caller to specify the desired
number of clusters, and as an output parameter to store the length
of consecutive clusters.
This patch can improve read performance by reducing the number of
get_block calls in sequential read scenarios. speacially in small
cluster size.
According to my test data, the performance improvement is
approximately 10% when read FAT_CHAIN file with 512 bytes of
cluster size.
454 MB/s -> 511 MB/s
Suggested-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn>
Reviewed-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
|
|
Change exfat_cache_lookup to return the cluster number of the last
cluster before the next cache (i.e., the end of the current cache range)
or the given 'end' if there is no next cache. This allows the caller to
know whether the next cluster after the current cache is cached.
The function signature is changed to accept an 'end' parameter, which
is the upper bound of the search range. The function now stops early
if it finds a cache that starts within the current cache's tail, meaning
caches are contiguous. The return value is the cluster number at which
the next cache starts (minus one) or the original 'end' if no next cache
is found.
The new behavior is illustrated as follows:
cache: [ccccccc-------ccccccccc]
search: [..................]
return: ^
Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn>
Reviewed-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
|
|
The current cache mechanism does not support reading clusters starting
from a file offset of zero. This patch enables that feature in
preparation for subsequent reads of contiguous clusters from offset zero.
1. support finding clusters with zero offset.
2. allow clusters with zero offset to be cached.
Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn>
Reviewed-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
|
|
This patch introduces a parameter 'count' to support fetching multiple
clusters in exfat_map_cluster. The returned 'count' indicates the number
of consecutive clusters, or 0 when the input cluster offset is past EOF.
And the 'count' is also an input parameter for the caller to specify the
required number of clusters.
Only NO_FAT_CHAIN files enable multi-cluster fetching in this patch.
After this patch, the time proportion of exfat_get_block has decreased,
The performance data is as follows:
Cluster size: 512 bytes
Sequential read of a 30GB NO_FAT_CHAIN file:
2.4GB/s -> 2.5 GB/s
proportion of exfat_get_block:
10.8% -> 0.02%
Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn>
Reviewed-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
|
|
Yuezhang said: "exfat_map_cluster() is only used for files. The code
in this 'else' block is never executed and can be cleaned up."
Suggested-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn>
Reviewed-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
|
|
Since exfat_ent_get supports cache buffer head, we can use this option to
reduce sb_bread calls when fetching consecutive entries.
Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn>
Reviewed-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
|
|
Remove parameter 'fclus' and 'allow_eof':
- The fclus parameter is changed to a local variable as it is not
needed to be returned.
- The passed allow_eof parameter was always 1, remove it and the
associated error handling.
Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn>
Reviewed-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
|
|
The cache_id remains unchanged on a cache miss; its value is always
exactly what was set by cache_init. Therefore, checking this value
again is meaningless.
Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn>
Reviewed-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
|
|
The infinite cluster chain loop check is not work because the
loop will terminate when fclus reaches the parameter cluster,
and the parameter cluster value is never greater than
ei->valid_size.
The following relationship holds:
'fclus' < 'cluster' ≤ ei->valid_size ≤ sb->num_clusters
The check would only be triggered if a cluster number greater than
sb->num_clusters is passed, but no caller currently does this.
Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn>
Reviewed-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
|
|
Since exfat_ent_get support cache buffer head, let's apply it to
exfat_find_last_cluster.
Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn>
Reviewed-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
|
|
Since exfat_ent_get support cache buffer head, let's apply it to
exfat_count_num_clusters.
Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn>
Reviewed-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
|
|
This patch is part 2 of cached buffer head for exfat_ent_get,
it introduces an argument for exfat_ent_get, and make sure this
routine releases buffer head refcount when any error return.
Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn>
Reviewed-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
|
|
When multiple entries are obtained consecutively, these entries are mostly
stored adjacent to each other. this patch introduces a "last" parameter to
cache the last opened buffer head, and reuse it when possible, which
reduces the number of sb_bread() calls.
When the passed parameter "last" is NULL, it means cache option is
disabled, the behavior unchanged as it was.
Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn>
Reviewed-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
|
|
During mmap write, exfat_page_mkwrite() currently extends
valid_size to the end of the VMA range. For a large mapping,
this can push valid_size far beyond the page that actually
triggered the fault, resulting in unnecessary writes.
valid_size only needs to extend to the end of the page
being written.
Signed-off-by: Yuling Dong <yuling-dong@qq.com>
Reviewed-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
|
|
Change the type of 'ret' from unsigned int to int in
exfat_find_empty_entry(). Although the implicit type conversion
(int -> unsigned int -> int) does not cause actual bugs in
practice, using int directly is more appropriate for storing
error codes returned by exfat_alloc_cluster().
This improves code clarity and consistency with standard error
handling practices.
Signed-off-by: Haotian Zhang <vulab@iscas.ac.cn>
Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core
Pull driver core updates from Danilo Krummrich:
"Bus:
- Ensure bus->match() is consistently called with the device lock
held
- Improve type safety of bus_find_device_by_acpi_dev()
Devtmpfs:
- Parse 'devtmpfs.mount=' boot parameter with kstrtoint() instead of
simple_strtoul()
- Avoid sparse warning by making devtmpfs_context_ops static
IOMMU:
- Do not register the qcom_smmu_tbu_driver in arm_smmu_device_probe()
MAINTAINERS:
- Add the new driver-core mailing list (driver-core@lists.linux.dev)
to all relevant entries
- Add missing tree location for "FIRMWARE LOADER (request_firmware)"
- Add driver-model documentation to the "DRIVER CORE" entry
- Add missing driver-core maintainers to the "AUXILIARY BUS" entry
Misc:
- Change return type of attribute_container_register() to void; it
has always been infallible
- Do not export sysfs_change_owner(), sysfs_file_change_owner() and
device_change_owner()
- Move devres_for_each_res() from the public devres header to
drivers/base/base.h
- Do not use a static struct device for the faux bus; allocate it
dynamically
Revocable:
- Patches for the revocable synchronization primitive have been
scheduled for v7.0-rc1, but have been reverted as they need some
more refinement
Rust:
- Device:
- Support dev_printk on all device types, not just the core Device
struct; remove now-redundant .as_ref() calls in dev_* print
calls
- Devres:
- Introduce an internal reference count in Devres<T> to avoid a
deadlock condition in case of (indirect) nesting
- DMA:
- Allow drivers to tune the maximum DMA segment size via
dma_set_max_seg_size()
- I/O:
- Introduce the concept of generic I/O backends to handle
different kinds of device shared memory through a common
interface.
This enables higher-level concepts such as register
abstractions, I/O slices, and field projections to be built
generically on top.
In a first step, introduce the Io, IoCapable<T>, and IoKnownSize
trait hierarchy for sharing a common interface supporting offset
validation and bound-checking logic between I/O backends.
- Refactor MMIO to use the common I/O backend infrastructure
- Misc:
- Add __rust_helper annotations to C helpers for inlining into
Rust code
- Use "kernel vertical" style for imports
- Replace kernel::c_str! with C string literals
- Update ARef imports to use sync::aref
- Use pin_init::zeroed() for struct auxiliary_device_id and
debugfs file_operations initialization
- Use LKMM atomic types in debugfs doc-tests
- Various minor comment and documentation fixes
- PCI:
- Implement PCI configuration space accessors using the common I/O
backend infrastructure
- Document pci::Bar device endianness assumptions
- SoC:
- Abstractions for struct soc_device and struct soc_device_attribute
- Sample driver for soc::Device"
* tag 'driver-core-7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core: (79 commits)
rust: devres: fix race condition due to nesting
rust: dma: add missing __rust_helper annotations
samples: rust: pci: Remove some additional `.as_ref()` for `dev_*` print
Revert "revocable: Revocable resource management"
Revert "revocable: Add Kunit test cases"
Revert "selftests: revocable: Add kselftest cases"
driver core: remove device_change_owner() export
sysfs: remove exports of sysfs_*change_owner()
driver core: disable revocable code from build
revocable: Add KUnit test for concurrent access
revocable: fix SRCU index corruption by requiring caller-provided storage
revocable: Add KUnit test for provider lifetime races
revocable: Fix races in revocable_alloc() using RCU
driver core: fix inverted "locked" suffix of driver_match_device()
rust: io: move MIN_SIZE and io_addr_assert to IoKnownSize
rust: pci: re-export ConfigSpace
rust: dma: allow drivers to tune max segment size
gpu: tyr: remove redundant `.as_ref()` for `dev_*` print
rust: auxiliary: use `pin_init::zeroed()` for device ID
rust: debugfs: use pin_init::zeroed() for file_operations
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab
Pull slab updates from Vlastimil Babka:
- The percpu sheaves caching layer was introduced as opt-in in 6.18 and
now we enable it for all caches and remove the previous cpu (partial)
slab caching mechanism.
Besides the lower locking overhead and much more likely fastpath when
freeing, this removes the rather complicated code related to the cpu
slab lockless fastpaths (using this_cpu_try_cmpxchg128/64) and all
its complications for PREEMPT_RT or kmalloc_nolock().
The lockless slab freelist+counters update operation using
try_cmpxchg128/64 remains and is crucial for freeing remote NUMA
objects, and to allow flushing objects from sheaves to slabs mostly
without the node list_lock (Vlastimil Babka)
- Eliminate slabobj_ext metadata overhead when possible. Instead of
using kmalloc() to allocate the array for memcg and/or allocation
profiling tag pointers, use leftover space in a slab or per-object
padding due to alignment (Harry Yoo)
- Various followup improvements to the above (Hao Li)
* tag 'slab-for-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab: (39 commits)
slub: let need_slab_obj_exts() return false if SLAB_NO_OBJ_EXT is set
mm/slab: only allow SLAB_OBJ_EXT_IN_OBJ for unmergeable caches
mm/slab: place slabobj_ext metadata in unused space within s->size
mm/slab: move [__]ksize and slab_ksize() to mm/slub.c
mm/slab: save memory by allocating slabobj_ext array from leftover
mm/memcontrol,alloc_tag: handle slabobj_ext access under KASAN poison
mm/slab: use stride to access slabobj_ext
mm/slab: abstract slabobj_ext access via new slab_obj_ext() helper
ext4: specify the free pointer offset for ext4_inode_cache
mm/slab: allow specifying free pointer offset when using constructor
mm/slab: use unsigned long for orig_size to ensure proper metadata align
slub: clarify object field layout comments
mm/slab: avoid allocating slabobj_ext array from its own slab
slub: avoid list_lock contention from __refill_objects_any()
mm/slub: cleanup and repurpose some stat items
mm/slub: remove DEACTIVATE_TO_* stat items
slab: remove frozen slab checks from __slab_free()
slab: update overview comments
slab: refill sheaves from all nodes
slab: remove unused PREEMPT_RT specific macros
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux
Pull Kbuild/Kconfig updates from Nathan Chancellor:
"Kbuild:
- Drop '*_probe' pattern from modpost section check allowlist, which
hid legitimate warnings (Johan Hovold)
- Disable -Wtype-limits altogether, instead of enabling at W=2
(Vincent Mailhol)
- Improve UAPI testing to skip testing headers that require a libc
when CONFIG_CC_CAN_LINK is not set, opening up testing of headers
with no libc dependencies to more environments (Thomas Weißschuh)
- Update gendwarfksyms documentation with required dependencies
(Jihan LIN)
- Reject invalid LLVM= values to avoid unintentionally falling back
to system toolchain (Thomas Weißschuh)
- Add a script to help run the kernel build process in a container
for consistent environments and testing (Guillaume Tucker)
- Simplify kallsyms by getting rid of the relative base (Ard
Biesheuvel)
- Performance and usability improvements to scripts/make_fit.py
(Simon Glass)
- Minor various clean ups and fixes
Kconfig:
- Move XPM icons to individual files, clearing up GTK deprecation
warnings (Rostislav Krasny)
- Support
depends on FOO if BAR
as syntactic sugar for
depends on FOO || !BAR
(Nicolas Pitre, Graham Roff)
- Refactor merge_config.sh to use awk over shell/sed/grep,
dramatically speeding up processing large number of config
fragments (Anders Roxell, Mikko Rapeli)"
* tag 'kbuild-7.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux: (39 commits)
kbuild: remove dependency of run-command on config
scripts/make_fit: Compress dtbs in parallel
scripts/make_fit: Support a few more parallel compressors
kbuild: Support a FIT_EXTRA_ARGS environment variable
scripts/make_fit: Move dtb processing into a function
scripts/make_fit: Support an initial ramdisk
scripts/make_fit: Speed up operation
rust: kconfig: Don't require RUST_IS_AVAILABLE for rustc-option
MAINTAINERS: Add scripts/install.sh into Kbuild entry
modpost: Amend ppc64 save/restfpr symnames for -Os build
MIPS: tools: relocs: Ship a definition of R_MIPS_PC32
streamline_config.pl: remove superfluous exclamation mark
kbuild: dummy-tools: Add python3
scripts: kconfig: merge_config.sh: warn on duplicate input files
scripts: kconfig: merge_config.sh: use awk in checks too
scripts: kconfig: merge_config.sh: refactor from shell/sed/grep to awk
kallsyms: Get rid of kallsyms relative base
mips: Add support for PC32 relocations in vmlinux
Documentation: dev-tools: add container.rst page
scripts: add tool to run containerized builds
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:
- cpuset changes:
- Continue separating v1 and v2 implementations by moving more
v1-specific logic into cpuset-v1.c
- Improve partition handling. Sibling partitions are no longer
invalidated on cpuset.cpus conflict, cpuset.cpus changes no longer
fail in v2, and effective_xcpus computation is made consistent
- Fix partition effective CPUs overlap that caused a warning on
cpuset removal when sibling partitions shared CPUs
- Increase the maximum cgroup subsystem count from 16 to 32 to
accommodate future subsystem additions
- Misc cleanups and selftest improvements including switching to
css_is_online() helper, removing dead code and stale documentation
references, using lockdep_assert_cpuset_lock_held() consistently, and
adding polling helpers for asynchronously updated cgroup statistics
* tag 'cgroup-for-6.20' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (21 commits)
cpuset: fix overlap of partition effective CPUs
cgroup: increase maximum subsystem count from 16 to 32
cgroup: Remove stale cpu.rt.max reference from documentation
cpuset: replace direct lockdep_assert_held() with lockdep_assert_cpuset_lock_held()
cgroup/cpuset: Move the v1 empty cpus/mems check to cpuset1_validate_change()
cgroup/cpuset: Don't invalidate sibling partitions on cpuset.cpus conflict
cgroup/cpuset: Don't fail cpuset.cpus change in v2
cgroup/cpuset: Consistently compute effective_xcpus in update_cpumasks_hier()
cgroup/cpuset: Streamline rm_siblings_excl_cpus()
cpuset: remove dead code in cpuset-v1.c
cpuset: remove v1-specific code from generate_sched_domains
cpuset: separate generate_sched_domains for v1 and v2
cpuset: move update_domain_attr_tree to cpuset_v1.c
cpuset: add cpuset1_init helper for v1 initialization
cpuset: add cpuset1_online_css helper for v1-specific operations
cpuset: add lockdep_assert_cpuset_lock_held helper
cpuset: Remove unnecessary checks in rebuild_sched_domains_locked
cgroup: switch to css_is_online() helper
selftests: cgroup: Replace sleep with cg_read_key_long_poll() for waiting on nr_dying_descendants
selftests: cgroup: make test_memcg_sock robust against delayed sock stats
...
|
|
If `locked_pages` is zero, the page array must not be allocated:
ceph_process_folio_batch() uses `locked_pages` to decide when to
allocate `pages`, and redundant allocations trigger
ceph_allocate_page_array()'s BUG_ON(), resulting in a worker oops (and
writeback stall) or even a kernel panic. Consequently, the main loop in
ceph_writepages_start() assumes that the lifetime of `pages` is confined
to a single iteration.
This expectation is currently not clear enough, as evidenced by the
recent patch which fixed an oops caused by `pages` persisting into
the next loop iteration:
- "ceph: do not propagate page array emplacement errors as batch errors"
Use an explicit BUG_ON() at the top of the loop to assert the loop's
preexisting expectation that `pages` is cleaned up by the previous
iteration. Because this is closely tied to `locked_pages`, also make it
the previous iteration's responsibility to guarantee its reset, and
verify with a second new BUG_ON() instead of handling (and masking)
failures to do so.
This patch does not change invariants, behavior, or failure modes.
The added BUG_ON() lines catch conditions that would already trigger oops,
but do so earlier for easier debugging and programmer clarity.
Signed-off-by: Sam Edwards <CFSworks@gmail.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
Following an earlier commit, ceph_process_folio_batch() no longer
returns errors because the writeback loop cannot handle them.
Since this function already indicates failure to lock any pages by
leaving `ceph_wbc.locked_pages == 0`, and the writeback loop has no way
to handle abandonment of a locked batch, change the return type of
ceph_process_folio_batch() to `void` and remove the pathological goto in
the writeback loop. The lack of a return code emphasizes that
ceph_process_folio_batch() is designed to be abort-free: that is, once
it commits a folio for writeback, it will not later abandon it or
propagate an error for that folio. Any future changes requiring "abort"
logic should follow this invariant by cleaning up its array and
resetting ceph_wbc.locked_pages appropriately.
Signed-off-by: Sam Edwards <CFSworks@gmail.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
CephFS stores file data across multiple RADOS objects. An object is the
atomic unit of storage, so the writeback code must clean only folios
that belong to the same object with each OSD request.
CephFS also supports RAID0-style striping of file contents: if enabled,
each object stores multiple unbroken "stripe units" covering different
portions of the file; if disabled, a "stripe unit" is simply the whole
object. The stripe unit is (usually) reported as the inode's block size.
Though the writeback logic could, in principle, lock all dirty folios
belonging to the same object, its current design is to lock only a
single stripe unit at a time. Ever since this code was first written,
it has determined this size by checking the inode's block size.
However, the relatively-new fscrypt support needed to reduce the block
size for encrypted inodes to the crypto block size (see 'fixes' commit),
which causes an unnecessarily high number of write operations (~1024x as
many, with 4MiB objects) and correspondingly degraded performance.
Fix this (and clarify intent) by using i_layout.stripe_unit directly in
ceph_define_write_size() so that encrypted inodes are written back with
the same number of operations as if they were unencrypted.
This patch depends on the preceding commit ("ceph: do not propagate page
array emplacement errors as batch errors") for correctness. While it
applies cleanly on its own, applying it alone will introduce a
regression. This dependency is only relevant for kernels where
ce80b76dd327 ("ceph: introduce ceph_process_folio_batch() method") has
been applied; stable kernels without that commit are unaffected.
Cc: stable@vger.kernel.org
Fixes: 94af0470924c ("ceph: add some fscrypt guardrails")
Signed-off-by: Sam Edwards <CFSworks@gmail.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
When fscrypt is enabled, move_dirty_folio_in_page_array() may fail
because it needs to allocate bounce buffers to store the encrypted
versions of each folio. Each folio beyond the first allocates its bounce
buffer with GFP_NOWAIT. Failures are common (and expected) under this
allocation mode; they should flush (not abort) the batch.
However, ceph_process_folio_batch() uses the same `rc` variable for its
own return code and for capturing the return codes of its routine calls;
failing to reset `rc` back to 0 results in the error being propagated
out to the main writeback loop, which cannot actually tolerate any
errors here: once `ceph_wbc.pages` is allocated, it must be passed to
ceph_submit_write() to be freed. If it survives until the next iteration
(e.g. due to the goto being followed), ceph_allocate_page_array()'s
BUG_ON() will oops the worker.
Note that this failure mode is currently masked due to another bug
(addressed next in this series) that prevents multiple encrypted folios
from being selected for the same write.
For now, just reset `rc` when redirtying the folio to prevent errors in
move_dirty_folio_in_page_array() from propagating. Note that
move_dirty_folio_in_page_array() is careful never to return errors on
the first folio, so there is no need to check for that. After this
change, ceph_process_folio_batch() no longer returns errors; its only
remaining failure indicator is `locked_pages == 0`, which the caller
already handles correctly.
Cc: stable@vger.kernel.org
Fixes: ce80b76dd327 ("ceph: introduce ceph_process_folio_batch() method")
Signed-off-by: Sam Edwards <CFSworks@gmail.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 resource control updates from Borislav Petkov:
- Extend the resctrl machinery to support telemetry monitoring on
Intel (Tony Luck)
The practical usage of this is being able to tell how much energy or
how much work can be attributed to a group of tasks tracked under a
single idenitifier. Prepend this work with proper refactoring of
resctrl domains handling code.
* tag 'x86_cache_for_v7.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (30 commits)
x86,fs/resctrl: Update documentation for telemetry events
x86/resctrl: Enable RDT_RESOURCE_PERF_PKG
fs/resctrl: Move RMID initialization to first mount
x86,fs/resctrl: Compute number of RMIDs as minimum across resources
fs/resctrl: Move allocation/free of closid_num_dirty_rmid[]
x86/resctrl: Handle number of RMIDs supported by RDT_RESOURCE_PERF_PKG
x86/resctrl: Add energy/perf choices to rdt boot option
x86,fs/resctrl: Handle domain creation/deletion for RDT_RESOURCE_PERF_PKG
fs/resctrl: Refactor rmdir_mondata_subdir_allrdtgrp()
fs/resctrl: Refactor mkdir_mondata_subdir()
x86/resctrl: Read telemetry events
x86/resctrl: Find and enable usable telemetry events
x86,fs/resctrl: Add architectural event pointer
x86,fs/resctrl: Fill in details of events for performance and energy GUIDs
x86/resctrl: Discover hardware telemetry events
fs/resctrl: Emphasize that L3 monitoring resource is required for summing domains
x86,fs/resctrl: Add and initialize a resource for package scope monitoring
x86,fs/resctrl: Add an architectural hook called for first mount
x86,fs/resctrl: Support binary fixed point event counters
x86,fs/resctrl: Handle events that can be read from any CPU
...
|
|
This patch introduces /sys/fs/f2fs/<disk>/critical_task_priority, w/
this new sysfs interface, we can tune priority of f2fs_ckpt thread and
f2fs_gc thread.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
"Lock debugging:
- Implement compiler-driven static analysis locking context checking,
using the upcoming Clang 22 compiler's context analysis features
(Marco Elver)
We removed Sparse context analysis support, because prior to
removal even a defconfig kernel produced 1,700+ context tracking
Sparse warnings, the overwhelming majority of which are false
positives. On an allmodconfig kernel the number of false positive
context tracking Sparse warnings grows to over 5,200... On the plus
side of the balance actual locking bugs found by Sparse context
analysis is also rather ... sparse: I found only 3 such commits in
the last 3 years. So the rate of false positives and the
maintenance overhead is rather high and there appears to be no
active policy in place to achieve a zero-warnings baseline to move
the annotations & fixers to developers who introduce new code.
Clang context analysis is more complete and more aggressive in
trying to find bugs, at least in principle. Plus it has a different
model to enabling it: it's enabled subsystem by subsystem, which
results in zero warnings on all relevant kernel builds (as far as
our testing managed to cover it). Which allowed us to enable it by
default, similar to other compiler warnings, with the expectation
that there are no warnings going forward. This enforces a
zero-warnings baseline on clang-22+ builds (Which are still limited
in distribution, admittedly)
Hopefully the Clang approach can lead to a more maintainable
zero-warnings status quo and policy, with more and more subsystems
and drivers enabling the feature. Context tracking can be enabled
for all kernel code via WARN_CONTEXT_ANALYSIS_ALL=y (default
disabled), but this will generate a lot of false positives.
( Having said that, Sparse support could still be added back,
if anyone is interested - the removal patch is still
relatively straightforward to revert at this stage. )
Rust integration updates: (Alice Ryhl, Fujita Tomonori, Boqun Feng)
- Add support for Atomic<i8/i16/bool> and replace most Rust native
AtomicBool usages with Atomic<bool>
- Clean up LockClassKey and improve its documentation
- Add missing Send and Sync trait implementation for SetOnce
- Make ARef Unpin as it is supposed to be
- Add __rust_helper to a few Rust helpers as a preparation for
helper LTO
- Inline various lock related functions to avoid additional function
calls
WW mutexes:
- Extend ww_mutex tests and other test-ww_mutex updates (John
Stultz)
Misc fixes and cleanups:
- rcu: Mark lockdep_assert_rcu_helper() __always_inline (Arnd
Bergmann)
- locking/local_lock: Include more missing headers (Peter Zijlstra)
- seqlock: fix scoped_seqlock_read kernel-doc (Randy Dunlap)
- rust: sync: Replace `kernel::c_str!` with C-Strings (Tamir
Duberstein)"
* tag 'locking-core-2026-02-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (90 commits)
locking/rwlock: Fix write_trylock_irqsave() with CONFIG_INLINE_WRITE_TRYLOCK
rcu: Mark lockdep_assert_rcu_helper() __always_inline
compiler-context-analysis: Remove __assume_ctx_lock from initializers
tomoyo: Use scoped init guard
crypto: Use scoped init guard
kcov: Use scoped init guard
compiler-context-analysis: Introduce scoped init guards
cleanup: Make __DEFINE_LOCK_GUARD handle commas in initializers
seqlock: fix scoped_seqlock_read kernel-doc
tools: Update context analysis macros in compiler_types.h
rust: sync: Replace `kernel::c_str!` with C-Strings
rust: sync: Inline various lock related methods
rust: helpers: Move #define __rust_helper out of atomic.c
rust: wait: Add __rust_helper to helpers
rust: time: Add __rust_helper to helpers
rust: task: Add __rust_helper to helpers
rust: sync: Add __rust_helper to helpers
rust: refcount: Add __rust_helper to helpers
rust: rcu: Add __rust_helper to helpers
rust: processor: Add __rust_helper to helpers
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Pull bpf updates from Alexei Starovoitov:
- Support associating BPF program with struct_ops (Amery Hung)
- Switch BPF local storage to rqspinlock and remove recursion detection
counters which were causing false positives (Amery Hung)
- Fix live registers marking for indirect jumps (Anton Protopopov)
- Introduce execution context detection BPF helpers (Changwoo Min)
- Improve verifier precision for 32bit sign extension pattern
(Cupertino Miranda)
- Optimize BTF type lookup by sorting vmlinux BTF and doing binary
search (Donglin Peng)
- Allow states pruning for misc/invalid slots in iterator loops (Eduard
Zingerman)
- In preparation for ASAN support in BPF arenas teach libbpf to move
global BPF variables to the end of the region and enable arena kfuncs
while holding locks (Emil Tsalapatis)
- Introduce support for implicit arguments in kfuncs and migrate a
number of them to new API. This is a prerequisite for cgroup
sub-schedulers in sched-ext (Ihor Solodrai)
- Fix incorrect copied_seq calculation in sockmap (Jiayuan Chen)
- Fix ORC stack unwind from kprobe_multi (Jiri Olsa)
- Speed up fentry attach by using single ftrace direct ops in BPF
trampolines (Jiri Olsa)
- Require frozen map for calculating map hash (KP Singh)
- Fix lock entry creation in TAS fallback in rqspinlock (Kumar
Kartikeya Dwivedi)
- Allow user space to select cpu in lookup/update operations on per-cpu
array and hash maps (Leon Hwang)
- Make kfuncs return trusted pointers by default (Matt Bobrowski)
- Introduce "fsession" support where single BPF program is executed
upon entry and exit from traced kernel function (Menglong Dong)
- Allow bpf_timer and bpf_wq use in all programs types (Mykyta
Yatsenko, Andrii Nakryiko, Kumar Kartikeya Dwivedi, Alexei
Starovoitov)
- Make KF_TRUSTED_ARGS the default for all kfuncs and clean up their
definition across the tree (Puranjay Mohan)
- Allow BPF arena calls from non-sleepable context (Puranjay Mohan)
- Improve register id comparison logic in the verifier and extend
linked registers with negative offsets (Puranjay Mohan)
- In preparation for BPF-OOM introduce kfuncs to access memcg events
(Roman Gushchin)
- Use CFI compatible destructor kfunc type (Sami Tolvanen)
- Add bitwise tracking for BPF_END in the verifier (Tianci Cao)
- Add range tracking for BPF_DIV and BPF_MOD in the verifier (Yazhou
Tang)
- Make BPF selftests work with 64k page size (Yonghong Song)
* tag 'bpf-next-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (268 commits)
selftests/bpf: Fix outdated test on storage->smap
selftests/bpf: Choose another percpu variable in bpf for btf_dump test
selftests/bpf: Remove test_task_storage_map_stress_lookup
selftests/bpf: Update task_local_storage/task_storage_nodeadlock test
selftests/bpf: Update task_local_storage/recursion test
selftests/bpf: Update sk_storage_omem_uncharge test
bpf: Switch to bpf_selem_unlink_nofail in bpf_local_storage_{map_free, destroy}
bpf: Support lockless unlink when freeing map or local storage
bpf: Prepare for bpf_selem_unlink_nofail()
bpf: Remove unused percpu counter from bpf_local_storage_map_free
bpf: Remove cgroup local storage percpu counter
bpf: Remove task local storage percpu counter
bpf: Change local_storage->lock and b->lock to rqspinlock
bpf: Convert bpf_selem_unlink to failable
bpf: Convert bpf_selem_link_map to failable
bpf: Convert bpf_selem_unlink_map to failable
bpf: Select bpf_local_storage_map_bucket based on bpf_local_storage
selftests/xsk: fix number of Tx frags in invalid packet
selftests/xsk: properly handle batch ending in the middle of a packet
bpf: Prevent reentrance into call_rcu_tasks_trace()
...
|
|
The newly added procfs code fails to build when CONFIG_IPv6 is disabled:
fs/smb/server/connection.c: In function 'proc_show_clients':
fs/smb/server/connection.c:47:58: error: 'struct ksmbd_conn' has no member named 'inet6_addr'; did you mean 'inet_addr'?
47 | seq_printf(m, "%-20pI6c", &conn->inet6_addr);
| ^~~~~~~~~~
| inet_addr
make[7]: *** [scripts/Makefile.build:279: fs/smb/server/connection.o] Error 1
fs/smb/server/mgmt/user_session.c: In function 'show_proc_sessions':
fs/smb/server/mgmt/user_session.c:215:65: error: 'struct ksmbd_conn' has no member named 'inet6_addr'; did you mean 'inet_addr'?
215 | seq_printf(m, " %-40pI6c", &chan->conn->inet6_addr);
| ^~~~~~~~~~
| inet_addr
Rearrange the condition to allow adding a simple preprocessor conditional.
Fixes: b38f99c1217a ("ksmbd: add procfs interface for runtime monitoring and statistics")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull hardening updates from Kees Cook:
"Mostly small cleanups and various scattered annotations and flex array
warning fixes that we reviewed by unlanded in other trees. Introduces
new annotation for expanding counted_by to pointer members, now that
compiler behavior between GCC and Clang has been normalized.
- Various missed __counted_by annotations (Thorsten Blum)
- Various missed -Wflex-array-member-not-at-end fixes (Gustavo A. R.
Silva)
- Avoid leftover tempfiles for interrupted compile-time FORTIFY tests
(Nicolas Schier)
- Remove non-existant CONFIG_UBSAN_REPORT_FULL from docs (Stefan
Wiehler)
- fortify: Use C arithmetic not FIELD_xxx() in FORTIFY_REASON defines
(David Laight)
- Add __counted_by_ptr attribute, tests, and first user (Bill
Wendling, Kees Cook)
- Update MAINTAINERS file to make hardening section not include
pstore"
* tag 'hardening-v7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
MAINTAINERS: pstore: Remove L: entry
nfp: tls: Avoid -Wflex-array-member-not-at-end warnings
carl9170: Avoid -Wflex-array-member-not-at-end warning
coredump: Use __counted_by_ptr for struct core_name::corename
lkdtm/bugs: Add __counted_by_ptr() test PTR_BOUNDS
compiler_types.h: Attributes: Add __counted_by_ptr macro
fortify: Cleanup temp file also on non-successful exit
fortify: Rename temporary file to match ignore pattern
fortify: Use C arithmetic not FIELD_xxx() in FORTIFY_REASON defines
ecryptfs: Annotate struct ecryptfs_message with __counted_by
fs/xattr: Annotate struct simple_xattr with __counted_by
crypto: af_alg - Annotate struct af_alg_iv with __counted_by
Kconfig.ubsan: Remove CONFIG_UBSAN_REPORT_FULL from documentation
drm/nouveau: fifo: Avoid -Wflex-array-member-not-at-end warning
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull pstore updates from Kees Cook:
- Catch unlikely NULL return from vmap() (Ruipeng Qi)
- Handle corner case of past incomplete buffer fills causing heap
overflow (Sai Ritvik Tanksalkar)
* tag 'pstore-v7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
pstore/ram: fix buffer overflow in persistent_ram_save_old()
pstore: ram_core: fix incorrect success return when vmap() fails
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull execve update from Kees Cook:
- drop duplicate bprm_stack_limits test vectors (Titouan Ameline de
Cadeville)
* tag 'execve-v7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
fs/tests: exec: drop duplicate bprm_stack_limits test vectors
|
|
The smb client code recently started generating the error mapping table
from a common header, but didn't tell git about it, so then git ends up
thinking maybe it should be committed.
Let's fix that.
Fixes: c527e13a7a66 ("cifs: Autogenerate SMB2 error mapping table")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Cosmetic change. Unlike all other similar helpers task_ppid_nr_ns() doesn't
have a _vnr() version; add one for consistency.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Link: https://patch.msgid.link/20251015123633.GB9456@redhat.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
This paves the way for scalable PID allocation later.
The 32 bit variant merely takes a spinlock for simplicity, the 64 bit
variant uses a scalable scheme.
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Link: https://patch.msgid.link/20260120184539.1480930-1-mjguzik@gmail.com
Co-developed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Mateusz reported performance penalties [1] during task creation because
pidfs uses pidmap_lock to add elements into the rbtree. Switch to an
rhashtable to have separate fine-grained locking and to decouple from
pidmap_lock moving all heavy manipulations outside of it.
Convert the pidfs inode-to-pid mapping from an rb-tree with seqcount
protection to an rhashtable. This removes the global pidmap_lock
contention from pidfs_ino_get_pid() lookups and allows the hashtable
insert to happen outside the pidmap_lock.
pidfs_add_pid() is split. pidfs_prepare_pid() allocates inode number and
initializes pid fields and is called inside pidmap_lock. pidfs_add_pid()
inserts pid into rhashtable and is called outside pidmap_lock. Insertion
into the rhashtable can fail and memory allocation may happen so we need
to drop the spinlock.
To guard against accidently opening an already reaped task
pidfs_ino_get_pid() uses additional checks beyond pid_vnr(). If
pid->attr is PIDFS_PID_DEAD or NULL the pid either never had a pidfd or
it already went through pidfs_exit() aka the process as already reaped.
If pid->attr is valid check PIDFS_ATTR_BIT_EXIT to figure out whether
the task has exited.
This slightly changes visibility semantics: pidfd creation is denied
after pidfs_exit() runs, which is just before the pid number is removed
from the via free_pid(). That should not be an issue though.
Link: https://lore.kernel.org/20251206131955.780557-1-mjguzik@gmail.com [1]
Link: https://patch.msgid.link/20260120-work-pidfs-rhashtable-v2-1-d593c4d0f576@kernel.org
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux
Pull bounce buffer dio for stable pages from Jens Axboe:
"This adds support for bounce buffering of dio for stable pages. This
was all done by Christoph. In his words:
This series tries to address the problem that under I/O pages can be
modified during direct I/O, even when the device or file system
require stable pages during I/O to calculate checksums, parity or data
operations. It does so by adding block layer helpers to bounce buffer
an iov_iter into a bio, then wires that up in iomap and ultimately
XFS.
The reason that the file system even needs to know about it, is
because reads need a user context to copy the data back, and the
infrastructure to defer ioends to a workqueue currently sits in XFS.
I'm going to look into moving that into ioend and enabling it for
other file systems. Additionally btrfs already has it's own
infrastructure for this, and actually an urgent need to bounce buffer,
so this should be useful there and could be wire up easily. In fact
the idea comes from patches by Qu that did this in btrfs.
This patch fixes all but one xfstests failures on T10 PI capable
devices (generic/095 seems to have issues with a mix of mmap and
splice still, I'm looking into that separately), and make qemu VMs
running Windows, or Linux with swap enabled fine on an XFS file on a
device using PI.
Performance numbers on my (not exactly state of the art) NVMe PI test
setup:
Sequential reads using io_uring, QD=16.
Bandwidth and CPU usage (usr/sys):
| size | zero copy | bounce |
+------+--------------------------+--------------------------+
| 4k | 1316MiB/s (12.65/55.40%) | 1081MiB/s (11.76/49.78%) |
| 64K | 3370MiB/s ( 5.46/18.20%) | 3365MiB/s ( 4.47/15.68%) |
| 1M | 3401MiB/s ( 0.76/23.05%) | 3400MiB/s ( 0.80/09.06%) |
+------+--------------------------+--------------------------+
Sequential writes using io_uring, QD=16.
Bandwidth and CPU usage (usr/sys):
| size | zero copy | bounce |
+------+--------------------------+--------------------------+
| 4k | 882MiB/s (11.83/33.88%) | 750MiB/s (10.53/34.08%) |
| 64K | 2009MiB/s ( 7.33/15.80%) | 2007MiB/s ( 7.47/24.71%) |
| 1M | 1992MiB/s ( 7.26/ 9.13%) | 1992MiB/s ( 9.21/19.11%) |
+------+--------------------------+--------------------------+
Note that the 64k read numbers look really odd to me for the baseline
zero copy case, but are reproducible over many repeated runs.
The bounce read numbers should further improve when moving the PI
validation to the file system and removing the double context switch,
which I have patches for that will sent out soon"
* tag 'for-7.0/block-stable-pages-20260206' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
xfs: use bounce buffering direct I/O when the device requires stable pages
iomap: add a flag to bounce buffer direct I/O
iomap: support ioends for direct reads
iomap: rename IOMAP_DIO_DIRTY to IOMAP_DIO_USER_BACKED
iomap: free the bio before completing the dio
iomap: share code between iomap_dio_bio_end_io and iomap_finish_ioend_direct
iomap: split out the per-bio logic from iomap_dio_bio_iter
iomap: simplify iomap_dio_bio_iter
iomap: fix submission side handling of completion side errors
block: add helpers to bounce buffer an iov_iter into bios
block: remove bio_release_page
iov_iter: extract a iov_iter_extract_bvecs helper from bio code
block: open code bio_add_page and fix handling of mismatching P2P ranges
block: refactor get_contig_folio_len
block: add a BIO_MAX_SIZE constant and use it
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux
Pull block updates from Jens Axboe:
- Support for batch request processing for ublk, improving the
efficiency of the kernel/ublk server communication. This can yield
nice 7-12% performance improvements
- Support for integrity data for ublk
- Various other ublk improvements and additions, including a ton of
selftests additions and updated
- Move the handling of blk-crypto software fallback from below the
block layer to above it. This reduces the complexity of dealing with
bio splitting
- Series fixing a number of potential deadlocks in blk-mq related to
the queue usage counter and writeback throttling and rq-qos debugfs
handling
- Add an async_depth queue attribute, to resolve a performance
regression that's been around for a qhilw related to the scheduler
depth handling
- Only use task_work for IOPOLL completions on NVMe, if it is necessary
to do so. An earlier fix for an issue resulted in all these
completions being punted to task_work, to guarantee that completions
were only run for a given io_uring ring when it was local to that
ring. With the new changes, we can detect if it's necessary to use
task_work or not, and avoid it if possible.
- rnbd fixes:
- Fix refcount underflow in device unmap path
- Handle PREFLUSH and NOUNMAP flags properly in protocol
- Fix server-side bi_size for special IOs
- Zero response buffer before use
- Fix trace format for flags
- Add .release to rnbd_dev_ktype
- MD pull requests via Yu Kuai
- Fix raid5_run() to return error when log_init() fails
- Fix IO hang with degraded array with llbitmap
- Fix percpu_ref not resurrected on suspend timeout in llbitmap
- Fix GPF in write_page caused by resize race
- Fix NULL pointer dereference in process_metadata_update
- Fix hang when stopping arrays with metadata through dm-raid
- Fix any_working flag handling in raid10_sync_request
- Refactor sync/recovery code path, improve error handling for
badblocks, and remove unused recovery_disabled field
- Consolidate mddev boolean fields into mddev_flags
- Use mempool to allocate stripe_request_ctx and make sure
max_sectors is not less than io_opt in raid5
- Fix return value of mddev_trylock
- Fix memory leak in raid1_run()
- Add Li Nan as mdraid reviewer
- Move phys_vec definitions to the kernel types, mostly in preparation
for some VFIO and RDMA changes
- Improve the speed for secure erase for some devices
- Various little rust updates
- Various other minor fixes, improvements, and cleanups
* tag 'for-7.0/block-20260206' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: (162 commits)
blk-mq: ABI/sysfs-block: fix docs build warnings
selftests: ublk: organize test directories by test ID
block: decouple secure erase size limit from discard size limit
block: remove redundant kill_bdev() call in set_blocksize()
blk-mq: add documentation for new queue attribute async_dpeth
block, bfq: convert to use request_queue->async_depth
mq-deadline: covert to use request_queue->async_depth
kyber: covert to use request_queue->async_depth
blk-mq: add a new queue sysfs attribute async_depth
blk-mq: factor out a helper blk_mq_limit_depth()
blk-mq-sched: unify elevators checking for async requests
block: convert nr_requests to unsigned int
block: don't use strcpy to copy blockdev name
blk-mq-debugfs: warn about possible deadlock
blk-mq-debugfs: add missing debugfs_mutex in blk_mq_debugfs_register_hctxs()
blk-mq-debugfs: remove blk_mq_debugfs_unregister_rqos()
blk-mq-debugfs: make blk_mq_debugfs_register_rqos() static
blk-rq-qos: fix possible debugfs_mutex deadlock
blk-mq-debugfs: factor out a helper to register debugfs for all rq_qos
blk-wbt: fix possible deadlock to nest pcpu_alloc_mutex under q_usage_counter
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs 'struct filename' updates from Al Viro:
"[Mostly] sanitize struct filename handling"
* tag 'pull-filename' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (68 commits)
sysfs(2): fs_index() argument is _not_ a pathname
alpha: switch osf_mount() to strndup_user()
ksmbd: use CLASS(filename_kernel)
mqueue: switch to CLASS(filename)
user_statfs(): switch to CLASS(filename)
statx: switch to CLASS(filename_maybe_null)
quotactl_block(): switch to CLASS(filename)
chroot(2): switch to CLASS(filename)
move_mount(2): switch to CLASS(filename_maybe_null)
namei.c: switch user pathname imports to CLASS(filename{,_flags})
namei.c: convert getname_kernel() callers to CLASS(filename_kernel)
do_f{chmod,chown,access}at(): use CLASS(filename_uflags)
do_readlinkat(): switch to CLASS(filename_flags)
do_sys_truncate(): switch to CLASS(filename)
do_utimes_path(): switch to CLASS(filename_uflags)
chdir(2): unspaghettify a bit...
do_fchownat(): unspaghettify a bit...
fspick(2): use CLASS(filename_flags)
name_to_handle_at(): use CLASS(filename_uflags)
vfs_open_tree(): use CLASS(filename_uflags)
...
|
|
git://git.samba.org/sfrench/cifs-2.6
Pull smb client updates from Steve French:
- multichannel improvements, including making add channel async at
mount time
- fix potential double free in open path
- retry fixes
- locking improvements
- fix potential directory lease races
- cleanup patches for client headers
- patches to better split out SMB1 code
- minor cleanup of structs for gcc 14 warnings
- error handling improvements
* tag 'v7.0-rc-part1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: (74 commits)
cifs: Fix the copyright banner on smb1maperror.c
smb: common: add header guards to fs/smb/common/smb2status.h
smb: client: Avoid a dozen -Wflex-array-member-not-at-end warnings
smb/client: remove useless comment in mapping_table_ERRSRV
smb/client: remove some literal NT error codes from ntstatus_to_dos_map
smb/client: add NT_STATUS_VOLUME_NOT_UPGRADED
smb/client: add NT_STATUS_NO_USER_KEYS
smb/client: add NT_STATUS_WRONG_EFS
smb/client: add NT_STATUS_NO_EFS
smb/client: add NT_STATUS_NO_RECOVERY_POLICY
smb/client: add NT_STATUS_RANGE_NOT_FOUND
smb/client: add NT_STATUS_DECRYPTION_FAILED
smb/client: add NT_STATUS_ENCRYPTION_FAILED
smb/client: add NT_STATUS_DIRECTORY_IS_A_REPARSE_POINT
smb/client: add NT_STATUS_VOLUME_DISMOUNTED
smb/client: add NT_STATUS_BIOS_FAILED_TO_CONNECT_INTERRUPT
smb/client: add NT_STATUS_VARIABLE_NOT_FOUND
smb/client: rename ERRinvlevel to ERRunknownlevel
smb/client: add NT_STATUS_OS2_INVALID_LEVEL
smb/client: map NT_STATUS_INVALID_INFO_CLASS to ERRbadpipe
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm
Pull dlm updates from David Teigland:
"This includes several minor code cleanups, and one notable fix for
recovery of in-progress lock conversions which would lead to a the
convert operation never completing"
* tag 'dlm-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm:
dlm: Avoid -Wflex-array-member-not-at-end warning
fs/dlm/dir: remove unuse variable count_match
dlm: Constify struct configfs_item_operations and configfs_group_operations
fs/dlm: use list_add_tail() instead of open-coding list insertion
dlm: validate length in dlm_search_rsb_tree
dlm: fix recovery pending middle conversion
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2
Pull gfs2 updates from Andreas Gruenbacher:
- Prevent rename() from failing with -ESTALE when there are locking
conflicts and retry the operation instead
- Don't fail when fiemap triggers a page fault (xfstest generic/742)
- Fix another locking request cancellation bug
- Minor other fixes and cleanups
* tag 'gfs2-for-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
gfs2: fiemap page fault fix
gfs2: fix memory leaks in gfs2_fill_super error path
gfs2: Fix use-after-free in iomap inline data write path
gfs2: Fix slab-use-after-free in qd_put
gfs2: Introduce glock_{type,number,sbd} helpers
gfs2: gfs2_glock_hold cleanup
gfs: Use fixed GL_GLOCK_MIN_HOLD time
gfs2: Fix gfs2_log_get_bio argument type
gfs2: gfs2_chain_bio start sector fix
gfs2: Initialize bio->bi_opf early
gfs2: Rename gfs2_log_submit_{bio -> write}
gfs2: Do not cancel internal demote requests
gfs2: run_queue cleanup
gfs2: Retries missing in gfs2_{rename,exchange}
gfs2: glock cancelation flag fix
|
|
Pull xfs updates from Carlos Maiolino:
"This contains several improvements to zoned device support,
performance improvements for the parent pointers, and a new health
monitoring feature. There are some improvements in the journaling code
too but no behavior change expected.
Last but not least, some code refactoring and bug fixes are also
included in this series"
* tag 'xfs-merge-7.0' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (67 commits)
xfs: add sysfs stats for zoned GC
xfs: give the defer_relog stat a xs_ prefix
xfs: add zone reset error injection
xfs: refactor zone reset handling
xfs: don't mark all discard issued by zoned GC as sync
xfs: allow setting errortags at mount time
xfs: use WRITE_ONCE/READ_ONCE for m_errortag
xfs: move the guts of XFS_ERRORTAG_DELAY out of line
xfs: don't validate error tags in the I/O path
xfs: allocate m_errortag early
xfs: fix the errno sign for the xfs_errortag_{add,clearall} stubs
xfs: validate log record version against superblock log version
xfs: fix spacing style issues in xfs_alloc.c
xfs: remove xfs_zone_gc_space_available
xfs: use a seprate member to track space availabe in the GC scatch buffer
xfs: check for deleted cursors when revalidating two btrees
xfs: fix UAF in xchk_btree_check_block_owner
xfs: check return value of xchk_scrub_create_subord
xfs: only call xf{array,blob}_destroy if we have a valid pointer
xfs: get rid of the xchk_xfile_*_descr calls
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
Pull erofs updates from Gao Xiang:
"In this cycle, inode page cache sharing among filesystems on the same
machine is now supported, which is particularly useful for
high-density hosts running tens of thousands of containers.
In addition, we fully isolate the EROFS core on-disk format from other
optional encoded layouts since the core on-disk part is designed to be
simple, effective, and secure. Users can use the core format to build
unique golden immutable images and import their filesystem trees
directly from raw block devices via DMA, page-mapped DAX devices,
and/or file-backed mounts without having to worry about unnecessary
intrinsic consistency issues found in other generic filesystems by
design. However, the full vision is still working in progress and will
spend more time to achieve final goals.
There are other improvements and bug fixes as usual, as listed below:
- Support inode page cache sharing among filesystems
- Formally separate optional encoded (aka compressed) inode layouts
(and the implementations) from the EROFS core on-disk aligned plain
format for future zero-trust security usage
- Improve performance by caching the fact that an inode does not have
a POSIX ACL
- Improve LZ4 decompression error reporting
- Enable LZMA by default and promote DEFLATE and Zstandard algorithms
out of EXPERIMENTAL status
- Switch to inode_set_cached_link() to cache symlink lengths
- random bugfixes and minor cleanups"
* tag 'erofs-for-7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: (31 commits)
erofs: fix UAF issue for file-backed mounts w/ directio option
erofs: update compression algorithm status
erofs: fix inline data read failure for ztailpacking pclusters
erofs: avoid some unnecessary #ifdefs
erofs: handle end of filesystem properly for file-backed mounts
erofs: separate plain and compressed filesystems formally
erofs: use inode_set_cached_link()
erofs: mark inodes without acls in erofs_read_inode()
erofs: implement .fadvise for page cache share
erofs: support compressed inodes for page cache share
erofs: support unencoded inodes for page cache share
erofs: pass inode to trace_erofs_read_folio
erofs: introduce the page cache share feature
erofs: using domain_id in the safer way
erofs: add erofs_inode_set_aops helper to set the aops
erofs: support user-defined fingerprint name
erofs: decouple `struct erofs_anon_fs_type`
fs: Export alloc_empty_backing_file
erofs: tidy up erofs_init_inode_xattrs()
erofs: add missing documentation about `directio` mount option
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vdubeyko/hfs
Pull hfs/hfsplus updates from Viacheslav Dubeyko:
"This pull request contains several fixes of syzbot reported issues and
HFS+ fixes of xfstests failures.
- fix an issue reported by syzbot triggering BUG_ON() in the case of
corrupted superblock, replacing the BUG_ON()s with proper error
handling (Jori Koolstra)
- fix memory leaks in the mount logic of HFS/HFS+ file systems. When
HFS/HFS+ were converted to the new mount api a bug was introduced
by changing the allocation pattern of sb->s_fs_info (Mehdi Ben Hadj
Khelifa)
- fix hfs_bnode_create() by returning ERR_PTR(-EEXIST) instead of
the node pointer when it's already hashed. This avoids a double
unload_nls() on mount failure (suggested by Shardul Bankar)
- set inode's mode as regular file for system inodes (Tetsuo Handa)
The rest fix failures in generic/020, generic/037, generic/062,
generic/480, and generic/498 xfstests for the case of HFS+ file
system. Currently, only 30 xfstests' test-cases experience failures
for HFS+ file system (initially, it was around 100 failed xfstests)"
* tag 'hfs-v7.0-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/vdubeyko/hfs:
hfsplus: avoid double unload_nls() on mount failure
hfsplus: fix warning issue in inode.c
hfsplus: fix generic/062 xfstests failure
hfsplus: fix generic/037 xfstests failure
hfsplus: pretend special inodes as regular files
hfsplus: return error when node already exists in hfs_bnode_create
hfs: Replace BUG_ON with error handling for CNID count checks
hfsplus: fix generic/020 xfstests failure
hfsplus: fix volume corruption issue for generic/498
hfsplus: fix volume corruption issue for generic/480
hfsplus: ensure sb->s_fs_info is always cleaned up
hfs: ensure sb->s_fs_info is always cleaned up
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vdubeyko/nilfs2
Pull nilfs2 updates from Viacheslav Dubeyko:
- Fix potential block overflow that cause system hang
When executing the FITRIM command, an underflow can occur in the
calculation of nblocks. This ultimately leads to the block layer
function __blkdev_issue_discard() taking an excessively long time
to process the bio chain, and the ns_segctor_sem lock remains held
for a long period.
This prevents other tasks from acquiring the ns_segctor_sem lock,
resulting in a hang reported by syzbot (Edward Adam Davis)
- Fix missing struct keywords in nilfs2_api.h kernel-doc (Ryusuke
Konishi)
- Convert nilfs_super_block to kernel-doc
Eliminate 40+ kernel-doc warnings in nilfs2_ondisk.h by converting
all of the struct member comments to kernel-doc comments (Randy
Dunlap)
* tag 'nilfs2-v7.0-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/vdubeyko/nilfs2:
nilfs2: fix missing struct keywords in nilfs2_api.h kernel-doc
nilfs2: convert nilfs_super_block to kernel-doc
nilfs2: Fix potential block overflow that cause system hang
|
|
Converts tree_conns_lock to an rw_semaphore to allow sleeping while
the lock is held. Additionally, it simplifies the locking logic in
ksmbd_tree_conn_session_logoff() and introduces
__ksmbd_tree_conn_disconnect() to avoid redundant locking.
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|