user/sven/linux.git/drivers/nvme, branch v5.16

nvmet-tcp: fix possible list corruption for unexpected command failure

2021-12-08T15:36:58Z

nvmet_tcp_handle_req_failure needs to understand weather to prepare for incoming data or the next pdu. However if we misidentify this, we will wait for 0-length data, and queue the response although nvmet_req_init already did that. The particular command was namespace management command with no data, which was incorrectly categorized as a command with incapsule data. Also, add a code comment of what we are trying to do here. Signed-off-by: Sagi Grimberg Reviewed-by: Keith Busch Signed-off-by: Christoph Hellwig

nvme: fix use after free when disconnecting a reconnecting ctrl

2021-12-07T17:21:16Z

A crash happens when trying to disconnect a reconnecting ctrl: 1) The network was cut off when the connection was just established, scan work hang there waiting for some IOs complete. Those I/Os were retried because we return BLK_STS_RESOURCE to blk in reconnecting. 2) After a while, I tried to disconnect this connection. This procedure also hangs because it tried to obtain ctrl->scan_lock. It should be noted that now we have switched the controller state to NVME_CTRL_DELETING. 3) In nvme_check_ready(), we always return true when ctrl->state is NVME_CTRL_DELETING, so those retrying I/Os were issued to the bottom device which was already freed. To fix this, when ctrl->state is NVME_CTRL_DELETING, issue cmd to bottom device only when queue state is live. If not, return host path error to the block layer Signed-off-by: Ruozhu Li Reviewed-by: Sagi Grimberg Signed-off-by: Christoph Hellwig

nvme-multipath: set ana_log_size to 0 after free ana_log_buf

2021-12-07T17:19:28Z

Set ana_log_size to 0 when ana_log_buf is freed to make sure nvme_mpath_init_identify will do the right thing when retrying after an earlier failure. Signed-off-by: Hou Tao Reviewed-by: Keith Busch Signed-off-by: Christoph Hellwig

nvme: report write pointer for a full zone as zone start + zone len

2021-12-06T07:52:08Z

The write pointer in NVMe ZNS is invalid for a zone in zone state full. The same also holds true for ZAC/ZBC. The current behavior for NVMe is to simply propagate the wp reported by the drive, even for full zones. Since the wp is invalid for a full zone, the wp reported by the drive may be any value. The way that the sd_zbc driver handles a full zone is to always report the wp as zone start + zone len, regardless of what the drive reported. null_blk also follows this convention. Do the same for NVMe, so that a BLKREPORTZONE ioctl reports the write pointer for a full zone in a consistent way, regardless of the interface of the underlying zoned block device. blkzone report before patch: start: 0x000040000, len 0x040000, cap 0x03e000, wptr 0xfffffffffffbfff8 reset:0 non-seq:0, zcond:14(fu) [type: 2(SEQ_WRITE_REQUIRED)] blkzone report after patch: start: 0x000040000, len 0x040000, cap 0x03e000, wptr 0x040000 reset:0 non-seq:0, zcond:14(fu) [type: 2(SEQ_WRITE_REQUIRED)] Signed-off-by: Niklas Cassel Reviewed-by: Keith Busch Reviewed-by: Damien Le Moal Reviewed-by: Johannes Thumshirn Signed-off-by: Christoph Hellwig

nvme: disable namespace access for unsupported metadata

2021-12-06T07:52:08Z

The only fabrics target that supports metadata handling through the separate integrity buffer is RDMA. It is currently usable only if the size is 8B per block and formatted for protection information. If an rdma target were to export a namespace with a different format (ex: 4k+64B), the driver will not be able to submit valid read/write commands for that namespace. Suppress setting the metadata feature in the namespace so that the gendisk capacity will be set to 0. This will prevent read/write access through the block stack, but will continue to allow ioctl passthrough commands. Cc: Max Gurtovoy Cc: Sagi Grimberg Signed-off-by: Keith Busch Signed-off-by: Christoph Hellwig

nvme: show subsys nqn for duplicate cntlids

2021-12-06T07:52:08Z

The driver assigned nvme handle isn't persistent across reboots, so is not enough information to match up where the collisions are occuring. Add the subsys nqn string to the output so that it can more easily be identified later. Link: https://bugzilla.kernel.org/show_bug.cgi?id=215099 Signed-off-by: Keith Busch Signed-off-by: Christoph Hellwig

nvmet: use IOCB_NOWAIT only if the filesystem supports it

2021-11-25T14:02:40Z

Submit I/O requests with the IOCB_NOWAIT flag set only if the underlying filesystem supports it. Fixes: 50a909db36f2 ("nvmet: use IOCB_NOWAIT for file-ns buffered I/O") Signed-off-by: Maurizio Lombardi Reviewed-by: Chaitanya Kulkarni Signed-off-by: Christoph Hellwig

nvme: fix write zeroes pi

2021-11-23T16:22:41Z

Write Zeroes sets PRACT when block integrity is enabled (as it should), but neglects to also set the reftag which is expected by reads. This causes protection errors on reads. Fix this by setting the reftag for type 1 and 2 (for type 3, reads will not check the reftag). Signed-off-by: Klaus Jensen Reviewed-by: Martin K. Petersen Signed-off-by: Christoph Hellwig

nvme-fabrics: ignore invalid fast_io_fail_tmo values

2021-11-23T16:22:41Z

Valid fast_io_fail_tmo values are integers >= 0 or -1 (disabled). Prevent userspace from setting arbitrary negative values. Signed-off-by: Maurizio Lombardi Reviewed-by: Sagi Grimberg Reviewed-by: Chaitanya Kulkarni Signed-off-by: Christoph Hellwig

nvme-pci: add NO APST quirk for Kioxia device

2021-11-23T16:22:41Z

This particular Kioxia device times out and aborts I/O during any load, but it's more easily observable with discards (fstrim). The device gets to a state that is also not possible to use "nvme set-feature" to disable APST. Booting with nvme_core.default_ps_max_latency=0 solves the issue. We had a dozen or so of these devices behaving this same way in customer environments. Signed-off-by: Enzo Matsumiya Signed-off-by: Christoph Hellwig