commit 2dd2a1740ee19cd2636d247276cf27bfa434b0e2 upstream.
A recent change to ndctl to attempt to reconfigure namespaces in place
uncovered a label accounting problem in block-window-type namespaces.
The ndctl "create.sh" test is able to trigger this signature:
WARNING: CPU: 34 PID: 9167 at drivers/nvdimm/label.c:1100 __blk_label_update+0x9a3/0xbc0 [libnvdimm]
[..]
RIP: 0010:__blk_label_update+0x9a3/0xbc0 [libnvdimm]
[..]
Call Trace:
uuid_store+0x21b/0x2f0 [libnvdimm]
kernfs_fop_write+0xcf/0x1c0
vfs_write+0xcc/0x380
ksys_write+0x68/0xe0
When allocated capacity for a namespace is renamed (new UUID) the labels
with the old UUID need to be deleted. The ndctl behavior to always
destroy namespaces on reconfiguration hid this problem.
The immediate impact of this bug is limited since block-window-type
namespaces only seem to exist in the specification and not in any
shipping products. However, the label handling code is being reused for
other technologies like CXL region labels, so there is a benefit to
making sure both vertical labels sets (block-window) and horizontal
label sets (pmem) have a functional reference implementation in
libnvdimm.
Fixes: c4703ce11c ("libnvdimm/namespace: Fix label tracking error")
Cc: <stable@vger.kernel.org>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 4c46764733c85b82c07e9559b39da4d00a7dd659 ]
Forget to set error code when nd_label_alloc_slot failed, and we
add it to avoid overwritten error code.
Fixes: 0ba1c63489 ("libnvdimm: write blk label set")
Signed-off-by: Zhang Qilong <zhangqilong3@huawei.com>
Link: https://lore.kernel.org/r/20201205115056.2076523-1-zhangqilong3@huawei.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
There's no strict requirement why slot_valid() needs to check for page alignment
and it would seem to actively hurt cross-page-size compatibility. Let's
delete the check and rely on checksum validation.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Link: https://lore.kernel.org/r/20190905154603.10349-6-aneesh.kumar@linux.ibm.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Based on 1 normalized pattern(s):
this program is free software you can redistribute it and or modify
it under the terms of version 2 of the gnu general public license as
published by the free software foundation this program is
distributed in the hope that it will be useful but without any
warranty without even the implied warranty of merchantability or
fitness for a particular purpose see the gnu general public license
for more details
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-only
has been chosen to replace the boilerplate/reference in 64 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexios Zavras <alexios.zavras@intel.com>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190529141901.894819585@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Several places (dimm_devs.c, core.c etc) include label.h but only
label.c uses NSINDEX_SIGNATURE, so move its definition to label.c
instead.
In file included from drivers/nvdimm/dimm_devs.c:23:
drivers/nvdimm/label.h:41:19: warning: 'NSINDEX_SIGNATURE' defined but
not used [-Wunused-const-variable=]
Also, some places abuse "/**" which is only reserved for the kernel-doc.
drivers/nvdimm/bus.c:648: warning: cannot understand function prototype:
'struct attribute_group nd_device_attribute_group = '
drivers/nvdimm/bus.c:677: warning: cannot understand function prototype:
'struct attribute_group nd_numa_attribute_group = '
Those are just some member assignments for the "struct attribute_group"
instances and it can't be expressed in the kernel-doc.
Reviewed-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Users have reported intermittent occurrences of DIMM initialization
failures due to duplicate allocations of address capacity detected in
the labels, or errors of the form below, both have the same root cause.
nd namespace1.4: failed to track label: 0
WARNING: CPU: 17 PID: 1381 at drivers/nvdimm/label.c:863
RIP: 0010:__pmem_label_update+0x56c/0x590 [libnvdimm]
Call Trace:
? nd_pmem_namespace_label_update+0xd6/0x160 [libnvdimm]
nd_pmem_namespace_label_update+0xd6/0x160 [libnvdimm]
uuid_store+0x17e/0x190 [libnvdimm]
kernfs_fop_write+0xf0/0x1a0
vfs_write+0xb7/0x1b0
ksys_write+0x57/0xd0
do_syscall_64+0x60/0x210
Unfortunately those reports were typically with a busy parallel
namespace creation / destruction loop making it difficult to see the
components of the bug. However, Jane provided a simple reproducer using
the work-in-progress sub-section implementation.
When ndctl is reconfiguring a namespace it may take an existing defunct
/ disabled namespace and reconfigure it with a new uuid and other
parameters. Critically namespace_update_uuid() takes existing address
resources and renames them for the new namespace to use / reconfigure as
it sees fit. The bug is that this rename only happens in the resource
tracking tree. Existing labels with the old uuid are not reaped leading
to a scenario where multiple active labels reference the same span of
address range.
Teach namespace_update_uuid() to flag any references to the old uuid for
reaping at the next label update attempt.
Cc: <stable@vger.kernel.org>
Fixes: bf9bccc14c ("libnvdimm: pmem label sets and namespace instantiation")
Link: https://github.com/pmem/ndctl/issues/91
Reported-by: Jane Chu <jane.chu@oracle.com>
Reported-by: Jeff Moyer <jmoyer@redhat.com>
Reported-by: Erwin Tsaur <erwin.tsaur@oracle.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
As Dexuan reports the NVDIMM_FAMILY_HYPERV platform is incompatible with
the existing Linux namespace implementation because it uses
NSLABEL_FLAG_LOCAL for x1-width PMEM interleave sets. Quirk it as an
platform / DIMM that does not provide BLK-aperture access. Allow the
libnvdimm core to assume no potential for aliasing. In case other
implementations make the same mistake, provide a "noblk" module
parameter to force-enable the quirk.
Link: https://lkml.kernel.org/r/PU1P153MB0169977604493B82B662A01CBF920@PU1P153MB0169.APCP153.PROD.OUTLOOK.COM
Reported-by: Dexuan Cui <decui@microsoft.com>
Tested-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The UEFI 2.7 specification sets expectations that the 'updating' flag is
eventually cleared. To date, the libnvdimm core has never adhered to
that protocol. The policy of the core matches the policy of other
multi-device info-block formats like MD-Software-RAID that expect
administrator intervention on inconsistent info-blocks, not automatic
invalidation.
However, some pre-boot environments may unfortunately attempt to "clean
up" the labels and invalidate a set when it fails to find at least one
"non-updating" label in the set. Clear the updating flag after set
updates to minimize the window of vulnerability to aggressive pre-boot
environments.
Ideally implementations would not write to the label area outside of
creating namespaces.
Note that this only minimizes the window, it does not close it as the
system can still crash while clearing the flag and the set can be
subsequently deleted / invalidated by the pre-boot environment.
Fixes: f524bf271a ("libnvdimm: write pmem label set")
Cc: <stable@vger.kernel.org>
Cc: Kelly Couch <kelly.j.couch@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Switch to bitmap_zalloc() to show clearly what we are allocating.
Besides that it returns pointer of bitmap type instead of opaque void *.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The kbuild robot reports:
drivers/nvdimm/label.c:500:32: warning: restricted __le32 degrades to integer
...read 'nslot' into a local u32.
Reported-by: kbuild test robot <lkp@intel.com>
Acked-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
This patch adds logic that is meant to make use of the namespace index data
to reduce the number of reads that are needed to initialize a given
namespace. The general idea is that once we have enough data to validate
the namespace index we do so and then proceed to fetch only those labels
that are not listed as being "free". By doing this I am seeing a total time
reduction from about 4-5 seconds to 2-3 seconds for 24 NVDIMM modules each
with 128K of label config area.
Reviewed-by: Toshi Kani <toshi.kani@hpe.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
This patch splits the initialization of the label data into two functions.
One for doing the init, and another for reading the actual configuration
data. The idea behind this is that by doing this we create a symmetry
between the getting and setting of config data in that we have a function
for both. In addition it will make it easier for us to identify the bits
that are related to init versus the pieces that are a wrapper for reading
data from the ACPI interface.
So for example by splitting things out like this it becomes much more
obvious that we were performing checks that weren't necessarily related to
the set/get operations such as relying on ndd->data being present when the
set and get ops should not care about a locally cached copy of the label
area.
Reviewed-by: Toshi Kani <toshi.kani@hpe.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
This patch removes an empty statement from an if expression and promotes
the else statement to the if expression with the expression logic reversed.
I feel this is more readable as the empty statement can lead to issues if
any additional logic was ever added.
Reviewed-by: Toshi Kani <toshi.kani@hpe.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
When working on the label code I found it rather confusing to see several
spots that reference a minimum label size of 256 while working with labels
that are 128 bytes in size.
This patch is meant to provide a clarification on one of the comments that
was at the heart of the issue. Specifically for version 1.2 and later of
the namespace specification the minimum label size is 256, prior to that
the minimum label size was 128. So we should state that as such to avoid
confusion.
Reviewed-by: Toshi Kani <toshi.kani@hpe.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
This patch adds validation for the labeloff field in the indexes.
Reviewed-by: Toshi Kani <toshi.kani@hpe.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
sizeof_namespace_index() fails when NVDIMM devices have the minimum
1024 bytes label storage area. nvdimm_num_label_slots() returns 3
slots while the area is only big enough for 2 slots.
Change nvdimm_num_label_slots() to calculate a number of label slots
according to UEFI 2.7 spec.
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dynamic debug can be instructed to add the function name to the debug
output using the +f switch, so there is no need for the libnvdimm
modules to do it again. If a user decides to add the +f switch for
libnvdimm's dynamic debug this results in double prints of the function
name.
Reported-by: Johannes Thumshirn <jthumshirn@suse.de>
Reported-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The set of valid sequence numbers is {1,2,3}. The specification
indicates that an implementation should consider 0 a sign of a critical
error:
UEFI 2.7: 13.19 NVDIMM Label Protocol
Software never writes the sequence number 00, so a correctly
check-summed Index Block with this sequence number probably indicates a
critical error. When software discovers this case it treats it as an
invalid Index Block indication.
While the expectation is that the invalid block is just thrown away, the
Robustness Principle says we should fix this to make both sequence
numbers valid.
Fixes: f524bf271a ("libnvdimm: write pmem label set")
Cc: <stable@vger.kernel.org>
Reported-by: Juston Li <juston.li@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The old calculation assumed that the label space was 128k and the label
size is 128. With v1.2 labels where the label size is 256 this
calculation will return zero. We are saved by the fact that the
nsindex_size is always pre-initialized from a previous 128 byte
assumption and we are lucky that the index sizes turn out the same.
Fix this going forward in case we start encountering different
geometries of label areas besides 128k.
Since the label size can change from one call to the next, drop the
caching of nsindex_size.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Commit f979b13c3c "libnvdimm, label: honor the lba size specified in
v1.2 labels") neglected to update the 'lbasize' in the label when the
namespace sector_size attribute was written. We need this value in the
label for inter-OS / pre-OS compatibility.
Fixes: f979b13c3c ("libnvdimm, label: honor the lba size specified in v1.2 labels")
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The UEFI 2.7 specification defines an updated BTT metadata format,
bumping the revision to 2.0. Add support for the new format, while
retaining compatibility for the old 1.1 format.
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Linda Knippers <linda.knippers@hpe.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The rules for which version of the label specification are in effect at
any given point in time are as follows:
1/ If a DIMM has an existing / valid index block then the version
specified is used regardless if it is a previous version.
2/ By default when the kernel is initializing new index blocks the
latest specification version (v1.2 at time of writing) is used.
3/ An environment that wants to force create v1.1 label-sets must
arrange for userspace to disable all active regions / namespaces /
dimms and write a valid set of v1.1 index blocks to the dimms.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Starting with v1.2 labels, 'address abstractions' can be hinted via an
address abstraction id that implies an info-block format. The standard
address abstraction in the specification is the v2 format of the
Block-Translation-Table (BTT). Support for that is saved for a later
patch, for now we add support for the Linux supported address
abstractions BTT (v1), PFN, and DAX.
The new 'holder_class' attribute for namespace devices is added for
tooling to specify the 'abstraction_guid' to store in the namespace label.
For v1.1 labels this field is undefined and any setting of
'holder_class' away from the default 'none' value will only have effect
until the driver is unloaded. Setting 'holder_class' requires that
whatever device tries to claim the namespace must be of the specified
class.
Cc: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The v1.2 namespace label specification adds a fletcher checksum to each
label instance. Add generation and validation support for the new field.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The v1.2 namespace label specification requires 'nlabel' and 'position'
to be valid for the first ("lowest dpa") label in the set. It also
requires all non-first labels to set those fields to 0xff.
Linux does not much care if these values are correct, because we can
just trust the count of labels with the matching uuid like the v1.1
case. However, we set them correctly in case other environments care.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Starting with the v1.2 definition of namespace labels, the isetcookie
field is populated and validated for blk-aperture namespaces. This adds
some safety against inadvertent copying of namespace labels from one
DIMM-device to another.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The type_guid refers to the "Address Range Type GUID" for the region
backing a namespace as defined the ACPI NFIT (NVDIMM Firmware Interface
Table). This 'type' identifier specifies an access mechanism for the
given namespace. This capability replaces the confusing usage of the
'NSLABEL_FLAG_LOCAL' flag to indicate a block-aperture-mode namespace.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The interleave-set-cookie algorithm is extended to incorporate all the
same components that are used to generate an nvdimm unique-id. For
backwards compatibility we still maintain the old v1.1 definition.
Reported-by: Nicholas Moulin <nicholas.w.moulin@intel.com>
Reported-by: Kaushik Kanetkar <kaushik.a.kanetkar@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
In support of improved interoperability between operating systems and pre-boot
environments the Intel proposed NVDIMM Namespace Specification [1], has been
adopted and modified to the the UEFI 2.7 NVDIMM Label Protocol [2].
Update the definitions of the namespace label data structures so that the new
format can be supported alongside the existing label format.
The new specification changes the default label size to 256 bytes, so
everywhere that relied on sizeof(struct nd_namespace_label) must now use the
sizeof_namespace_label() helper.
There should be no functional differences from these changes as the
default is still the v1.1 128-byte format. Future patches will move the
default to the v1.2 definition.
[1]: http://pmem.io/documents/NVDIMM_Namespace_Spec.pdf
[2]: http://www.uefi.org/sites/default/files/resources/UEFI_Spec_2_7.pdf
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
In order to test that the name of a resource begins with "pmem", call
strncmp() with 4 as length instead of 3 to match the whole prefix.
Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Instead of assuming that there will only ever be one allocated range at
the start of the region, account for additional namespaces that might
start at an offset from the region base.
After this change pmem namespaces now have a reason to carry an array of
resources similar to blk. Unifying the resource tracking infrastructure
in nd_namespace_common is a future cleanup candidate.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
In preparation for enabling multiple namespaces per pmem region, convert
the label tracking to use a linked list. In particular this will allow
select_pmem_id() to move labels from the unvalidated state to the
validated state. Currently we only track one validated set per-region.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
NVDIMM namespaces, in addition to accepting "struct bio" based requests,
also have the capability to perform byte-aligned accesses. By default
only the bio/block interface is used. However, if another driver can
make effective use of the byte-aligned capability it can claim namespace
interface and use the byte-aligned ->rw_bytes() interface.
The BTT driver is the initial first consumer of this mechanism to allow
adding atomic sector update semantics to a pmem or blk namespace. This
patch is the sysfs infrastructure to allow configuring a BTT instance
for a namespace. Enabling that BTT and performing i/o is in a
subsequent patch.
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
After 'uuid', 'size', 'sector_size', and optionally 'alt_name' have been
set to valid values the labels on the dimm can be updated. The
difference with the pmem case is that blk namespaces are limited to one
dimm and can cover discontiguous ranges in dpa space.
Also, after allocating label slots, it is useful for userspace to know
how many slots are left. Export this information in sysfs.
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Neil Brown <neilb@suse.de>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
After 'uuid', 'size', and optionally 'alt_name' have been set to valid
values the labels on the dimms can be updated.
Write procedure is:
1/ Allocate and write new labels in the "next" index
2/ Free the old labels in the working copy
3/ Write the bitmap and the label space on the dimm
4/ Write the index to make the update valid
Label ranges directly mirror the dpa resource values for the given
label_id of the namespace.
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Neil Brown <neilb@suse.de>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
A complete label set is a PMEM-label per-dimm per-interleave-set where
all the UUIDs match and the interleave set cookie matches the hosting
interleave set.
Present sysfs attributes for manipulation of a PMEM-namespace's
'alt_name', 'uuid', and 'size' attributes. A later patch will make
these settings persistent by writing back the label.
Note that PMEM allocations grow forwards from the start of an interleave
set (lowest dimm-physical-address (DPA)). BLK-namespaces that alias
with a PMEM interleave set will grow allocations backward from the
highest DPA.
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Neil Brown <neilb@suse.de>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
This on media label format [1] consists of two index blocks followed by
an array of labels. None of these structures are ever updated in place.
A sequence number tracks the current active index and the next one to
write, while labels are written to free slots.
+------------+
| |
| nsindex0 |
| |
+------------+
| |
| nsindex1 |
| |
+------------+
| label0 |
+------------+
| label1 |
+------------+
| |
....nslot...
| |
+------------+
| labelN |
+------------+
After reading valid labels, store the dpa ranges they claim into
per-dimm resource trees.
[1]: http://pmem.io/documents/NVDIMM_Namespace_Spec.pdf
Cc: Neil Brown <neilb@suse.de>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>