forked from luck/tmp_suning_uos_patched
9303c9d5e9
The :c:type:`foo` only works properly with structs before Sphinx 3.x. On Sphinx 3.x, structs should now be declared using the .. c:struct, and referenced via :c:struct tag. As we now have the automarkup.py macro, that automatically convert: struct foo into cross-references, let's get rid of that, solving several warnings when building docs with Sphinx 3.x. Reviewed-by: André Almeida <andrealmeid@collabora.com> # blk-mq.rst Reviewed-by: Takashi Iwai <tiwai@suse.de> # sound Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
264 lines
14 KiB
ReStructuredText
264 lines
14 KiB
ReStructuredText
.. SPDX-License-Identifier: GPL-2.0
|
|
|
|
=================
|
|
Inline Encryption
|
|
=================
|
|
|
|
Background
|
|
==========
|
|
|
|
Inline encryption hardware sits logically between memory and the disk, and can
|
|
en/decrypt data as it goes in/out of the disk. Inline encryption hardware has a
|
|
fixed number of "keyslots" - slots into which encryption contexts (i.e. the
|
|
encryption key, encryption algorithm, data unit size) can be programmed by the
|
|
kernel at any time. Each request sent to the disk can be tagged with the index
|
|
of a keyslot (and also a data unit number to act as an encryption tweak), and
|
|
the inline encryption hardware will en/decrypt the data in the request with the
|
|
encryption context programmed into that keyslot. This is very different from
|
|
full disk encryption solutions like self encrypting drives/TCG OPAL/ATA
|
|
Security standards, since with inline encryption, any block on disk could be
|
|
encrypted with any encryption context the kernel chooses.
|
|
|
|
|
|
Objective
|
|
=========
|
|
|
|
We want to support inline encryption (IE) in the kernel.
|
|
To allow for testing, we also want a crypto API fallback when actual
|
|
IE hardware is absent. We also want IE to work with layered devices
|
|
like dm and loopback (i.e. we want to be able to use the IE hardware
|
|
of the underlying devices if present, or else fall back to crypto API
|
|
en/decryption).
|
|
|
|
|
|
Constraints and notes
|
|
=====================
|
|
|
|
- IE hardware has a limited number of "keyslots" that can be programmed
|
|
with an encryption context (key, algorithm, data unit size, etc.) at any time.
|
|
One can specify a keyslot in a data request made to the device, and the
|
|
device will en/decrypt the data using the encryption context programmed into
|
|
that specified keyslot. When possible, we want to make multiple requests with
|
|
the same encryption context share the same keyslot.
|
|
|
|
- We need a way for upper layers like filesystems to specify an encryption
|
|
context to use for en/decrypting a struct bio, and a device driver (like UFS)
|
|
needs to be able to use that encryption context when it processes the bio.
|
|
|
|
- We need a way for device drivers to expose their inline encryption
|
|
capabilities in a unified way to the upper layers.
|
|
|
|
|
|
Design
|
|
======
|
|
|
|
We add a struct bio_crypt_ctx to struct bio that can
|
|
represent an encryption context, because we need to be able to pass this
|
|
encryption context from the upper layers (like the fs layer) to the
|
|
device driver to act upon.
|
|
|
|
While IE hardware works on the notion of keyslots, the FS layer has no
|
|
knowledge of keyslots - it simply wants to specify an encryption context to
|
|
use while en/decrypting a bio.
|
|
|
|
We introduce a keyslot manager (KSM) that handles the translation from
|
|
encryption contexts specified by the FS to keyslots on the IE hardware.
|
|
This KSM also serves as the way IE hardware can expose its capabilities to
|
|
upper layers. The generic mode of operation is: each device driver that wants
|
|
to support IE will construct a KSM and set it up in its struct request_queue.
|
|
Upper layers that want to use IE on this device can then use this KSM in
|
|
the device's struct request_queue to translate an encryption context into
|
|
a keyslot. The presence of the KSM in the request queue shall be used to mean
|
|
that the device supports IE.
|
|
|
|
The KSM uses refcounts to track which keyslots are idle (either they have no
|
|
encryption context programmed, or there are no in-flight struct bios
|
|
referencing that keyslot). When a new encryption context needs a keyslot, it
|
|
tries to find a keyslot that has already been programmed with the same
|
|
encryption context, and if there is no such keyslot, it evicts the least
|
|
recently used idle keyslot and programs the new encryption context into that
|
|
one. If no idle keyslots are available, then the caller will sleep until there
|
|
is at least one.
|
|
|
|
|
|
blk-mq changes, other block layer changes and blk-crypto-fallback
|
|
=================================================================
|
|
|
|
We add a pointer to a ``bi_crypt_context`` and ``keyslot`` to
|
|
struct request. These will be referred to as the ``crypto fields``
|
|
for the request. This ``keyslot`` is the keyslot into which the
|
|
``bi_crypt_context`` has been programmed in the KSM of the ``request_queue``
|
|
that this request is being sent to.
|
|
|
|
We introduce ``block/blk-crypto-fallback.c``, which allows upper layers to remain
|
|
blissfully unaware of whether or not real inline encryption hardware is present
|
|
underneath. When a bio is submitted with a target ``request_queue`` that doesn't
|
|
support the encryption context specified with the bio, the block layer will
|
|
en/decrypt the bio with the blk-crypto-fallback.
|
|
|
|
If the bio is a ``WRITE`` bio, a bounce bio is allocated, and the data in the bio
|
|
is encrypted stored in the bounce bio - blk-mq will then proceed to process the
|
|
bounce bio as if it were not encrypted at all (except when blk-integrity is
|
|
concerned). ``blk-crypto-fallback`` sets the bounce bio's ``bi_end_io`` to an
|
|
internal function that cleans up the bounce bio and ends the original bio.
|
|
|
|
If the bio is a ``READ`` bio, the bio's ``bi_end_io`` (and also ``bi_private``)
|
|
is saved and overwritten by ``blk-crypto-fallback`` to
|
|
``bio_crypto_fallback_decrypt_bio``. The bio's ``bi_crypt_context`` is also
|
|
overwritten with ``NULL``, so that to the rest of the stack, the bio looks
|
|
as if it was a regular bio that never had an encryption context specified.
|
|
``bio_crypto_fallback_decrypt_bio`` will decrypt the bio, restore the original
|
|
``bi_end_io`` (and also ``bi_private``) and end the bio again.
|
|
|
|
Regardless of whether real inline encryption hardware is used or the
|
|
blk-crypto-fallback is used, the ciphertext written to disk (and hence the
|
|
on-disk format of data) will be the same (assuming the hardware's implementation
|
|
of the algorithm being used adheres to spec and functions correctly).
|
|
|
|
If a ``request queue``'s inline encryption hardware claimed to support the
|
|
encryption context specified with a bio, then it will not be handled by the
|
|
``blk-crypto-fallback``. We will eventually reach a point in blk-mq when a
|
|
struct request needs to be allocated for that bio. At that point,
|
|
blk-mq tries to program the encryption context into the ``request_queue``'s
|
|
keyslot_manager, and obtain a keyslot, which it stores in its newly added
|
|
``keyslot`` field. This keyslot is released when the request is completed.
|
|
|
|
When the first bio is added to a request, ``blk_crypto_rq_bio_prep`` is called,
|
|
which sets the request's ``crypt_ctx`` to a copy of the bio's
|
|
``bi_crypt_context``. bio_crypt_do_front_merge is called whenever a subsequent
|
|
bio is merged to the front of the request, which updates the ``crypt_ctx`` of
|
|
the request so that it matches the newly merged bio's ``bi_crypt_context``. In particular, the request keeps a copy of the ``bi_crypt_context`` of the first
|
|
bio in its bio-list (blk-mq needs to be careful to maintain this invariant
|
|
during bio and request merges).
|
|
|
|
To make it possible for inline encryption to work with request queue based
|
|
layered devices, when a request is cloned, its ``crypto fields`` are cloned as
|
|
well. When the cloned request is submitted, blk-mq programs the
|
|
``bi_crypt_context`` of the request into the clone's request_queue's keyslot
|
|
manager, and stores the returned keyslot in the clone's ``keyslot``.
|
|
|
|
|
|
API presented to users of the block layer
|
|
=========================================
|
|
|
|
``struct blk_crypto_key`` represents a crypto key (the raw key, size of the
|
|
key, the crypto algorithm to use, the data unit size to use, and the number of
|
|
bytes required to represent data unit numbers that will be specified with the
|
|
``bi_crypt_context``).
|
|
|
|
``blk_crypto_init_key`` allows upper layers to initialize such a
|
|
``blk_crypto_key``.
|
|
|
|
``bio_crypt_set_ctx`` should be called on any bio that a user of
|
|
the block layer wants en/decrypted via inline encryption (or the
|
|
blk-crypto-fallback, if hardware support isn't available for the desired
|
|
crypto configuration). This function takes the ``blk_crypto_key`` and the
|
|
data unit number (DUN) to use when en/decrypting the bio.
|
|
|
|
``blk_crypto_config_supported`` allows upper layers to query whether or not the
|
|
an encryption context passed to request queue can be handled by blk-crypto
|
|
(either by real inline encryption hardware, or by the blk-crypto-fallback).
|
|
This is useful e.g. when blk-crypto-fallback is disabled, and the upper layer
|
|
wants to use an algorithm that may not supported by hardware - this function
|
|
lets the upper layer know ahead of time that the algorithm isn't supported,
|
|
and the upper layer can fallback to something else if appropriate.
|
|
|
|
``blk_crypto_start_using_key`` - Upper layers must call this function on
|
|
``blk_crypto_key`` and a ``request_queue`` before using the key with any bio
|
|
headed for that ``request_queue``. This function ensures that either the
|
|
hardware supports the key's crypto settings, or the crypto API fallback has
|
|
transforms for the needed mode allocated and ready to go. Note that this
|
|
function may allocate an ``skcipher``, and must not be called from the data
|
|
path, since allocating ``skciphers`` from the data path can deadlock.
|
|
|
|
``blk_crypto_evict_key`` *must* be called by upper layers before a
|
|
``blk_crypto_key`` is freed. Further, it *must* only be called only once
|
|
there are no more in-flight requests that use that ``blk_crypto_key``.
|
|
``blk_crypto_evict_key`` will ensure that a key is removed from any keyslots in
|
|
inline encryption hardware that the key might have been programmed into (or the blk-crypto-fallback).
|
|
|
|
API presented to device drivers
|
|
===============================
|
|
|
|
A :c:type:``struct blk_keyslot_manager`` should be set up by device drivers in
|
|
the ``request_queue`` of the device. The device driver needs to call
|
|
``blk_ksm_init`` on the ``blk_keyslot_manager``, which specifying the number of
|
|
keyslots supported by the hardware.
|
|
|
|
The device driver also needs to tell the KSM how to actually manipulate the
|
|
IE hardware in the device to do things like programming the crypto key into
|
|
the IE hardware into a particular keyslot. All this is achieved through the
|
|
struct blk_ksm_ll_ops field in the KSM that the device driver
|
|
must fill up after initing the ``blk_keyslot_manager``.
|
|
|
|
The KSM also handles runtime power management for the device when applicable
|
|
(e.g. when it wants to program a crypto key into the IE hardware, the device
|
|
must be runtime powered on) - so the device driver must also set the ``dev``
|
|
field in the ksm to point to the `struct device` for the KSM to use for runtime
|
|
power management.
|
|
|
|
``blk_ksm_reprogram_all_keys`` can be called by device drivers if the device
|
|
needs each and every of its keyslots to be reprogrammed with the key it
|
|
"should have" at the point in time when the function is called. This is useful
|
|
e.g. if a device loses all its keys on runtime power down/up.
|
|
|
|
``blk_ksm_destroy`` should be called to free up all resources used by a keyslot
|
|
manager upon ``blk_ksm_init``, once the ``blk_keyslot_manager`` is no longer
|
|
needed.
|
|
|
|
|
|
Layered Devices
|
|
===============
|
|
|
|
Request queue based layered devices like dm-rq that wish to support IE need to
|
|
create their own keyslot manager for their request queue, and expose whatever
|
|
functionality they choose. When a layered device wants to pass a clone of that
|
|
request to another ``request_queue``, blk-crypto will initialize and prepare the
|
|
clone as necessary - see ``blk_crypto_insert_cloned_request`` in
|
|
``blk-crypto.c``.
|
|
|
|
|
|
Future Optimizations for layered devices
|
|
========================================
|
|
|
|
Creating a keyslot manager for a layered device uses up memory for each
|
|
keyslot, and in general, a layered device merely passes the request on to a
|
|
"child" device, so the keyslots in the layered device itself are completely
|
|
unused, and don't need any refcounting or keyslot programming. We can instead
|
|
define a new type of KSM; the "passthrough KSM", that layered devices can use
|
|
to advertise an unlimited number of keyslots, and support for any encryption
|
|
algorithms they choose, while not actually using any memory for each keyslot.
|
|
Another use case for the "passthrough KSM" is for IE devices that do not have a
|
|
limited number of keyslots.
|
|
|
|
|
|
Interaction between inline encryption and blk integrity
|
|
=======================================================
|
|
|
|
At the time of this patch, there is no real hardware that supports both these
|
|
features. However, these features do interact with each other, and it's not
|
|
completely trivial to make them both work together properly. In particular,
|
|
when a WRITE bio wants to use inline encryption on a device that supports both
|
|
features, the bio will have an encryption context specified, after which
|
|
its integrity information is calculated (using the plaintext data, since
|
|
the encryption will happen while data is being written), and the data and
|
|
integrity info is sent to the device. Obviously, the integrity info must be
|
|
verified before the data is encrypted. After the data is encrypted, the device
|
|
must not store the integrity info that it received with the plaintext data
|
|
since that might reveal information about the plaintext data. As such, it must
|
|
re-generate the integrity info from the ciphertext data and store that on disk
|
|
instead. Another issue with storing the integrity info of the plaintext data is
|
|
that it changes the on disk format depending on whether hardware inline
|
|
encryption support is present or the kernel crypto API fallback is used (since
|
|
if the fallback is used, the device will receive the integrity info of the
|
|
ciphertext, not that of the plaintext).
|
|
|
|
Because there isn't any real hardware yet, it seems prudent to assume that
|
|
hardware implementations might not implement both features together correctly,
|
|
and disallow the combination for now. Whenever a device supports integrity, the
|
|
kernel will pretend that the device does not support hardware inline encryption
|
|
(by essentially setting the keyslot manager in the request_queue of the device
|
|
to NULL). When the crypto API fallback is enabled, this means that all bios with
|
|
and encryption context will use the fallback, and IO will complete as usual.
|
|
When the fallback is disabled, a bio with an encryption context will be failed.
|