It's always to same, so no need to put in the PTE every time we're
about to run. Keep a flag to track whether the pagetable has the
Switcher entries allocated, and when allocating always initialize the
Switcher text PTE.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We currently use the whole top PGD entry for the switcher, so we
simply share a fixed page of PTEs between all guests (actually, it's
one per Host CPU, to ensure isolation between guests).
Changes to a scheme where every guest has its own mappings.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We want a separate find_pte() function so we can call it for populating the
switcher PTE entries.
We can also use it in page_writable().
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is a bit neater: we can immediately return if a PTE/PGD/PMD entry
is invalid (which also kills the guest). It means we don't risk using
invalid entries as we reshuffle the code.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
ie. SHARED_SWITCHER_PAGES == 1. It is well under a page, and it's a
minor simplification: it's nice to have *one* simplification in a
patch series!
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We currently assume that the Switcher the top pgd; we want to remove
this assumption, so check that vaddr is OK, rather then checking pgd
index.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We currently use the whole top PGD entry for the switcher, but that's
hitting the fixmap in some configurations (mainly, large NR_CPUS).
Introduce a variable, currently set to the constant.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Returning EMFILE (process has too many open files) is incorrect to
indicate a port is already open by another process. Use EBUSY for that.
This does change what we report to userspace, but I believe userspace
can look at it this way: it gets EBUSY, a new error code, instead of
EMFILE. It's still an error, and that's not changing.
Reported-by: Mateusz Guzik <mguzik@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Add hot cpu notifier to reset the request virtqueue affinity
when doing cpu hotplug.
Cc: linux-scsi@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
Reviewed-by: Asias He <asias@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This patch adds queue steering to virtio-scsi. When a target is sent
multiple requests, we always drive them to the same queue so that FIFO
processing order is kept. However, if a target was idle, we can choose
a queue arbitrarily. In this case the queue is chosen according to the
current VCPU, so the driver expects the number of request queues to be
equal to the number of VCPUs. This makes it easy and fast to select
the queue, and also lets the driver optimize the IRQ affinity for the
virtqueues (each virtqueue's affinity is set to the CPU that "owns"
the queue).
The speedup comes from improving cache locality and giving CPU affinity
to the virtqueues, which is why this scheme was selected. Assuming that
the thread that is sending requests to the device is I/O-bound, it is
likely to be sleeping at the time the ISR is executed, and thus executing
the ISR on the same processor that sent the requests is cheap.
However, the kernel will not execute the ISR on the "best" processor
unless you explicitly set the affinity. This is because in practice
you will have many such I/O-bound processes and thus many otherwise
idle processors. Then the kernel will execute the ISR on a random
processor, rather than the one that is sending requests to the device.
The alternative to per-CPU virtqueues is per-target virtqueues. To
achieve the same locality, we could dynamically choose the virtqueue's
affinity based on the CPU of the last task that sent a request. This
is less appealing because we do not set the affinity directly---we only
provide a hint to the irqbalanced running in userspace. Dynamically
changing the affinity only works if the userspace applies the hint
fast enough.
Cc: linux-scsi@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
Reviewed-by: Asias He <asias@redhat.com>
Tested-by: Venkatesh Srinivas <venkateshs@google.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Avoid duplicated code in all of the callers.
Cc: linux-scsi@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
Reviewed-by: Asias He <asias@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This will be needed soon in order to retrieve the per-target
struct.
Cc: linux-scsi@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
Reviewed-by: Asias He <asias@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
virtio_scsi_target_state is now empty. We will find new uses for it in
the next few patches, so this patch does not drop it completely.
And as James suggested, we use entries target_alloc and target_destroy
in the host template to allocate and destroy the virtio_scsi_target_state
of each target, attach this struct to scsi_target->hostdata. Now
we can get at it from the sdev with scsi_target(sdev)->hostdata.
No messing around with fixed size arrays and bulk memory allocation
and no need to pass in the maximum target size as a parameter because
everything should now happen dynamically.
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: linux-scsi@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
Reviewed-by: Asias He <asias@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Those symbols only used within this file, and should be static.
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Acked-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Fix to return a negative error code from the error handling
case instead of 0, as returned elsewhere in this function.
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Acked-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Some head files were split or moved to uapi/ without
updating MAINTAINERS.
Signed-off-by: Amos Kong <kongjianjun@gmail.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
virtio_balloon.h exports "u16" and "u64" to userspace. Use "__u16" and
"__u64" instead.
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Check that vringh_config is not NULL before using it.
Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Check on the correct return value from
vringh_notify_enable_kern(). It returns false if
more packets are available, not true.
Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
virtio_add_buf() is going away, replaced with virtio_add_sgs() which
takes multiple terminated scatterlists.
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We never add buffers with input and output parts, so use the new accessors.
Cc: Ohad Ben-Cohen <ohad@wizery.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We never add buffers with input and output parts, so use the new accessors.
Cc: Sjur Brendeland <sjur.brandeland@stericsson.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We never add buffers with input and output parts, so use the new accessors.
Acked-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We never add buffers with input and output parts, so use the new accessors.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Reviewed-by: Asias He <asias@redhat.com>
We never add buffers with input and output parts, so use the new accessors.
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Reviewed-by: Asias He <asias@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It's a bit cleaner to hand multiple sgs, rather than one big one.
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Tested-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Using the new virtqueue_add_sgs function lets us simplify the queueing
path. In particular, all data protected by the tgt_lock is just gone
(multiqueue will find a new use for the lock).
Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Asias He <asias@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It's simply a flag as to whether we have data now, so make it an
explicit function parameter rather than a member of struct
virtblk_req.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Reviewed-by: Asias He <asias@redhat.com>
(This is a respin of Paolo Bonzini's patch, but it calls
virtqueue_add_sgs() instead of his multi-part API).
This is similar to the previous patch, but a bit more radical
because the bio and req paths now share the buffer construction
code. Because the req path doesn't use vbr->sg, however, we
need to add a couple of arguments to __virtblk_add_req.
We also need to teach __virtblk_add_req how to build SCSI command
requests.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Reviewed-by: Asias He <asias@redhat.com>
(This is a respin of Paolo Bonzini's patch, but it calls
virtqueue_add_sgs() instead of his multi-part API).
Move the creation of the request header and response footer to
__virtblk_add_req. vbr->sg only contains the data scatterlist,
the header/footer are added separately using virtqueue_add_sgs().
With this change, virtio-blk (with use_bio) is not relying anymore on
the virtio functions ignoring the end markers in a scatterlist.
The next patch will do the same for the other path.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Reviewed-by: Asias He <asias@redhat.com>
Right now, both virtblk_add_req and virtblk_add_req_wait call
virtqueue_add_buf. To prepare for the next patches, abstract the call
to virtqueue_add_buf into a new function __virtblk_add_req, and include
the waiting logic directly in virtblk_add_req.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Asias He <asias@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
These are specialized versions of virtqueue_add_buf(), which cover
over 80% of cases and are far clearer.
In particular, the scatterlists passed to these functions don't have
to be clean (ie. we ignore end markers).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
virtio_scsi can really use this, to avoid the current hack of copying
the whole sg array. Some other things get slightly neater, too.
This causes a slowdown in virtqueue_add_buf(), which is implemented as
a wrapper. This is addressed in the next patches.
for i in `seq 50`; do /usr/bin/time -f 'Wall time:%e' ./vringh_test --indirect --eventidx --parallel --fast-vringh; done 2>&1 | stats --trim-outliers:
Before:
Using CPUS 0 and 3
Guest: notified 0, pinged 39009-39063(39062)
Host: notified 39009-39063(39062), pinged 0
Wall time:1.700000-1.950000(1.723542)
After:
Using CPUS 0 and 3
Guest: notified 0, pinged 39062-39063(39063)
Host: notified 39062-39063(39063), pinged 0
Wall time:1.760000-2.220000(1.789167)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Reviewed-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
Reviewed-by: Asias He <asias@redhat.com>
This is useful in places that recycle the same scatterlist multiple
times, and do not want to incur the cost of sg_init_table every
time in hot paths.
Acked-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Add the CAIF Virtio shared memory driver for talking
to a modem.
This CAIF Link layer communicates to the modem over
shared memory. It is implemented as a virtio_driver.
The underlying virtio device is managed by the remoteproc
framework. The Virtio queue is used for transmitting data
to the modem, and the new vringh is used for receiving data.
Genalloc is used for managing the shared memory used for TX
data. The default dma-alloc-coherent allocator can only
allocate whole pages, and this wastes too much shared memory.
Flow control is implemented by stopping the TX-queues if the
virtio queues go full or we run out of memory. Queued are
reopened when queues are below the watermark.
NAPI is used in RX path, and a dedicated tasklet is used
for releasing TX buffers.
Signed-off-by: Erwan Yvin <erwan.yvin@stericsson.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (minor fixes)
Add wrappers for the host vrings to support loose
coupling between the virtio device and driver.
A new struct vringh_config_ops with the functions
find_vrhs() and del_vrhs() is added to the virtio_device
struct. This enables virtio drivers to manage virtio
host rings without detailed knowledge of how the
vrings are created and deleted.
The function vringh_notify() is added so vringh clients
can notify the other side that buffers are added to the
used-ring.
Cc: Ohad Ben-Cohen <ohad@wizery.com>
Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (constified vringh_config)
This is mainly to test the drivers/vhost/vringh.c code, but it also
uses the drivers/virtio/virtio_ring.c code for the guest side.
Usage for testing the basic implementation:
./vringh_test
# Test with indirect descriptors
./vringh_test --indirect
# Test with indirect descriptors and event indexex
./vringh_test --indirect --eventidx
You can run a parallel stress test by adding --parallel to any of the
above options.
eg ./vringh_test --parallel:
Using CPUS 0 and 3
Guest: notified 10107974, pinged 107970
Host: notified 108158, pinged 3172148
./vringh_test --indirect --eventidx --parallel:
Using CPUS 0 and 3
Guest: notified 156357, pinged 156251
Host: notified 156251, pinged 78179
Average of 50 times doing ./vringh_test --indirect --eventidx --parallel:
2.840000-3.040000(2.927292)user
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Getting use of virtio rings correct is tricky, and a recent patch saw
an implementation of in-kernel rings (as separate from userspace).
This abstracts the business of dealing with the virtio ring layout
from the access (userspace or direct); to do this, we use function
pointers, which gcc inlines correctly.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
This makes them a bit more like the kernel headers, so we can include more
real kernel headers in our tests.
In addition this means that we don't break tools/virtio with the next
patch.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
When virtio-blk device is resized from host (using block_resize from QEMU) emit
KOBJ_CHANGE uevent to notify guest about such change. This allows user to have
custom udev rules which would take whatever action if such event occurs. As a
proof of concept I've created simple udev rule that automatically resize
filesystem on virtio-blk device.
ACTION=="change", KERNEL=="vd*", \
ENV{RESIZE}=="1", \
ENV{ID_FS_TYPE}=="ext[3-4]", \
RUN+="/sbin/resize2fs /dev/%k"
ACTION=="change", KERNEL=="vd*", \
ENV{RESIZE}=="1", \
ENV{ID_FS_TYPE}=="LVM2_member", \
RUN+="/sbin/pvresize /dev/%k"
Signed-off-by: Milos Vyletel <milos.vyletel@sde.cz>
Tested-by: Asias He <asias@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (minor simplification)