Merge branch 'kvm-updates/3.1' of git://git.kernel.org/pub/scm/virt/kvm/kvm

* 'kvm-updates/3.1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (143 commits)
  KVM: IOMMU: Disable device assignment without interrupt remapping
  KVM: MMU: trace mmio page fault
  KVM: MMU: mmio page fault support
  KVM: MMU: reorganize struct kvm_shadow_walk_iterator
  KVM: MMU: lockless walking shadow page table
  KVM: MMU: do not need atomicly to set/clear spte
  KVM: MMU: introduce the rules to modify shadow page table
  KVM: MMU: abstract some functions to handle fault pfn
  KVM: MMU: filter out the mmio pfn from the fault pfn
  KVM: MMU: remove bypass_guest_pf
  KVM: MMU: split kvm_mmu_free_page
  KVM: MMU: count used shadow pages on prepareing path
  KVM: MMU: rename 'pt_write' to 'emulate'
  KVM: MMU: cleanup for FNAME(fetch)
  KVM: MMU: optimize to handle dirty bit
  KVM: MMU: cache mmio info on page fault path
  KVM: x86: introduce vcpu_mmio_gva_to_gpa to cleanup the code
  KVM: MMU: do not update slot bitmap if spte is nonpresent
  KVM: MMU: fix walking shadow page table
  KVM guest: KVM Steal time registration
  ...
This commit is contained in:
Linus Torvalds 2011-07-24 09:07:03 -07:00
commit 5fabc487c9
102 changed files with 12375 additions and 3734 deletions

View File

@ -1159,10 +1159,6 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
for all guests.
Default is 1 (enabled) if in 64bit or 32bit-PAE mode
kvm-intel.bypass_guest_pf=
[KVM,Intel] Disables bypassing of guest page faults
on Intel chips. Default is 1 (enabled)
kvm-intel.ept= [KVM,Intel] Disable extended page tables
(virtualized MMU) support on capable Intel chips.
Default is 1 (enabled)
@ -1737,6 +1733,10 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
no-kvmapf [X86,KVM] Disable paravirtualized asynchronous page
fault handling.
no-steal-acc [X86,KVM] Disable paravirtualized steal time accounting.
steal time is computed, but won't influence scheduler
behaviour
nolapic [X86-32,APIC] Do not enable or use the local APIC.
nolapic_timer [X86-32,APIC] Do not use the local APIC timer.

View File

@ -180,6 +180,19 @@ KVM_CHECK_EXTENSION ioctl() to determine the value for max_vcpus at run-time.
If the KVM_CAP_NR_VCPUS does not exist, you should assume that max_vcpus is 4
cpus max.
On powerpc using book3s_hv mode, the vcpus are mapped onto virtual
threads in one or more virtual CPU cores. (This is because the
hardware requires all the hardware threads in a CPU core to be in the
same partition.) The KVM_CAP_PPC_SMT capability indicates the number
of vcpus per virtual core (vcore). The vcore id is obtained by
dividing the vcpu id by the number of vcpus per vcore. The vcpus in a
given vcore will always be in the same physical core as each other
(though that might be a different physical core from time to time).
Userspace can control the threading (SMT) mode of the guest by its
allocation of vcpu ids. For example, if userspace wants
single-threaded guest vcpus, it should make all vcpu ids be a multiple
of the number of vcpus per vcore.
4.8 KVM_GET_DIRTY_LOG (vm ioctl)
Capability: basic
@ -1143,15 +1156,10 @@ Assigns an IRQ to a passed-through device.
struct kvm_assigned_irq {
__u32 assigned_dev_id;
__u32 host_irq;
__u32 host_irq; /* ignored (legacy field) */
__u32 guest_irq;
__u32 flags;
union {
struct {
__u32 addr_lo;
__u32 addr_hi;
__u32 data;
} guest_msi;
__u32 reserved[12];
};
};
@ -1239,8 +1247,10 @@ Type: vm ioctl
Parameters: struct kvm_assigned_msix_nr (in)
Returns: 0 on success, -1 on error
Set the number of MSI-X interrupts for an assigned device. This service can
only be called once in the lifetime of an assigned device.
Set the number of MSI-X interrupts for an assigned device. The number is
reset again by terminating the MSI-X assignment of the device via
KVM_DEASSIGN_DEV_IRQ. Calling this service more than once at any earlier
point will fail.
struct kvm_assigned_msix_nr {
__u32 assigned_dev_id;
@ -1291,6 +1301,135 @@ Returns the tsc frequency of the guest. The unit of the return value is
KHz. If the host has unstable tsc this ioctl returns -EIO instead as an
error.
4.56 KVM_GET_LAPIC
Capability: KVM_CAP_IRQCHIP
Architectures: x86
Type: vcpu ioctl
Parameters: struct kvm_lapic_state (out)
Returns: 0 on success, -1 on error
#define KVM_APIC_REG_SIZE 0x400
struct kvm_lapic_state {
char regs[KVM_APIC_REG_SIZE];
};
Reads the Local APIC registers and copies them into the input argument. The
data format and layout are the same as documented in the architecture manual.
4.57 KVM_SET_LAPIC
Capability: KVM_CAP_IRQCHIP
Architectures: x86
Type: vcpu ioctl
Parameters: struct kvm_lapic_state (in)
Returns: 0 on success, -1 on error
#define KVM_APIC_REG_SIZE 0x400
struct kvm_lapic_state {
char regs[KVM_APIC_REG_SIZE];
};
Copies the input argument into the the Local APIC registers. The data format
and layout are the same as documented in the architecture manual.
4.58 KVM_IOEVENTFD
Capability: KVM_CAP_IOEVENTFD
Architectures: all
Type: vm ioctl
Parameters: struct kvm_ioeventfd (in)
Returns: 0 on success, !0 on error
This ioctl attaches or detaches an ioeventfd to a legal pio/mmio address
within the guest. A guest write in the registered address will signal the
provided event instead of triggering an exit.
struct kvm_ioeventfd {
__u64 datamatch;
__u64 addr; /* legal pio/mmio address */
__u32 len; /* 1, 2, 4, or 8 bytes */
__s32 fd;
__u32 flags;
__u8 pad[36];
};
The following flags are defined:
#define KVM_IOEVENTFD_FLAG_DATAMATCH (1 << kvm_ioeventfd_flag_nr_datamatch)
#define KVM_IOEVENTFD_FLAG_PIO (1 << kvm_ioeventfd_flag_nr_pio)
#define KVM_IOEVENTFD_FLAG_DEASSIGN (1 << kvm_ioeventfd_flag_nr_deassign)
If datamatch flag is set, the event will be signaled only if the written value
to the registered address is equal to datamatch in struct kvm_ioeventfd.
4.62 KVM_CREATE_SPAPR_TCE
Capability: KVM_CAP_SPAPR_TCE
Architectures: powerpc
Type: vm ioctl
Parameters: struct kvm_create_spapr_tce (in)
Returns: file descriptor for manipulating the created TCE table
This creates a virtual TCE (translation control entry) table, which
is an IOMMU for PAPR-style virtual I/O. It is used to translate
logical addresses used in virtual I/O into guest physical addresses,
and provides a scatter/gather capability for PAPR virtual I/O.
/* for KVM_CAP_SPAPR_TCE */
struct kvm_create_spapr_tce {
__u64 liobn;
__u32 window_size;
};
The liobn field gives the logical IO bus number for which to create a
TCE table. The window_size field specifies the size of the DMA window
which this TCE table will translate - the table will contain one 64
bit TCE entry for every 4kiB of the DMA window.
When the guest issues an H_PUT_TCE hcall on a liobn for which a TCE
table has been created using this ioctl(), the kernel will handle it
in real mode, updating the TCE table. H_PUT_TCE calls for other
liobns will cause a vm exit and must be handled by userspace.
The return value is a file descriptor which can be passed to mmap(2)
to map the created TCE table into userspace. This lets userspace read
the entries written by kernel-handled H_PUT_TCE calls, and also lets
userspace update the TCE table directly which is useful in some
circumstances.
4.63 KVM_ALLOCATE_RMA
Capability: KVM_CAP_PPC_RMA
Architectures: powerpc
Type: vm ioctl
Parameters: struct kvm_allocate_rma (out)
Returns: file descriptor for mapping the allocated RMA
This allocates a Real Mode Area (RMA) from the pool allocated at boot
time by the kernel. An RMA is a physically-contiguous, aligned region
of memory used on older POWER processors to provide the memory which
will be accessed by real-mode (MMU off) accesses in a KVM guest.
POWER processors support a set of sizes for the RMA that usually
includes 64MB, 128MB, 256MB and some larger powers of two.
/* for KVM_ALLOCATE_RMA */
struct kvm_allocate_rma {
__u64 rma_size;
};
The return value is a file descriptor which can be passed to mmap(2)
to map the allocated RMA into userspace. The mapped area can then be
passed to the KVM_SET_USER_MEMORY_REGION ioctl to establish it as the
RMA for a virtual machine. The size of the RMA in bytes (which is
fixed at host kernel boot time) is returned in the rma_size field of
the argument structure.
The KVM_CAP_PPC_RMA capability is 1 or 2 if the KVM_ALLOCATE_RMA ioctl
is supported; 2 if the processor requires all virtual machines to have
an RMA, or 1 if the processor can use an RMA but doesn't require it,
because it supports the Virtual RMA (VRMA) facility.
5. The kvm_run structure
Application code obtains a pointer to the kvm_run structure by
@ -1473,6 +1612,23 @@ Userspace can now handle the hypercall and when it's done modify the gprs as
necessary. Upon guest entry all guest GPRs will then be replaced by the values
in this struct.
/* KVM_EXIT_PAPR_HCALL */
struct {
__u64 nr;
__u64 ret;
__u64 args[9];
} papr_hcall;
This is used on 64-bit PowerPC when emulating a pSeries partition,
e.g. with the 'pseries' machine type in qemu. It occurs when the
guest does a hypercall using the 'sc 1' instruction. The 'nr' field
contains the hypercall number (from the guest R3), and 'args' contains
the arguments (from the guest R4 - R12). Userspace should put the
return code in 'ret' and any extra returned values in args[].
The possible hypercalls are defined in the Power Architecture Platform
Requirements (PAPR) document available from www.power.org (free
developer registration required to access it).
/* Fix the size of the union. */
char padding[256];
};

View File

@ -165,6 +165,10 @@ Shadow pages contain the following information:
Contains the value of efer.nxe for which the page is valid.
role.cr0_wp:
Contains the value of cr0.wp for which the page is valid.
role.smep_andnot_wp:
Contains the value of cr4.smep && !cr0.wp for which the page is valid
(pages for which this is true are different from other pages; see the
treatment of cr0.wp=0 below).
gfn:
Either the guest page table containing the translations shadowed by this
page, or the base page frame for linear translations. See role.direct.
@ -317,6 +321,20 @@ on fault type:
(user write faults generate a #PF)
In the first case there is an additional complication if CR4.SMEP is
enabled: since we've turned the page into a kernel page, the kernel may now
execute it. We handle this by also setting spte.nx. If we get a user
fetch or read fault, we'll change spte.u=1 and spte.nx=gpte.nx back.
To prevent an spte that was converted into a kernel page with cr0.wp=0
from being written by the kernel after cr0.wp has changed to 1, we make
the value of cr0.wp part of the page role. This means that an spte created
with one value of cr0.wp cannot be used when cr0.wp has a different value -
it will simply be missed by the shadow page lookup code. A similar issue
exists when an spte created with cr0.wp=0 and cr4.smep=0 is used after
changing cr4.smep to 1. To avoid this, the value of !cr0.wp && cr4.smep
is also made a part of the page role.
Large pages
===========

View File

@ -185,3 +185,37 @@ MSR_KVM_ASYNC_PF_EN: 0x4b564d02
Currently type 2 APF will be always delivered on the same vcpu as
type 1 was, but guest should not rely on that.
MSR_KVM_STEAL_TIME: 0x4b564d03
data: 64-byte alignment physical address of a memory area which must be
in guest RAM, plus an enable bit in bit 0. This memory is expected to
hold a copy of the following structure:
struct kvm_steal_time {
__u64 steal;
__u32 version;
__u32 flags;
__u32 pad[12];
}
whose data will be filled in by the hypervisor periodically. Only one
write, or registration, is needed for each VCPU. The interval between
updates of this structure is arbitrary and implementation-dependent.
The hypervisor may update this structure at any time it sees fit until
anything with bit0 == 0 is written to it. Guest is required to make sure
this structure is initialized to zero.
Fields have the following meanings:
version: a sequence counter. In other words, guest has to check
this field before and after grabbing time information and make
sure they are both equal and even. An odd version indicates an
in-progress update.
flags: At this point, always zero. May be used to indicate
changes in this structure in the future.
steal: the amount of time in which this vCPU did not run, in
nanoseconds. Time during which the vcpu is idle, will not be
reported as steal time.

View File

@ -0,0 +1,251 @@
Nested VMX
==========
Overview
---------
On Intel processors, KVM uses Intel's VMX (Virtual-Machine eXtensions)
to easily and efficiently run guest operating systems. Normally, these guests
*cannot* themselves be hypervisors running their own guests, because in VMX,
guests cannot use VMX instructions.
The "Nested VMX" feature adds this missing capability - of running guest
hypervisors (which use VMX) with their own nested guests. It does so by
allowing a guest to use VMX instructions, and correctly and efficiently
emulating them using the single level of VMX available in the hardware.
We describe in much greater detail the theory behind the nested VMX feature,
its implementation and its performance characteristics, in the OSDI 2010 paper
"The Turtles Project: Design and Implementation of Nested Virtualization",
available at:
http://www.usenix.org/events/osdi10/tech/full_papers/Ben-Yehuda.pdf
Terminology
-----------
Single-level virtualization has two levels - the host (KVM) and the guests.
In nested virtualization, we have three levels: The host (KVM), which we call
L0, the guest hypervisor, which we call L1, and its nested guest, which we
call L2.
Known limitations
-----------------
The current code supports running Linux guests under KVM guests.
Only 64-bit guest hypervisors are supported.
Additional patches for running Windows under guest KVM, and Linux under
guest VMware server, and support for nested EPT, are currently running in
the lab, and will be sent as follow-on patchsets.
Running nested VMX
------------------
The nested VMX feature is disabled by default. It can be enabled by giving
the "nested=1" option to the kvm-intel module.
No modifications are required to user space (qemu). However, qemu's default
emulated CPU type (qemu64) does not list the "VMX" CPU feature, so it must be
explicitly enabled, by giving qemu one of the following options:
-cpu host (emulated CPU has all features of the real CPU)
-cpu qemu64,+vmx (add just the vmx feature to a named CPU type)
ABIs
----
Nested VMX aims to present a standard and (eventually) fully-functional VMX
implementation for the a guest hypervisor to use. As such, the official
specification of the ABI that it provides is Intel's VMX specification,
namely volume 3B of their "Intel 64 and IA-32 Architectures Software
Developer's Manual". Not all of VMX's features are currently fully supported,
but the goal is to eventually support them all, starting with the VMX features
which are used in practice by popular hypervisors (KVM and others).
As a VMX implementation, nested VMX presents a VMCS structure to L1.
As mandated by the spec, other than the two fields revision_id and abort,
this structure is *opaque* to its user, who is not supposed to know or care
about its internal structure. Rather, the structure is accessed through the
VMREAD and VMWRITE instructions.
Still, for debugging purposes, KVM developers might be interested to know the
internals of this structure; This is struct vmcs12 from arch/x86/kvm/vmx.c.
The name "vmcs12" refers to the VMCS that L1 builds for L2. In the code we
also have "vmcs01", the VMCS that L0 built for L1, and "vmcs02" is the VMCS
which L0 builds to actually run L2 - how this is done is explained in the
aforementioned paper.
For convenience, we repeat the content of struct vmcs12 here. If the internals
of this structure changes, this can break live migration across KVM versions.
VMCS12_REVISION (from vmx.c) should be changed if struct vmcs12 or its inner
struct shadow_vmcs is ever changed.
typedef u64 natural_width;
struct __packed vmcs12 {
/* According to the Intel spec, a VMCS region must start with
* these two user-visible fields */
u32 revision_id;
u32 abort;
u32 launch_state; /* set to 0 by VMCLEAR, to 1 by VMLAUNCH */
u32 padding[7]; /* room for future expansion */
u64 io_bitmap_a;
u64 io_bitmap_b;
u64 msr_bitmap;
u64 vm_exit_msr_store_addr;
u64 vm_exit_msr_load_addr;
u64 vm_entry_msr_load_addr;
u64 tsc_offset;
u64 virtual_apic_page_addr;
u64 apic_access_addr;
u64 ept_pointer;
u64 guest_physical_address;
u64 vmcs_link_pointer;
u64 guest_ia32_debugctl;
u64 guest_ia32_pat;
u64 guest_ia32_efer;
u64 guest_pdptr0;
u64 guest_pdptr1;
u64 guest_pdptr2;
u64 guest_pdptr3;
u64 host_ia32_pat;
u64 host_ia32_efer;
u64 padding64[8]; /* room for future expansion */
natural_width cr0_guest_host_mask;
natural_width cr4_guest_host_mask;
natural_width cr0_read_shadow;
natural_width cr4_read_shadow;
natural_width cr3_target_value0;
natural_width cr3_target_value1;
natural_width cr3_target_value2;
natural_width cr3_target_value3;
natural_width exit_qualification;
natural_width guest_linear_address;
natural_width guest_cr0;
natural_width guest_cr3;
natural_width guest_cr4;
natural_width guest_es_base;
natural_width guest_cs_base;
natural_width guest_ss_base;
natural_width guest_ds_base;
natural_width guest_fs_base;
natural_width guest_gs_base;
natural_width guest_ldtr_base;
natural_width guest_tr_base;
natural_width guest_gdtr_base;
natural_width guest_idtr_base;
natural_width guest_dr7;
natural_width guest_rsp;
natural_width guest_rip;
natural_width guest_rflags;
natural_width guest_pending_dbg_exceptions;
natural_width guest_sysenter_esp;
natural_width guest_sysenter_eip;
natural_width host_cr0;
natural_width host_cr3;
natural_width host_cr4;
natural_width host_fs_base;
natural_width host_gs_base;
natural_width host_tr_base;
natural_width host_gdtr_base;
natural_width host_idtr_base;
natural_width host_ia32_sysenter_esp;
natural_width host_ia32_sysenter_eip;
natural_width host_rsp;
natural_width host_rip;
natural_width paddingl[8]; /* room for future expansion */
u32 pin_based_vm_exec_control;
u32 cpu_based_vm_exec_control;
u32 exception_bitmap;
u32 page_fault_error_code_mask;
u32 page_fault_error_code_match;
u32 cr3_target_count;
u32 vm_exit_controls;
u32 vm_exit_msr_store_count;
u32 vm_exit_msr_load_count;
u32 vm_entry_controls;
u32 vm_entry_msr_load_count;
u32 vm_entry_intr_info_field;
u32 vm_entry_exception_error_code;
u32 vm_entry_instruction_len;
u32 tpr_threshold;
u32 secondary_vm_exec_control;
u32 vm_instruction_error;
u32 vm_exit_reason;
u32 vm_exit_intr_info;
u32 vm_exit_intr_error_code;
u32 idt_vectoring_info_field;
u32 idt_vectoring_error_code;
u32 vm_exit_instruction_len;
u32 vmx_instruction_info;
u32 guest_es_limit;
u32 guest_cs_limit;
u32 guest_ss_limit;
u32 guest_ds_limit;
u32 guest_fs_limit;
u32 guest_gs_limit;
u32 guest_ldtr_limit;
u32 guest_tr_limit;
u32 guest_gdtr_limit;
u32 guest_idtr_limit;
u32 guest_es_ar_bytes;
u32 guest_cs_ar_bytes;
u32 guest_ss_ar_bytes;
u32 guest_ds_ar_bytes;
u32 guest_fs_ar_bytes;
u32 guest_gs_ar_bytes;
u32 guest_ldtr_ar_bytes;
u32 guest_tr_ar_bytes;
u32 guest_interruptibility_info;
u32 guest_activity_state;
u32 guest_sysenter_cs;
u32 host_ia32_sysenter_cs;
u32 padding32[8]; /* room for future expansion */
u16 virtual_processor_id;
u16 guest_es_selector;
u16 guest_cs_selector;
u16 guest_ss_selector;
u16 guest_ds_selector;
u16 guest_fs_selector;
u16 guest_gs_selector;
u16 guest_ldtr_selector;
u16 guest_tr_selector;
u16 host_es_selector;
u16 host_cs_selector;
u16 host_ss_selector;
u16 host_ds_selector;
u16 host_fs_selector;
u16 host_gs_selector;
u16 host_tr_selector;
};
Authors
-------
These patches were written by:
Abel Gordon, abelg <at> il.ibm.com
Nadav Har'El, nyh <at> il.ibm.com
Orit Wasserman, oritw <at> il.ibm.com
Ben-Ami Yassor, benami <at> il.ibm.com
Muli Ben-Yehuda, muli <at> il.ibm.com
With contributions by:
Anthony Liguori, aliguori <at> us.ibm.com
Mike Day, mdday <at> us.ibm.com
Michael Factor, factor <at> il.ibm.com
Zvi Dubitzky, dubi <at> il.ibm.com
And valuable reviews by:
Avi Kivity, avi <at> redhat.com
Gleb Natapov, gleb <at> redhat.com
Marcelo Tosatti, mtosatti <at> redhat.com
Kevin Tian, kevin.tian <at> intel.com
and others.

View File

@ -68,9 +68,11 @@ page that contains parts of supervisor visible register state. The guest can
map this shared page using the KVM hypercall KVM_HC_PPC_MAP_MAGIC_PAGE.
With this hypercall issued the guest always gets the magic page mapped at the
desired location in effective and physical address space. For now, we always
map the page to -4096. This way we can access it using absolute load and store
functions. The following instruction reads the first field of the magic page:
desired location. The first parameter indicates the effective address when the
MMU is enabled. The second parameter indicates the address in real mode, if
applicable to the target. For now, we always map the page to -4096. This way we
can access it using absolute load and store functions. The following
instruction reads the first field of the magic page:
ld rX, -4096(0)

View File

@ -281,6 +281,10 @@ paravirt_init_missing_ticks_accounting(int cpu)
pv_time_ops.init_missing_ticks_accounting(cpu);
}
struct jump_label_key;
extern struct jump_label_key paravirt_steal_enabled;
extern struct jump_label_key paravirt_steal_rq_enabled;
static inline int
paravirt_do_steal_accounting(unsigned long *new_itm)
{

View File

@ -634,6 +634,8 @@ struct pv_irq_ops pv_irq_ops = {
* pv_time_ops
* time operations
*/
struct jump_label_key paravirt_steal_enabled;
struct jump_label_key paravirt_steal_rq_enabled;
static int
ia64_native_do_steal_accounting(unsigned long *new_itm)

View File

@ -179,8 +179,9 @@ extern const char *powerpc_base_platform;
#define LONG_ASM_CONST(x) 0
#endif
#define CPU_FTR_HVMODE_206 LONG_ASM_CONST(0x0000000800000000)
#define CPU_FTR_HVMODE LONG_ASM_CONST(0x0000000200000000)
#define CPU_FTR_ARCH_201 LONG_ASM_CONST(0x0000000400000000)
#define CPU_FTR_ARCH_206 LONG_ASM_CONST(0x0000000800000000)
#define CPU_FTR_CFAR LONG_ASM_CONST(0x0000001000000000)
#define CPU_FTR_IABR LONG_ASM_CONST(0x0000002000000000)
#define CPU_FTR_MMCRA LONG_ASM_CONST(0x0000004000000000)
@ -401,9 +402,10 @@ extern const char *powerpc_base_platform;
CPU_FTR_MMCRA | CPU_FTR_CP_USE_DCBTZ | \
CPU_FTR_STCX_CHECKS_ADDRESS)
#define CPU_FTRS_PPC970 (CPU_FTR_USE_TB | CPU_FTR_LWSYNC | \
CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | \
CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | CPU_FTR_ARCH_201 | \
CPU_FTR_ALTIVEC_COMP | CPU_FTR_CAN_NAP | CPU_FTR_MMCRA | \
CPU_FTR_CP_USE_DCBTZ | CPU_FTR_STCX_CHECKS_ADDRESS)
CPU_FTR_CP_USE_DCBTZ | CPU_FTR_STCX_CHECKS_ADDRESS | \
CPU_FTR_HVMODE)
#define CPU_FTRS_POWER5 (CPU_FTR_USE_TB | CPU_FTR_LWSYNC | \
CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | \
CPU_FTR_MMCRA | CPU_FTR_SMT | \
@ -417,13 +419,13 @@ extern const char *powerpc_base_platform;
CPU_FTR_DSCR | CPU_FTR_UNALIGNED_LD_STD | \
CPU_FTR_STCX_CHECKS_ADDRESS | CPU_FTR_POPCNTB | CPU_FTR_CFAR)
#define CPU_FTRS_POWER7 (CPU_FTR_USE_TB | CPU_FTR_LWSYNC | \
CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | CPU_FTR_HVMODE_206 |\
CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | CPU_FTR_ARCH_206 |\
CPU_FTR_MMCRA | CPU_FTR_SMT | \
CPU_FTR_COHERENT_ICACHE | \
CPU_FTR_PURR | CPU_FTR_SPURR | CPU_FTR_REAL_LE | \
CPU_FTR_DSCR | CPU_FTR_SAO | CPU_FTR_ASYM_SMT | \
CPU_FTR_STCX_CHECKS_ADDRESS | CPU_FTR_POPCNTB | CPU_FTR_POPCNTD | \
CPU_FTR_ICSWX | CPU_FTR_CFAR)
CPU_FTR_ICSWX | CPU_FTR_CFAR | CPU_FTR_HVMODE)
#define CPU_FTRS_CELL (CPU_FTR_USE_TB | CPU_FTR_LWSYNC | \
CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | \
CPU_FTR_ALTIVEC_COMP | CPU_FTR_MMCRA | CPU_FTR_SMT | \

View File

@ -61,19 +61,22 @@
#define EXC_HV H
#define EXC_STD
#define EXCEPTION_PROLOG_1(area) \
#define __EXCEPTION_PROLOG_1(area, extra, vec) \
GET_PACA(r13); \
std r9,area+EX_R9(r13); /* save r9 - r12 */ \
std r10,area+EX_R10(r13); \
std r11,area+EX_R11(r13); \
std r12,area+EX_R12(r13); \
BEGIN_FTR_SECTION_NESTED(66); \
mfspr r10,SPRN_CFAR; \
std r10,area+EX_CFAR(r13); \
END_FTR_SECTION_NESTED(CPU_FTR_CFAR, CPU_FTR_CFAR, 66); \
GET_SCRATCH0(r9); \
std r9,area+EX_R13(r13); \
mfcr r9
mfcr r9; \
extra(vec); \
std r11,area+EX_R11(r13); \
std r12,area+EX_R12(r13); \
GET_SCRATCH0(r10); \
std r10,area+EX_R13(r13)
#define EXCEPTION_PROLOG_1(area, extra, vec) \
__EXCEPTION_PROLOG_1(area, extra, vec)
#define __EXCEPTION_PROLOG_PSERIES_1(label, h) \
ld r12,PACAKBASE(r13); /* get high part of &label */ \
@ -85,13 +88,65 @@
mtspr SPRN_##h##SRR1,r10; \
h##rfid; \
b . /* prevent speculative execution */
#define EXCEPTION_PROLOG_PSERIES_1(label, h) \
#define EXCEPTION_PROLOG_PSERIES_1(label, h) \
__EXCEPTION_PROLOG_PSERIES_1(label, h)
#define EXCEPTION_PROLOG_PSERIES(area, label, h) \
EXCEPTION_PROLOG_1(area); \
#define EXCEPTION_PROLOG_PSERIES(area, label, h, extra, vec) \
EXCEPTION_PROLOG_1(area, extra, vec); \
EXCEPTION_PROLOG_PSERIES_1(label, h);
#define __KVMTEST(n) \
lbz r10,HSTATE_IN_GUEST(r13); \
cmpwi r10,0; \
bne do_kvm_##n
#define __KVM_HANDLER(area, h, n) \
do_kvm_##n: \
ld r10,area+EX_R10(r13); \
stw r9,HSTATE_SCRATCH1(r13); \
ld r9,area+EX_R9(r13); \
std r12,HSTATE_SCRATCH0(r13); \
li r12,n; \
b kvmppc_interrupt
#define __KVM_HANDLER_SKIP(area, h, n) \
do_kvm_##n: \
cmpwi r10,KVM_GUEST_MODE_SKIP; \
ld r10,area+EX_R10(r13); \
beq 89f; \
stw r9,HSTATE_SCRATCH1(r13); \
ld r9,area+EX_R9(r13); \
std r12,HSTATE_SCRATCH0(r13); \
li r12,n; \
b kvmppc_interrupt; \
89: mtocrf 0x80,r9; \
ld r9,area+EX_R9(r13); \
b kvmppc_skip_##h##interrupt
#ifdef CONFIG_KVM_BOOK3S_64_HANDLER
#define KVMTEST(n) __KVMTEST(n)
#define KVM_HANDLER(area, h, n) __KVM_HANDLER(area, h, n)
#define KVM_HANDLER_SKIP(area, h, n) __KVM_HANDLER_SKIP(area, h, n)
#else
#define KVMTEST(n)
#define KVM_HANDLER(area, h, n)
#define KVM_HANDLER_SKIP(area, h, n)
#endif
#ifdef CONFIG_KVM_BOOK3S_PR
#define KVMTEST_PR(n) __KVMTEST(n)
#define KVM_HANDLER_PR(area, h, n) __KVM_HANDLER(area, h, n)
#define KVM_HANDLER_PR_SKIP(area, h, n) __KVM_HANDLER_SKIP(area, h, n)
#else
#define KVMTEST_PR(n)
#define KVM_HANDLER_PR(area, h, n)
#define KVM_HANDLER_PR_SKIP(area, h, n)
#endif
#define NOTEST(n)
/*
* The common exception prolog is used for all except a few exceptions
* such as a segment miss on a kernel address. We have to be prepared
@ -164,57 +219,58 @@
.globl label##_pSeries; \
label##_pSeries: \
HMT_MEDIUM; \
DO_KVM vec; \
SET_SCRATCH0(r13); /* save r13 */ \
EXCEPTION_PROLOG_PSERIES(PACA_EXGEN, label##_common, EXC_STD)
EXCEPTION_PROLOG_PSERIES(PACA_EXGEN, label##_common, \
EXC_STD, KVMTEST_PR, vec)
#define STD_EXCEPTION_HV(loc, vec, label) \
. = loc; \
.globl label##_hv; \
label##_hv: \
HMT_MEDIUM; \
DO_KVM vec; \
SET_SCRATCH0(r13); /* save r13 */ \
EXCEPTION_PROLOG_PSERIES(PACA_EXGEN, label##_common, EXC_HV)
SET_SCRATCH0(r13); /* save r13 */ \
EXCEPTION_PROLOG_PSERIES(PACA_EXGEN, label##_common, \
EXC_HV, KVMTEST, vec)
#define __MASKABLE_EXCEPTION_PSERIES(vec, label, h) \
HMT_MEDIUM; \
DO_KVM vec; \
SET_SCRATCH0(r13); /* save r13 */ \
GET_PACA(r13); \
std r9,PACA_EXGEN+EX_R9(r13); /* save r9, r10 */ \
std r10,PACA_EXGEN+EX_R10(r13); \
#define __SOFTEN_TEST(h) \
lbz r10,PACASOFTIRQEN(r13); \
mfcr r9; \
cmpwi r10,0; \
beq masked_##h##interrupt; \
GET_SCRATCH0(r10); \
std r10,PACA_EXGEN+EX_R13(r13); \
std r11,PACA_EXGEN+EX_R11(r13); \
std r12,PACA_EXGEN+EX_R12(r13); \
ld r12,PACAKBASE(r13); /* get high part of &label */ \
ld r10,PACAKMSR(r13); /* get MSR value for kernel */ \
mfspr r11,SPRN_##h##SRR0; /* save SRR0 */ \
LOAD_HANDLER(r12,label##_common) \
mtspr SPRN_##h##SRR0,r12; \
mfspr r12,SPRN_##h##SRR1; /* and SRR1 */ \
mtspr SPRN_##h##SRR1,r10; \
h##rfid; \
b . /* prevent speculative execution */
#define _MASKABLE_EXCEPTION_PSERIES(vec, label, h) \
__MASKABLE_EXCEPTION_PSERIES(vec, label, h)
beq masked_##h##interrupt
#define _SOFTEN_TEST(h) __SOFTEN_TEST(h)
#define SOFTEN_TEST_PR(vec) \
KVMTEST_PR(vec); \
_SOFTEN_TEST(EXC_STD)
#define SOFTEN_TEST_HV(vec) \
KVMTEST(vec); \
_SOFTEN_TEST(EXC_HV)
#define SOFTEN_TEST_HV_201(vec) \
KVMTEST(vec); \
_SOFTEN_TEST(EXC_STD)
#define __MASKABLE_EXCEPTION_PSERIES(vec, label, h, extra) \
HMT_MEDIUM; \
SET_SCRATCH0(r13); /* save r13 */ \
__EXCEPTION_PROLOG_1(PACA_EXGEN, extra, vec); \
EXCEPTION_PROLOG_PSERIES_1(label##_common, h);
#define _MASKABLE_EXCEPTION_PSERIES(vec, label, h, extra) \
__MASKABLE_EXCEPTION_PSERIES(vec, label, h, extra)
#define MASKABLE_EXCEPTION_PSERIES(loc, vec, label) \
. = loc; \
.globl label##_pSeries; \
label##_pSeries: \
_MASKABLE_EXCEPTION_PSERIES(vec, label, EXC_STD)
_MASKABLE_EXCEPTION_PSERIES(vec, label, \
EXC_STD, SOFTEN_TEST_PR)
#define MASKABLE_EXCEPTION_HV(loc, vec, label) \
. = loc; \
.globl label##_hv; \
label##_hv: \
_MASKABLE_EXCEPTION_PSERIES(vec, label, EXC_HV)
_MASKABLE_EXCEPTION_PSERIES(vec, label, \
EXC_HV, SOFTEN_TEST_HV)
#ifdef CONFIG_PPC_ISERIES
#define DISABLE_INTS \

View File

@ -29,6 +29,10 @@
#define H_LONG_BUSY_ORDER_100_SEC 9905 /* Long busy, hint that 100sec \
is a good time to retry */
#define H_LONG_BUSY_END_RANGE 9905 /* End of long busy range */
/* Internal value used in book3s_hv kvm support; not returned to guests */
#define H_TOO_HARD 9999
#define H_HARDWARE -1 /* Hardware error */
#define H_FUNCTION -2 /* Function not supported */
#define H_PRIVILEGE -3 /* Caller not privileged */
@ -100,6 +104,7 @@
#define H_PAGE_SET_ACTIVE H_PAGE_STATE_CHANGE
#define H_AVPN (1UL<<(63-32)) /* An avpn is provided as a sanity test */
#define H_ANDCOND (1UL<<(63-33))
#define H_LOCAL (1UL<<(63-35))
#define H_ICACHE_INVALIDATE (1UL<<(63-40)) /* icbi, etc. (ignored for IO pages) */
#define H_ICACHE_SYNCHRONIZE (1UL<<(63-41)) /* dcbst, icbi, etc (ignored for IO pages */
#define H_COALESCE_CAND (1UL<<(63-42)) /* page is a good candidate for coalescing */

View File

@ -22,6 +22,10 @@
#include <linux/types.h>
/* Select powerpc specific features in <linux/kvm.h> */
#define __KVM_HAVE_SPAPR_TCE
#define __KVM_HAVE_PPC_SMT
struct kvm_regs {
__u64 pc;
__u64 cr;
@ -272,4 +276,15 @@ struct kvm_guest_debug_arch {
#define KVM_INTERRUPT_UNSET -2U
#define KVM_INTERRUPT_SET_LEVEL -3U
/* for KVM_CAP_SPAPR_TCE */
struct kvm_create_spapr_tce {
__u64 liobn;
__u32 window_size;
};
/* for KVM_ALLOCATE_RMA */
struct kvm_allocate_rma {
__u64 rma_size;
};
#endif /* __LINUX_KVM_POWERPC_H */

View File

@ -64,8 +64,12 @@
#define BOOK3S_INTERRUPT_PROGRAM 0x700
#define BOOK3S_INTERRUPT_FP_UNAVAIL 0x800
#define BOOK3S_INTERRUPT_DECREMENTER 0x900
#define BOOK3S_INTERRUPT_HV_DECREMENTER 0x980
#define BOOK3S_INTERRUPT_SYSCALL 0xc00
#define BOOK3S_INTERRUPT_TRACE 0xd00
#define BOOK3S_INTERRUPT_H_DATA_STORAGE 0xe00
#define BOOK3S_INTERRUPT_H_INST_STORAGE 0xe20
#define BOOK3S_INTERRUPT_H_EMUL_ASSIST 0xe40
#define BOOK3S_INTERRUPT_PERFMON 0xf00
#define BOOK3S_INTERRUPT_ALTIVEC 0xf20
#define BOOK3S_INTERRUPT_VSX 0xf40

View File

@ -24,20 +24,6 @@
#include <linux/kvm_host.h>
#include <asm/kvm_book3s_asm.h>
struct kvmppc_slb {
u64 esid;
u64 vsid;
u64 orige;
u64 origv;
bool valid : 1;
bool Ks : 1;
bool Kp : 1;
bool nx : 1;
bool large : 1; /* PTEs are 16MB */
bool tb : 1; /* 1TB segment */
bool class : 1;
};
struct kvmppc_bat {
u64 raw;
u32 bepi;
@ -67,11 +53,22 @@ struct kvmppc_sid_map {
#define VSID_POOL_SIZE (SID_CONTEXTS * 16)
#endif
struct hpte_cache {
struct hlist_node list_pte;
struct hlist_node list_pte_long;
struct hlist_node list_vpte;
struct hlist_node list_vpte_long;
struct rcu_head rcu_head;
u64 host_va;
u64 pfn;
ulong slot;
struct kvmppc_pte pte;
};
struct kvmppc_vcpu_book3s {
struct kvm_vcpu vcpu;
struct kvmppc_book3s_shadow_vcpu *shadow_vcpu;
struct kvmppc_sid_map sid_map[SID_MAP_NUM];
struct kvmppc_slb slb[64];
struct {
u64 esid;
u64 vsid;
@ -81,7 +78,6 @@ struct kvmppc_vcpu_book3s {
struct kvmppc_bat dbat[8];
u64 hid[6];
u64 gqr[8];
int slb_nr;
u64 sdr1;
u64 hior;
u64 msr_mask;
@ -93,7 +89,13 @@ struct kvmppc_vcpu_book3s {
u64 vsid_max;
#endif
int context_id[SID_CONTEXTS];
ulong prog_flags; /* flags to inject when giving a 700 trap */
struct hlist_head hpte_hash_pte[HPTEG_HASH_NUM_PTE];
struct hlist_head hpte_hash_pte_long[HPTEG_HASH_NUM_PTE_LONG];
struct hlist_head hpte_hash_vpte[HPTEG_HASH_NUM_VPTE];
struct hlist_head hpte_hash_vpte_long[HPTEG_HASH_NUM_VPTE_LONG];
int hpte_cache_count;
spinlock_t mmu_lock;
};
#define CONTEXT_HOST 0
@ -110,8 +112,10 @@ extern void kvmppc_mmu_pte_flush(struct kvm_vcpu *vcpu, ulong ea, ulong ea_mask)
extern void kvmppc_mmu_pte_vflush(struct kvm_vcpu *vcpu, u64 vp, u64 vp_mask);
extern void kvmppc_mmu_pte_pflush(struct kvm_vcpu *vcpu, ulong pa_start, ulong pa_end);
extern void kvmppc_set_msr(struct kvm_vcpu *vcpu, u64 new_msr);
extern void kvmppc_set_pvr(struct kvm_vcpu *vcpu, u32 pvr);
extern void kvmppc_mmu_book3s_64_init(struct kvm_vcpu *vcpu);
extern void kvmppc_mmu_book3s_32_init(struct kvm_vcpu *vcpu);
extern void kvmppc_mmu_book3s_hv_init(struct kvm_vcpu *vcpu);
extern int kvmppc_mmu_map_page(struct kvm_vcpu *vcpu, struct kvmppc_pte *pte);
extern int kvmppc_mmu_map_segment(struct kvm_vcpu *vcpu, ulong eaddr);
extern void kvmppc_mmu_flush_segments(struct kvm_vcpu *vcpu);
@ -123,19 +127,22 @@ extern int kvmppc_mmu_hpte_init(struct kvm_vcpu *vcpu);
extern void kvmppc_mmu_invalidate_pte(struct kvm_vcpu *vcpu, struct hpte_cache *pte);
extern int kvmppc_mmu_hpte_sysinit(void);
extern void kvmppc_mmu_hpte_sysexit(void);
extern int kvmppc_mmu_hv_init(void);
extern int kvmppc_ld(struct kvm_vcpu *vcpu, ulong *eaddr, int size, void *ptr, bool data);
extern int kvmppc_st(struct kvm_vcpu *vcpu, ulong *eaddr, int size, void *ptr, bool data);
extern void kvmppc_book3s_queue_irqprio(struct kvm_vcpu *vcpu, unsigned int vec);
extern void kvmppc_inject_interrupt(struct kvm_vcpu *vcpu, int vec, u64 flags);
extern void kvmppc_set_bat(struct kvm_vcpu *vcpu, struct kvmppc_bat *bat,
bool upper, u32 val);
extern void kvmppc_giveup_ext(struct kvm_vcpu *vcpu, ulong msr);
extern int kvmppc_emulate_paired_single(struct kvm_run *run, struct kvm_vcpu *vcpu);
extern pfn_t kvmppc_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn);
extern ulong kvmppc_trampoline_lowmem;
extern ulong kvmppc_trampoline_enter;
extern void kvmppc_handler_lowmem_trampoline(void);
extern void kvmppc_handler_trampoline_enter(void);
extern void kvmppc_rmcall(ulong srr0, ulong srr1);
extern void kvmppc_hv_entry_trampoline(void);
extern void kvmppc_load_up_fpu(void);
extern void kvmppc_load_up_altivec(void);
extern void kvmppc_load_up_vsx(void);
@ -147,15 +154,32 @@ static inline struct kvmppc_vcpu_book3s *to_book3s(struct kvm_vcpu *vcpu)
return container_of(vcpu, struct kvmppc_vcpu_book3s, vcpu);
}
static inline ulong dsisr(void)
extern void kvm_return_point(void);
/* Also add subarch specific defines */
#ifdef CONFIG_KVM_BOOK3S_32_HANDLER
#include <asm/kvm_book3s_32.h>
#endif
#ifdef CONFIG_KVM_BOOK3S_64_HANDLER
#include <asm/kvm_book3s_64.h>
#endif
#ifdef CONFIG_KVM_BOOK3S_PR
static inline unsigned long kvmppc_interrupt_offset(struct kvm_vcpu *vcpu)
{
ulong r;
asm ( "mfdsisr %0 " : "=r" (r) );
return r;
return to_book3s(vcpu)->hior;
}
extern void kvm_return_point(void);
static inline struct kvmppc_book3s_shadow_vcpu *to_svcpu(struct kvm_vcpu *vcpu);
static inline void kvmppc_update_int_pending(struct kvm_vcpu *vcpu,
unsigned long pending_now, unsigned long old_pending)
{
if (pending_now)
vcpu->arch.shared->int_pending = 1;
else if (old_pending)
vcpu->arch.shared->int_pending = 0;
}
static inline void kvmppc_set_gpr(struct kvm_vcpu *vcpu, int num, ulong val)
{
@ -244,6 +268,120 @@ static inline ulong kvmppc_get_fault_dar(struct kvm_vcpu *vcpu)
return to_svcpu(vcpu)->fault_dar;
}
static inline bool kvmppc_critical_section(struct kvm_vcpu *vcpu)
{
ulong crit_raw = vcpu->arch.shared->critical;
ulong crit_r1 = kvmppc_get_gpr(vcpu, 1);
bool crit;
/* Truncate crit indicators in 32 bit mode */
if (!(vcpu->arch.shared->msr & MSR_SF)) {
crit_raw &= 0xffffffff;
crit_r1 &= 0xffffffff;
}
/* Critical section when crit == r1 */
crit = (crit_raw == crit_r1);
/* ... and we're in supervisor mode */
crit = crit && !(vcpu->arch.shared->msr & MSR_PR);
return crit;
}
#else /* CONFIG_KVM_BOOK3S_PR */
static inline unsigned long kvmppc_interrupt_offset(struct kvm_vcpu *vcpu)
{
return 0;
}
static inline void kvmppc_update_int_pending(struct kvm_vcpu *vcpu,
unsigned long pending_now, unsigned long old_pending)
{
}
static inline void kvmppc_set_gpr(struct kvm_vcpu *vcpu, int num, ulong val)
{
vcpu->arch.gpr[num] = val;
}
static inline ulong kvmppc_get_gpr(struct kvm_vcpu *vcpu, int num)
{
return vcpu->arch.gpr[num];
}
static inline void kvmppc_set_cr(struct kvm_vcpu *vcpu, u32 val)
{
vcpu->arch.cr = val;
}
static inline u32 kvmppc_get_cr(struct kvm_vcpu *vcpu)
{
return vcpu->arch.cr;
}
static inline void kvmppc_set_xer(struct kvm_vcpu *vcpu, u32 val)
{
vcpu->arch.xer = val;
}
static inline u32 kvmppc_get_xer(struct kvm_vcpu *vcpu)
{
return vcpu->arch.xer;
}
static inline void kvmppc_set_ctr(struct kvm_vcpu *vcpu, ulong val)
{
vcpu->arch.ctr = val;
}
static inline ulong kvmppc_get_ctr(struct kvm_vcpu *vcpu)
{
return vcpu->arch.ctr;
}
static inline void kvmppc_set_lr(struct kvm_vcpu *vcpu, ulong val)
{
vcpu->arch.lr = val;
}
static inline ulong kvmppc_get_lr(struct kvm_vcpu *vcpu)
{
return vcpu->arch.lr;
}
static inline void kvmppc_set_pc(struct kvm_vcpu *vcpu, ulong val)
{
vcpu->arch.pc = val;
}
static inline ulong kvmppc_get_pc(struct kvm_vcpu *vcpu)
{
return vcpu->arch.pc;
}
static inline u32 kvmppc_get_last_inst(struct kvm_vcpu *vcpu)
{
ulong pc = kvmppc_get_pc(vcpu);
/* Load the instruction manually if it failed to do so in the
* exit path */
if (vcpu->arch.last_inst == KVM_INST_FETCH_FAILED)
kvmppc_ld(vcpu, &pc, sizeof(u32), &vcpu->arch.last_inst, false);
return vcpu->arch.last_inst;
}
static inline ulong kvmppc_get_fault_dar(struct kvm_vcpu *vcpu)
{
return vcpu->arch.fault_dar;
}
static inline bool kvmppc_critical_section(struct kvm_vcpu *vcpu)
{
return false;
}
#endif
/* Magic register values loaded into r3 and r4 before the 'sc' assembly
* instruction for the OSI hypercalls */
#define OSI_SC_MAGIC_R3 0x113724FA
@ -251,12 +389,4 @@ static inline ulong kvmppc_get_fault_dar(struct kvm_vcpu *vcpu)
#define INS_DCBZ 0x7c0007ec
/* Also add subarch specific defines */
#ifdef CONFIG_PPC_BOOK3S_32
#include <asm/kvm_book3s_32.h>
#else
#include <asm/kvm_book3s_64.h>
#endif
#endif /* __ASM_KVM_BOOK3S_H__ */

View File

@ -20,9 +20,13 @@
#ifndef __ASM_KVM_BOOK3S_64_H__
#define __ASM_KVM_BOOK3S_64_H__
#ifdef CONFIG_KVM_BOOK3S_PR
static inline struct kvmppc_book3s_shadow_vcpu *to_svcpu(struct kvm_vcpu *vcpu)
{
return &get_paca()->shadow_vcpu;
}
#endif
#define SPAPR_TCE_SHIFT 12
#endif /* __ASM_KVM_BOOK3S_64_H__ */

View File

@ -60,6 +60,36 @@ kvmppc_resume_\intno:
#else /*__ASSEMBLY__ */
/*
* This struct goes in the PACA on 64-bit processors. It is used
* to store host state that needs to be saved when we enter a guest
* and restored when we exit, but isn't specific to any particular
* guest or vcpu. It also has some scratch fields used by the guest
* exit code.
*/
struct kvmppc_host_state {
ulong host_r1;
ulong host_r2;
ulong host_msr;
ulong vmhandler;
ulong scratch0;
ulong scratch1;
u8 in_guest;
#ifdef CONFIG_KVM_BOOK3S_64_HV
struct kvm_vcpu *kvm_vcpu;
struct kvmppc_vcore *kvm_vcore;
unsigned long xics_phys;
u64 dabr;
u64 host_mmcr[3];
u32 host_pmc[8];
u64 host_purr;
u64 host_spurr;
u64 host_dscr;
u64 dec_expires;
#endif
};
struct kvmppc_book3s_shadow_vcpu {
ulong gpr[14];
u32 cr;
@ -73,17 +103,12 @@ struct kvmppc_book3s_shadow_vcpu {
ulong shadow_srr1;
ulong fault_dar;
ulong host_r1;
ulong host_r2;
ulong handler;
ulong scratch0;
ulong scratch1;
ulong vmhandler;
u8 in_guest;
#ifdef CONFIG_PPC_BOOK3S_32
u32 sr[16]; /* Guest SRs */
struct kvmppc_host_state hstate;
#endif
#ifdef CONFIG_PPC_BOOK3S_64
u8 slb_max; /* highest used guest slb entry */
struct {

View File

@ -93,4 +93,8 @@ static inline ulong kvmppc_get_fault_dar(struct kvm_vcpu *vcpu)
return vcpu->arch.fault_dear;
}
static inline ulong kvmppc_get_msr(struct kvm_vcpu *vcpu)
{
return vcpu->arch.shared->msr;
}
#endif /* __ASM_KVM_BOOKE_H__ */

View File

@ -1,5 +1,5 @@
/*
* Copyright (C) 2008 Freescale Semiconductor, Inc. All rights reserved.
* Copyright (C) 2008-2011 Freescale Semiconductor, Inc. All rights reserved.
*
* Author: Yu Liu, <yu.liu@freescale.com>
*
@ -29,17 +29,25 @@ struct tlbe{
u32 mas7;
};
#define E500_TLB_VALID 1
#define E500_TLB_DIRTY 2
struct tlbe_priv {
pfn_t pfn;
unsigned int flags; /* E500_TLB_* */
};
struct vcpu_id_table;
struct kvmppc_vcpu_e500 {
/* Unmodified copy of the guest's TLB. */
struct tlbe *guest_tlb[E500_TLB_NUM];
/* TLB that's actually used when the guest is running. */
struct tlbe *shadow_tlb[E500_TLB_NUM];
/* Pages which are referenced in the shadow TLB. */
struct page **shadow_pages[E500_TLB_NUM];
struct tlbe *gtlb_arch[E500_TLB_NUM];
unsigned int guest_tlb_size[E500_TLB_NUM];
unsigned int shadow_tlb_size[E500_TLB_NUM];
unsigned int guest_tlb_nv[E500_TLB_NUM];
/* KVM internal information associated with each guest TLB entry */
struct tlbe_priv *gtlb_priv[E500_TLB_NUM];
unsigned int gtlb_size[E500_TLB_NUM];
unsigned int gtlb_nv[E500_TLB_NUM];
u32 host_pid[E500_PID_NUM];
u32 pid[E500_PID_NUM];
@ -53,6 +61,10 @@ struct kvmppc_vcpu_e500 {
u32 mas5;
u32 mas6;
u32 mas7;
/* vcpu id table */
struct vcpu_id_table *idt;
u32 l1csr0;
u32 l1csr1;
u32 hid0;

View File

@ -25,15 +25,23 @@
#include <linux/interrupt.h>
#include <linux/types.h>
#include <linux/kvm_types.h>
#include <linux/threads.h>
#include <linux/spinlock.h>
#include <linux/kvm_para.h>
#include <linux/list.h>
#include <linux/atomic.h>
#include <asm/kvm_asm.h>
#include <asm/processor.h>
#define KVM_MAX_VCPUS 1
#define KVM_MAX_VCPUS NR_CPUS
#define KVM_MAX_VCORES NR_CPUS
#define KVM_MEMORY_SLOTS 32
/* memory slots that does not exposed to userspace */
#define KVM_PRIVATE_MEM_SLOTS 4
#ifdef CONFIG_KVM_MMIO
#define KVM_COALESCED_MMIO_PAGE_OFFSET 1
#endif
/* We don't currently support large pages. */
#define KVM_HPAGE_GFN_SHIFT(x) 0
@ -57,6 +65,10 @@ struct kvm;
struct kvm_run;
struct kvm_vcpu;
struct lppaca;
struct slb_shadow;
struct dtl;
struct kvm_vm_stat {
u32 remote_tlb_flush;
};
@ -133,9 +145,74 @@ struct kvmppc_exit_timing {
};
};
struct kvm_arch {
struct kvmppc_pginfo {
unsigned long pfn;
atomic_t refcnt;
};
struct kvmppc_spapr_tce_table {
struct list_head list;
struct kvm *kvm;
u64 liobn;
u32 window_size;
struct page *pages[0];
};
struct kvmppc_rma_info {
void *base_virt;
unsigned long base_pfn;
unsigned long npages;
struct list_head list;
atomic_t use_count;
};
struct kvm_arch {
#ifdef CONFIG_KVM_BOOK3S_64_HV
unsigned long hpt_virt;
unsigned long ram_npages;
unsigned long ram_psize;
unsigned long ram_porder;
struct kvmppc_pginfo *ram_pginfo;
unsigned int lpid;
unsigned int host_lpid;
unsigned long host_lpcr;
unsigned long sdr1;
unsigned long host_sdr1;
int tlbie_lock;
int n_rma_pages;
unsigned long lpcr;
unsigned long rmor;
struct kvmppc_rma_info *rma;
struct list_head spapr_tce_tables;
unsigned short last_vcpu[NR_CPUS];
struct kvmppc_vcore *vcores[KVM_MAX_VCORES];
#endif /* CONFIG_KVM_BOOK3S_64_HV */
};
/*
* Struct for a virtual core.
* Note: entry_exit_count combines an entry count in the bottom 8 bits
* and an exit count in the next 8 bits. This is so that we can
* atomically increment the entry count iff the exit count is 0
* without taking the lock.
*/
struct kvmppc_vcore {
int n_runnable;
int n_blocked;
int num_threads;
int entry_exit_count;
int n_woken;
int nap_count;
u16 pcpu;
u8 vcore_running;
u8 in_guest;
struct list_head runnable_threads;
spinlock_t lock;
};
#define VCORE_ENTRY_COUNT(vc) ((vc)->entry_exit_count & 0xff)
#define VCORE_EXIT_COUNT(vc) ((vc)->entry_exit_count >> 8)
struct kvmppc_pte {
ulong eaddr;
u64 vpage;
@ -163,16 +240,18 @@ struct kvmppc_mmu {
bool (*is_dcbz32)(struct kvm_vcpu *vcpu);
};
struct hpte_cache {
struct hlist_node list_pte;
struct hlist_node list_pte_long;
struct hlist_node list_vpte;
struct hlist_node list_vpte_long;
struct rcu_head rcu_head;
u64 host_va;
u64 pfn;
ulong slot;
struct kvmppc_pte pte;
struct kvmppc_slb {
u64 esid;
u64 vsid;
u64 orige;
u64 origv;
bool valid : 1;
bool Ks : 1;
bool Kp : 1;
bool nx : 1;
bool large : 1; /* PTEs are 16MB */
bool tb : 1; /* 1TB segment */
bool class : 1;
};
struct kvm_vcpu_arch {
@ -187,6 +266,9 @@ struct kvm_vcpu_arch {
ulong highmem_handler;
ulong rmcall;
ulong host_paca_phys;
struct kvmppc_slb slb[64];
int slb_max; /* 1 + index of last valid entry in slb[] */
int slb_nr; /* total number of entries in SLB */
struct kvmppc_mmu mmu;
#endif
@ -195,13 +277,19 @@ struct kvm_vcpu_arch {
u64 fpr[32];
u64 fpscr;
#ifdef CONFIG_SPE
ulong evr[32];
ulong spefscr;
ulong host_spefscr;
u64 acc;
#endif
#ifdef CONFIG_ALTIVEC
vector128 vr[32];
vector128 vscr;
#endif
#ifdef CONFIG_VSX
u64 vsr[32];
u64 vsr[64];
#endif
#ifdef CONFIG_PPC_BOOK3S
@ -209,22 +297,27 @@ struct kvm_vcpu_arch {
u32 qpr[32];
#endif
#ifdef CONFIG_BOOKE
ulong pc;
ulong ctr;
ulong lr;
ulong xer;
u32 cr;
#endif
#ifdef CONFIG_PPC_BOOK3S
ulong shadow_msr;
ulong hflags;
ulong guest_owned_ext;
ulong purr;
ulong spurr;
ulong dscr;
ulong amr;
ulong uamor;
u32 ctrl;
ulong dabr;
#endif
u32 vrsave; /* also USPRG0 */
u32 mmucr;
ulong shadow_msr;
ulong sprg4;
ulong sprg5;
ulong sprg6;
@ -249,6 +342,7 @@ struct kvm_vcpu_arch {
u32 pvr;
u32 shadow_pid;
u32 shadow_pid1;
u32 pid;
u32 swap_pid;
@ -258,6 +352,9 @@ struct kvm_vcpu_arch {
u32 dbcr1;
u32 dbsr;
u64 mmcr[3];
u32 pmc[8];
#ifdef CONFIG_KVM_EXIT_TIMING
struct mutex exit_timing_lock;
struct kvmppc_exit_timing timing_exit;
@ -272,8 +369,12 @@ struct kvm_vcpu_arch {
struct dentry *debugfs_exit_timing;
#endif
#ifdef CONFIG_PPC_BOOK3S
ulong fault_dar;
u32 fault_dsisr;
#endif
#ifdef CONFIG_BOOKE
u32 last_inst;
ulong fault_dear;
ulong fault_esr;
ulong queued_dear;
@ -288,25 +389,47 @@ struct kvm_vcpu_arch {
u8 dcr_is_write;
u8 osi_needed;
u8 osi_enabled;
u8 hcall_needed;
u32 cpr0_cfgaddr; /* holds the last set cpr0_cfgaddr */
struct hrtimer dec_timer;
struct tasklet_struct tasklet;
u64 dec_jiffies;
u64 dec_expires;
unsigned long pending_exceptions;
u16 last_cpu;
u8 ceded;
u8 prodded;
u32 last_inst;
struct lppaca *vpa;
struct slb_shadow *slb_shadow;
struct dtl *dtl;
struct dtl *dtl_end;
struct kvmppc_vcore *vcore;
int ret;
int trap;
int state;
int ptid;
wait_queue_head_t cpu_run;
struct kvm_vcpu_arch_shared *shared;
unsigned long magic_page_pa; /* phys addr to map the magic page to */
unsigned long magic_page_ea; /* effect. addr to map the magic page to */
#ifdef CONFIG_PPC_BOOK3S
struct hlist_head hpte_hash_pte[HPTEG_HASH_NUM_PTE];
struct hlist_head hpte_hash_pte_long[HPTEG_HASH_NUM_PTE_LONG];
struct hlist_head hpte_hash_vpte[HPTEG_HASH_NUM_VPTE];
struct hlist_head hpte_hash_vpte_long[HPTEG_HASH_NUM_VPTE_LONG];
int hpte_cache_count;
spinlock_t mmu_lock;
#ifdef CONFIG_KVM_BOOK3S_64_HV
struct kvm_vcpu_arch_shared shregs;
struct list_head run_list;
struct task_struct *run_task;
struct kvm_run *kvm_run;
#endif
};
#define KVMPPC_VCPU_BUSY_IN_HOST 0
#define KVMPPC_VCPU_BLOCKED 1
#define KVMPPC_VCPU_RUNNABLE 2
#endif /* __POWERPC_KVM_HOST_H__ */

View File

@ -33,6 +33,9 @@
#else
#include <asm/kvm_booke.h>
#endif
#ifdef CONFIG_KVM_BOOK3S_64_HANDLER
#include <asm/paca.h>
#endif
enum emulation_result {
EMULATE_DONE, /* no further processing */
@ -42,6 +45,7 @@ enum emulation_result {
EMULATE_AGAIN, /* something went wrong. go again */
};
extern int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu);
extern int __kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu);
extern char kvmppc_handlers_start[];
extern unsigned long kvmppc_handler_len;
@ -109,6 +113,27 @@ extern void kvmppc_booke_exit(void);
extern void kvmppc_core_destroy_mmu(struct kvm_vcpu *vcpu);
extern int kvmppc_kvm_pv(struct kvm_vcpu *vcpu);
extern void kvmppc_map_magic(struct kvm_vcpu *vcpu);
extern long kvmppc_alloc_hpt(struct kvm *kvm);
extern void kvmppc_free_hpt(struct kvm *kvm);
extern long kvmppc_prepare_vrma(struct kvm *kvm,
struct kvm_userspace_memory_region *mem);
extern void kvmppc_map_vrma(struct kvm *kvm,
struct kvm_userspace_memory_region *mem);
extern int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu);
extern long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
struct kvm_create_spapr_tce *args);
extern long kvm_vm_ioctl_allocate_rma(struct kvm *kvm,
struct kvm_allocate_rma *rma);
extern struct kvmppc_rma_info *kvm_alloc_rma(void);
extern void kvm_release_rma(struct kvmppc_rma_info *ri);
extern int kvmppc_core_init_vm(struct kvm *kvm);
extern void kvmppc_core_destroy_vm(struct kvm *kvm);
extern int kvmppc_core_prepare_memory_region(struct kvm *kvm,
struct kvm_userspace_memory_region *mem);
extern void kvmppc_core_commit_memory_region(struct kvm *kvm,
struct kvm_userspace_memory_region *mem);
/*
* Cuts out inst bits with ordering according to spec.
@ -151,4 +176,20 @@ int kvmppc_set_sregs_ivor(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs);
void kvmppc_set_pid(struct kvm_vcpu *vcpu, u32 pid);
#ifdef CONFIG_KVM_BOOK3S_64_HV
static inline void kvmppc_set_xics_phys(int cpu, unsigned long addr)
{
paca[cpu].kvm_hstate.xics_phys = addr;
}
extern void kvm_rma_init(void);
#else
static inline void kvmppc_set_xics_phys(int cpu, unsigned long addr)
{}
static inline void kvm_rma_init(void)
{}
#endif
#endif /* __POWERPC_KVM_PPC_H__ */

View File

@ -90,13 +90,19 @@ extern char initial_stab[];
#define HPTE_R_PP0 ASM_CONST(0x8000000000000000)
#define HPTE_R_TS ASM_CONST(0x4000000000000000)
#define HPTE_R_KEY_HI ASM_CONST(0x3000000000000000)
#define HPTE_R_RPN_SHIFT 12
#define HPTE_R_RPN ASM_CONST(0x3ffffffffffff000)
#define HPTE_R_FLAGS ASM_CONST(0x00000000000003ff)
#define HPTE_R_RPN ASM_CONST(0x0ffffffffffff000)
#define HPTE_R_PP ASM_CONST(0x0000000000000003)
#define HPTE_R_N ASM_CONST(0x0000000000000004)
#define HPTE_R_G ASM_CONST(0x0000000000000008)
#define HPTE_R_M ASM_CONST(0x0000000000000010)
#define HPTE_R_I ASM_CONST(0x0000000000000020)
#define HPTE_R_W ASM_CONST(0x0000000000000040)
#define HPTE_R_WIMG ASM_CONST(0x0000000000000078)
#define HPTE_R_C ASM_CONST(0x0000000000000080)
#define HPTE_R_R ASM_CONST(0x0000000000000100)
#define HPTE_R_KEY_LO ASM_CONST(0x0000000000000e00)
#define HPTE_V_1TB_SEG ASM_CONST(0x4000000000000000)
#define HPTE_V_VRMA_MASK ASM_CONST(0x4001ffffff000000)

View File

@ -147,8 +147,11 @@ struct paca_struct {
struct dtl_entry *dtl_curr; /* pointer corresponding to dtl_ridx */
#ifdef CONFIG_KVM_BOOK3S_HANDLER
#ifdef CONFIG_KVM_BOOK3S_PR
/* We use this to store guest state in */
struct kvmppc_book3s_shadow_vcpu shadow_vcpu;
#endif
struct kvmppc_host_state kvm_hstate;
#endif
};

View File

@ -150,18 +150,22 @@ END_FW_FTR_SECTION_IFSET(FW_FEATURE_SPLPAR)
#define REST_16VSRSU(n,b,base) REST_8VSRSU(n,b,base); REST_8VSRSU(n+8,b,base)
#define REST_32VSRSU(n,b,base) REST_16VSRSU(n,b,base); REST_16VSRSU(n+16,b,base)
#define SAVE_EVR(n,s,base) evmergehi s,s,n; stw s,THREAD_EVR0+4*(n)(base)
#define SAVE_2EVRS(n,s,base) SAVE_EVR(n,s,base); SAVE_EVR(n+1,s,base)
#define SAVE_4EVRS(n,s,base) SAVE_2EVRS(n,s,base); SAVE_2EVRS(n+2,s,base)
#define SAVE_8EVRS(n,s,base) SAVE_4EVRS(n,s,base); SAVE_4EVRS(n+4,s,base)
#define SAVE_16EVRS(n,s,base) SAVE_8EVRS(n,s,base); SAVE_8EVRS(n+8,s,base)
#define SAVE_32EVRS(n,s,base) SAVE_16EVRS(n,s,base); SAVE_16EVRS(n+16,s,base)
#define REST_EVR(n,s,base) lwz s,THREAD_EVR0+4*(n)(base); evmergelo n,s,n
#define REST_2EVRS(n,s,base) REST_EVR(n,s,base); REST_EVR(n+1,s,base)
#define REST_4EVRS(n,s,base) REST_2EVRS(n,s,base); REST_2EVRS(n+2,s,base)
#define REST_8EVRS(n,s,base) REST_4EVRS(n,s,base); REST_4EVRS(n+4,s,base)
#define REST_16EVRS(n,s,base) REST_8EVRS(n,s,base); REST_8EVRS(n+8,s,base)
#define REST_32EVRS(n,s,base) REST_16EVRS(n,s,base); REST_16EVRS(n+16,s,base)
/*
* b = base register for addressing, o = base offset from register of 1st EVR
* n = first EVR, s = scratch
*/
#define SAVE_EVR(n,s,b,o) evmergehi s,s,n; stw s,o+4*(n)(b)
#define SAVE_2EVRS(n,s,b,o) SAVE_EVR(n,s,b,o); SAVE_EVR(n+1,s,b,o)
#define SAVE_4EVRS(n,s,b,o) SAVE_2EVRS(n,s,b,o); SAVE_2EVRS(n+2,s,b,o)
#define SAVE_8EVRS(n,s,b,o) SAVE_4EVRS(n,s,b,o); SAVE_4EVRS(n+4,s,b,o)
#define SAVE_16EVRS(n,s,b,o) SAVE_8EVRS(n,s,b,o); SAVE_8EVRS(n+8,s,b,o)
#define SAVE_32EVRS(n,s,b,o) SAVE_16EVRS(n,s,b,o); SAVE_16EVRS(n+16,s,b,o)
#define REST_EVR(n,s,b,o) lwz s,o+4*(n)(b); evmergelo n,s,n
#define REST_2EVRS(n,s,b,o) REST_EVR(n,s,b,o); REST_EVR(n+1,s,b,o)
#define REST_4EVRS(n,s,b,o) REST_2EVRS(n,s,b,o); REST_2EVRS(n+2,s,b,o)
#define REST_8EVRS(n,s,b,o) REST_4EVRS(n,s,b,o); REST_4EVRS(n+4,s,b,o)
#define REST_16EVRS(n,s,b,o) REST_8EVRS(n,s,b,o); REST_8EVRS(n+8,s,b,o)
#define REST_32EVRS(n,s,b,o) REST_16EVRS(n,s,b,o); REST_16EVRS(n+16,s,b,o)
/* Macros to adjust thread priority for hardware multithreading */
#define HMT_VERY_LOW or 31,31,31 # very low priority

View File

@ -189,6 +189,9 @@
#define SPRN_CTR 0x009 /* Count Register */
#define SPRN_DSCR 0x11
#define SPRN_CFAR 0x1c /* Come From Address Register */
#define SPRN_AMR 0x1d /* Authority Mask Register */
#define SPRN_UAMOR 0x9d /* User Authority Mask Override Register */
#define SPRN_AMOR 0x15d /* Authority Mask Override Register */
#define SPRN_ACOP 0x1F /* Available Coprocessor Register */
#define SPRN_CTRLF 0x088
#define SPRN_CTRLT 0x098
@ -232,22 +235,28 @@
#define LPCR_VPM0 (1ul << (63-0))
#define LPCR_VPM1 (1ul << (63-1))
#define LPCR_ISL (1ul << (63-2))
#define LPCR_VC_SH (63-2)
#define LPCR_DPFD_SH (63-11)
#define LPCR_VRMA_L (1ul << (63-12))
#define LPCR_VRMA_LP0 (1ul << (63-15))
#define LPCR_VRMA_LP1 (1ul << (63-16))
#define LPCR_VRMASD_SH (63-16)
#define LPCR_RMLS 0x1C000000 /* impl dependent rmo limit sel */
#define LPCR_RMLS_SH (63-37)
#define LPCR_ILE 0x02000000 /* !HV irqs set MSR:LE */
#define LPCR_PECE 0x00007000 /* powersave exit cause enable */
#define LPCR_PECE0 0x00004000 /* ext. exceptions can cause exit */
#define LPCR_PECE1 0x00002000 /* decrementer can cause exit */
#define LPCR_PECE2 0x00001000 /* machine check etc can cause exit */
#define LPCR_MER 0x00000800 /* Mediated External Exception */
#define LPCR_LPES 0x0000000c
#define LPCR_LPES0 0x00000008 /* LPAR Env selector 0 */
#define LPCR_LPES1 0x00000004 /* LPAR Env selector 1 */
#define LPCR_LPES_SH 2
#define LPCR_RMI 0x00000002 /* real mode is cache inhibit */
#define LPCR_HDICE 0x00000001 /* Hyp Decr enable (HV,PR,EE) */
#define SPRN_LPID 0x13F /* Logical Partition Identifier */
#define LPID_RSVD 0x3ff /* Reserved LPID for partn switching */
#define SPRN_HMER 0x150 /* Hardware m? error recovery */
#define SPRN_HMEER 0x151 /* Hardware m? enable error recovery */
#define SPRN_HEIR 0x153 /* Hypervisor Emulated Instruction Register */
@ -298,6 +307,7 @@
#define SPRN_HASH1 0x3D2 /* Primary Hash Address Register */
#define SPRN_HASH2 0x3D3 /* Secondary Hash Address Resgister */
#define SPRN_HID0 0x3F0 /* Hardware Implementation Register 0 */
#define HID0_HDICE_SH (63 - 23) /* 970 HDEC interrupt enable */
#define HID0_EMCP (1<<31) /* Enable Machine Check pin */
#define HID0_EBA (1<<29) /* Enable Bus Address Parity */
#define HID0_EBD (1<<28) /* Enable Bus Data Parity */
@ -353,6 +363,13 @@
#define SPRN_IABR2 0x3FA /* 83xx */
#define SPRN_IBCR 0x135 /* 83xx Insn Breakpoint Control Reg */
#define SPRN_HID4 0x3F4 /* 970 HID4 */
#define HID4_LPES0 (1ul << (63-0)) /* LPAR env. sel. bit 0 */
#define HID4_RMLS2_SH (63 - 2) /* Real mode limit bottom 2 bits */
#define HID4_LPID5_SH (63 - 6) /* partition ID bottom 4 bits */
#define HID4_RMOR_SH (63 - 22) /* real mode offset (16 bits) */
#define HID4_LPES1 (1 << (63-57)) /* LPAR env. sel. bit 1 */
#define HID4_RMLS0_SH (63 - 58) /* Real mode limit top bit */
#define HID4_LPID1_SH 0 /* partition ID top 2 bits */
#define SPRN_HID4_GEKKO 0x3F3 /* Gekko HID4 */
#define SPRN_HID5 0x3F6 /* 970 HID5 */
#define SPRN_HID6 0x3F9 /* BE HID 6 */
@ -802,28 +819,28 @@
mfspr rX,SPRN_SPRG_PACA; \
FTR_SECTION_ELSE_NESTED(66); \
mfspr rX,SPRN_SPRG_HPACA; \
ALT_FTR_SECTION_END_NESTED_IFCLR(CPU_FTR_HVMODE_206, 66)
ALT_FTR_SECTION_END_NESTED_IFCLR(CPU_FTR_HVMODE, 66)
#define SET_PACA(rX) \
BEGIN_FTR_SECTION_NESTED(66); \
mtspr SPRN_SPRG_PACA,rX; \
FTR_SECTION_ELSE_NESTED(66); \
mtspr SPRN_SPRG_HPACA,rX; \
ALT_FTR_SECTION_END_NESTED_IFCLR(CPU_FTR_HVMODE_206, 66)
ALT_FTR_SECTION_END_NESTED_IFCLR(CPU_FTR_HVMODE, 66)
#define GET_SCRATCH0(rX) \
BEGIN_FTR_SECTION_NESTED(66); \
mfspr rX,SPRN_SPRG_SCRATCH0; \
FTR_SECTION_ELSE_NESTED(66); \
mfspr rX,SPRN_SPRG_HSCRATCH0; \
ALT_FTR_SECTION_END_NESTED_IFCLR(CPU_FTR_HVMODE_206, 66)
ALT_FTR_SECTION_END_NESTED_IFCLR(CPU_FTR_HVMODE, 66)
#define SET_SCRATCH0(rX) \
BEGIN_FTR_SECTION_NESTED(66); \
mtspr SPRN_SPRG_SCRATCH0,rX; \
FTR_SECTION_ELSE_NESTED(66); \
mtspr SPRN_SPRG_HSCRATCH0,rX; \
ALT_FTR_SECTION_END_NESTED_IFCLR(CPU_FTR_HVMODE_206, 66)
ALT_FTR_SECTION_END_NESTED_IFCLR(CPU_FTR_HVMODE, 66)
#else /* CONFIG_PPC_BOOK3S_64 */
#define GET_SCRATCH0(rX) mfspr rX,SPRN_SPRG_SCRATCH0

View File

@ -318,6 +318,7 @@
#define ESR_ILK 0x00100000 /* Instr. Cache Locking */
#define ESR_PUO 0x00040000 /* Unimplemented Operation exception */
#define ESR_BO 0x00020000 /* Byte Ordering */
#define ESR_SPV 0x00000080 /* Signal Processing operation */
/* Bit definitions related to the DBCR0. */
#if defined(CONFIG_40x)

View File

@ -128,6 +128,7 @@ int main(void)
DEFINE(ICACHEL1LINESPERPAGE, offsetof(struct ppc64_caches, ilines_per_page));
/* paca */
DEFINE(PACA_SIZE, sizeof(struct paca_struct));
DEFINE(PACA_LOCK_TOKEN, offsetof(struct paca_struct, lock_token));
DEFINE(PACAPACAINDEX, offsetof(struct paca_struct, paca_index));
DEFINE(PACAPROCSTART, offsetof(struct paca_struct, cpu_start));
DEFINE(PACAKSAVE, offsetof(struct paca_struct, kstack));
@ -187,7 +188,9 @@ int main(void)
DEFINE(LPPACASRR1, offsetof(struct lppaca, saved_srr1));
DEFINE(LPPACAANYINT, offsetof(struct lppaca, int_dword.any_int));
DEFINE(LPPACADECRINT, offsetof(struct lppaca, int_dword.fields.decr_int));
DEFINE(LPPACA_PMCINUSE, offsetof(struct lppaca, pmcregs_in_use));
DEFINE(LPPACA_DTLIDX, offsetof(struct lppaca, dtl_idx));
DEFINE(LPPACA_YIELDCOUNT, offsetof(struct lppaca, yield_count));
DEFINE(PACA_DTL_RIDX, offsetof(struct paca_struct, dtl_ridx));
#endif /* CONFIG_PPC_STD_MMU_64 */
DEFINE(PACAEMERGSP, offsetof(struct paca_struct, emergency_sp));
@ -198,11 +201,6 @@ int main(void)
DEFINE(PACA_USER_TIME, offsetof(struct paca_struct, user_time));
DEFINE(PACA_SYSTEM_TIME, offsetof(struct paca_struct, system_time));
DEFINE(PACA_TRAP_SAVE, offsetof(struct paca_struct, trap_save));
#ifdef CONFIG_KVM_BOOK3S_64_HANDLER
DEFINE(PACA_KVM_SVCPU, offsetof(struct paca_struct, shadow_vcpu));
DEFINE(SVCPU_SLB, offsetof(struct kvmppc_book3s_shadow_vcpu, slb));
DEFINE(SVCPU_SLB_MAX, offsetof(struct kvmppc_book3s_shadow_vcpu, slb_max));
#endif
#endif /* CONFIG_PPC64 */
/* RTAS */
@ -397,67 +395,160 @@ int main(void)
DEFINE(VCPU_HOST_PID, offsetof(struct kvm_vcpu, arch.host_pid));
DEFINE(VCPU_GPRS, offsetof(struct kvm_vcpu, arch.gpr));
DEFINE(VCPU_VRSAVE, offsetof(struct kvm_vcpu, arch.vrsave));
DEFINE(VCPU_FPRS, offsetof(struct kvm_vcpu, arch.fpr));
DEFINE(VCPU_FPSCR, offsetof(struct kvm_vcpu, arch.fpscr));
#ifdef CONFIG_ALTIVEC
DEFINE(VCPU_VRS, offsetof(struct kvm_vcpu, arch.vr));
DEFINE(VCPU_VSCR, offsetof(struct kvm_vcpu, arch.vscr));
#endif
#ifdef CONFIG_VSX
DEFINE(VCPU_VSRS, offsetof(struct kvm_vcpu, arch.vsr));
#endif
DEFINE(VCPU_XER, offsetof(struct kvm_vcpu, arch.xer));
DEFINE(VCPU_CTR, offsetof(struct kvm_vcpu, arch.ctr));
DEFINE(VCPU_LR, offsetof(struct kvm_vcpu, arch.lr));
DEFINE(VCPU_CR, offsetof(struct kvm_vcpu, arch.cr));
DEFINE(VCPU_PC, offsetof(struct kvm_vcpu, arch.pc));
#ifdef CONFIG_KVM_BOOK3S_64_HV
DEFINE(VCPU_MSR, offsetof(struct kvm_vcpu, arch.shregs.msr));
DEFINE(VCPU_SRR0, offsetof(struct kvm_vcpu, arch.shregs.srr0));
DEFINE(VCPU_SRR1, offsetof(struct kvm_vcpu, arch.shregs.srr1));
DEFINE(VCPU_SPRG0, offsetof(struct kvm_vcpu, arch.shregs.sprg0));
DEFINE(VCPU_SPRG1, offsetof(struct kvm_vcpu, arch.shregs.sprg1));
DEFINE(VCPU_SPRG2, offsetof(struct kvm_vcpu, arch.shregs.sprg2));
DEFINE(VCPU_SPRG3, offsetof(struct kvm_vcpu, arch.shregs.sprg3));
#endif
DEFINE(VCPU_SPRG4, offsetof(struct kvm_vcpu, arch.sprg4));
DEFINE(VCPU_SPRG5, offsetof(struct kvm_vcpu, arch.sprg5));
DEFINE(VCPU_SPRG6, offsetof(struct kvm_vcpu, arch.sprg6));
DEFINE(VCPU_SPRG7, offsetof(struct kvm_vcpu, arch.sprg7));
DEFINE(VCPU_SHADOW_PID, offsetof(struct kvm_vcpu, arch.shadow_pid));
DEFINE(VCPU_SHADOW_PID1, offsetof(struct kvm_vcpu, arch.shadow_pid1));
DEFINE(VCPU_SHARED, offsetof(struct kvm_vcpu, arch.shared));
DEFINE(VCPU_SHARED_MSR, offsetof(struct kvm_vcpu_arch_shared, msr));
DEFINE(VCPU_SHADOW_MSR, offsetof(struct kvm_vcpu, arch.shadow_msr));
/* book3s */
#ifdef CONFIG_KVM_BOOK3S_64_HV
DEFINE(KVM_LPID, offsetof(struct kvm, arch.lpid));
DEFINE(KVM_SDR1, offsetof(struct kvm, arch.sdr1));
DEFINE(KVM_HOST_LPID, offsetof(struct kvm, arch.host_lpid));
DEFINE(KVM_HOST_LPCR, offsetof(struct kvm, arch.host_lpcr));
DEFINE(KVM_HOST_SDR1, offsetof(struct kvm, arch.host_sdr1));
DEFINE(KVM_TLBIE_LOCK, offsetof(struct kvm, arch.tlbie_lock));
DEFINE(KVM_ONLINE_CPUS, offsetof(struct kvm, online_vcpus.counter));
DEFINE(KVM_LAST_VCPU, offsetof(struct kvm, arch.last_vcpu));
DEFINE(KVM_LPCR, offsetof(struct kvm, arch.lpcr));
DEFINE(KVM_RMOR, offsetof(struct kvm, arch.rmor));
DEFINE(VCPU_DSISR, offsetof(struct kvm_vcpu, arch.shregs.dsisr));
DEFINE(VCPU_DAR, offsetof(struct kvm_vcpu, arch.shregs.dar));
#endif
#ifdef CONFIG_PPC_BOOK3S
DEFINE(VCPU_KVM, offsetof(struct kvm_vcpu, kvm));
DEFINE(VCPU_VCPUID, offsetof(struct kvm_vcpu, vcpu_id));
DEFINE(VCPU_HOST_RETIP, offsetof(struct kvm_vcpu, arch.host_retip));
DEFINE(VCPU_HOST_MSR, offsetof(struct kvm_vcpu, arch.host_msr));
DEFINE(VCPU_SHADOW_MSR, offsetof(struct kvm_vcpu, arch.shadow_msr));
DEFINE(VCPU_PURR, offsetof(struct kvm_vcpu, arch.purr));
DEFINE(VCPU_SPURR, offsetof(struct kvm_vcpu, arch.spurr));
DEFINE(VCPU_DSCR, offsetof(struct kvm_vcpu, arch.dscr));
DEFINE(VCPU_AMR, offsetof(struct kvm_vcpu, arch.amr));
DEFINE(VCPU_UAMOR, offsetof(struct kvm_vcpu, arch.uamor));
DEFINE(VCPU_CTRL, offsetof(struct kvm_vcpu, arch.ctrl));
DEFINE(VCPU_DABR, offsetof(struct kvm_vcpu, arch.dabr));
DEFINE(VCPU_TRAMPOLINE_LOWMEM, offsetof(struct kvm_vcpu, arch.trampoline_lowmem));
DEFINE(VCPU_TRAMPOLINE_ENTER, offsetof(struct kvm_vcpu, arch.trampoline_enter));
DEFINE(VCPU_HIGHMEM_HANDLER, offsetof(struct kvm_vcpu, arch.highmem_handler));
DEFINE(VCPU_RMCALL, offsetof(struct kvm_vcpu, arch.rmcall));
DEFINE(VCPU_HFLAGS, offsetof(struct kvm_vcpu, arch.hflags));
DEFINE(VCPU_DEC, offsetof(struct kvm_vcpu, arch.dec));
DEFINE(VCPU_DEC_EXPIRES, offsetof(struct kvm_vcpu, arch.dec_expires));
DEFINE(VCPU_PENDING_EXC, offsetof(struct kvm_vcpu, arch.pending_exceptions));
DEFINE(VCPU_VPA, offsetof(struct kvm_vcpu, arch.vpa));
DEFINE(VCPU_MMCR, offsetof(struct kvm_vcpu, arch.mmcr));
DEFINE(VCPU_PMC, offsetof(struct kvm_vcpu, arch.pmc));
DEFINE(VCPU_SLB, offsetof(struct kvm_vcpu, arch.slb));
DEFINE(VCPU_SLB_MAX, offsetof(struct kvm_vcpu, arch.slb_max));
DEFINE(VCPU_SLB_NR, offsetof(struct kvm_vcpu, arch.slb_nr));
DEFINE(VCPU_LAST_CPU, offsetof(struct kvm_vcpu, arch.last_cpu));
DEFINE(VCPU_FAULT_DSISR, offsetof(struct kvm_vcpu, arch.fault_dsisr));
DEFINE(VCPU_FAULT_DAR, offsetof(struct kvm_vcpu, arch.fault_dar));
DEFINE(VCPU_LAST_INST, offsetof(struct kvm_vcpu, arch.last_inst));
DEFINE(VCPU_TRAP, offsetof(struct kvm_vcpu, arch.trap));
DEFINE(VCPU_PTID, offsetof(struct kvm_vcpu, arch.ptid));
DEFINE(VCORE_ENTRY_EXIT, offsetof(struct kvmppc_vcore, entry_exit_count));
DEFINE(VCORE_NAP_COUNT, offsetof(struct kvmppc_vcore, nap_count));
DEFINE(VCORE_IN_GUEST, offsetof(struct kvmppc_vcore, in_guest));
DEFINE(VCPU_SVCPU, offsetof(struct kvmppc_vcpu_book3s, shadow_vcpu) -
offsetof(struct kvmppc_vcpu_book3s, vcpu));
DEFINE(SVCPU_CR, offsetof(struct kvmppc_book3s_shadow_vcpu, cr));
DEFINE(SVCPU_XER, offsetof(struct kvmppc_book3s_shadow_vcpu, xer));
DEFINE(SVCPU_CTR, offsetof(struct kvmppc_book3s_shadow_vcpu, ctr));
DEFINE(SVCPU_LR, offsetof(struct kvmppc_book3s_shadow_vcpu, lr));
DEFINE(SVCPU_PC, offsetof(struct kvmppc_book3s_shadow_vcpu, pc));
DEFINE(SVCPU_R0, offsetof(struct kvmppc_book3s_shadow_vcpu, gpr[0]));
DEFINE(SVCPU_R1, offsetof(struct kvmppc_book3s_shadow_vcpu, gpr[1]));
DEFINE(SVCPU_R2, offsetof(struct kvmppc_book3s_shadow_vcpu, gpr[2]));
DEFINE(SVCPU_R3, offsetof(struct kvmppc_book3s_shadow_vcpu, gpr[3]));
DEFINE(SVCPU_R4, offsetof(struct kvmppc_book3s_shadow_vcpu, gpr[4]));
DEFINE(SVCPU_R5, offsetof(struct kvmppc_book3s_shadow_vcpu, gpr[5]));
DEFINE(SVCPU_R6, offsetof(struct kvmppc_book3s_shadow_vcpu, gpr[6]));
DEFINE(SVCPU_R7, offsetof(struct kvmppc_book3s_shadow_vcpu, gpr[7]));
DEFINE(SVCPU_R8, offsetof(struct kvmppc_book3s_shadow_vcpu, gpr[8]));
DEFINE(SVCPU_R9, offsetof(struct kvmppc_book3s_shadow_vcpu, gpr[9]));
DEFINE(SVCPU_R10, offsetof(struct kvmppc_book3s_shadow_vcpu, gpr[10]));
DEFINE(SVCPU_R11, offsetof(struct kvmppc_book3s_shadow_vcpu, gpr[11]));
DEFINE(SVCPU_R12, offsetof(struct kvmppc_book3s_shadow_vcpu, gpr[12]));
DEFINE(SVCPU_R13, offsetof(struct kvmppc_book3s_shadow_vcpu, gpr[13]));
DEFINE(SVCPU_HOST_R1, offsetof(struct kvmppc_book3s_shadow_vcpu, host_r1));
DEFINE(SVCPU_HOST_R2, offsetof(struct kvmppc_book3s_shadow_vcpu, host_r2));
DEFINE(SVCPU_VMHANDLER, offsetof(struct kvmppc_book3s_shadow_vcpu,
vmhandler));
DEFINE(SVCPU_SCRATCH0, offsetof(struct kvmppc_book3s_shadow_vcpu,
scratch0));
DEFINE(SVCPU_SCRATCH1, offsetof(struct kvmppc_book3s_shadow_vcpu,
scratch1));
DEFINE(SVCPU_IN_GUEST, offsetof(struct kvmppc_book3s_shadow_vcpu,
in_guest));
DEFINE(SVCPU_FAULT_DSISR, offsetof(struct kvmppc_book3s_shadow_vcpu,
fault_dsisr));
DEFINE(SVCPU_FAULT_DAR, offsetof(struct kvmppc_book3s_shadow_vcpu,
fault_dar));
DEFINE(SVCPU_LAST_INST, offsetof(struct kvmppc_book3s_shadow_vcpu,
last_inst));
DEFINE(SVCPU_SHADOW_SRR1, offsetof(struct kvmppc_book3s_shadow_vcpu,
shadow_srr1));
#ifdef CONFIG_PPC_BOOK3S_32
DEFINE(SVCPU_SR, offsetof(struct kvmppc_book3s_shadow_vcpu, sr));
#endif
DEFINE(VCPU_SLB_E, offsetof(struct kvmppc_slb, orige));
DEFINE(VCPU_SLB_V, offsetof(struct kvmppc_slb, origv));
DEFINE(VCPU_SLB_SIZE, sizeof(struct kvmppc_slb));
#ifdef CONFIG_PPC_BOOK3S_64
#ifdef CONFIG_KVM_BOOK3S_PR
# define SVCPU_FIELD(x, f) DEFINE(x, offsetof(struct paca_struct, shadow_vcpu.f))
#else
# define SVCPU_FIELD(x, f)
#endif
# define HSTATE_FIELD(x, f) DEFINE(x, offsetof(struct paca_struct, kvm_hstate.f))
#else /* 32-bit */
# define SVCPU_FIELD(x, f) DEFINE(x, offsetof(struct kvmppc_book3s_shadow_vcpu, f))
# define HSTATE_FIELD(x, f) DEFINE(x, offsetof(struct kvmppc_book3s_shadow_vcpu, hstate.f))
#endif
SVCPU_FIELD(SVCPU_CR, cr);
SVCPU_FIELD(SVCPU_XER, xer);
SVCPU_FIELD(SVCPU_CTR, ctr);
SVCPU_FIELD(SVCPU_LR, lr);
SVCPU_FIELD(SVCPU_PC, pc);
SVCPU_FIELD(SVCPU_R0, gpr[0]);
SVCPU_FIELD(SVCPU_R1, gpr[1]);
SVCPU_FIELD(SVCPU_R2, gpr[2]);
SVCPU_FIELD(SVCPU_R3, gpr[3]);
SVCPU_FIELD(SVCPU_R4, gpr[4]);
SVCPU_FIELD(SVCPU_R5, gpr[5]);
SVCPU_FIELD(SVCPU_R6, gpr[6]);
SVCPU_FIELD(SVCPU_R7, gpr[7]);
SVCPU_FIELD(SVCPU_R8, gpr[8]);
SVCPU_FIELD(SVCPU_R9, gpr[9]);
SVCPU_FIELD(SVCPU_R10, gpr[10]);
SVCPU_FIELD(SVCPU_R11, gpr[11]);
SVCPU_FIELD(SVCPU_R12, gpr[12]);
SVCPU_FIELD(SVCPU_R13, gpr[13]);
SVCPU_FIELD(SVCPU_FAULT_DSISR, fault_dsisr);
SVCPU_FIELD(SVCPU_FAULT_DAR, fault_dar);
SVCPU_FIELD(SVCPU_LAST_INST, last_inst);
SVCPU_FIELD(SVCPU_SHADOW_SRR1, shadow_srr1);
#ifdef CONFIG_PPC_BOOK3S_32
SVCPU_FIELD(SVCPU_SR, sr);
#endif
#ifdef CONFIG_PPC64
SVCPU_FIELD(SVCPU_SLB, slb);
SVCPU_FIELD(SVCPU_SLB_MAX, slb_max);
#endif
HSTATE_FIELD(HSTATE_HOST_R1, host_r1);
HSTATE_FIELD(HSTATE_HOST_R2, host_r2);
HSTATE_FIELD(HSTATE_HOST_MSR, host_msr);
HSTATE_FIELD(HSTATE_VMHANDLER, vmhandler);
HSTATE_FIELD(HSTATE_SCRATCH0, scratch0);
HSTATE_FIELD(HSTATE_SCRATCH1, scratch1);
HSTATE_FIELD(HSTATE_IN_GUEST, in_guest);
#ifdef CONFIG_KVM_BOOK3S_64_HV
HSTATE_FIELD(HSTATE_KVM_VCPU, kvm_vcpu);
HSTATE_FIELD(HSTATE_KVM_VCORE, kvm_vcore);
HSTATE_FIELD(HSTATE_XICS_PHYS, xics_phys);
HSTATE_FIELD(HSTATE_MMCR, host_mmcr);
HSTATE_FIELD(HSTATE_PMC, host_pmc);
HSTATE_FIELD(HSTATE_PURR, host_purr);
HSTATE_FIELD(HSTATE_SPURR, host_spurr);
HSTATE_FIELD(HSTATE_DSCR, host_dscr);
HSTATE_FIELD(HSTATE_DABR, dabr);
HSTATE_FIELD(HSTATE_DECEXP, dec_expires);
#endif /* CONFIG_KVM_BOOK3S_64_HV */
#else /* CONFIG_PPC_BOOK3S */
DEFINE(VCPU_CR, offsetof(struct kvm_vcpu, arch.cr));
DEFINE(VCPU_XER, offsetof(struct kvm_vcpu, arch.xer));
DEFINE(VCPU_LR, offsetof(struct kvm_vcpu, arch.lr));
@ -467,7 +558,7 @@ int main(void)
DEFINE(VCPU_FAULT_DEAR, offsetof(struct kvm_vcpu, arch.fault_dear));
DEFINE(VCPU_FAULT_ESR, offsetof(struct kvm_vcpu, arch.fault_esr));
#endif /* CONFIG_PPC_BOOK3S */
#endif
#endif /* CONFIG_KVM */
#ifdef CONFIG_KVM_GUEST
DEFINE(KVM_MAGIC_SCRATCH1, offsetof(struct kvm_vcpu_arch_shared,
@ -497,6 +588,13 @@ int main(void)
DEFINE(TLBCAM_MAS7, offsetof(struct tlbcam, MAS7));
#endif
#if defined(CONFIG_KVM) && defined(CONFIG_SPE)
DEFINE(VCPU_EVR, offsetof(struct kvm_vcpu, arch.evr[0]));
DEFINE(VCPU_ACC, offsetof(struct kvm_vcpu, arch.acc));
DEFINE(VCPU_SPEFSCR, offsetof(struct kvm_vcpu, arch.spefscr));
DEFINE(VCPU_HOST_SPEFSCR, offsetof(struct kvm_vcpu, arch.host_spefscr));
#endif
#ifdef CONFIG_KVM_EXIT_TIMING
DEFINE(VCPU_TIMING_EXIT_TBU, offsetof(struct kvm_vcpu,
arch.timing_exit.tv32.tbu));

View File

@ -45,12 +45,12 @@ _GLOBAL(__restore_cpu_power7)
blr
__init_hvmode_206:
/* Disable CPU_FTR_HVMODE_206 and exit if MSR:HV is not set */
/* Disable CPU_FTR_HVMODE and exit if MSR:HV is not set */
mfmsr r3
rldicl. r0,r3,4,63
bnelr
ld r5,CPU_SPEC_FEATURES(r4)
LOAD_REG_IMMEDIATE(r6,CPU_FTR_HVMODE_206)
LOAD_REG_IMMEDIATE(r6,CPU_FTR_HVMODE)
xor r5,r5,r6
std r5,CPU_SPEC_FEATURES(r4)
blr
@ -61,19 +61,23 @@ __init_LPCR:
* LPES = 0b01 (HSRR0/1 used for 0x500)
* PECE = 0b111
* DPFD = 4
* HDICE = 0
* VC = 0b100 (VPM0=1, VPM1=0, ISL=0)
* VRMASD = 0b10000 (L=1, LP=00)
*
* Other bits untouched for now
*/
mfspr r3,SPRN_LPCR
ori r3,r3,(LPCR_LPES0|LPCR_LPES1)
xori r3,r3, LPCR_LPES0
li r5,1
rldimi r3,r5, LPCR_LPES_SH, 64-LPCR_LPES_SH-2
ori r3,r3,(LPCR_PECE0|LPCR_PECE1|LPCR_PECE2)
li r5,7
sldi r5,r5,LPCR_DPFD_SH
andc r3,r3,r5
li r5,4
sldi r5,r5,LPCR_DPFD_SH
or r3,r3,r5
rldimi r3,r5, LPCR_DPFD_SH, 64-LPCR_DPFD_SH-3
clrrdi r3,r3,1 /* clear HDICE */
li r5,4
rldimi r3,r5, LPCR_VC_SH, 0
li r5,0x10
rldimi r3,r5, LPCR_VRMASD_SH, 64-LPCR_VRMASD_SH-5
mtspr SPRN_LPCR,r3
isync
blr

View File

@ -76,7 +76,7 @@ _GLOBAL(__setup_cpu_ppc970)
/* Do nothing if not running in HV mode */
mfmsr r0
rldicl. r0,r0,4,63
beqlr
beq no_hv_mode
mfspr r0,SPRN_HID0
li r11,5 /* clear DOZE and SLEEP */
@ -90,7 +90,7 @@ _GLOBAL(__setup_cpu_ppc970MP)
/* Do nothing if not running in HV mode */
mfmsr r0
rldicl. r0,r0,4,63
beqlr
beq no_hv_mode
mfspr r0,SPRN_HID0
li r11,0x15 /* clear DOZE and SLEEP */
@ -109,6 +109,14 @@ load_hids:
sync
isync
/* Try to set LPES = 01 in HID4 */
mfspr r0,SPRN_HID4
clrldi r0,r0,1 /* clear LPES0 */
ori r0,r0,HID4_LPES1 /* set LPES1 */
sync
mtspr SPRN_HID4,r0
isync
/* Save away cpu state */
LOAD_REG_ADDR(r5,cpu_state_storage)
@ -117,11 +125,21 @@ load_hids:
std r3,CS_HID0(r5)
mfspr r3,SPRN_HID1
std r3,CS_HID1(r5)
mfspr r3,SPRN_HID4
std r3,CS_HID4(r5)
mfspr r4,SPRN_HID4
std r4,CS_HID4(r5)
mfspr r3,SPRN_HID5
std r3,CS_HID5(r5)
/* See if we successfully set LPES1 to 1; if not we are in Apple mode */
andi. r4,r4,HID4_LPES1
bnelr
no_hv_mode:
/* Disable CPU_FTR_HVMODE and exit, since we don't have HV mode */
ld r5,CPU_SPEC_FEATURES(r4)
LOAD_REG_IMMEDIATE(r6,CPU_FTR_HVMODE)
andc r5,r5,r6
std r5,CPU_SPEC_FEATURES(r4)
blr
/* Called with no MMU context (typically MSR:IR/DR off) to

View File

@ -40,7 +40,6 @@ __start_interrupts:
.globl system_reset_pSeries;
system_reset_pSeries:
HMT_MEDIUM;
DO_KVM 0x100;
SET_SCRATCH0(r13)
#ifdef CONFIG_PPC_P7_NAP
BEGIN_FTR_SECTION
@ -50,82 +49,73 @@ BEGIN_FTR_SECTION
* state loss at this time.
*/
mfspr r13,SPRN_SRR1
rlwinm r13,r13,47-31,30,31
cmpwi cr0,r13,1
bne 1f
b .power7_wakeup_noloss
1: cmpwi cr0,r13,2
bne 1f
b .power7_wakeup_loss
rlwinm. r13,r13,47-31,30,31
beq 9f
/* waking up from powersave (nap) state */
cmpwi cr1,r13,2
/* Total loss of HV state is fatal, we could try to use the
* PIR to locate a PACA, then use an emergency stack etc...
* but for now, let's just stay stuck here
*/
1: cmpwi cr0,r13,3
beq .
END_FTR_SECTION_IFSET(CPU_FTR_HVMODE_206)
bgt cr1,.
GET_PACA(r13)
#ifdef CONFIG_KVM_BOOK3S_64_HV
lbz r0,PACAPROCSTART(r13)
cmpwi r0,0x80
bne 1f
li r0,0
stb r0,PACAPROCSTART(r13)
b kvm_start_guest
1:
#endif
beq cr1,2f
b .power7_wakeup_noloss
2: b .power7_wakeup_loss
9:
END_FTR_SECTION_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206)
#endif /* CONFIG_PPC_P7_NAP */
EXCEPTION_PROLOG_PSERIES(PACA_EXGEN, system_reset_common, EXC_STD)
EXCEPTION_PROLOG_PSERIES(PACA_EXGEN, system_reset_common, EXC_STD,
NOTEST, 0x100)
. = 0x200
_machine_check_pSeries:
HMT_MEDIUM
DO_KVM 0x200
SET_SCRATCH0(r13)
EXCEPTION_PROLOG_PSERIES(PACA_EXMC, machine_check_common, EXC_STD)
machine_check_pSeries_1:
/* This is moved out of line as it can be patched by FW, but
* some code path might still want to branch into the original
* vector
*/
b machine_check_pSeries
. = 0x300
.globl data_access_pSeries
data_access_pSeries:
HMT_MEDIUM
DO_KVM 0x300
SET_SCRATCH0(r13)
#ifndef CONFIG_POWER4_ONLY
BEGIN_FTR_SECTION
GET_PACA(r13)
std r9,PACA_EXSLB+EX_R9(r13)
std r10,PACA_EXSLB+EX_R10(r13)
mfspr r10,SPRN_DAR
mfspr r9,SPRN_DSISR
srdi r10,r10,60
rlwimi r10,r9,16,0x20
mfcr r9
cmpwi r10,0x2c
beq do_stab_bolted_pSeries
ld r10,PACA_EXSLB+EX_R10(r13)
std r11,PACA_EXGEN+EX_R11(r13)
ld r11,PACA_EXSLB+EX_R9(r13)
std r12,PACA_EXGEN+EX_R12(r13)
GET_SCRATCH0(r12)
std r10,PACA_EXGEN+EX_R10(r13)
std r11,PACA_EXGEN+EX_R9(r13)
std r12,PACA_EXGEN+EX_R13(r13)
EXCEPTION_PROLOG_PSERIES_1(data_access_common, EXC_STD)
FTR_SECTION_ELSE
EXCEPTION_PROLOG_PSERIES(PACA_EXGEN, data_access_common, EXC_STD)
ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_SLB)
b data_access_check_stab
data_access_not_stab:
END_MMU_FTR_SECTION_IFCLR(MMU_FTR_SLB)
#endif
EXCEPTION_PROLOG_PSERIES(PACA_EXGEN, data_access_common, EXC_STD,
KVMTEST_PR, 0x300)
. = 0x380
.globl data_access_slb_pSeries
data_access_slb_pSeries:
HMT_MEDIUM
DO_KVM 0x380
SET_SCRATCH0(r13)
GET_PACA(r13)
EXCEPTION_PROLOG_1(PACA_EXSLB, KVMTEST_PR, 0x380)
std r3,PACA_EXSLB+EX_R3(r13)
mfspr r3,SPRN_DAR
std r9,PACA_EXSLB+EX_R9(r13) /* save r9 - r12 */
mfcr r9
#ifdef __DISABLED__
/* Keep that around for when we re-implement dynamic VSIDs */
cmpdi r3,0
bge slb_miss_user_pseries
#endif /* __DISABLED__ */
std r10,PACA_EXSLB+EX_R10(r13)
std r11,PACA_EXSLB+EX_R11(r13)
std r12,PACA_EXSLB+EX_R12(r13)
GET_SCRATCH0(r10)
std r10,PACA_EXSLB+EX_R13(r13)
mfspr r12,SPRN_SRR1 /* and SRR1 */
mfspr r12,SPRN_SRR1
#ifndef CONFIG_RELOCATABLE
b .slb_miss_realmode
#else
@ -147,24 +137,16 @@ data_access_slb_pSeries:
.globl instruction_access_slb_pSeries
instruction_access_slb_pSeries:
HMT_MEDIUM
DO_KVM 0x480
SET_SCRATCH0(r13)
GET_PACA(r13)
EXCEPTION_PROLOG_1(PACA_EXSLB, KVMTEST_PR, 0x480)
std r3,PACA_EXSLB+EX_R3(r13)
mfspr r3,SPRN_SRR0 /* SRR0 is faulting address */
std r9,PACA_EXSLB+EX_R9(r13) /* save r9 - r12 */
mfcr r9
#ifdef __DISABLED__
/* Keep that around for when we re-implement dynamic VSIDs */
cmpdi r3,0
bge slb_miss_user_pseries
#endif /* __DISABLED__ */
std r10,PACA_EXSLB+EX_R10(r13)
std r11,PACA_EXSLB+EX_R11(r13)
std r12,PACA_EXSLB+EX_R12(r13)
GET_SCRATCH0(r10)
std r10,PACA_EXSLB+EX_R13(r13)
mfspr r12,SPRN_SRR1 /* and SRR1 */
mfspr r12,SPRN_SRR1
#ifndef CONFIG_RELOCATABLE
b .slb_miss_realmode
#else
@ -184,26 +166,46 @@ instruction_access_slb_pSeries:
hardware_interrupt_pSeries:
hardware_interrupt_hv:
BEGIN_FTR_SECTION
_MASKABLE_EXCEPTION_PSERIES(0x500, hardware_interrupt, EXC_STD)
_MASKABLE_EXCEPTION_PSERIES(0x502, hardware_interrupt,
EXC_HV, SOFTEN_TEST_HV)
KVM_HANDLER(PACA_EXGEN, EXC_HV, 0x502)
FTR_SECTION_ELSE
_MASKABLE_EXCEPTION_PSERIES(0x502, hardware_interrupt, EXC_HV)
ALT_FTR_SECTION_END_IFCLR(CPU_FTR_HVMODE_206)
_MASKABLE_EXCEPTION_PSERIES(0x500, hardware_interrupt,
EXC_STD, SOFTEN_TEST_HV_201)
KVM_HANDLER(PACA_EXGEN, EXC_STD, 0x500)
ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206)
STD_EXCEPTION_PSERIES(0x600, 0x600, alignment)
KVM_HANDLER_PR(PACA_EXGEN, EXC_STD, 0x600)
STD_EXCEPTION_PSERIES(0x700, 0x700, program_check)
KVM_HANDLER_PR(PACA_EXGEN, EXC_STD, 0x700)
STD_EXCEPTION_PSERIES(0x800, 0x800, fp_unavailable)
KVM_HANDLER_PR(PACA_EXGEN, EXC_STD, 0x800)
MASKABLE_EXCEPTION_PSERIES(0x900, 0x900, decrementer)
MASKABLE_EXCEPTION_HV(0x980, 0x980, decrementer)
MASKABLE_EXCEPTION_HV(0x980, 0x982, decrementer)
STD_EXCEPTION_PSERIES(0xa00, 0xa00, trap_0a)
KVM_HANDLER_PR(PACA_EXGEN, EXC_STD, 0xa00)
STD_EXCEPTION_PSERIES(0xb00, 0xb00, trap_0b)
KVM_HANDLER_PR(PACA_EXGEN, EXC_STD, 0xb00)
. = 0xc00
.globl system_call_pSeries
system_call_pSeries:
HMT_MEDIUM
DO_KVM 0xc00
#ifdef CONFIG_KVM_BOOK3S_64_HANDLER
SET_SCRATCH0(r13)
GET_PACA(r13)
std r9,PACA_EXGEN+EX_R9(r13)
std r10,PACA_EXGEN+EX_R10(r13)
mfcr r9
KVMTEST(0xc00)
GET_SCRATCH0(r13)
#endif
BEGIN_FTR_SECTION
cmpdi r0,0x1ebe
beq- 1f
@ -220,6 +222,8 @@ END_FTR_SECTION_IFSET(CPU_FTR_REAL_LE)
rfid
b . /* prevent speculative execution */
KVM_HANDLER(PACA_EXGEN, EXC_STD, 0xc00)
/* Fast LE/BE switch system call */
1: mfspr r12,SPRN_SRR1
xori r12,r12,MSR_LE
@ -228,6 +232,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_REAL_LE)
b .
STD_EXCEPTION_PSERIES(0xd00, 0xd00, single_step)
KVM_HANDLER_PR(PACA_EXGEN, EXC_STD, 0xd00)
/* At 0xe??? we have a bunch of hypervisor exceptions, we branch
* out of line to handle them
@ -262,30 +267,93 @@ vsx_unavailable_pSeries_1:
#ifdef CONFIG_CBE_RAS
STD_EXCEPTION_HV(0x1200, 0x1202, cbe_system_error)
KVM_HANDLER_PR_SKIP(PACA_EXGEN, EXC_HV, 0x1202)
#endif /* CONFIG_CBE_RAS */
STD_EXCEPTION_PSERIES(0x1300, 0x1300, instruction_breakpoint)
KVM_HANDLER_PR_SKIP(PACA_EXGEN, EXC_STD, 0x1300)
#ifdef CONFIG_CBE_RAS
STD_EXCEPTION_HV(0x1600, 0x1602, cbe_maintenance)
KVM_HANDLER_PR_SKIP(PACA_EXGEN, EXC_HV, 0x1602)
#endif /* CONFIG_CBE_RAS */
STD_EXCEPTION_PSERIES(0x1700, 0x1700, altivec_assist)
KVM_HANDLER_PR(PACA_EXGEN, EXC_STD, 0x1700)
#ifdef CONFIG_CBE_RAS
STD_EXCEPTION_HV(0x1800, 0x1802, cbe_thermal)
KVM_HANDLER_PR_SKIP(PACA_EXGEN, EXC_HV, 0x1802)
#endif /* CONFIG_CBE_RAS */
. = 0x3000
/*** Out of line interrupts support ***/
/* moved from 0x200 */
machine_check_pSeries:
.globl machine_check_fwnmi
machine_check_fwnmi:
HMT_MEDIUM
SET_SCRATCH0(r13) /* save r13 */
EXCEPTION_PROLOG_PSERIES(PACA_EXMC, machine_check_common,
EXC_STD, KVMTEST, 0x200)
KVM_HANDLER_SKIP(PACA_EXMC, EXC_STD, 0x200)
#ifndef CONFIG_POWER4_ONLY
/* moved from 0x300 */
data_access_check_stab:
GET_PACA(r13)
std r9,PACA_EXSLB+EX_R9(r13)
std r10,PACA_EXSLB+EX_R10(r13)
mfspr r10,SPRN_DAR
mfspr r9,SPRN_DSISR
srdi r10,r10,60
rlwimi r10,r9,16,0x20
#ifdef CONFIG_KVM_BOOK3S_PR
lbz r9,HSTATE_IN_GUEST(r13)
rlwimi r10,r9,8,0x300
#endif
mfcr r9
cmpwi r10,0x2c
beq do_stab_bolted_pSeries
mtcrf 0x80,r9
ld r9,PACA_EXSLB+EX_R9(r13)
ld r10,PACA_EXSLB+EX_R10(r13)
b data_access_not_stab
do_stab_bolted_pSeries:
std r11,PACA_EXSLB+EX_R11(r13)
std r12,PACA_EXSLB+EX_R12(r13)
GET_SCRATCH0(r10)
std r10,PACA_EXSLB+EX_R13(r13)
EXCEPTION_PROLOG_PSERIES_1(.do_stab_bolted, EXC_STD)
#endif /* CONFIG_POWER4_ONLY */
KVM_HANDLER_PR_SKIP(PACA_EXGEN, EXC_STD, 0x300)
KVM_HANDLER_PR_SKIP(PACA_EXSLB, EXC_STD, 0x380)
KVM_HANDLER_PR(PACA_EXGEN, EXC_STD, 0x400)
KVM_HANDLER_PR(PACA_EXSLB, EXC_STD, 0x480)
KVM_HANDLER_PR(PACA_EXGEN, EXC_STD, 0x900)
KVM_HANDLER(PACA_EXGEN, EXC_HV, 0x982)
.align 7
/* moved from 0xe00 */
STD_EXCEPTION_HV(., 0xe00, h_data_storage)
STD_EXCEPTION_HV(., 0xe20, h_instr_storage)
STD_EXCEPTION_HV(., 0xe40, emulation_assist)
STD_EXCEPTION_HV(., 0xe60, hmi_exception) /* need to flush cache ? */
STD_EXCEPTION_HV(., 0xe02, h_data_storage)
KVM_HANDLER_SKIP(PACA_EXGEN, EXC_HV, 0xe02)
STD_EXCEPTION_HV(., 0xe22, h_instr_storage)
KVM_HANDLER(PACA_EXGEN, EXC_HV, 0xe22)
STD_EXCEPTION_HV(., 0xe42, emulation_assist)
KVM_HANDLER(PACA_EXGEN, EXC_HV, 0xe42)
STD_EXCEPTION_HV(., 0xe62, hmi_exception) /* need to flush cache ? */
KVM_HANDLER(PACA_EXGEN, EXC_HV, 0xe62)
/* moved from 0xf00 */
STD_EXCEPTION_PSERIES(., 0xf00, performance_monitor)
KVM_HANDLER_PR(PACA_EXGEN, EXC_STD, 0xf00)
STD_EXCEPTION_PSERIES(., 0xf20, altivec_unavailable)
KVM_HANDLER_PR(PACA_EXGEN, EXC_STD, 0xf20)
STD_EXCEPTION_PSERIES(., 0xf40, vsx_unavailable)
KVM_HANDLER_PR(PACA_EXGEN, EXC_STD, 0xf40)
/*
* An interrupt came in while soft-disabled; clear EE in SRR1,
@ -317,14 +385,6 @@ masked_Hinterrupt:
hrfid
b .
.align 7
do_stab_bolted_pSeries:
std r11,PACA_EXSLB+EX_R11(r13)
std r12,PACA_EXSLB+EX_R12(r13)
GET_SCRATCH0(r10)
std r10,PACA_EXSLB+EX_R13(r13)
EXCEPTION_PROLOG_PSERIES_1(.do_stab_bolted, EXC_STD)
#ifdef CONFIG_PPC_PSERIES
/*
* Vectors for the FWNMI option. Share common code.
@ -334,14 +394,8 @@ do_stab_bolted_pSeries:
system_reset_fwnmi:
HMT_MEDIUM
SET_SCRATCH0(r13) /* save r13 */
EXCEPTION_PROLOG_PSERIES(PACA_EXGEN, system_reset_common, EXC_STD)
.globl machine_check_fwnmi
.align 7
machine_check_fwnmi:
HMT_MEDIUM
SET_SCRATCH0(r13) /* save r13 */
EXCEPTION_PROLOG_PSERIES(PACA_EXMC, machine_check_common, EXC_STD)
EXCEPTION_PROLOG_PSERIES(PACA_EXGEN, system_reset_common, EXC_STD,
NOTEST, 0x100)
#endif /* CONFIG_PPC_PSERIES */
@ -376,7 +430,11 @@ slb_miss_user_pseries:
/* KVM's trampoline code needs to be close to the interrupt handlers */
#ifdef CONFIG_KVM_BOOK3S_64_HANDLER
#ifdef CONFIG_KVM_BOOK3S_PR
#include "../kvm/book3s_rmhandlers.S"
#else
#include "../kvm/book3s_hv_rmhandlers.S"
#endif
#endif
.align 7

View File

@ -656,7 +656,7 @@ load_up_spe:
cmpi 0,r4,0
beq 1f
addi r4,r4,THREAD /* want THREAD of last_task_used_spe */
SAVE_32EVRS(0,r10,r4)
SAVE_32EVRS(0,r10,r4,THREAD_EVR0)
evxor evr10, evr10, evr10 /* clear out evr10 */
evmwumiaa evr10, evr10, evr10 /* evr10 <- ACC = 0 * 0 + ACC */
li r5,THREAD_ACC
@ -676,7 +676,7 @@ load_up_spe:
stw r4,THREAD_USED_SPE(r5)
evlddx evr4,r10,r5
evmra evr4,evr4
REST_32EVRS(0,r10,r5)
REST_32EVRS(0,r10,r5,THREAD_EVR0)
#ifndef CONFIG_SMP
subi r4,r5,THREAD
stw r4,last_task_used_spe@l(r3)
@ -787,13 +787,11 @@ _GLOBAL(giveup_spe)
addi r3,r3,THREAD /* want THREAD of task */
lwz r5,PT_REGS(r3)
cmpi 0,r5,0
SAVE_32EVRS(0, r4, r3)
SAVE_32EVRS(0, r4, r3, THREAD_EVR0)
evxor evr6, evr6, evr6 /* clear out evr6 */
evmwumiaa evr6, evr6, evr6 /* evr6 <- ACC = 0 * 0 + ACC */
li r4,THREAD_ACC
evstddx evr6, r4, r3 /* save off accumulator */
mfspr r6,SPRN_SPEFSCR
stw r6,THREAD_SPEFSCR(r3) /* save spefscr register value */
beq 1f
lwz r4,_MSR-STACK_FRAME_OVERHEAD(r5)
lis r3,MSR_SPE@h

View File

@ -73,7 +73,6 @@ _GLOBAL(power7_idle)
b .
_GLOBAL(power7_wakeup_loss)
GET_PACA(r13)
ld r1,PACAR1(r13)
REST_NVGPRS(r1)
REST_GPR(2, r1)
@ -87,7 +86,6 @@ _GLOBAL(power7_wakeup_loss)
rfid
_GLOBAL(power7_wakeup_noloss)
GET_PACA(r13)
ld r1,PACAR1(r13)
ld r4,_MSR(r1)
ld r5,_NIP(r1)

View File

@ -167,7 +167,7 @@ void setup_paca(struct paca_struct *new_paca)
* if we do a GET_PACA() before the feature fixups have been
* applied
*/
if (cpu_has_feature(CPU_FTR_HVMODE_206))
if (cpu_has_feature(CPU_FTR_HVMODE))
mtspr(SPRN_SPRG_HPACA, local_paca);
#endif
mtspr(SPRN_SPRG_PACA, local_paca);

View File

@ -96,6 +96,7 @@ void flush_fp_to_thread(struct task_struct *tsk)
preempt_enable();
}
}
EXPORT_SYMBOL_GPL(flush_fp_to_thread);
void enable_kernel_fp(void)
{
@ -145,6 +146,7 @@ void flush_altivec_to_thread(struct task_struct *tsk)
preempt_enable();
}
}
EXPORT_SYMBOL_GPL(flush_altivec_to_thread);
#endif /* CONFIG_ALTIVEC */
#ifdef CONFIG_VSX
@ -186,6 +188,7 @@ void flush_vsx_to_thread(struct task_struct *tsk)
preempt_enable();
}
}
EXPORT_SYMBOL_GPL(flush_vsx_to_thread);
#endif /* CONFIG_VSX */
#ifdef CONFIG_SPE
@ -213,6 +216,7 @@ void flush_spe_to_thread(struct task_struct *tsk)
#ifdef CONFIG_SMP
BUG_ON(tsk != current);
#endif
tsk->thread.spefscr = mfspr(SPRN_SPEFSCR);
giveup_spe(tsk);
}
preempt_enable();

View File

@ -375,6 +375,9 @@ void __init check_for_initrd(void)
int threads_per_core, threads_shift;
cpumask_t threads_core_mask;
EXPORT_SYMBOL_GPL(threads_per_core);
EXPORT_SYMBOL_GPL(threads_shift);
EXPORT_SYMBOL_GPL(threads_core_mask);
static void __init cpu_init_thread_core_maps(int tpc)
{

View File

@ -63,6 +63,7 @@
#include <asm/kexec.h>
#include <asm/mmu_context.h>
#include <asm/code-patching.h>
#include <asm/kvm_ppc.h>
#include "setup.h"
@ -580,6 +581,8 @@ void __init setup_arch(char **cmdline_p)
/* Initialize the MMU context management stuff */
mmu_context_init();
kvm_rma_init();
ppc64_boot_msg(0x15, "Setup Done");
}

View File

@ -243,6 +243,7 @@ void smp_send_reschedule(int cpu)
if (likely(smp_ops))
smp_ops->message_pass(cpu, PPC_MSG_RESCHEDULE);
}
EXPORT_SYMBOL_GPL(smp_send_reschedule);
void arch_send_call_function_single_ipi(int cpu)
{

View File

@ -1387,10 +1387,7 @@ void SPEFloatingPointException(struct pt_regs *regs)
int code = 0;
int err;
preempt_disable();
if (regs->msr & MSR_SPE)
giveup_spe(current);
preempt_enable();
flush_spe_to_thread(current);
spefscr = current->thread.spefscr;
fpexc_mode = current->thread.fpexc_mode;

View File

@ -387,8 +387,10 @@ static void kvmppc_44x_invalidate(struct kvm_vcpu *vcpu,
}
}
void kvmppc_mmu_priv_switch(struct kvm_vcpu *vcpu, int usermode)
void kvmppc_mmu_msr_notify(struct kvm_vcpu *vcpu, u32 old_msr)
{
int usermode = vcpu->arch.shared->msr & MSR_PR;
vcpu->arch.shadow_pid = !usermode;
}

View File

@ -20,7 +20,6 @@ config KVM
bool
select PREEMPT_NOTIFIERS
select ANON_INODES
select KVM_MMIO
config KVM_BOOK3S_HANDLER
bool
@ -28,16 +27,22 @@ config KVM_BOOK3S_HANDLER
config KVM_BOOK3S_32_HANDLER
bool
select KVM_BOOK3S_HANDLER
select KVM_MMIO
config KVM_BOOK3S_64_HANDLER
bool
select KVM_BOOK3S_HANDLER
config KVM_BOOK3S_PR
bool
select KVM_MMIO
config KVM_BOOK3S_32
tristate "KVM support for PowerPC book3s_32 processors"
depends on EXPERIMENTAL && PPC_BOOK3S_32 && !SMP && !PTE_64BIT
select KVM
select KVM_BOOK3S_32_HANDLER
select KVM_BOOK3S_PR
---help---
Support running unmodified book3s_32 guest kernels
in virtual machines on book3s_32 host processors.
@ -50,8 +55,8 @@ config KVM_BOOK3S_32
config KVM_BOOK3S_64
tristate "KVM support for PowerPC book3s_64 processors"
depends on EXPERIMENTAL && PPC_BOOK3S_64
select KVM
select KVM_BOOK3S_64_HANDLER
select KVM
---help---
Support running unmodified book3s_64 and book3s_32 guest kernels
in virtual machines on book3s_64 host processors.
@ -61,10 +66,34 @@ config KVM_BOOK3S_64
If unsure, say N.
config KVM_BOOK3S_64_HV
bool "KVM support for POWER7 and PPC970 using hypervisor mode in host"
depends on KVM_BOOK3S_64
---help---
Support running unmodified book3s_64 guest kernels in
virtual machines on POWER7 and PPC970 processors that have
hypervisor mode available to the host.
If you say Y here, KVM will use the hardware virtualization
facilities of POWER7 (and later) processors, meaning that
guest operating systems will run at full hardware speed
using supervisor and user modes. However, this also means
that KVM is not usable under PowerVM (pHyp), is only usable
on POWER7 (or later) processors and PPC970-family processors,
and cannot emulate a different processor from the host processor.
If unsure, say N.
config KVM_BOOK3S_64_PR
def_bool y
depends on KVM_BOOK3S_64 && !KVM_BOOK3S_64_HV
select KVM_BOOK3S_PR
config KVM_440
bool "KVM support for PowerPC 440 processors"
depends on EXPERIMENTAL && 44x
select KVM
select KVM_MMIO
---help---
Support running unmodified 440 guest kernels in virtual machines on
440 host processors.
@ -89,6 +118,7 @@ config KVM_E500
bool "KVM support for PowerPC E500 processors"
depends on EXPERIMENTAL && E500
select KVM
select KVM_MMIO
---help---
Support running unmodified E500 guest kernels in virtual machines on
E500 host processors.

View File

@ -38,24 +38,42 @@ kvm-e500-objs := \
e500_emulate.o
kvm-objs-$(CONFIG_KVM_E500) := $(kvm-e500-objs)
kvm-book3s_64-objs := \
$(common-objs-y) \
kvm-book3s_64-objs-$(CONFIG_KVM_BOOK3S_64_PR) := \
../../../virt/kvm/coalesced_mmio.o \
fpu.o \
book3s_paired_singles.o \
book3s.o \
book3s_pr.o \
book3s_emulate.o \
book3s_interrupts.o \
book3s_mmu_hpte.o \
book3s_64_mmu_host.o \
book3s_64_mmu.o \
book3s_32_mmu.o
kvm-objs-$(CONFIG_KVM_BOOK3S_64) := $(kvm-book3s_64-objs)
kvm-book3s_64-objs-$(CONFIG_KVM_BOOK3S_64_HV) := \
book3s_hv.o \
book3s_hv_interrupts.o \
book3s_64_mmu_hv.o
kvm-book3s_64-builtin-objs-$(CONFIG_KVM_BOOK3S_64_HV) := \
book3s_hv_rm_mmu.o \
book3s_64_vio_hv.o \
book3s_hv_builtin.o
kvm-book3s_64-module-objs := \
../../../virt/kvm/kvm_main.o \
powerpc.o \
emulate.o \
book3s.o \
$(kvm-book3s_64-objs-y)
kvm-objs-$(CONFIG_KVM_BOOK3S_64) := $(kvm-book3s_64-module-objs)
kvm-book3s_32-objs := \
$(common-objs-y) \
fpu.o \
book3s_paired_singles.o \
book3s.o \
book3s_pr.o \
book3s_emulate.o \
book3s_interrupts.o \
book3s_mmu_hpte.o \
@ -70,3 +88,4 @@ obj-$(CONFIG_KVM_E500) += kvm.o
obj-$(CONFIG_KVM_BOOK3S_64) += kvm.o
obj-$(CONFIG_KVM_BOOK3S_32) += kvm.o
obj-y += $(kvm-book3s_64-builtin-objs-y)

File diff suppressed because it is too large Load Diff

View File

@ -41,36 +41,36 @@ static void kvmppc_mmu_book3s_64_reset_msr(struct kvm_vcpu *vcpu)
}
static struct kvmppc_slb *kvmppc_mmu_book3s_64_find_slbe(
struct kvmppc_vcpu_book3s *vcpu_book3s,
struct kvm_vcpu *vcpu,
gva_t eaddr)
{
int i;
u64 esid = GET_ESID(eaddr);
u64 esid_1t = GET_ESID_1T(eaddr);
for (i = 0; i < vcpu_book3s->slb_nr; i++) {
for (i = 0; i < vcpu->arch.slb_nr; i++) {
u64 cmp_esid = esid;
if (!vcpu_book3s->slb[i].valid)
if (!vcpu->arch.slb[i].valid)
continue;
if (vcpu_book3s->slb[i].tb)
if (vcpu->arch.slb[i].tb)
cmp_esid = esid_1t;
if (vcpu_book3s->slb[i].esid == cmp_esid)
return &vcpu_book3s->slb[i];
if (vcpu->arch.slb[i].esid == cmp_esid)
return &vcpu->arch.slb[i];
}
dprintk("KVM: No SLB entry found for 0x%lx [%llx | %llx]\n",
eaddr, esid, esid_1t);
for (i = 0; i < vcpu_book3s->slb_nr; i++) {
if (vcpu_book3s->slb[i].vsid)
for (i = 0; i < vcpu->arch.slb_nr; i++) {
if (vcpu->arch.slb[i].vsid)
dprintk(" %d: %c%c%c %llx %llx\n", i,
vcpu_book3s->slb[i].valid ? 'v' : ' ',
vcpu_book3s->slb[i].large ? 'l' : ' ',
vcpu_book3s->slb[i].tb ? 't' : ' ',
vcpu_book3s->slb[i].esid,
vcpu_book3s->slb[i].vsid);
vcpu->arch.slb[i].valid ? 'v' : ' ',
vcpu->arch.slb[i].large ? 'l' : ' ',
vcpu->arch.slb[i].tb ? 't' : ' ',
vcpu->arch.slb[i].esid,
vcpu->arch.slb[i].vsid);
}
return NULL;
@ -81,7 +81,7 @@ static u64 kvmppc_mmu_book3s_64_ea_to_vp(struct kvm_vcpu *vcpu, gva_t eaddr,
{
struct kvmppc_slb *slb;
slb = kvmppc_mmu_book3s_64_find_slbe(to_book3s(vcpu), eaddr);
slb = kvmppc_mmu_book3s_64_find_slbe(vcpu, eaddr);
if (!slb)
return 0;
@ -180,7 +180,7 @@ static int kvmppc_mmu_book3s_64_xlate(struct kvm_vcpu *vcpu, gva_t eaddr,
return 0;
}
slbe = kvmppc_mmu_book3s_64_find_slbe(vcpu_book3s, eaddr);
slbe = kvmppc_mmu_book3s_64_find_slbe(vcpu, eaddr);
if (!slbe)
goto no_seg_found;
@ -320,10 +320,10 @@ static void kvmppc_mmu_book3s_64_slbmte(struct kvm_vcpu *vcpu, u64 rs, u64 rb)
esid_1t = GET_ESID_1T(rb);
slb_nr = rb & 0xfff;
if (slb_nr > vcpu_book3s->slb_nr)
if (slb_nr > vcpu->arch.slb_nr)
return;
slbe = &vcpu_book3s->slb[slb_nr];
slbe = &vcpu->arch.slb[slb_nr];
slbe->large = (rs & SLB_VSID_L) ? 1 : 0;
slbe->tb = (rs & SLB_VSID_B_1T) ? 1 : 0;
@ -344,38 +344,35 @@ static void kvmppc_mmu_book3s_64_slbmte(struct kvm_vcpu *vcpu, u64 rs, u64 rb)
static u64 kvmppc_mmu_book3s_64_slbmfee(struct kvm_vcpu *vcpu, u64 slb_nr)
{
struct kvmppc_vcpu_book3s *vcpu_book3s = to_book3s(vcpu);
struct kvmppc_slb *slbe;
if (slb_nr > vcpu_book3s->slb_nr)
if (slb_nr > vcpu->arch.slb_nr)
return 0;
slbe = &vcpu_book3s->slb[slb_nr];
slbe = &vcpu->arch.slb[slb_nr];
return slbe->orige;
}
static u64 kvmppc_mmu_book3s_64_slbmfev(struct kvm_vcpu *vcpu, u64 slb_nr)
{
struct kvmppc_vcpu_book3s *vcpu_book3s = to_book3s(vcpu);
struct kvmppc_slb *slbe;
if (slb_nr > vcpu_book3s->slb_nr)
if (slb_nr > vcpu->arch.slb_nr)
return 0;
slbe = &vcpu_book3s->slb[slb_nr];
slbe = &vcpu->arch.slb[slb_nr];
return slbe->origv;
}
static void kvmppc_mmu_book3s_64_slbie(struct kvm_vcpu *vcpu, u64 ea)
{
struct kvmppc_vcpu_book3s *vcpu_book3s = to_book3s(vcpu);
struct kvmppc_slb *slbe;
dprintk("KVM MMU: slbie(0x%llx)\n", ea);
slbe = kvmppc_mmu_book3s_64_find_slbe(vcpu_book3s, ea);
slbe = kvmppc_mmu_book3s_64_find_slbe(vcpu, ea);
if (!slbe)
return;
@ -389,13 +386,12 @@ static void kvmppc_mmu_book3s_64_slbie(struct kvm_vcpu *vcpu, u64 ea)
static void kvmppc_mmu_book3s_64_slbia(struct kvm_vcpu *vcpu)
{
struct kvmppc_vcpu_book3s *vcpu_book3s = to_book3s(vcpu);
int i;
dprintk("KVM MMU: slbia()\n");
for (i = 1; i < vcpu_book3s->slb_nr; i++)
vcpu_book3s->slb[i].valid = false;
for (i = 1; i < vcpu->arch.slb_nr; i++)
vcpu->arch.slb[i].valid = false;
if (vcpu->arch.shared->msr & MSR_IR) {
kvmppc_mmu_flush_segments(vcpu);
@ -464,7 +460,7 @@ static int kvmppc_mmu_book3s_64_esid_to_vsid(struct kvm_vcpu *vcpu, ulong esid,
ulong mp_ea = vcpu->arch.magic_page_ea;
if (vcpu->arch.shared->msr & (MSR_DR|MSR_IR)) {
slb = kvmppc_mmu_book3s_64_find_slbe(to_book3s(vcpu), ea);
slb = kvmppc_mmu_book3s_64_find_slbe(vcpu, ea);
if (slb)
gvsid = slb->vsid;
}

View File

@ -0,0 +1,180 @@
/*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License, version 2, as
* published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
*
* Copyright 2010 Paul Mackerras, IBM Corp. <paulus@au1.ibm.com>
*/
#include <linux/types.h>
#include <linux/string.h>
#include <linux/kvm.h>
#include <linux/kvm_host.h>
#include <linux/highmem.h>
#include <linux/gfp.h>
#include <linux/slab.h>
#include <linux/hugetlb.h>
#include <asm/tlbflush.h>
#include <asm/kvm_ppc.h>
#include <asm/kvm_book3s.h>
#include <asm/mmu-hash64.h>
#include <asm/hvcall.h>
#include <asm/synch.h>
#include <asm/ppc-opcode.h>
#include <asm/cputable.h>
/* For now use fixed-size 16MB page table */
#define HPT_ORDER 24
#define HPT_NPTEG (1ul << (HPT_ORDER - 7)) /* 128B per pteg */
#define HPT_HASH_MASK (HPT_NPTEG - 1)
/* Pages in the VRMA are 16MB pages */
#define VRMA_PAGE_ORDER 24
#define VRMA_VSID 0x1ffffffUL /* 1TB VSID reserved for VRMA */
/* POWER7 has 10-bit LPIDs, PPC970 has 6-bit LPIDs */
#define MAX_LPID_970 63
#define NR_LPIDS (LPID_RSVD + 1)
unsigned long lpid_inuse[BITS_TO_LONGS(NR_LPIDS)];
long kvmppc_alloc_hpt(struct kvm *kvm)
{
unsigned long hpt;
unsigned long lpid;
hpt = __get_free_pages(GFP_KERNEL|__GFP_ZERO|__GFP_REPEAT|__GFP_NOWARN,
HPT_ORDER - PAGE_SHIFT);
if (!hpt) {
pr_err("kvm_alloc_hpt: Couldn't alloc HPT\n");
return -ENOMEM;
}
kvm->arch.hpt_virt = hpt;
do {
lpid = find_first_zero_bit(lpid_inuse, NR_LPIDS);
if (lpid >= NR_LPIDS) {
pr_err("kvm_alloc_hpt: No LPIDs free\n");
free_pages(hpt, HPT_ORDER - PAGE_SHIFT);
return -ENOMEM;
}
} while (test_and_set_bit(lpid, lpid_inuse));
kvm->arch.sdr1 = __pa(hpt) | (HPT_ORDER - 18);
kvm->arch.lpid = lpid;
pr_info("KVM guest htab at %lx, LPID %lx\n", hpt, lpid);
return 0;
}
void kvmppc_free_hpt(struct kvm *kvm)
{
clear_bit(kvm->arch.lpid, lpid_inuse);
free_pages(kvm->arch.hpt_virt, HPT_ORDER - PAGE_SHIFT);
}
void kvmppc_map_vrma(struct kvm *kvm, struct kvm_userspace_memory_region *mem)
{
unsigned long i;
unsigned long npages = kvm->arch.ram_npages;
unsigned long pfn;
unsigned long *hpte;
unsigned long hash;
struct kvmppc_pginfo *pginfo = kvm->arch.ram_pginfo;
if (!pginfo)
return;
/* VRMA can't be > 1TB */
if (npages > 1ul << (40 - kvm->arch.ram_porder))
npages = 1ul << (40 - kvm->arch.ram_porder);
/* Can't use more than 1 HPTE per HPTEG */
if (npages > HPT_NPTEG)
npages = HPT_NPTEG;
for (i = 0; i < npages; ++i) {
pfn = pginfo[i].pfn;
if (!pfn)
break;
/* can't use hpt_hash since va > 64 bits */
hash = (i ^ (VRMA_VSID ^ (VRMA_VSID << 25))) & HPT_HASH_MASK;
/*
* We assume that the hash table is empty and no
* vcpus are using it at this stage. Since we create
* at most one HPTE per HPTEG, we just assume entry 7
* is available and use it.
*/
hpte = (unsigned long *) (kvm->arch.hpt_virt + (hash << 7));
hpte += 7 * 2;
/* HPTE low word - RPN, protection, etc. */
hpte[1] = (pfn << PAGE_SHIFT) | HPTE_R_R | HPTE_R_C |
HPTE_R_M | PP_RWXX;
wmb();
hpte[0] = HPTE_V_1TB_SEG | (VRMA_VSID << (40 - 16)) |
(i << (VRMA_PAGE_ORDER - 16)) | HPTE_V_BOLTED |
HPTE_V_LARGE | HPTE_V_VALID;
}
}
int kvmppc_mmu_hv_init(void)
{
unsigned long host_lpid, rsvd_lpid;
if (!cpu_has_feature(CPU_FTR_HVMODE))
return -EINVAL;
memset(lpid_inuse, 0, sizeof(lpid_inuse));
if (cpu_has_feature(CPU_FTR_ARCH_206)) {
host_lpid = mfspr(SPRN_LPID); /* POWER7 */
rsvd_lpid = LPID_RSVD;
} else {
host_lpid = 0; /* PPC970 */
rsvd_lpid = MAX_LPID_970;
}
set_bit(host_lpid, lpid_inuse);
/* rsvd_lpid is reserved for use in partition switching */
set_bit(rsvd_lpid, lpid_inuse);
return 0;
}
void kvmppc_mmu_destroy(struct kvm_vcpu *vcpu)
{
}
static void kvmppc_mmu_book3s_64_hv_reset_msr(struct kvm_vcpu *vcpu)
{
kvmppc_set_msr(vcpu, MSR_SF | MSR_ME);
}
static int kvmppc_mmu_book3s_64_hv_xlate(struct kvm_vcpu *vcpu, gva_t eaddr,
struct kvmppc_pte *gpte, bool data)
{
return -ENOENT;
}
void kvmppc_mmu_book3s_hv_init(struct kvm_vcpu *vcpu)
{
struct kvmppc_mmu *mmu = &vcpu->arch.mmu;
if (cpu_has_feature(CPU_FTR_ARCH_206))
vcpu->arch.slb_nr = 32; /* POWER7 */
else
vcpu->arch.slb_nr = 64;
mmu->xlate = kvmppc_mmu_book3s_64_hv_xlate;
mmu->reset_msr = kvmppc_mmu_book3s_64_hv_reset_msr;
vcpu->arch.hflags |= BOOK3S_HFLAG_SLB;
}

View File

@ -0,0 +1,73 @@
/*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License, version 2, as
* published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
*
* Copyright 2010 Paul Mackerras, IBM Corp. <paulus@au1.ibm.com>
* Copyright 2011 David Gibson, IBM Corporation <dwg@au1.ibm.com>
*/
#include <linux/types.h>
#include <linux/string.h>
#include <linux/kvm.h>
#include <linux/kvm_host.h>
#include <linux/highmem.h>
#include <linux/gfp.h>
#include <linux/slab.h>
#include <linux/hugetlb.h>
#include <linux/list.h>
#include <asm/tlbflush.h>
#include <asm/kvm_ppc.h>
#include <asm/kvm_book3s.h>
#include <asm/mmu-hash64.h>
#include <asm/hvcall.h>
#include <asm/synch.h>
#include <asm/ppc-opcode.h>
#include <asm/kvm_host.h>
#include <asm/udbg.h>
#define TCES_PER_PAGE (PAGE_SIZE / sizeof(u64))
long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
unsigned long ioba, unsigned long tce)
{
struct kvm *kvm = vcpu->kvm;
struct kvmppc_spapr_tce_table *stt;
/* udbg_printf("H_PUT_TCE(): liobn=0x%lx ioba=0x%lx, tce=0x%lx\n", */
/* liobn, ioba, tce); */
list_for_each_entry(stt, &kvm->arch.spapr_tce_tables, list) {
if (stt->liobn == liobn) {
unsigned long idx = ioba >> SPAPR_TCE_SHIFT;
struct page *page;
u64 *tbl;
/* udbg_printf("H_PUT_TCE: liobn 0x%lx => stt=%p window_size=0x%x\n", */
/* liobn, stt, stt->window_size); */
if (ioba >= stt->window_size)
return H_PARAMETER;
page = stt->pages[idx / TCES_PER_PAGE];
tbl = (u64 *)page_address(page);
/* FIXME: Need to validate the TCE itself */
/* udbg_printf("tce @ %p\n", &tbl[idx % TCES_PER_PAGE]); */
tbl[idx % TCES_PER_PAGE] = tce;
return H_SUCCESS;
}
}
/* Didn't find the liobn, punt it to userspace */
return H_TOO_HARD;
}

View File

@ -20,8 +20,11 @@
#include <linux/module.h>
#include <asm/kvm_book3s.h>
EXPORT_SYMBOL_GPL(kvmppc_trampoline_enter);
EXPORT_SYMBOL_GPL(kvmppc_trampoline_lowmem);
#ifdef CONFIG_KVM_BOOK3S_64_HV
EXPORT_SYMBOL_GPL(kvmppc_hv_entry_trampoline);
#else
EXPORT_SYMBOL_GPL(kvmppc_handler_trampoline_enter);
EXPORT_SYMBOL_GPL(kvmppc_handler_lowmem_trampoline);
EXPORT_SYMBOL_GPL(kvmppc_rmcall);
EXPORT_SYMBOL_GPL(kvmppc_load_up_fpu);
#ifdef CONFIG_ALTIVEC
@ -30,3 +33,5 @@ EXPORT_SYMBOL_GPL(kvmppc_load_up_altivec);
#ifdef CONFIG_VSX
EXPORT_SYMBOL_GPL(kvmppc_load_up_vsx);
#endif
#endif

1269
arch/powerpc/kvm/book3s_hv.c Normal file

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,155 @@
/*
* Copyright 2011 Paul Mackerras, IBM Corp. <paulus@au1.ibm.com>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License, version 2, as
* published by the Free Software Foundation.
*/
#include <linux/kvm_host.h>
#include <linux/preempt.h>
#include <linux/sched.h>
#include <linux/spinlock.h>
#include <linux/bootmem.h>
#include <linux/init.h>
#include <asm/cputable.h>
#include <asm/kvm_ppc.h>
#include <asm/kvm_book3s.h>
/*
* This maintains a list of RMAs (real mode areas) for KVM guests to use.
* Each RMA has to be physically contiguous and of a size that the
* hardware supports. PPC970 and POWER7 support 64MB, 128MB and 256MB,
* and other larger sizes. Since we are unlikely to be allocate that
* much physically contiguous memory after the system is up and running,
* we preallocate a set of RMAs in early boot for KVM to use.
*/
static unsigned long kvm_rma_size = 64 << 20; /* 64MB */
static unsigned long kvm_rma_count;
static int __init early_parse_rma_size(char *p)
{
if (!p)
return 1;
kvm_rma_size = memparse(p, &p);
return 0;
}
early_param("kvm_rma_size", early_parse_rma_size);
static int __init early_parse_rma_count(char *p)
{
if (!p)
return 1;
kvm_rma_count = simple_strtoul(p, NULL, 0);
return 0;
}
early_param("kvm_rma_count", early_parse_rma_count);
static struct kvmppc_rma_info *rma_info;
static LIST_HEAD(free_rmas);
static DEFINE_SPINLOCK(rma_lock);
/* Work out RMLS (real mode limit selector) field value for a given RMA size.
Assumes POWER7 or PPC970. */
static inline int lpcr_rmls(unsigned long rma_size)
{
switch (rma_size) {
case 32ul << 20: /* 32 MB */
if (cpu_has_feature(CPU_FTR_ARCH_206))
return 8; /* only supported on POWER7 */
return -1;
case 64ul << 20: /* 64 MB */
return 3;
case 128ul << 20: /* 128 MB */
return 7;
case 256ul << 20: /* 256 MB */
return 4;
case 1ul << 30: /* 1 GB */
return 2;
case 16ul << 30: /* 16 GB */
return 1;
case 256ul << 30: /* 256 GB */
return 0;
default:
return -1;
}
}
/*
* Called at boot time while the bootmem allocator is active,
* to allocate contiguous physical memory for the real memory
* areas for guests.
*/
void kvm_rma_init(void)
{
unsigned long i;
unsigned long j, npages;
void *rma;
struct page *pg;
/* Only do this on PPC970 in HV mode */
if (!cpu_has_feature(CPU_FTR_HVMODE) ||
!cpu_has_feature(CPU_FTR_ARCH_201))
return;
if (!kvm_rma_size || !kvm_rma_count)
return;
/* Check that the requested size is one supported in hardware */
if (lpcr_rmls(kvm_rma_size) < 0) {
pr_err("RMA size of 0x%lx not supported\n", kvm_rma_size);
return;
}
npages = kvm_rma_size >> PAGE_SHIFT;
rma_info = alloc_bootmem(kvm_rma_count * sizeof(struct kvmppc_rma_info));
for (i = 0; i < kvm_rma_count; ++i) {
rma = alloc_bootmem_align(kvm_rma_size, kvm_rma_size);
pr_info("Allocated KVM RMA at %p (%ld MB)\n", rma,
kvm_rma_size >> 20);
rma_info[i].base_virt = rma;
rma_info[i].base_pfn = __pa(rma) >> PAGE_SHIFT;
rma_info[i].npages = npages;
list_add_tail(&rma_info[i].list, &free_rmas);
atomic_set(&rma_info[i].use_count, 0);
pg = pfn_to_page(rma_info[i].base_pfn);
for (j = 0; j < npages; ++j) {
atomic_inc(&pg->_count);
++pg;
}
}
}
struct kvmppc_rma_info *kvm_alloc_rma(void)
{
struct kvmppc_rma_info *ri;
ri = NULL;
spin_lock(&rma_lock);
if (!list_empty(&free_rmas)) {
ri = list_first_entry(&free_rmas, struct kvmppc_rma_info, list);
list_del(&ri->list);
atomic_inc(&ri->use_count);
}
spin_unlock(&rma_lock);
return ri;
}
EXPORT_SYMBOL_GPL(kvm_alloc_rma);
void kvm_release_rma(struct kvmppc_rma_info *ri)
{
if (atomic_dec_and_test(&ri->use_count)) {
spin_lock(&rma_lock);
list_add_tail(&ri->list, &free_rmas);
spin_unlock(&rma_lock);
}
}
EXPORT_SYMBOL_GPL(kvm_release_rma);

View File

@ -0,0 +1,166 @@
/*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License, version 2, as
* published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
*
* Copyright 2011 Paul Mackerras, IBM Corp. <paulus@au1.ibm.com>
*
* Derived from book3s_interrupts.S, which is:
* Copyright SUSE Linux Products GmbH 2009
*
* Authors: Alexander Graf <agraf@suse.de>
*/
#include <asm/ppc_asm.h>
#include <asm/kvm_asm.h>
#include <asm/reg.h>
#include <asm/page.h>
#include <asm/asm-offsets.h>
#include <asm/exception-64s.h>
#include <asm/ppc-opcode.h>
/*****************************************************************************
* *
* Guest entry / exit code that is in kernel module memory (vmalloc) *
* *
****************************************************************************/
/* Registers:
* r4: vcpu pointer
*/
_GLOBAL(__kvmppc_vcore_entry)
/* Write correct stack frame */
mflr r0
std r0,PPC_LR_STKOFF(r1)
/* Save host state to the stack */
stdu r1, -SWITCH_FRAME_SIZE(r1)
/* Save non-volatile registers (r14 - r31) */
SAVE_NVGPRS(r1)
/* Save host DSCR */
BEGIN_FTR_SECTION
mfspr r3, SPRN_DSCR
std r3, HSTATE_DSCR(r13)
END_FTR_SECTION_IFSET(CPU_FTR_ARCH_206)
/* Save host DABR */
mfspr r3, SPRN_DABR
std r3, HSTATE_DABR(r13)
/* Hard-disable interrupts */
mfmsr r10
std r10, HSTATE_HOST_MSR(r13)
rldicl r10,r10,48,1
rotldi r10,r10,16
mtmsrd r10,1
/* Save host PMU registers and load guest PMU registers */
/* R4 is live here (vcpu pointer) but not r3 or r5 */
li r3, 1
sldi r3, r3, 31 /* MMCR0_FC (freeze counters) bit */
mfspr r7, SPRN_MMCR0 /* save MMCR0 */
mtspr SPRN_MMCR0, r3 /* freeze all counters, disable interrupts */
isync
ld r3, PACALPPACAPTR(r13) /* is the host using the PMU? */
lbz r5, LPPACA_PMCINUSE(r3)
cmpwi r5, 0
beq 31f /* skip if not */
mfspr r5, SPRN_MMCR1
mfspr r6, SPRN_MMCRA
std r7, HSTATE_MMCR(r13)
std r5, HSTATE_MMCR + 8(r13)
std r6, HSTATE_MMCR + 16(r13)
mfspr r3, SPRN_PMC1
mfspr r5, SPRN_PMC2
mfspr r6, SPRN_PMC3
mfspr r7, SPRN_PMC4
mfspr r8, SPRN_PMC5
mfspr r9, SPRN_PMC6
BEGIN_FTR_SECTION
mfspr r10, SPRN_PMC7
mfspr r11, SPRN_PMC8
END_FTR_SECTION_IFSET(CPU_FTR_ARCH_201)
stw r3, HSTATE_PMC(r13)
stw r5, HSTATE_PMC + 4(r13)
stw r6, HSTATE_PMC + 8(r13)
stw r7, HSTATE_PMC + 12(r13)
stw r8, HSTATE_PMC + 16(r13)
stw r9, HSTATE_PMC + 20(r13)
BEGIN_FTR_SECTION
stw r10, HSTATE_PMC + 24(r13)
stw r11, HSTATE_PMC + 28(r13)
END_FTR_SECTION_IFSET(CPU_FTR_ARCH_201)
31:
/*
* Put whatever is in the decrementer into the
* hypervisor decrementer.
*/
mfspr r8,SPRN_DEC
mftb r7
mtspr SPRN_HDEC,r8
extsw r8,r8
add r8,r8,r7
std r8,HSTATE_DECEXP(r13)
/*
* On PPC970, if the guest vcpu has an external interrupt pending,
* send ourselves an IPI so as to interrupt the guest once it
* enables interrupts. (It must have interrupts disabled,
* otherwise we would already have delivered the interrupt.)
*/
BEGIN_FTR_SECTION
ld r0, VCPU_PENDING_EXC(r4)
li r7, (1 << BOOK3S_IRQPRIO_EXTERNAL)
oris r7, r7, (1 << BOOK3S_IRQPRIO_EXTERNAL_LEVEL)@h
and. r0, r0, r7
beq 32f
mr r31, r4
lhz r3, PACAPACAINDEX(r13)
bl smp_send_reschedule
nop
mr r4, r31
32:
END_FTR_SECTION_IFSET(CPU_FTR_ARCH_201)
/* Jump to partition switch code */
bl .kvmppc_hv_entry_trampoline
nop
/*
* We return here in virtual mode after the guest exits
* with something that we can't handle in real mode.
* Interrupts are enabled again at this point.
*/
.global kvmppc_handler_highmem
kvmppc_handler_highmem:
/*
* Register usage at this point:
*
* R1 = host R1
* R2 = host R2
* R12 = exit handler id
* R13 = PACA
*/
/* Restore non-volatile host registers (r14 - r31) */
REST_NVGPRS(r1)
addi r1, r1, SWITCH_FRAME_SIZE
ld r0, PPC_LR_STKOFF(r1)
mtlr r0
blr

View File

@ -0,0 +1,370 @@
/*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License, version 2, as
* published by the Free Software Foundation.
*
* Copyright 2010-2011 Paul Mackerras, IBM Corp. <paulus@au1.ibm.com>
*/
#include <linux/types.h>
#include <linux/string.h>
#include <linux/kvm.h>
#include <linux/kvm_host.h>
#include <linux/hugetlb.h>
#include <asm/tlbflush.h>
#include <asm/kvm_ppc.h>
#include <asm/kvm_book3s.h>
#include <asm/mmu-hash64.h>
#include <asm/hvcall.h>
#include <asm/synch.h>
#include <asm/ppc-opcode.h>
/* For now use fixed-size 16MB page table */
#define HPT_ORDER 24
#define HPT_NPTEG (1ul << (HPT_ORDER - 7)) /* 128B per pteg */
#define HPT_HASH_MASK (HPT_NPTEG - 1)
#define HPTE_V_HVLOCK 0x40UL
static inline long lock_hpte(unsigned long *hpte, unsigned long bits)
{
unsigned long tmp, old;
asm volatile(" ldarx %0,0,%2\n"
" and. %1,%0,%3\n"
" bne 2f\n"
" ori %0,%0,%4\n"
" stdcx. %0,0,%2\n"
" beq+ 2f\n"
" li %1,%3\n"
"2: isync"
: "=&r" (tmp), "=&r" (old)
: "r" (hpte), "r" (bits), "i" (HPTE_V_HVLOCK)
: "cc", "memory");
return old == 0;
}
long kvmppc_h_enter(struct kvm_vcpu *vcpu, unsigned long flags,
long pte_index, unsigned long pteh, unsigned long ptel)
{
unsigned long porder;
struct kvm *kvm = vcpu->kvm;
unsigned long i, lpn, pa;
unsigned long *hpte;
/* only handle 4k, 64k and 16M pages for now */
porder = 12;
if (pteh & HPTE_V_LARGE) {
if (cpu_has_feature(CPU_FTR_ARCH_206) &&
(ptel & 0xf000) == 0x1000) {
/* 64k page */
porder = 16;
} else if ((ptel & 0xff000) == 0) {
/* 16M page */
porder = 24;
/* lowest AVA bit must be 0 for 16M pages */
if (pteh & 0x80)
return H_PARAMETER;
} else
return H_PARAMETER;
}
lpn = (ptel & HPTE_R_RPN) >> kvm->arch.ram_porder;
if (lpn >= kvm->arch.ram_npages || porder > kvm->arch.ram_porder)
return H_PARAMETER;
pa = kvm->arch.ram_pginfo[lpn].pfn << PAGE_SHIFT;
if (!pa)
return H_PARAMETER;
/* Check WIMG */
if ((ptel & HPTE_R_WIMG) != HPTE_R_M &&
(ptel & HPTE_R_WIMG) != (HPTE_R_W | HPTE_R_I | HPTE_R_M))
return H_PARAMETER;
pteh &= ~0x60UL;
ptel &= ~(HPTE_R_PP0 - kvm->arch.ram_psize);
ptel |= pa;
if (pte_index >= (HPT_NPTEG << 3))
return H_PARAMETER;
if (likely((flags & H_EXACT) == 0)) {
pte_index &= ~7UL;
hpte = (unsigned long *)(kvm->arch.hpt_virt + (pte_index << 4));
for (i = 0; ; ++i) {
if (i == 8)
return H_PTEG_FULL;
if ((*hpte & HPTE_V_VALID) == 0 &&
lock_hpte(hpte, HPTE_V_HVLOCK | HPTE_V_VALID))
break;
hpte += 2;
}
} else {
i = 0;
hpte = (unsigned long *)(kvm->arch.hpt_virt + (pte_index << 4));
if (!lock_hpte(hpte, HPTE_V_HVLOCK | HPTE_V_VALID))
return H_PTEG_FULL;
}
hpte[1] = ptel;
eieio();
hpte[0] = pteh;
asm volatile("ptesync" : : : "memory");
atomic_inc(&kvm->arch.ram_pginfo[lpn].refcnt);
vcpu->arch.gpr[4] = pte_index + i;
return H_SUCCESS;
}
static unsigned long compute_tlbie_rb(unsigned long v, unsigned long r,
unsigned long pte_index)
{
unsigned long rb, va_low;
rb = (v & ~0x7fUL) << 16; /* AVA field */
va_low = pte_index >> 3;
if (v & HPTE_V_SECONDARY)
va_low = ~va_low;
/* xor vsid from AVA */
if (!(v & HPTE_V_1TB_SEG))
va_low ^= v >> 12;
else
va_low ^= v >> 24;
va_low &= 0x7ff;
if (v & HPTE_V_LARGE) {
rb |= 1; /* L field */
if (cpu_has_feature(CPU_FTR_ARCH_206) &&
(r & 0xff000)) {
/* non-16MB large page, must be 64k */
/* (masks depend on page size) */
rb |= 0x1000; /* page encoding in LP field */
rb |= (va_low & 0x7f) << 16; /* 7b of VA in AVA/LP field */
rb |= (va_low & 0xfe); /* AVAL field (P7 doesn't seem to care) */
}
} else {
/* 4kB page */
rb |= (va_low & 0x7ff) << 12; /* remaining 11b of VA */
}
rb |= (v >> 54) & 0x300; /* B field */
return rb;
}
#define LOCK_TOKEN (*(u32 *)(&get_paca()->lock_token))
static inline int try_lock_tlbie(unsigned int *lock)
{
unsigned int tmp, old;
unsigned int token = LOCK_TOKEN;
asm volatile("1:lwarx %1,0,%2\n"
" cmpwi cr0,%1,0\n"
" bne 2f\n"
" stwcx. %3,0,%2\n"
" bne- 1b\n"
" isync\n"
"2:"
: "=&r" (tmp), "=&r" (old)
: "r" (lock), "r" (token)
: "cc", "memory");
return old == 0;
}
long kvmppc_h_remove(struct kvm_vcpu *vcpu, unsigned long flags,
unsigned long pte_index, unsigned long avpn,
unsigned long va)
{
struct kvm *kvm = vcpu->kvm;
unsigned long *hpte;
unsigned long v, r, rb;
if (pte_index >= (HPT_NPTEG << 3))
return H_PARAMETER;
hpte = (unsigned long *)(kvm->arch.hpt_virt + (pte_index << 4));
while (!lock_hpte(hpte, HPTE_V_HVLOCK))
cpu_relax();
if ((hpte[0] & HPTE_V_VALID) == 0 ||
((flags & H_AVPN) && (hpte[0] & ~0x7fUL) != avpn) ||
((flags & H_ANDCOND) && (hpte[0] & avpn) != 0)) {
hpte[0] &= ~HPTE_V_HVLOCK;
return H_NOT_FOUND;
}
if (atomic_read(&kvm->online_vcpus) == 1)
flags |= H_LOCAL;
vcpu->arch.gpr[4] = v = hpte[0] & ~HPTE_V_HVLOCK;
vcpu->arch.gpr[5] = r = hpte[1];
rb = compute_tlbie_rb(v, r, pte_index);
hpte[0] = 0;
if (!(flags & H_LOCAL)) {
while(!try_lock_tlbie(&kvm->arch.tlbie_lock))
cpu_relax();
asm volatile("ptesync" : : : "memory");
asm volatile(PPC_TLBIE(%1,%0)"; eieio; tlbsync"
: : "r" (rb), "r" (kvm->arch.lpid));
asm volatile("ptesync" : : : "memory");
kvm->arch.tlbie_lock = 0;
} else {
asm volatile("ptesync" : : : "memory");
asm volatile("tlbiel %0" : : "r" (rb));
asm volatile("ptesync" : : : "memory");
}
return H_SUCCESS;
}
long kvmppc_h_bulk_remove(struct kvm_vcpu *vcpu)
{
struct kvm *kvm = vcpu->kvm;
unsigned long *args = &vcpu->arch.gpr[4];
unsigned long *hp, tlbrb[4];
long int i, found;
long int n_inval = 0;
unsigned long flags, req, pte_index;
long int local = 0;
long int ret = H_SUCCESS;
if (atomic_read(&kvm->online_vcpus) == 1)
local = 1;
for (i = 0; i < 4; ++i) {
pte_index = args[i * 2];
flags = pte_index >> 56;
pte_index &= ((1ul << 56) - 1);
req = flags >> 6;
flags &= 3;
if (req == 3)
break;
if (req != 1 || flags == 3 ||
pte_index >= (HPT_NPTEG << 3)) {
/* parameter error */
args[i * 2] = ((0xa0 | flags) << 56) + pte_index;
ret = H_PARAMETER;
break;
}
hp = (unsigned long *)(kvm->arch.hpt_virt + (pte_index << 4));
while (!lock_hpte(hp, HPTE_V_HVLOCK))
cpu_relax();
found = 0;
if (hp[0] & HPTE_V_VALID) {
switch (flags & 3) {
case 0: /* absolute */
found = 1;
break;
case 1: /* andcond */
if (!(hp[0] & args[i * 2 + 1]))
found = 1;
break;
case 2: /* AVPN */
if ((hp[0] & ~0x7fUL) == args[i * 2 + 1])
found = 1;
break;
}
}
if (!found) {
hp[0] &= ~HPTE_V_HVLOCK;
args[i * 2] = ((0x90 | flags) << 56) + pte_index;
continue;
}
/* insert R and C bits from PTE */
flags |= (hp[1] >> 5) & 0x0c;
args[i * 2] = ((0x80 | flags) << 56) + pte_index;
tlbrb[n_inval++] = compute_tlbie_rb(hp[0], hp[1], pte_index);
hp[0] = 0;
}
if (n_inval == 0)
return ret;
if (!local) {
while(!try_lock_tlbie(&kvm->arch.tlbie_lock))
cpu_relax();
asm volatile("ptesync" : : : "memory");
for (i = 0; i < n_inval; ++i)
asm volatile(PPC_TLBIE(%1,%0)
: : "r" (tlbrb[i]), "r" (kvm->arch.lpid));
asm volatile("eieio; tlbsync; ptesync" : : : "memory");
kvm->arch.tlbie_lock = 0;
} else {
asm volatile("ptesync" : : : "memory");
for (i = 0; i < n_inval; ++i)
asm volatile("tlbiel %0" : : "r" (tlbrb[i]));
asm volatile("ptesync" : : : "memory");
}
return ret;
}
long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long flags,
unsigned long pte_index, unsigned long avpn,
unsigned long va)
{
struct kvm *kvm = vcpu->kvm;
unsigned long *hpte;
unsigned long v, r, rb;
if (pte_index >= (HPT_NPTEG << 3))
return H_PARAMETER;
hpte = (unsigned long *)(kvm->arch.hpt_virt + (pte_index << 4));
while (!lock_hpte(hpte, HPTE_V_HVLOCK))
cpu_relax();
if ((hpte[0] & HPTE_V_VALID) == 0 ||
((flags & H_AVPN) && (hpte[0] & ~0x7fUL) != avpn)) {
hpte[0] &= ~HPTE_V_HVLOCK;
return H_NOT_FOUND;
}
if (atomic_read(&kvm->online_vcpus) == 1)
flags |= H_LOCAL;
v = hpte[0];
r = hpte[1] & ~(HPTE_R_PP0 | HPTE_R_PP | HPTE_R_N |
HPTE_R_KEY_HI | HPTE_R_KEY_LO);
r |= (flags << 55) & HPTE_R_PP0;
r |= (flags << 48) & HPTE_R_KEY_HI;
r |= flags & (HPTE_R_PP | HPTE_R_N | HPTE_R_KEY_LO);
rb = compute_tlbie_rb(v, r, pte_index);
hpte[0] = v & ~HPTE_V_VALID;
if (!(flags & H_LOCAL)) {
while(!try_lock_tlbie(&kvm->arch.tlbie_lock))
cpu_relax();
asm volatile("ptesync" : : : "memory");
asm volatile(PPC_TLBIE(%1,%0)"; eieio; tlbsync"
: : "r" (rb), "r" (kvm->arch.lpid));
asm volatile("ptesync" : : : "memory");
kvm->arch.tlbie_lock = 0;
} else {
asm volatile("ptesync" : : : "memory");
asm volatile("tlbiel %0" : : "r" (rb));
asm volatile("ptesync" : : : "memory");
}
hpte[1] = r;
eieio();
hpte[0] = v & ~HPTE_V_HVLOCK;
asm volatile("ptesync" : : : "memory");
return H_SUCCESS;
}
static unsigned long reverse_xlate(struct kvm *kvm, unsigned long realaddr)
{
long int i;
unsigned long offset, rpn;
offset = realaddr & (kvm->arch.ram_psize - 1);
rpn = (realaddr - offset) >> PAGE_SHIFT;
for (i = 0; i < kvm->arch.ram_npages; ++i)
if (rpn == kvm->arch.ram_pginfo[i].pfn)
return (i << PAGE_SHIFT) + offset;
return HPTE_R_RPN; /* all 1s in the RPN field */
}
long kvmppc_h_read(struct kvm_vcpu *vcpu, unsigned long flags,
unsigned long pte_index)
{
struct kvm *kvm = vcpu->kvm;
unsigned long *hpte, r;
int i, n = 1;
if (pte_index >= (HPT_NPTEG << 3))
return H_PARAMETER;
if (flags & H_READ_4) {
pte_index &= ~3;
n = 4;
}
for (i = 0; i < n; ++i, ++pte_index) {
hpte = (unsigned long *)(kvm->arch.hpt_virt + (pte_index << 4));
r = hpte[1];
if ((flags & H_R_XLATE) && (hpte[0] & HPTE_V_VALID))
r = reverse_xlate(kvm, r & HPTE_R_RPN) |
(r & ~HPTE_R_RPN);
vcpu->arch.gpr[4 + i * 2] = hpte[0];
vcpu->arch.gpr[5 + i * 2] = r;
}
return H_SUCCESS;
}

File diff suppressed because it is too large Load Diff

View File

@ -29,8 +29,7 @@
#define ULONG_SIZE 8
#define FUNC(name) GLUE(.,name)
#define GET_SHADOW_VCPU(reg) \
addi reg, r13, PACA_KVM_SVCPU
#define GET_SHADOW_VCPU_R13
#define DISABLE_INTERRUPTS \
mfmsr r0; \
@ -43,8 +42,8 @@
#define ULONG_SIZE 4
#define FUNC(name) name
#define GET_SHADOW_VCPU(reg) \
lwz reg, (THREAD + THREAD_KVM_SVCPU)(r2)
#define GET_SHADOW_VCPU_R13 \
lwz r13, (THREAD + THREAD_KVM_SVCPU)(r2)
#define DISABLE_INTERRUPTS \
mfmsr r0; \
@ -85,7 +84,7 @@
* r3: kvm_run pointer
* r4: vcpu pointer
*/
_GLOBAL(__kvmppc_vcpu_entry)
_GLOBAL(__kvmppc_vcpu_run)
kvm_start_entry:
/* Write correct stack frame */
@ -107,18 +106,12 @@ kvm_start_entry:
/* Load non-volatile guest state from the vcpu */
VCPU_LOAD_NVGPRS(r4)
GET_SHADOW_VCPU(r5)
/* Save R1/R2 in the PACA */
PPC_STL r1, SVCPU_HOST_R1(r5)
PPC_STL r2, SVCPU_HOST_R2(r5)
/* XXX swap in/out on load? */
PPC_LL r3, VCPU_HIGHMEM_HANDLER(r4)
PPC_STL r3, SVCPU_VMHANDLER(r5)
kvm_start_lightweight:
GET_SHADOW_VCPU_R13
PPC_LL r3, VCPU_HIGHMEM_HANDLER(r4)
PPC_STL r3, HSTATE_VMHANDLER(r13)
PPC_LL r10, VCPU_SHADOW_MSR(r4) /* r10 = vcpu->arch.shadow_msr */
DISABLE_INTERRUPTS

View File

@ -21,7 +21,6 @@
#include <linux/kvm_host.h>
#include <linux/hash.h>
#include <linux/slab.h>
#include "trace.h"
#include <asm/kvm_ppc.h>
#include <asm/kvm_book3s.h>
@ -29,6 +28,8 @@
#include <asm/mmu_context.h>
#include <asm/hw_irq.h>
#include "trace.h"
#define PTE_SIZE 12
static struct kmem_cache *hpte_cache;
@ -58,30 +59,31 @@ static inline u64 kvmppc_mmu_hash_vpte_long(u64 vpage)
void kvmppc_mmu_hpte_cache_map(struct kvm_vcpu *vcpu, struct hpte_cache *pte)
{
u64 index;
struct kvmppc_vcpu_book3s *vcpu3s = to_book3s(vcpu);
trace_kvm_book3s_mmu_map(pte);
spin_lock(&vcpu->arch.mmu_lock);
spin_lock(&vcpu3s->mmu_lock);
/* Add to ePTE list */
index = kvmppc_mmu_hash_pte(pte->pte.eaddr);
hlist_add_head_rcu(&pte->list_pte, &vcpu->arch.hpte_hash_pte[index]);
hlist_add_head_rcu(&pte->list_pte, &vcpu3s->hpte_hash_pte[index]);
/* Add to ePTE_long list */
index = kvmppc_mmu_hash_pte_long(pte->pte.eaddr);
hlist_add_head_rcu(&pte->list_pte_long,
&vcpu->arch.hpte_hash_pte_long[index]);
&vcpu3s->hpte_hash_pte_long[index]);
/* Add to vPTE list */
index = kvmppc_mmu_hash_vpte(pte->pte.vpage);
hlist_add_head_rcu(&pte->list_vpte, &vcpu->arch.hpte_hash_vpte[index]);
hlist_add_head_rcu(&pte->list_vpte, &vcpu3s->hpte_hash_vpte[index]);
/* Add to vPTE_long list */
index = kvmppc_mmu_hash_vpte_long(pte->pte.vpage);
hlist_add_head_rcu(&pte->list_vpte_long,
&vcpu->arch.hpte_hash_vpte_long[index]);
&vcpu3s->hpte_hash_vpte_long[index]);
spin_unlock(&vcpu->arch.mmu_lock);
spin_unlock(&vcpu3s->mmu_lock);
}
static void free_pte_rcu(struct rcu_head *head)
@ -92,16 +94,18 @@ static void free_pte_rcu(struct rcu_head *head)
static void invalidate_pte(struct kvm_vcpu *vcpu, struct hpte_cache *pte)
{
struct kvmppc_vcpu_book3s *vcpu3s = to_book3s(vcpu);
trace_kvm_book3s_mmu_invalidate(pte);
/* Different for 32 and 64 bit */
kvmppc_mmu_invalidate_pte(vcpu, pte);
spin_lock(&vcpu->arch.mmu_lock);
spin_lock(&vcpu3s->mmu_lock);
/* pte already invalidated in between? */
if (hlist_unhashed(&pte->list_pte)) {
spin_unlock(&vcpu->arch.mmu_lock);
spin_unlock(&vcpu3s->mmu_lock);
return;
}
@ -115,14 +119,15 @@ static void invalidate_pte(struct kvm_vcpu *vcpu, struct hpte_cache *pte)
else
kvm_release_pfn_clean(pte->pfn);
spin_unlock(&vcpu->arch.mmu_lock);
spin_unlock(&vcpu3s->mmu_lock);
vcpu->arch.hpte_cache_count--;
vcpu3s->hpte_cache_count--;
call_rcu(&pte->rcu_head, free_pte_rcu);
}
static void kvmppc_mmu_pte_flush_all(struct kvm_vcpu *vcpu)
{
struct kvmppc_vcpu_book3s *vcpu3s = to_book3s(vcpu);
struct hpte_cache *pte;
struct hlist_node *node;
int i;
@ -130,7 +135,7 @@ static void kvmppc_mmu_pte_flush_all(struct kvm_vcpu *vcpu)
rcu_read_lock();
for (i = 0; i < HPTEG_HASH_NUM_VPTE_LONG; i++) {
struct hlist_head *list = &vcpu->arch.hpte_hash_vpte_long[i];
struct hlist_head *list = &vcpu3s->hpte_hash_vpte_long[i];
hlist_for_each_entry_rcu(pte, node, list, list_vpte_long)
invalidate_pte(vcpu, pte);
@ -141,12 +146,13 @@ static void kvmppc_mmu_pte_flush_all(struct kvm_vcpu *vcpu)
static void kvmppc_mmu_pte_flush_page(struct kvm_vcpu *vcpu, ulong guest_ea)
{
struct kvmppc_vcpu_book3s *vcpu3s = to_book3s(vcpu);
struct hlist_head *list;
struct hlist_node *node;
struct hpte_cache *pte;
/* Find the list of entries in the map */
list = &vcpu->arch.hpte_hash_pte[kvmppc_mmu_hash_pte(guest_ea)];
list = &vcpu3s->hpte_hash_pte[kvmppc_mmu_hash_pte(guest_ea)];
rcu_read_lock();
@ -160,12 +166,13 @@ static void kvmppc_mmu_pte_flush_page(struct kvm_vcpu *vcpu, ulong guest_ea)
static void kvmppc_mmu_pte_flush_long(struct kvm_vcpu *vcpu, ulong guest_ea)
{
struct kvmppc_vcpu_book3s *vcpu3s = to_book3s(vcpu);
struct hlist_head *list;
struct hlist_node *node;
struct hpte_cache *pte;
/* Find the list of entries in the map */
list = &vcpu->arch.hpte_hash_pte_long[
list = &vcpu3s->hpte_hash_pte_long[
kvmppc_mmu_hash_pte_long(guest_ea)];
rcu_read_lock();
@ -203,12 +210,13 @@ void kvmppc_mmu_pte_flush(struct kvm_vcpu *vcpu, ulong guest_ea, ulong ea_mask)
/* Flush with mask 0xfffffffff */
static void kvmppc_mmu_pte_vflush_short(struct kvm_vcpu *vcpu, u64 guest_vp)
{
struct kvmppc_vcpu_book3s *vcpu3s = to_book3s(vcpu);
struct hlist_head *list;
struct hlist_node *node;
struct hpte_cache *pte;
u64 vp_mask = 0xfffffffffULL;
list = &vcpu->arch.hpte_hash_vpte[kvmppc_mmu_hash_vpte(guest_vp)];
list = &vcpu3s->hpte_hash_vpte[kvmppc_mmu_hash_vpte(guest_vp)];
rcu_read_lock();
@ -223,12 +231,13 @@ static void kvmppc_mmu_pte_vflush_short(struct kvm_vcpu *vcpu, u64 guest_vp)
/* Flush with mask 0xffffff000 */
static void kvmppc_mmu_pte_vflush_long(struct kvm_vcpu *vcpu, u64 guest_vp)
{
struct kvmppc_vcpu_book3s *vcpu3s = to_book3s(vcpu);
struct hlist_head *list;
struct hlist_node *node;
struct hpte_cache *pte;
u64 vp_mask = 0xffffff000ULL;
list = &vcpu->arch.hpte_hash_vpte_long[
list = &vcpu3s->hpte_hash_vpte_long[
kvmppc_mmu_hash_vpte_long(guest_vp)];
rcu_read_lock();
@ -261,6 +270,7 @@ void kvmppc_mmu_pte_vflush(struct kvm_vcpu *vcpu, u64 guest_vp, u64 vp_mask)
void kvmppc_mmu_pte_pflush(struct kvm_vcpu *vcpu, ulong pa_start, ulong pa_end)
{
struct kvmppc_vcpu_book3s *vcpu3s = to_book3s(vcpu);
struct hlist_node *node;
struct hpte_cache *pte;
int i;
@ -270,7 +280,7 @@ void kvmppc_mmu_pte_pflush(struct kvm_vcpu *vcpu, ulong pa_start, ulong pa_end)
rcu_read_lock();
for (i = 0; i < HPTEG_HASH_NUM_VPTE_LONG; i++) {
struct hlist_head *list = &vcpu->arch.hpte_hash_vpte_long[i];
struct hlist_head *list = &vcpu3s->hpte_hash_vpte_long[i];
hlist_for_each_entry_rcu(pte, node, list, list_vpte_long)
if ((pte->pte.raddr >= pa_start) &&
@ -283,12 +293,13 @@ void kvmppc_mmu_pte_pflush(struct kvm_vcpu *vcpu, ulong pa_start, ulong pa_end)
struct hpte_cache *kvmppc_mmu_hpte_cache_next(struct kvm_vcpu *vcpu)
{
struct kvmppc_vcpu_book3s *vcpu3s = to_book3s(vcpu);
struct hpte_cache *pte;
pte = kmem_cache_zalloc(hpte_cache, GFP_KERNEL);
vcpu->arch.hpte_cache_count++;
vcpu3s->hpte_cache_count++;
if (vcpu->arch.hpte_cache_count == HPTEG_CACHE_NUM)
if (vcpu3s->hpte_cache_count == HPTEG_CACHE_NUM)
kvmppc_mmu_pte_flush_all(vcpu);
return pte;
@ -309,17 +320,19 @@ static void kvmppc_mmu_hpte_init_hash(struct hlist_head *hash_list, int len)
int kvmppc_mmu_hpte_init(struct kvm_vcpu *vcpu)
{
/* init hpte lookup hashes */
kvmppc_mmu_hpte_init_hash(vcpu->arch.hpte_hash_pte,
ARRAY_SIZE(vcpu->arch.hpte_hash_pte));
kvmppc_mmu_hpte_init_hash(vcpu->arch.hpte_hash_pte_long,
ARRAY_SIZE(vcpu->arch.hpte_hash_pte_long));
kvmppc_mmu_hpte_init_hash(vcpu->arch.hpte_hash_vpte,
ARRAY_SIZE(vcpu->arch.hpte_hash_vpte));
kvmppc_mmu_hpte_init_hash(vcpu->arch.hpte_hash_vpte_long,
ARRAY_SIZE(vcpu->arch.hpte_hash_vpte_long));
struct kvmppc_vcpu_book3s *vcpu3s = to_book3s(vcpu);
spin_lock_init(&vcpu->arch.mmu_lock);
/* init hpte lookup hashes */
kvmppc_mmu_hpte_init_hash(vcpu3s->hpte_hash_pte,
ARRAY_SIZE(vcpu3s->hpte_hash_pte));
kvmppc_mmu_hpte_init_hash(vcpu3s->hpte_hash_pte_long,
ARRAY_SIZE(vcpu3s->hpte_hash_pte_long));
kvmppc_mmu_hpte_init_hash(vcpu3s->hpte_hash_vpte,
ARRAY_SIZE(vcpu3s->hpte_hash_vpte));
kvmppc_mmu_hpte_init_hash(vcpu3s->hpte_hash_vpte_long,
ARRAY_SIZE(vcpu3s->hpte_hash_vpte_long));
spin_lock_init(&vcpu3s->mmu_lock);
return 0;
}

1029
arch/powerpc/kvm/book3s_pr.c Normal file

File diff suppressed because it is too large Load Diff

View File

@ -36,41 +36,44 @@
#if defined(CONFIG_PPC_BOOK3S_64)
#define LOAD_SHADOW_VCPU(reg) GET_PACA(reg)
#define SHADOW_VCPU_OFF PACA_KVM_SVCPU
#define MSR_NOIRQ MSR_KERNEL & ~(MSR_IR | MSR_DR)
#define FUNC(name) GLUE(.,name)
kvmppc_skip_interrupt:
/*
* Here all GPRs are unchanged from when the interrupt happened
* except for r13, which is saved in SPRG_SCRATCH0.
*/
mfspr r13, SPRN_SRR0
addi r13, r13, 4
mtspr SPRN_SRR0, r13
GET_SCRATCH0(r13)
rfid
b .
kvmppc_skip_Hinterrupt:
/*
* Here all GPRs are unchanged from when the interrupt happened
* except for r13, which is saved in SPRG_SCRATCH0.
*/
mfspr r13, SPRN_HSRR0
addi r13, r13, 4
mtspr SPRN_HSRR0, r13
GET_SCRATCH0(r13)
hrfid
b .
#elif defined(CONFIG_PPC_BOOK3S_32)
#define LOAD_SHADOW_VCPU(reg) \
mfspr reg, SPRN_SPRG_THREAD; \
lwz reg, THREAD_KVM_SVCPU(reg); \
/* PPC32 can have a NULL pointer - let's check for that */ \
mtspr SPRN_SPRG_SCRATCH1, r12; /* Save r12 */ \
mfcr r12; \
cmpwi reg, 0; \
bne 1f; \
mfspr reg, SPRN_SPRG_SCRATCH0; \
mtcr r12; \
mfspr r12, SPRN_SPRG_SCRATCH1; \
b kvmppc_resume_\intno; \
1:; \
mtcr r12; \
mfspr r12, SPRN_SPRG_SCRATCH1; \
tophys(reg, reg)
#define SHADOW_VCPU_OFF 0
#define MSR_NOIRQ MSR_KERNEL
#define FUNC(name) name
#endif
.macro INTERRUPT_TRAMPOLINE intno
.global kvmppc_trampoline_\intno
kvmppc_trampoline_\intno:
SET_SCRATCH0(r13) /* Save r13 */
mtspr SPRN_SPRG_SCRATCH0, r13 /* Save r13 */
/*
* First thing to do is to find out if we're coming
@ -78,19 +81,28 @@ kvmppc_trampoline_\intno:
*
* To distinguish, we check a magic byte in the PACA/current
*/
LOAD_SHADOW_VCPU(r13)
PPC_STL r12, (SHADOW_VCPU_OFF + SVCPU_SCRATCH0)(r13)
mfspr r13, SPRN_SPRG_THREAD
lwz r13, THREAD_KVM_SVCPU(r13)
/* PPC32 can have a NULL pointer - let's check for that */
mtspr SPRN_SPRG_SCRATCH1, r12 /* Save r12 */
mfcr r12
stw r12, (SHADOW_VCPU_OFF + SVCPU_SCRATCH1)(r13)
lbz r12, (SHADOW_VCPU_OFF + SVCPU_IN_GUEST)(r13)
cmpwi r13, 0
bne 1f
2: mtcr r12
mfspr r12, SPRN_SPRG_SCRATCH1
mfspr r13, SPRN_SPRG_SCRATCH0 /* r13 = original r13 */
b kvmppc_resume_\intno /* Get back original handler */
1: tophys(r13, r13)
stw r12, HSTATE_SCRATCH1(r13)
mfspr r12, SPRN_SPRG_SCRATCH1
stw r12, HSTATE_SCRATCH0(r13)
lbz r12, HSTATE_IN_GUEST(r13)
cmpwi r12, KVM_GUEST_MODE_NONE
bne ..kvmppc_handler_hasmagic_\intno
/* No KVM guest? Then jump back to the Linux handler! */
lwz r12, (SHADOW_VCPU_OFF + SVCPU_SCRATCH1)(r13)
mtcr r12
PPC_LL r12, (SHADOW_VCPU_OFF + SVCPU_SCRATCH0)(r13)
GET_SCRATCH0(r13) /* r13 = original r13 */
b kvmppc_resume_\intno /* Get back original handler */
lwz r12, HSTATE_SCRATCH1(r13)
b 2b
/* Now we know we're handling a KVM guest */
..kvmppc_handler_hasmagic_\intno:
@ -112,9 +124,6 @@ INTERRUPT_TRAMPOLINE BOOK3S_INTERRUPT_MACHINE_CHECK
INTERRUPT_TRAMPOLINE BOOK3S_INTERRUPT_DATA_STORAGE
INTERRUPT_TRAMPOLINE BOOK3S_INTERRUPT_INST_STORAGE
INTERRUPT_TRAMPOLINE BOOK3S_INTERRUPT_EXTERNAL
#ifdef CONFIG_PPC_BOOK3S_64
INTERRUPT_TRAMPOLINE BOOK3S_INTERRUPT_EXTERNAL_HV
#endif
INTERRUPT_TRAMPOLINE BOOK3S_INTERRUPT_ALIGNMENT
INTERRUPT_TRAMPOLINE BOOK3S_INTERRUPT_PROGRAM
INTERRUPT_TRAMPOLINE BOOK3S_INTERRUPT_FP_UNAVAIL
@ -124,14 +133,6 @@ INTERRUPT_TRAMPOLINE BOOK3S_INTERRUPT_TRACE
INTERRUPT_TRAMPOLINE BOOK3S_INTERRUPT_PERFMON
INTERRUPT_TRAMPOLINE BOOK3S_INTERRUPT_ALTIVEC
/* Those are only available on 64 bit machines */
#ifdef CONFIG_PPC_BOOK3S_64
INTERRUPT_TRAMPOLINE BOOK3S_INTERRUPT_DATA_SEGMENT
INTERRUPT_TRAMPOLINE BOOK3S_INTERRUPT_INST_SEGMENT
INTERRUPT_TRAMPOLINE BOOK3S_INTERRUPT_VSX
#endif
/*
* Bring us back to the faulting code, but skip the
* faulting instruction.
@ -143,8 +144,8 @@ INTERRUPT_TRAMPOLINE BOOK3S_INTERRUPT_VSX
*
* R12 = free
* R13 = Shadow VCPU (PACA)
* SVCPU.SCRATCH0 = guest R12
* SVCPU.SCRATCH1 = guest CR
* HSTATE.SCRATCH0 = guest R12
* HSTATE.SCRATCH1 = guest CR
* SPRG_SCRATCH0 = guest R13
*
*/
@ -156,13 +157,14 @@ kvmppc_handler_skip_ins:
mtsrr0 r12
/* Clean up all state */
lwz r12, (SHADOW_VCPU_OFF + SVCPU_SCRATCH1)(r13)
lwz r12, HSTATE_SCRATCH1(r13)
mtcr r12
PPC_LL r12, (SHADOW_VCPU_OFF + SVCPU_SCRATCH0)(r13)
PPC_LL r12, HSTATE_SCRATCH0(r13)
GET_SCRATCH0(r13)
/* And get back into the code */
RFI
#endif
/*
* This trampoline brings us back to a real mode handler
@ -251,12 +253,4 @@ define_load_up(altivec)
define_load_up(vsx)
#endif
.global kvmppc_trampoline_lowmem
kvmppc_trampoline_lowmem:
PPC_LONG kvmppc_handler_lowmem_trampoline - CONFIG_KERNEL_START
.global kvmppc_trampoline_enter
kvmppc_trampoline_enter:
PPC_LONG kvmppc_handler_trampoline_enter - CONFIG_KERNEL_START
#include "book3s_segment.S"

View File

@ -22,7 +22,7 @@
#if defined(CONFIG_PPC_BOOK3S_64)
#define GET_SHADOW_VCPU(reg) \
addi reg, r13, PACA_KVM_SVCPU
mr reg, r13
#elif defined(CONFIG_PPC_BOOK3S_32)
@ -71,6 +71,10 @@ kvmppc_handler_trampoline_enter:
/* r3 = shadow vcpu */
GET_SHADOW_VCPU(r3)
/* Save R1/R2 in the PACA (64-bit) or shadow_vcpu (32-bit) */
PPC_STL r1, HSTATE_HOST_R1(r3)
PPC_STL r2, HSTATE_HOST_R2(r3)
/* Move SRR0 and SRR1 into the respective regs */
PPC_LL r9, SVCPU_PC(r3)
mtsrr0 r9
@ -78,36 +82,36 @@ kvmppc_handler_trampoline_enter:
/* Activate guest mode, so faults get handled by KVM */
li r11, KVM_GUEST_MODE_GUEST
stb r11, SVCPU_IN_GUEST(r3)
stb r11, HSTATE_IN_GUEST(r3)
/* Switch to guest segment. This is subarch specific. */
LOAD_GUEST_SEGMENTS
/* Enter guest */
PPC_LL r4, (SVCPU_CTR)(r3)
PPC_LL r5, (SVCPU_LR)(r3)
lwz r6, (SVCPU_CR)(r3)
lwz r7, (SVCPU_XER)(r3)
PPC_LL r4, SVCPU_CTR(r3)
PPC_LL r5, SVCPU_LR(r3)
lwz r6, SVCPU_CR(r3)
lwz r7, SVCPU_XER(r3)
mtctr r4
mtlr r5
mtcr r6
mtxer r7
PPC_LL r0, (SVCPU_R0)(r3)
PPC_LL r1, (SVCPU_R1)(r3)
PPC_LL r2, (SVCPU_R2)(r3)
PPC_LL r4, (SVCPU_R4)(r3)
PPC_LL r5, (SVCPU_R5)(r3)
PPC_LL r6, (SVCPU_R6)(r3)
PPC_LL r7, (SVCPU_R7)(r3)
PPC_LL r8, (SVCPU_R8)(r3)
PPC_LL r9, (SVCPU_R9)(r3)
PPC_LL r10, (SVCPU_R10)(r3)
PPC_LL r11, (SVCPU_R11)(r3)
PPC_LL r12, (SVCPU_R12)(r3)
PPC_LL r13, (SVCPU_R13)(r3)
PPC_LL r0, SVCPU_R0(r3)
PPC_LL r1, SVCPU_R1(r3)
PPC_LL r2, SVCPU_R2(r3)
PPC_LL r4, SVCPU_R4(r3)
PPC_LL r5, SVCPU_R5(r3)
PPC_LL r6, SVCPU_R6(r3)
PPC_LL r7, SVCPU_R7(r3)
PPC_LL r8, SVCPU_R8(r3)
PPC_LL r9, SVCPU_R9(r3)
PPC_LL r10, SVCPU_R10(r3)
PPC_LL r11, SVCPU_R11(r3)
PPC_LL r12, SVCPU_R12(r3)
PPC_LL r13, SVCPU_R13(r3)
PPC_LL r3, (SVCPU_R3)(r3)
@ -125,56 +129,63 @@ kvmppc_handler_trampoline_enter_end:
.global kvmppc_handler_trampoline_exit
kvmppc_handler_trampoline_exit:
.global kvmppc_interrupt
kvmppc_interrupt:
/* Register usage at this point:
*
* SPRG_SCRATCH0 = guest R13
* R12 = exit handler id
* R13 = shadow vcpu - SHADOW_VCPU_OFF [=PACA on PPC64]
* SVCPU.SCRATCH0 = guest R12
* SVCPU.SCRATCH1 = guest CR
* R13 = shadow vcpu (32-bit) or PACA (64-bit)
* HSTATE.SCRATCH0 = guest R12
* HSTATE.SCRATCH1 = guest CR
*
*/
/* Save registers */
PPC_STL r0, (SHADOW_VCPU_OFF + SVCPU_R0)(r13)
PPC_STL r1, (SHADOW_VCPU_OFF + SVCPU_R1)(r13)
PPC_STL r2, (SHADOW_VCPU_OFF + SVCPU_R2)(r13)
PPC_STL r3, (SHADOW_VCPU_OFF + SVCPU_R3)(r13)
PPC_STL r4, (SHADOW_VCPU_OFF + SVCPU_R4)(r13)
PPC_STL r5, (SHADOW_VCPU_OFF + SVCPU_R5)(r13)
PPC_STL r6, (SHADOW_VCPU_OFF + SVCPU_R6)(r13)
PPC_STL r7, (SHADOW_VCPU_OFF + SVCPU_R7)(r13)
PPC_STL r8, (SHADOW_VCPU_OFF + SVCPU_R8)(r13)
PPC_STL r9, (SHADOW_VCPU_OFF + SVCPU_R9)(r13)
PPC_STL r10, (SHADOW_VCPU_OFF + SVCPU_R10)(r13)
PPC_STL r11, (SHADOW_VCPU_OFF + SVCPU_R11)(r13)
PPC_STL r0, SVCPU_R0(r13)
PPC_STL r1, SVCPU_R1(r13)
PPC_STL r2, SVCPU_R2(r13)
PPC_STL r3, SVCPU_R3(r13)
PPC_STL r4, SVCPU_R4(r13)
PPC_STL r5, SVCPU_R5(r13)
PPC_STL r6, SVCPU_R6(r13)
PPC_STL r7, SVCPU_R7(r13)
PPC_STL r8, SVCPU_R8(r13)
PPC_STL r9, SVCPU_R9(r13)
PPC_STL r10, SVCPU_R10(r13)
PPC_STL r11, SVCPU_R11(r13)
/* Restore R1/R2 so we can handle faults */
PPC_LL r1, (SHADOW_VCPU_OFF + SVCPU_HOST_R1)(r13)
PPC_LL r2, (SHADOW_VCPU_OFF + SVCPU_HOST_R2)(r13)
PPC_LL r1, HSTATE_HOST_R1(r13)
PPC_LL r2, HSTATE_HOST_R2(r13)
/* Save guest PC and MSR */
#ifdef CONFIG_PPC64
BEGIN_FTR_SECTION
andi. r0,r12,0x2
beq 1f
mfspr r3,SPRN_HSRR0
mfspr r4,SPRN_HSRR1
andi. r12,r12,0x3ffd
b 2f
END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
#endif
1: mfsrr0 r3
mfsrr1 r4
2:
PPC_STL r3, (SHADOW_VCPU_OFF + SVCPU_PC)(r13)
PPC_STL r4, (SHADOW_VCPU_OFF + SVCPU_SHADOW_SRR1)(r13)
PPC_STL r3, SVCPU_PC(r13)
PPC_STL r4, SVCPU_SHADOW_SRR1(r13)
/* Get scratch'ed off registers */
GET_SCRATCH0(r9)
PPC_LL r8, (SHADOW_VCPU_OFF + SVCPU_SCRATCH0)(r13)
lwz r7, (SHADOW_VCPU_OFF + SVCPU_SCRATCH1)(r13)
PPC_LL r8, HSTATE_SCRATCH0(r13)
lwz r7, HSTATE_SCRATCH1(r13)
PPC_STL r9, (SHADOW_VCPU_OFF + SVCPU_R13)(r13)
PPC_STL r8, (SHADOW_VCPU_OFF + SVCPU_R12)(r13)
stw r7, (SHADOW_VCPU_OFF + SVCPU_CR)(r13)
PPC_STL r9, SVCPU_R13(r13)
PPC_STL r8, SVCPU_R12(r13)
stw r7, SVCPU_CR(r13)
/* Save more register state */
@ -184,11 +195,11 @@ kvmppc_handler_trampoline_exit:
mfctr r8
mflr r9
stw r5, (SHADOW_VCPU_OFF + SVCPU_XER)(r13)
PPC_STL r6, (SHADOW_VCPU_OFF + SVCPU_FAULT_DAR)(r13)
stw r7, (SHADOW_VCPU_OFF + SVCPU_FAULT_DSISR)(r13)
PPC_STL r8, (SHADOW_VCPU_OFF + SVCPU_CTR)(r13)
PPC_STL r9, (SHADOW_VCPU_OFF + SVCPU_LR)(r13)
stw r5, SVCPU_XER(r13)
PPC_STL r6, SVCPU_FAULT_DAR(r13)
stw r7, SVCPU_FAULT_DSISR(r13)
PPC_STL r8, SVCPU_CTR(r13)
PPC_STL r9, SVCPU_LR(r13)
/*
* In order for us to easily get the last instruction,
@ -218,7 +229,7 @@ ld_last_inst:
/* Set guest mode to 'jump over instruction' so if lwz faults
* we'll just continue at the next IP. */
li r9, KVM_GUEST_MODE_SKIP
stb r9, (SHADOW_VCPU_OFF + SVCPU_IN_GUEST)(r13)
stb r9, HSTATE_IN_GUEST(r13)
/* 1) enable paging for data */
mfmsr r9
@ -232,13 +243,13 @@ ld_last_inst:
sync
#endif
stw r0, (SHADOW_VCPU_OFF + SVCPU_LAST_INST)(r13)
stw r0, SVCPU_LAST_INST(r13)
no_ld_last_inst:
/* Unset guest mode */
li r9, KVM_GUEST_MODE_NONE
stb r9, (SHADOW_VCPU_OFF + SVCPU_IN_GUEST)(r13)
stb r9, HSTATE_IN_GUEST(r13)
/* Switch back to host MMU */
LOAD_HOST_SEGMENTS
@ -248,7 +259,7 @@ no_ld_last_inst:
* R1 = host R1
* R2 = host R2
* R12 = exit handler id
* R13 = shadow vcpu - SHADOW_VCPU_OFF [=PACA on PPC64]
* R13 = shadow vcpu (32-bit) or PACA (64-bit)
* SVCPU.* = guest *
*
*/
@ -258,7 +269,7 @@ no_ld_last_inst:
ori r7, r7, MSR_IR|MSR_DR|MSR_RI|MSR_ME /* Enable paging */
mtsrr1 r7
/* Load highmem handler address */
PPC_LL r8, (SHADOW_VCPU_OFF + SVCPU_VMHANDLER)(r13)
PPC_LL r8, HSTATE_VMHANDLER(r13)
mtsrr0 r8
RFI

View File

@ -13,6 +13,7 @@
* Foundation, 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
*
* Copyright IBM Corp. 2007
* Copyright 2010-2011 Freescale Semiconductor, Inc.
*
* Authors: Hollis Blanchard <hollisb@us.ibm.com>
* Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com>
@ -78,6 +79,60 @@ void kvmppc_dump_vcpu(struct kvm_vcpu *vcpu)
}
}
#ifdef CONFIG_SPE
void kvmppc_vcpu_disable_spe(struct kvm_vcpu *vcpu)
{
preempt_disable();
enable_kernel_spe();
kvmppc_save_guest_spe(vcpu);
vcpu->arch.shadow_msr &= ~MSR_SPE;
preempt_enable();
}
static void kvmppc_vcpu_enable_spe(struct kvm_vcpu *vcpu)
{
preempt_disable();
enable_kernel_spe();
kvmppc_load_guest_spe(vcpu);
vcpu->arch.shadow_msr |= MSR_SPE;
preempt_enable();
}
static void kvmppc_vcpu_sync_spe(struct kvm_vcpu *vcpu)
{
if (vcpu->arch.shared->msr & MSR_SPE) {
if (!(vcpu->arch.shadow_msr & MSR_SPE))
kvmppc_vcpu_enable_spe(vcpu);
} else if (vcpu->arch.shadow_msr & MSR_SPE) {
kvmppc_vcpu_disable_spe(vcpu);
}
}
#else
static void kvmppc_vcpu_sync_spe(struct kvm_vcpu *vcpu)
{
}
#endif
/*
* Helper function for "full" MSR writes. No need to call this if only
* EE/CE/ME/DE/RI are changing.
*/
void kvmppc_set_msr(struct kvm_vcpu *vcpu, u32 new_msr)
{
u32 old_msr = vcpu->arch.shared->msr;
vcpu->arch.shared->msr = new_msr;
kvmppc_mmu_msr_notify(vcpu, old_msr);
if (vcpu->arch.shared->msr & MSR_WE) {
kvm_vcpu_block(vcpu);
kvmppc_set_exit_type(vcpu, EMULATED_MTMSRWE_EXITS);
};
kvmppc_vcpu_sync_spe(vcpu);
}
static void kvmppc_booke_queue_irqprio(struct kvm_vcpu *vcpu,
unsigned int priority)
{
@ -257,6 +312,19 @@ void kvmppc_core_deliver_interrupts(struct kvm_vcpu *vcpu)
vcpu->arch.shared->int_pending = 0;
}
int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu)
{
int ret;
local_irq_disable();
kvm_guest_enter();
ret = __kvmppc_vcpu_run(kvm_run, vcpu);
kvm_guest_exit();
local_irq_enable();
return ret;
}
/**
* kvmppc_handle_exit
*
@ -344,10 +412,16 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu,
r = RESUME_GUEST;
break;
case BOOKE_INTERRUPT_SPE_UNAVAIL:
kvmppc_booke_queue_irqprio(vcpu, BOOKE_IRQPRIO_SPE_UNAVAIL);
#ifdef CONFIG_SPE
case BOOKE_INTERRUPT_SPE_UNAVAIL: {
if (vcpu->arch.shared->msr & MSR_SPE)
kvmppc_vcpu_enable_spe(vcpu);
else
kvmppc_booke_queue_irqprio(vcpu,
BOOKE_IRQPRIO_SPE_UNAVAIL);
r = RESUME_GUEST;
break;
}
case BOOKE_INTERRUPT_SPE_FP_DATA:
kvmppc_booke_queue_irqprio(vcpu, BOOKE_IRQPRIO_SPE_FP_DATA);
@ -358,6 +432,28 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu,
kvmppc_booke_queue_irqprio(vcpu, BOOKE_IRQPRIO_SPE_FP_ROUND);
r = RESUME_GUEST;
break;
#else
case BOOKE_INTERRUPT_SPE_UNAVAIL:
/*
* Guest wants SPE, but host kernel doesn't support it. Send
* an "unimplemented operation" program check to the guest.
*/
kvmppc_core_queue_program(vcpu, ESR_PUO | ESR_SPV);
r = RESUME_GUEST;
break;
/*
* These really should never happen without CONFIG_SPE,
* as we should never enable the real MSR[SPE] in the guest.
*/
case BOOKE_INTERRUPT_SPE_FP_DATA:
case BOOKE_INTERRUPT_SPE_FP_ROUND:
printk(KERN_CRIT "%s: unexpected SPE interrupt %u at %08lx\n",
__func__, exit_nr, vcpu->arch.pc);
run->hw.hardware_exit_reason = exit_nr;
r = RESUME_HOST;
break;
#endif
case BOOKE_INTERRUPT_DATA_STORAGE:
kvmppc_core_queue_data_storage(vcpu, vcpu->arch.fault_dear,
@ -392,6 +488,17 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu,
gpa_t gpaddr;
gfn_t gfn;
#ifdef CONFIG_KVM_E500
if (!(vcpu->arch.shared->msr & MSR_PR) &&
(eaddr & PAGE_MASK) == vcpu->arch.magic_page_ea) {
kvmppc_map_magic(vcpu);
kvmppc_account_exit(vcpu, DTLB_VIRT_MISS_EXITS);
r = RESUME_GUEST;
break;
}
#endif
/* Check the guest TLB. */
gtlb_index = kvmppc_mmu_dtlb_index(vcpu, eaddr);
if (gtlb_index < 0) {
@ -514,6 +621,7 @@ int kvm_arch_vcpu_setup(struct kvm_vcpu *vcpu)
vcpu->arch.pc = 0;
vcpu->arch.shared->msr = 0;
vcpu->arch.shadow_msr = MSR_USER | MSR_DE | MSR_IS | MSR_DS;
kvmppc_set_gpr(vcpu, 1, (16<<20) - 8); /* -8 for the callee-save LR slot */
vcpu->arch.shadow_pid = 1;
@ -770,6 +878,26 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
return -ENOTSUPP;
}
int kvmppc_core_prepare_memory_region(struct kvm *kvm,
struct kvm_userspace_memory_region *mem)
{
return 0;
}
void kvmppc_core_commit_memory_region(struct kvm *kvm,
struct kvm_userspace_memory_region *mem)
{
}
int kvmppc_core_init_vm(struct kvm *kvm)
{
return 0;
}
void kvmppc_core_destroy_vm(struct kvm *kvm)
{
}
int __init kvmppc_booke_init(void)
{
unsigned long ivor[16];

View File

@ -52,24 +52,19 @@
extern unsigned long kvmppc_booke_handlers;
/* Helper function for "full" MSR writes. No need to call this if only EE is
* changing. */
static inline void kvmppc_set_msr(struct kvm_vcpu *vcpu, u32 new_msr)
{
if ((new_msr & MSR_PR) != (vcpu->arch.shared->msr & MSR_PR))
kvmppc_mmu_priv_switch(vcpu, new_msr & MSR_PR);
vcpu->arch.shared->msr = new_msr;
if (vcpu->arch.shared->msr & MSR_WE) {
kvm_vcpu_block(vcpu);
kvmppc_set_exit_type(vcpu, EMULATED_MTMSRWE_EXITS);
};
}
void kvmppc_set_msr(struct kvm_vcpu *vcpu, u32 new_msr);
void kvmppc_mmu_msr_notify(struct kvm_vcpu *vcpu, u32 old_msr);
int kvmppc_booke_emulate_op(struct kvm_run *run, struct kvm_vcpu *vcpu,
unsigned int inst, int *advance);
int kvmppc_booke_emulate_mfspr(struct kvm_vcpu *vcpu, int sprn, int rt);
int kvmppc_booke_emulate_mtspr(struct kvm_vcpu *vcpu, int sprn, int rs);
/* low-level asm code to transfer guest state */
void kvmppc_load_guest_spe(struct kvm_vcpu *vcpu);
void kvmppc_save_guest_spe(struct kvm_vcpu *vcpu);
/* high-level function, manages flags, host state */
void kvmppc_vcpu_disable_spe(struct kvm_vcpu *vcpu);
#endif /* __KVM_BOOKE_H__ */

View File

@ -13,6 +13,7 @@
* Foundation, 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
*
* Copyright IBM Corp. 2007
* Copyright 2011 Freescale Semiconductor, Inc.
*
* Authors: Hollis Blanchard <hollisb@us.ibm.com>
*/
@ -24,8 +25,6 @@
#include <asm/page.h>
#include <asm/asm-offsets.h>
#define KVMPPC_MSR_MASK (MSR_CE|MSR_EE|MSR_PR|MSR_DE|MSR_ME|MSR_IS|MSR_DS)
#define VCPU_GPR(n) (VCPU_GPRS + (n * 4))
/* The host stack layout: */
@ -192,6 +191,12 @@ _GLOBAL(kvmppc_resume_host)
lwz r3, VCPU_HOST_PID(r4)
mtspr SPRN_PID, r3
#ifdef CONFIG_FSL_BOOKE
/* we cheat and know that Linux doesn't use PID1 which is always 0 */
lis r3, 0
mtspr SPRN_PID1, r3
#endif
/* Restore host IVPR before re-enabling interrupts. We cheat and know
* that Linux IVPR is always 0xc0000000. */
lis r3, 0xc000
@ -241,6 +246,14 @@ _GLOBAL(kvmppc_resume_host)
heavyweight_exit:
/* Not returning to guest. */
#ifdef CONFIG_SPE
/* save guest SPEFSCR and load host SPEFSCR */
mfspr r9, SPRN_SPEFSCR
stw r9, VCPU_SPEFSCR(r4)
lwz r9, VCPU_HOST_SPEFSCR(r4)
mtspr SPRN_SPEFSCR, r9
#endif
/* We already saved guest volatile register state; now save the
* non-volatiles. */
stw r15, VCPU_GPR(r15)(r4)
@ -342,6 +355,14 @@ _GLOBAL(__kvmppc_vcpu_run)
lwz r30, VCPU_GPR(r30)(r4)
lwz r31, VCPU_GPR(r31)(r4)
#ifdef CONFIG_SPE
/* save host SPEFSCR and load guest SPEFSCR */
mfspr r3, SPRN_SPEFSCR
stw r3, VCPU_HOST_SPEFSCR(r4)
lwz r3, VCPU_SPEFSCR(r4)
mtspr SPRN_SPEFSCR, r3
#endif
lightweight_exit:
stw r2, HOST_R2(r1)
@ -350,6 +371,11 @@ lightweight_exit:
lwz r3, VCPU_SHADOW_PID(r4)
mtspr SPRN_PID, r3
#ifdef CONFIG_FSL_BOOKE
lwz r3, VCPU_SHADOW_PID1(r4)
mtspr SPRN_PID1, r3
#endif
#ifdef CONFIG_44x
iccci 0, 0 /* XXX hack */
#endif
@ -405,20 +431,17 @@ lightweight_exit:
/* Finish loading guest volatiles and jump to guest. */
lwz r3, VCPU_CTR(r4)
lwz r5, VCPU_CR(r4)
lwz r6, VCPU_PC(r4)
lwz r7, VCPU_SHADOW_MSR(r4)
mtctr r3
lwz r3, VCPU_CR(r4)
mtcr r3
mtcr r5
mtsrr0 r6
mtsrr1 r7
lwz r5, VCPU_GPR(r5)(r4)
lwz r6, VCPU_GPR(r6)(r4)
lwz r7, VCPU_GPR(r7)(r4)
lwz r8, VCPU_GPR(r8)(r4)
lwz r3, VCPU_PC(r4)
mtsrr0 r3
lwz r3, VCPU_SHARED(r4)
lwz r3, (VCPU_SHARED_MSR + 4)(r3)
oris r3, r3, KVMPPC_MSR_MASK@h
ori r3, r3, KVMPPC_MSR_MASK@l
mtsrr1 r3
/* Clear any debug events which occurred since we disabled MSR[DE].
* XXX This gives us a 3-instruction window in which a breakpoint
@ -430,3 +453,24 @@ lightweight_exit:
lwz r3, VCPU_GPR(r3)(r4)
lwz r4, VCPU_GPR(r4)(r4)
rfi
#ifdef CONFIG_SPE
_GLOBAL(kvmppc_save_guest_spe)
cmpi 0,r3,0
beqlr-
SAVE_32EVRS(0, r4, r3, VCPU_EVR)
evxor evr6, evr6, evr6
evmwumiaa evr6, evr6, evr6
li r4,VCPU_ACC
evstddx evr6, r4, r3 /* save acc */
blr
_GLOBAL(kvmppc_load_guest_spe)
cmpi 0,r3,0
beqlr-
li r4,VCPU_ACC
evlddx evr6,r4,r3
evmra evr6,evr6 /* load acc */
REST_32EVRS(0, r4, r3, VCPU_EVR)
blr
#endif

View File

@ -1,5 +1,5 @@
/*
* Copyright (C) 2008 Freescale Semiconductor, Inc. All rights reserved.
* Copyright (C) 2008-2011 Freescale Semiconductor, Inc. All rights reserved.
*
* Author: Yu Liu, <yu.liu@freescale.com>
*
@ -41,6 +41,11 @@ void kvmppc_core_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
void kvmppc_core_vcpu_put(struct kvm_vcpu *vcpu)
{
kvmppc_e500_tlb_put(vcpu);
#ifdef CONFIG_SPE
if (vcpu->arch.shadow_msr & MSR_SPE)
kvmppc_vcpu_disable_spe(vcpu);
#endif
}
int kvmppc_core_check_processor_compat(void)

View File

@ -81,8 +81,12 @@ int kvmppc_core_emulate_mtspr(struct kvm_vcpu *vcpu, int sprn, int rs)
kvmppc_set_pid(vcpu, spr_val);
break;
case SPRN_PID1:
if (spr_val != 0)
return EMULATE_FAIL;
vcpu_e500->pid[1] = spr_val; break;
case SPRN_PID2:
if (spr_val != 0)
return EMULATE_FAIL;
vcpu_e500->pid[2] = spr_val; break;
case SPRN_MAS0:
vcpu_e500->mas0 = spr_val; break;

File diff suppressed because it is too large Load Diff

View File

@ -1,5 +1,5 @@
/*
* Copyright (C) 2008 Freescale Semiconductor, Inc. All rights reserved.
* Copyright (C) 2008-2011 Freescale Semiconductor, Inc. All rights reserved.
*
* Author: Yu Liu, yu.liu@freescale.com
*
@ -55,6 +55,7 @@ extern void kvmppc_e500_tlb_load(struct kvm_vcpu *, int);
extern int kvmppc_e500_tlb_init(struct kvmppc_vcpu_e500 *);
extern void kvmppc_e500_tlb_uninit(struct kvmppc_vcpu_e500 *);
extern void kvmppc_e500_tlb_setup(struct kvmppc_vcpu_e500 *);
extern void kvmppc_e500_recalc_shadow_pid(struct kvmppc_vcpu_e500 *);
/* TLB helper functions */
static inline unsigned int get_tlb_size(const struct tlbe *tlbe)
@ -110,6 +111,16 @@ static inline unsigned int get_cur_pid(struct kvm_vcpu *vcpu)
return vcpu->arch.pid & 0xff;
}
static inline unsigned int get_cur_as(struct kvm_vcpu *vcpu)
{
return !!(vcpu->arch.shared->msr & (MSR_IS | MSR_DS));
}
static inline unsigned int get_cur_pr(struct kvm_vcpu *vcpu)
{
return !!(vcpu->arch.shared->msr & MSR_PR);
}
static inline unsigned int get_cur_spid(
const struct kvmppc_vcpu_e500 *vcpu_e500)
{

View File

@ -30,6 +30,7 @@
#include <asm/uaccess.h>
#include <asm/kvm_ppc.h>
#include <asm/tlbflush.h>
#include <asm/cputhreads.h>
#include "timing.h"
#include "../mm/mmu_decl.h"
@ -38,8 +39,12 @@
int kvm_arch_vcpu_runnable(struct kvm_vcpu *v)
{
#ifndef CONFIG_KVM_BOOK3S_64_HV
return !(v->arch.shared->msr & MSR_WE) ||
!!(v->arch.pending_exceptions);
#else
return !(v->arch.ceded) || !!(v->arch.pending_exceptions);
#endif
}
int kvmppc_kvm_pv(struct kvm_vcpu *vcpu)
@ -73,7 +78,8 @@ int kvmppc_kvm_pv(struct kvm_vcpu *vcpu)
}
case HC_VENDOR_KVM | KVM_HC_FEATURES:
r = HC_EV_SUCCESS;
#if defined(CONFIG_PPC_BOOK3S) /* XXX Missing magic page on BookE */
#if defined(CONFIG_PPC_BOOK3S) || defined(CONFIG_KVM_E500)
/* XXX Missing magic page on 44x */
r2 |= (1 << KVM_FEATURE_MAGIC_PAGE);
#endif
@ -147,7 +153,7 @@ void kvm_arch_check_processor_compat(void *rtn)
int kvm_arch_init_vm(struct kvm *kvm)
{
return 0;
return kvmppc_core_init_vm(kvm);
}
void kvm_arch_destroy_vm(struct kvm *kvm)
@ -163,6 +169,9 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
kvm->vcpus[i] = NULL;
atomic_set(&kvm->online_vcpus, 0);
kvmppc_core_destroy_vm(kvm);
mutex_unlock(&kvm->lock);
}
@ -180,10 +189,13 @@ int kvm_dev_ioctl_check_extension(long ext)
#else
case KVM_CAP_PPC_SEGSTATE:
#endif
case KVM_CAP_PPC_PAIRED_SINGLES:
case KVM_CAP_PPC_UNSET_IRQ:
case KVM_CAP_PPC_IRQ_LEVEL:
case KVM_CAP_ENABLE_CAP:
r = 1;
break;
#ifndef CONFIG_KVM_BOOK3S_64_HV
case KVM_CAP_PPC_PAIRED_SINGLES:
case KVM_CAP_PPC_OSI:
case KVM_CAP_PPC_GET_PVINFO:
r = 1;
@ -191,6 +203,21 @@ int kvm_dev_ioctl_check_extension(long ext)
case KVM_CAP_COALESCED_MMIO:
r = KVM_COALESCED_MMIO_PAGE_OFFSET;
break;
#endif
#ifdef CONFIG_KVM_BOOK3S_64_HV
case KVM_CAP_SPAPR_TCE:
r = 1;
break;
case KVM_CAP_PPC_SMT:
r = threads_per_core;
break;
case KVM_CAP_PPC_RMA:
r = 1;
/* PPC970 requires an RMA */
if (cpu_has_feature(CPU_FTR_ARCH_201))
r = 2;
break;
#endif
default:
r = 0;
break;
@ -211,7 +238,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
struct kvm_userspace_memory_region *mem,
int user_alloc)
{
return 0;
return kvmppc_core_prepare_memory_region(kvm, mem);
}
void kvm_arch_commit_memory_region(struct kvm *kvm,
@ -219,7 +246,7 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
struct kvm_memory_slot old,
int user_alloc)
{
return;
kvmppc_core_commit_memory_region(kvm, mem);
}
@ -287,6 +314,7 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
hrtimer_init(&vcpu->arch.dec_timer, CLOCK_REALTIME, HRTIMER_MODE_ABS);
tasklet_init(&vcpu->arch.tasklet, kvmppc_decrementer_func, (ulong)vcpu);
vcpu->arch.dec_timer.function = kvmppc_decrementer_wakeup;
vcpu->arch.dec_expires = ~(u64)0;
#ifdef CONFIG_KVM_EXIT_TIMING
mutex_init(&vcpu->arch.exit_timing_lock);
@ -313,6 +341,7 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
mtspr(SPRN_VRSAVE, vcpu->arch.vrsave);
#endif
kvmppc_core_vcpu_load(vcpu, cpu);
vcpu->cpu = smp_processor_id();
}
void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
@ -321,6 +350,7 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
#ifdef CONFIG_BOOKE
vcpu->arch.vrsave = mfspr(SPRN_VRSAVE);
#endif
vcpu->cpu = -1;
}
int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu,
@ -492,15 +522,18 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
for (i = 0; i < 32; i++)
kvmppc_set_gpr(vcpu, i, gprs[i]);
vcpu->arch.osi_needed = 0;
} else if (vcpu->arch.hcall_needed) {
int i;
kvmppc_set_gpr(vcpu, 3, run->papr_hcall.ret);
for (i = 0; i < 9; ++i)
kvmppc_set_gpr(vcpu, 4 + i, run->papr_hcall.args[i]);
vcpu->arch.hcall_needed = 0;
}
kvmppc_core_deliver_interrupts(vcpu);
local_irq_disable();
kvm_guest_enter();
r = __kvmppc_vcpu_run(run, vcpu);
kvm_guest_exit();
local_irq_enable();
r = kvmppc_vcpu_run(run, vcpu);
if (vcpu->sigset_active)
sigprocmask(SIG_SETMASK, &sigsaved, NULL);
@ -518,6 +551,8 @@ int kvm_vcpu_ioctl_interrupt(struct kvm_vcpu *vcpu, struct kvm_interrupt *irq)
if (waitqueue_active(&vcpu->wq)) {
wake_up_interruptible(&vcpu->wq);
vcpu->stat.halt_wakeup++;
} else if (vcpu->cpu != -1) {
smp_send_reschedule(vcpu->cpu);
}
return 0;
@ -633,6 +668,29 @@ long kvm_arch_vm_ioctl(struct file *filp,
break;
}
#ifdef CONFIG_KVM_BOOK3S_64_HV
case KVM_CREATE_SPAPR_TCE: {
struct kvm_create_spapr_tce create_tce;
struct kvm *kvm = filp->private_data;
r = -EFAULT;
if (copy_from_user(&create_tce, argp, sizeof(create_tce)))
goto out;
r = kvm_vm_ioctl_create_spapr_tce(kvm, &create_tce);
goto out;
}
case KVM_ALLOCATE_RMA: {
struct kvm *kvm = filp->private_data;
struct kvm_allocate_rma rma;
r = kvm_vm_ioctl_allocate_rma(kvm, &rma);
if (r >= 0 && copy_to_user(argp, &rma, sizeof(rma)))
r = -EFAULT;
break;
}
#endif /* CONFIG_KVM_BOOK3S_64_HV */
default:
r = -ENOTTY;
}

View File

@ -56,15 +56,6 @@ static void add_exit_timing(struct kvm_vcpu *vcpu, u64 duration, int type)
{
u64 old;
do_div(duration, tb_ticks_per_usec);
if (unlikely(duration > 0xFFFFFFFF)) {
printk(KERN_ERR"%s - duration too big -> overflow"
" duration %lld type %d exit #%d\n",
__func__, duration, type,
vcpu->arch.timing_count_type[type]);
return;
}
mutex_lock(&vcpu->arch.exit_timing_lock);
vcpu->arch.timing_count_type[type]++;

View File

@ -103,7 +103,7 @@ TRACE_EVENT(kvm_gtlb_write,
* Book3S trace points *
*************************************************************************/
#ifdef CONFIG_PPC_BOOK3S
#ifdef CONFIG_KVM_BOOK3S_PR
TRACE_EVENT(kvm_book3s_exit,
TP_PROTO(unsigned int exit_nr, struct kvm_vcpu *vcpu),
@ -252,7 +252,7 @@ TRACE_EVENT(kvm_book3s_mmu_flush,
),
TP_fast_assign(
__entry->count = vcpu->arch.hpte_cache_count;
__entry->count = to_book3s(vcpu)->hpte_cache_count;
__entry->p1 = p1;
__entry->p2 = p2;
__entry->type = type;

View File

@ -37,7 +37,7 @@
#define HPTE_LOCK_BIT 3
static DEFINE_RAW_SPINLOCK(native_tlbie_lock);
DEFINE_RAW_SPINLOCK(native_tlbie_lock);
static inline void __tlbie(unsigned long va, int psize, int ssize)
{
@ -51,7 +51,7 @@ static inline void __tlbie(unsigned long va, int psize, int ssize)
va &= ~0xffful;
va |= ssize << 8;
asm volatile(ASM_FTR_IFCLR("tlbie %0,0", PPC_TLBIE(%1,%0), %2)
: : "r" (va), "r"(0), "i" (CPU_FTR_HVMODE_206)
: : "r" (va), "r"(0), "i" (CPU_FTR_ARCH_206)
: "memory");
break;
default:
@ -61,7 +61,7 @@ static inline void __tlbie(unsigned long va, int psize, int ssize)
va |= ssize << 8;
va |= 1; /* L */
asm volatile(ASM_FTR_IFCLR("tlbie %0,1", PPC_TLBIE(%1,%0), %2)
: : "r" (va), "r"(0), "i" (CPU_FTR_HVMODE_206)
: : "r" (va), "r"(0), "i" (CPU_FTR_ARCH_206)
: "memory");
break;
}

View File

@ -167,7 +167,7 @@ BEGIN_FTR_SECTION
std r12,PACA_EXGEN+EX_R13(r13)
EXCEPTION_PROLOG_ISERIES_1
FTR_SECTION_ELSE
EXCEPTION_PROLOG_1(PACA_EXGEN)
EXCEPTION_PROLOG_1(PACA_EXGEN, NOTEST, 0)
EXCEPTION_PROLOG_ISERIES_1
ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_SLB)
b data_access_common

View File

@ -39,7 +39,7 @@
label##_iSeries: \
HMT_MEDIUM; \
mtspr SPRN_SPRG_SCRATCH0,r13; /* save r13 */ \
EXCEPTION_PROLOG_1(area); \
EXCEPTION_PROLOG_1(area, NOTEST, 0); \
EXCEPTION_PROLOG_ISERIES_1; \
b label##_common
@ -48,7 +48,7 @@ label##_iSeries: \
label##_iSeries: \
HMT_MEDIUM; \
mtspr SPRN_SPRG_SCRATCH0,r13; /* save r13 */ \
EXCEPTION_PROLOG_1(PACA_EXGEN); \
EXCEPTION_PROLOG_1(PACA_EXGEN, NOTEST, 0); \
lbz r10,PACASOFTIRQEN(r13); \
cmpwi 0,r10,0; \
beq- label##_iSeries_masked; \

View File

@ -17,6 +17,7 @@
#include <linux/cpu.h>
#include <linux/of.h>
#include <linux/spinlock.h>
#include <linux/module.h>
#include <asm/prom.h>
#include <asm/io.h>
@ -24,6 +25,7 @@
#include <asm/irq.h>
#include <asm/errno.h>
#include <asm/xics.h>
#include <asm/kvm_ppc.h>
struct icp_ipl {
union {
@ -139,6 +141,12 @@ static void icp_native_cause_ipi(int cpu, unsigned long data)
icp_native_set_qirr(cpu, IPI_PRIORITY);
}
void xics_wake_cpu(int cpu)
{
icp_native_set_qirr(cpu, IPI_PRIORITY);
}
EXPORT_SYMBOL_GPL(xics_wake_cpu);
static irqreturn_t icp_native_ipi_action(int irq, void *dev_id)
{
int cpu = smp_processor_id();
@ -185,6 +193,7 @@ static int __init icp_native_map_one_cpu(int hw_id, unsigned long addr,
}
icp_native_regs[cpu] = ioremap(addr, size);
kvmppc_set_xics_phys(cpu, addr);
if (!icp_native_regs[cpu]) {
pr_warning("icp_native: Failed ioremap for CPU %d, "
"interrupt server #0x%x, addr %#lx\n",

View File

@ -529,6 +529,18 @@ menuconfig PARAVIRT_GUEST
if PARAVIRT_GUEST
config PARAVIRT_TIME_ACCOUNTING
bool "Paravirtual steal time accounting"
select PARAVIRT
default n
---help---
Select this option to enable fine granularity task steal time
accounting. Time spent executing other tasks in parallel with
the current vCPU is discounted from the vCPU power. To account for
that, there can be a small performance impact.
If in doubt, say N here.
source "arch/x86/xen/Kconfig"
config KVM_CLOCK

View File

@ -229,37 +229,6 @@ struct read_cache {
unsigned long end;
};
struct decode_cache {
u8 twobyte;
u8 b;
u8 intercept;
u8 lock_prefix;
u8 rep_prefix;
u8 op_bytes;
u8 ad_bytes;
u8 rex_prefix;
struct operand src;
struct operand src2;
struct operand dst;
bool has_seg_override;
u8 seg_override;
unsigned int d;
int (*execute)(struct x86_emulate_ctxt *ctxt);
int (*check_perm)(struct x86_emulate_ctxt *ctxt);
unsigned long regs[NR_VCPU_REGS];
unsigned long eip;
/* modrm */
u8 modrm;
u8 modrm_mod;
u8 modrm_reg;
u8 modrm_rm;
u8 modrm_seg;
bool rip_relative;
struct fetch_cache fetch;
struct read_cache io_read;
struct read_cache mem_read;
};
struct x86_emulate_ctxt {
struct x86_emulate_ops *ops;
@ -280,7 +249,35 @@ struct x86_emulate_ctxt {
struct x86_exception exception;
/* decode cache */
struct decode_cache decode;
u8 twobyte;
u8 b;
u8 intercept;
u8 lock_prefix;
u8 rep_prefix;
u8 op_bytes;
u8 ad_bytes;
u8 rex_prefix;
struct operand src;
struct operand src2;
struct operand dst;
bool has_seg_override;
u8 seg_override;
unsigned int d;
int (*execute)(struct x86_emulate_ctxt *ctxt);
int (*check_perm)(struct x86_emulate_ctxt *ctxt);
/* modrm */
u8 modrm;
u8 modrm_mod;
u8 modrm_reg;
u8 modrm_rm;
u8 modrm_seg;
bool rip_relative;
unsigned long _eip;
/* Fields above regs are cleared together. */
unsigned long regs[NR_VCPU_REGS];
struct fetch_cache fetch;
struct read_cache io_read;
struct read_cache mem_read;
};
/* Repeat String Operation Prefix */
@ -373,6 +370,5 @@ int x86_emulate_insn(struct x86_emulate_ctxt *ctxt);
int emulator_task_switch(struct x86_emulate_ctxt *ctxt,
u16 tss_selector, int reason,
bool has_error_code, u32 error_code);
int emulate_int_real(struct x86_emulate_ctxt *ctxt,
struct x86_emulate_ops *ops, int irq);
int emulate_int_real(struct x86_emulate_ctxt *ctxt, int irq);
#endif /* _ASM_X86_KVM_X86_EMULATE_H */

View File

@ -48,7 +48,7 @@
(~(unsigned long)(X86_CR4_VME | X86_CR4_PVI | X86_CR4_TSD | X86_CR4_DE\
| X86_CR4_PSE | X86_CR4_PAE | X86_CR4_MCE \
| X86_CR4_PGE | X86_CR4_PCE | X86_CR4_OSFXSR \
| X86_CR4_OSXSAVE \
| X86_CR4_OSXSAVE | X86_CR4_SMEP | X86_CR4_RDWRGSFS \
| X86_CR4_OSXMMEXCPT | X86_CR4_VMXE))
#define CR8_RESERVED_BITS (~(unsigned long)X86_CR8_TPR)
@ -205,6 +205,7 @@ union kvm_mmu_page_role {
unsigned invalid:1;
unsigned nxe:1;
unsigned cr0_wp:1;
unsigned smep_andnot_wp:1;
};
};
@ -227,15 +228,17 @@ struct kvm_mmu_page {
* in this shadow page.
*/
DECLARE_BITMAP(slot_bitmap, KVM_MEMORY_SLOTS + KVM_PRIVATE_MEM_SLOTS);
bool multimapped; /* More than one parent_pte? */
bool unsync;
int root_count; /* Currently serving as active root */
unsigned int unsync_children;
union {
u64 *parent_pte; /* !multimapped */
struct hlist_head parent_ptes; /* multimapped, kvm_pte_chain */
};
unsigned long parent_ptes; /* Reverse mapping for parent_pte */
DECLARE_BITMAP(unsync_child_bitmap, 512);
#ifdef CONFIG_X86_32
int clear_spte_count;
#endif
struct rcu_head rcu;
};
struct kvm_pv_mmu_op_buffer {
@ -269,8 +272,6 @@ struct kvm_mmu {
gpa_t (*gva_to_gpa)(struct kvm_vcpu *vcpu, gva_t gva, u32 access,
struct x86_exception *exception);
gpa_t (*translate_gpa)(struct kvm_vcpu *vcpu, gpa_t gpa, u32 access);
void (*prefetch_page)(struct kvm_vcpu *vcpu,
struct kvm_mmu_page *page);
int (*sync_page)(struct kvm_vcpu *vcpu,
struct kvm_mmu_page *sp);
void (*invlpg)(struct kvm_vcpu *vcpu, gva_t gva);
@ -346,8 +347,7 @@ struct kvm_vcpu_arch {
* put it here to avoid allocation */
struct kvm_pv_mmu_op_buffer mmu_op_buffer;
struct kvm_mmu_memory_cache mmu_pte_chain_cache;
struct kvm_mmu_memory_cache mmu_rmap_desc_cache;
struct kvm_mmu_memory_cache mmu_pte_list_desc_cache;
struct kvm_mmu_memory_cache mmu_page_cache;
struct kvm_mmu_memory_cache mmu_page_header_cache;
@ -393,6 +393,15 @@ struct kvm_vcpu_arch {
unsigned int hw_tsc_khz;
unsigned int time_offset;
struct page *time_page;
struct {
u64 msr_val;
u64 last_steal;
u64 accum_steal;
struct gfn_to_hva_cache stime;
struct kvm_steal_time steal;
} st;
u64 last_guest_tsc;
u64 last_kernel_ns;
u64 last_tsc_nsec;
@ -419,6 +428,11 @@ struct kvm_vcpu_arch {
u64 mcg_ctl;
u64 *mce_banks;
/* Cache MMIO info */
u64 mmio_gva;
unsigned access;
gfn_t mmio_gfn;
/* used for guest single stepping over the given code position */
unsigned long singlestep_rip;
@ -441,6 +455,7 @@ struct kvm_arch {
unsigned int n_used_mmu_pages;
unsigned int n_requested_mmu_pages;
unsigned int n_max_mmu_pages;
unsigned int indirect_shadow_pages;
atomic_t invlpg_counter;
struct hlist_head mmu_page_hash[KVM_NUM_MMU_PAGES];
/*
@ -477,6 +492,8 @@ struct kvm_arch {
u64 hv_guest_os_id;
u64 hv_hypercall;
atomic_t reader_counter;
#ifdef CONFIG_KVM_MMU_AUDIT
int audit_point;
#endif
@ -559,7 +576,7 @@ struct kvm_x86_ops {
void (*decache_cr4_guest_bits)(struct kvm_vcpu *vcpu);
void (*set_cr0)(struct kvm_vcpu *vcpu, unsigned long cr0);
void (*set_cr3)(struct kvm_vcpu *vcpu, unsigned long cr3);
void (*set_cr4)(struct kvm_vcpu *vcpu, unsigned long cr4);
int (*set_cr4)(struct kvm_vcpu *vcpu, unsigned long cr4);
void (*set_efer)(struct kvm_vcpu *vcpu, u64 efer);
void (*get_idt)(struct kvm_vcpu *vcpu, struct desc_ptr *dt);
void (*set_idt)(struct kvm_vcpu *vcpu, struct desc_ptr *dt);
@ -636,7 +653,6 @@ void kvm_mmu_module_exit(void);
void kvm_mmu_destroy(struct kvm_vcpu *vcpu);
int kvm_mmu_create(struct kvm_vcpu *vcpu);
int kvm_mmu_setup(struct kvm_vcpu *vcpu);
void kvm_mmu_set_nonpresent_ptes(u64 trap_pte, u64 notrap_pte);
void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask,
u64 dirty_mask, u64 nx_mask, u64 x_mask);
@ -830,11 +846,12 @@ enum {
asmlinkage void kvm_spurious_fault(void);
extern bool kvm_rebooting;
#define __kvm_handle_fault_on_reboot(insn) \
#define ____kvm_handle_fault_on_reboot(insn, cleanup_insn) \
"666: " insn "\n\t" \
"668: \n\t" \
".pushsection .fixup, \"ax\" \n" \
"667: \n\t" \
cleanup_insn "\n\t" \
"cmpb $0, kvm_rebooting \n\t" \
"jne 668b \n\t" \
__ASM_SIZE(push) " $666b \n\t" \
@ -844,6 +861,9 @@ extern bool kvm_rebooting;
_ASM_PTR " 666b, 667b \n\t" \
".popsection"
#define __kvm_handle_fault_on_reboot(insn) \
____kvm_handle_fault_on_reboot(insn, "")
#define KVM_ARCH_WANT_MMU_NOTIFIER
int kvm_unmap_hva(struct kvm *kvm, unsigned long hva);
int kvm_age_hva(struct kvm *kvm, unsigned long hva);

View File

@ -21,6 +21,7 @@
*/
#define KVM_FEATURE_CLOCKSOURCE2 3
#define KVM_FEATURE_ASYNC_PF 4
#define KVM_FEATURE_STEAL_TIME 5
/* The last 8 bits are used to indicate how to interpret the flags field
* in pvclock structure. If no bits are set, all flags are ignored.
@ -30,10 +31,23 @@
#define MSR_KVM_WALL_CLOCK 0x11
#define MSR_KVM_SYSTEM_TIME 0x12
#define KVM_MSR_ENABLED 1
/* Custom MSRs falls in the range 0x4b564d00-0x4b564dff */
#define MSR_KVM_WALL_CLOCK_NEW 0x4b564d00
#define MSR_KVM_SYSTEM_TIME_NEW 0x4b564d01
#define MSR_KVM_ASYNC_PF_EN 0x4b564d02
#define MSR_KVM_STEAL_TIME 0x4b564d03
struct kvm_steal_time {
__u64 steal;
__u32 version;
__u32 flags;
__u32 pad[12];
};
#define KVM_STEAL_ALIGNMENT_BITS 5
#define KVM_STEAL_VALID_BITS ((-1ULL << (KVM_STEAL_ALIGNMENT_BITS + 1)))
#define KVM_STEAL_RESERVED_MASK (((1 << KVM_STEAL_ALIGNMENT_BITS) - 1 ) << 1)
#define KVM_MAX_MMU_OP_BATCH 32
@ -178,6 +192,7 @@ void __init kvm_guest_init(void);
void kvm_async_pf_task_wait(u32 token);
void kvm_async_pf_task_wake(u32 token);
u32 kvm_read_and_reset_pf_reason(void);
extern void kvm_disable_steal_time(void);
#else
#define kvm_guest_init() do { } while (0)
#define kvm_async_pf_task_wait(T) do {} while(0)
@ -186,6 +201,11 @@ static inline u32 kvm_read_and_reset_pf_reason(void)
{
return 0;
}
static inline void kvm_disable_steal_time(void)
{
return;
}
#endif
#endif /* __KERNEL__ */

View File

@ -441,6 +441,18 @@
#define MSR_IA32_VMX_VMCS_ENUM 0x0000048a
#define MSR_IA32_VMX_PROCBASED_CTLS2 0x0000048b
#define MSR_IA32_VMX_EPT_VPID_CAP 0x0000048c
#define MSR_IA32_VMX_TRUE_PINBASED_CTLS 0x0000048d
#define MSR_IA32_VMX_TRUE_PROCBASED_CTLS 0x0000048e
#define MSR_IA32_VMX_TRUE_EXIT_CTLS 0x0000048f
#define MSR_IA32_VMX_TRUE_ENTRY_CTLS 0x00000490
/* VMX_BASIC bits and bitmasks */
#define VMX_BASIC_VMCS_SIZE_SHIFT 32
#define VMX_BASIC_64 0x0001000000000000LLU
#define VMX_BASIC_MEM_TYPE_SHIFT 50
#define VMX_BASIC_MEM_TYPE_MASK 0x003c000000000000LLU
#define VMX_BASIC_MEM_TYPE_WB 6LLU
#define VMX_BASIC_INOUT 0x0040000000000000LLU
/* AMD-V MSRs */

View File

@ -230,6 +230,15 @@ static inline unsigned long long paravirt_sched_clock(void)
return PVOP_CALL0(unsigned long long, pv_time_ops.sched_clock);
}
struct jump_label_key;
extern struct jump_label_key paravirt_steal_enabled;
extern struct jump_label_key paravirt_steal_rq_enabled;
static inline u64 paravirt_steal_clock(int cpu)
{
return PVOP_CALL1(u64, pv_time_ops.steal_clock, cpu);
}
static inline unsigned long long paravirt_read_pmc(int counter)
{
return PVOP_CALL1(u64, pv_cpu_ops.read_pmc, counter);

View File

@ -89,6 +89,7 @@ struct pv_lazy_ops {
struct pv_time_ops {
unsigned long long (*sched_clock)(void);
unsigned long long (*steal_clock)(int cpu);
unsigned long (*get_tsc_khz)(void);
};

View File

@ -59,6 +59,7 @@
#define X86_CR4_OSFXSR 0x00000200 /* enable fast FPU save and restore */
#define X86_CR4_OSXMMEXCPT 0x00000400 /* enable unmasked SSE exceptions */
#define X86_CR4_VMXE 0x00002000 /* enable VMX virtualization */
#define X86_CR4_RDWRGSFS 0x00010000 /* enable RDWRGSFS support */
#define X86_CR4_OSXSAVE 0x00040000 /* enable xsave and xrestore */
#define X86_CR4_SMEP 0x00100000 /* enable SMEP support */

View File

@ -132,6 +132,8 @@ enum vmcs_field {
GUEST_IA32_PAT_HIGH = 0x00002805,
GUEST_IA32_EFER = 0x00002806,
GUEST_IA32_EFER_HIGH = 0x00002807,
GUEST_IA32_PERF_GLOBAL_CTRL = 0x00002808,
GUEST_IA32_PERF_GLOBAL_CTRL_HIGH= 0x00002809,
GUEST_PDPTR0 = 0x0000280a,
GUEST_PDPTR0_HIGH = 0x0000280b,
GUEST_PDPTR1 = 0x0000280c,
@ -144,6 +146,8 @@ enum vmcs_field {
HOST_IA32_PAT_HIGH = 0x00002c01,
HOST_IA32_EFER = 0x00002c02,
HOST_IA32_EFER_HIGH = 0x00002c03,
HOST_IA32_PERF_GLOBAL_CTRL = 0x00002c04,
HOST_IA32_PERF_GLOBAL_CTRL_HIGH = 0x00002c05,
PIN_BASED_VM_EXEC_CONTROL = 0x00004000,
CPU_BASED_VM_EXEC_CONTROL = 0x00004002,
EXCEPTION_BITMAP = 0x00004004,
@ -426,4 +430,43 @@ struct vmx_msr_entry {
u64 value;
} __aligned(16);
/*
* Exit Qualifications for entry failure during or after loading guest state
*/
#define ENTRY_FAIL_DEFAULT 0
#define ENTRY_FAIL_PDPTE 2
#define ENTRY_FAIL_NMI 3
#define ENTRY_FAIL_VMCS_LINK_PTR 4
/*
* VM-instruction error numbers
*/
enum vm_instruction_error_number {
VMXERR_VMCALL_IN_VMX_ROOT_OPERATION = 1,
VMXERR_VMCLEAR_INVALID_ADDRESS = 2,
VMXERR_VMCLEAR_VMXON_POINTER = 3,
VMXERR_VMLAUNCH_NONCLEAR_VMCS = 4,
VMXERR_VMRESUME_NONLAUNCHED_VMCS = 5,
VMXERR_VMRESUME_AFTER_VMXOFF = 6,
VMXERR_ENTRY_INVALID_CONTROL_FIELD = 7,
VMXERR_ENTRY_INVALID_HOST_STATE_FIELD = 8,
VMXERR_VMPTRLD_INVALID_ADDRESS = 9,
VMXERR_VMPTRLD_VMXON_POINTER = 10,
VMXERR_VMPTRLD_INCORRECT_VMCS_REVISION_ID = 11,
VMXERR_UNSUPPORTED_VMCS_COMPONENT = 12,
VMXERR_VMWRITE_READ_ONLY_VMCS_COMPONENT = 13,
VMXERR_VMXON_IN_VMX_ROOT_OPERATION = 15,
VMXERR_ENTRY_INVALID_EXECUTIVE_VMCS_POINTER = 16,
VMXERR_ENTRY_NONLAUNCHED_EXECUTIVE_VMCS = 17,
VMXERR_ENTRY_EXECUTIVE_VMCS_POINTER_NOT_VMXON_POINTER = 18,
VMXERR_VMCALL_NONCLEAR_VMCS = 19,
VMXERR_VMCALL_INVALID_VM_EXIT_CONTROL_FIELDS = 20,
VMXERR_VMCALL_INCORRECT_MSEG_REVISION_ID = 22,
VMXERR_VMXOFF_UNDER_DUAL_MONITOR_TREATMENT_OF_SMIS_AND_SMM = 23,
VMXERR_VMCALL_INVALID_SMM_MONITOR_FEATURES = 24,
VMXERR_ENTRY_INVALID_VM_EXECUTION_CONTROL_FIELDS_IN_EXECUTIVE_VMCS = 25,
VMXERR_ENTRY_EVENTS_BLOCKED_BY_MOV_SS = 26,
VMXERR_INVALID_OPERAND_TO_INVEPT_INVVPID = 28,
};
#endif

View File

@ -51,6 +51,15 @@ static int parse_no_kvmapf(char *arg)
early_param("no-kvmapf", parse_no_kvmapf);
static int steal_acc = 1;
static int parse_no_stealacc(char *arg)
{
steal_acc = 0;
return 0;
}
early_param("no-steal-acc", parse_no_stealacc);
struct kvm_para_state {
u8 mmu_queue[MMU_QUEUE_SIZE];
int mmu_queue_len;
@ -58,6 +67,8 @@ struct kvm_para_state {
static DEFINE_PER_CPU(struct kvm_para_state, para_state);
static DEFINE_PER_CPU(struct kvm_vcpu_pv_apf_data, apf_reason) __aligned(64);
static DEFINE_PER_CPU(struct kvm_steal_time, steal_time) __aligned(64);
static int has_steal_clock = 0;
static struct kvm_para_state *kvm_para_state(void)
{
@ -441,6 +452,21 @@ static void __init paravirt_ops_setup(void)
#endif
}
static void kvm_register_steal_time(void)
{
int cpu = smp_processor_id();
struct kvm_steal_time *st = &per_cpu(steal_time, cpu);
if (!has_steal_clock)
return;
memset(st, 0, sizeof(*st));
wrmsrl(MSR_KVM_STEAL_TIME, (__pa(st) | KVM_MSR_ENABLED));
printk(KERN_INFO "kvm-stealtime: cpu %d, msr %lx\n",
cpu, __pa(st));
}
void __cpuinit kvm_guest_cpu_init(void)
{
if (!kvm_para_available())
@ -457,6 +483,9 @@ void __cpuinit kvm_guest_cpu_init(void)
printk(KERN_INFO"KVM setup async PF for cpu %d\n",
smp_processor_id());
}
if (has_steal_clock)
kvm_register_steal_time();
}
static void kvm_pv_disable_apf(void *unused)
@ -483,6 +512,31 @@ static struct notifier_block kvm_pv_reboot_nb = {
.notifier_call = kvm_pv_reboot_notify,
};
static u64 kvm_steal_clock(int cpu)
{
u64 steal;
struct kvm_steal_time *src;
int version;
src = &per_cpu(steal_time, cpu);
do {
version = src->version;
rmb();
steal = src->steal;
rmb();
} while ((version & 1) || (version != src->version));
return steal;
}
void kvm_disable_steal_time(void)
{
if (!has_steal_clock)
return;
wrmsr(MSR_KVM_STEAL_TIME, 0, 0);
}
#ifdef CONFIG_SMP
static void __init kvm_smp_prepare_boot_cpu(void)
{
@ -500,6 +554,7 @@ static void __cpuinit kvm_guest_cpu_online(void *dummy)
static void kvm_guest_cpu_offline(void *dummy)
{
kvm_disable_steal_time();
kvm_pv_disable_apf(NULL);
apf_task_wake_all();
}
@ -548,6 +603,11 @@ void __init kvm_guest_init(void)
if (kvm_para_has_feature(KVM_FEATURE_ASYNC_PF))
x86_init.irqs.trap_init = kvm_apf_trap_init;
if (kvm_para_has_feature(KVM_FEATURE_STEAL_TIME)) {
has_steal_clock = 1;
pv_time_ops.steal_clock = kvm_steal_clock;
}
#ifdef CONFIG_SMP
smp_ops.smp_prepare_boot_cpu = kvm_smp_prepare_boot_cpu;
register_cpu_notifier(&kvm_cpu_notifier);
@ -555,3 +615,15 @@ void __init kvm_guest_init(void)
kvm_guest_cpu_init();
#endif
}
static __init int activate_jump_labels(void)
{
if (has_steal_clock) {
jump_label_inc(&paravirt_steal_enabled);
if (steal_acc)
jump_label_inc(&paravirt_steal_rq_enabled);
}
return 0;
}
arch_initcall(activate_jump_labels);

View File

@ -160,6 +160,7 @@ static void __cpuinit kvm_setup_secondary_clock(void)
static void kvm_crash_shutdown(struct pt_regs *regs)
{
native_write_msr(msr_kvm_system_time, 0, 0);
kvm_disable_steal_time();
native_machine_crash_shutdown(regs);
}
#endif
@ -167,6 +168,7 @@ static void kvm_crash_shutdown(struct pt_regs *regs)
static void kvm_shutdown(void)
{
native_write_msr(msr_kvm_system_time, 0, 0);
kvm_disable_steal_time();
native_machine_shutdown();
}

View File

@ -202,6 +202,14 @@ static void native_flush_tlb_single(unsigned long addr)
__native_flush_tlb_single(addr);
}
struct jump_label_key paravirt_steal_enabled;
struct jump_label_key paravirt_steal_rq_enabled;
static u64 native_steal_clock(int cpu)
{
return 0;
}
/* These are in entry.S */
extern void native_iret(void);
extern void native_irq_enable_sysexit(void);
@ -307,6 +315,7 @@ struct pv_init_ops pv_init_ops = {
struct pv_time_ops pv_time_ops = {
.sched_clock = native_sched_clock,
.steal_clock = native_steal_clock,
};
struct pv_irq_ops pv_irq_ops = {

View File

@ -31,6 +31,7 @@ config KVM
select KVM_ASYNC_PF
select USER_RETURN_NOTIFIER
select KVM_MMIO
select TASK_DELAY_ACCT
---help---
Support hosting fully virtualized guest machines using hardware
virtualization extensions. You will need a fairly recent

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@ -49,6 +49,8 @@
#define PFERR_FETCH_MASK (1U << 4)
int kvm_mmu_get_spte_hierarchy(struct kvm_vcpu *vcpu, u64 addr, u64 sptes[4]);
void kvm_mmu_set_mmio_spte_mask(u64 mmio_mask);
int handle_mmio_page_fault_common(struct kvm_vcpu *vcpu, u64 addr, bool direct);
int kvm_init_shadow_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu *context);
static inline unsigned int kvm_mmu_available_pages(struct kvm *kvm)
@ -76,4 +78,27 @@ static inline int is_present_gpte(unsigned long pte)
return pte & PT_PRESENT_MASK;
}
static inline int is_writable_pte(unsigned long pte)
{
return pte & PT_WRITABLE_MASK;
}
static inline bool is_write_protection(struct kvm_vcpu *vcpu)
{
return kvm_read_cr0_bits(vcpu, X86_CR0_WP);
}
static inline bool check_write_user_access(struct kvm_vcpu *vcpu,
bool write_fault, bool user_fault,
unsigned long pte)
{
if (unlikely(write_fault && !is_writable_pte(pte)
&& (user_fault || is_write_protection(vcpu))))
return false;
if (unlikely(user_fault && !(pte & PT_USER_MASK)))
return false;
return true;
}
#endif

View File

@ -99,18 +99,6 @@ static void audit_mappings(struct kvm_vcpu *vcpu, u64 *sptep, int level)
"level = %d\n", sp, level);
return;
}
if (*sptep == shadow_notrap_nonpresent_pte) {
audit_printk(vcpu->kvm, "notrap spte in unsync "
"sp: %p\n", sp);
return;
}
}
if (sp->role.direct && *sptep == shadow_notrap_nonpresent_pte) {
audit_printk(vcpu->kvm, "notrap spte in direct sp: %p\n",
sp);
return;
}
if (!is_shadow_present_pte(*sptep) || !is_last_spte(*sptep, level))

View File

@ -196,6 +196,54 @@ DEFINE_EVENT(kvm_mmu_page_class, kvm_mmu_prepare_zap_page,
TP_ARGS(sp)
);
DEFINE_EVENT(kvm_mmu_page_class, kvm_mmu_delay_free_pages,
TP_PROTO(struct kvm_mmu_page *sp),
TP_ARGS(sp)
);
TRACE_EVENT(
mark_mmio_spte,
TP_PROTO(u64 *sptep, gfn_t gfn, unsigned access),
TP_ARGS(sptep, gfn, access),
TP_STRUCT__entry(
__field(void *, sptep)
__field(gfn_t, gfn)
__field(unsigned, access)
),
TP_fast_assign(
__entry->sptep = sptep;
__entry->gfn = gfn;
__entry->access = access;
),
TP_printk("sptep:%p gfn %llx access %x", __entry->sptep, __entry->gfn,
__entry->access)
);
TRACE_EVENT(
handle_mmio_page_fault,
TP_PROTO(u64 addr, gfn_t gfn, unsigned access),
TP_ARGS(addr, gfn, access),
TP_STRUCT__entry(
__field(u64, addr)
__field(gfn_t, gfn)
__field(unsigned, access)
),
TP_fast_assign(
__entry->addr = addr;
__entry->gfn = gfn;
__entry->access = access;
),
TP_printk("addr:%llx gfn %llx access %x", __entry->addr, __entry->gfn,
__entry->access)
);
TRACE_EVENT(
kvm_mmu_audit,
TP_PROTO(struct kvm_vcpu *vcpu, int audit_point),

View File

@ -101,11 +101,15 @@ static int FNAME(cmpxchg_gpte)(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
return (ret != orig_pte);
}
static unsigned FNAME(gpte_access)(struct kvm_vcpu *vcpu, pt_element_t gpte)
static unsigned FNAME(gpte_access)(struct kvm_vcpu *vcpu, pt_element_t gpte,
bool last)
{
unsigned access;
access = (gpte & (PT_WRITABLE_MASK | PT_USER_MASK)) | ACC_EXEC_MASK;
if (last && !is_dirty_gpte(gpte))
access &= ~ACC_WRITE_MASK;
#if PTTYPE == 64
if (vcpu->arch.mmu.nx)
access &= ~(gpte >> PT64_NX_SHIFT);
@ -113,6 +117,24 @@ static unsigned FNAME(gpte_access)(struct kvm_vcpu *vcpu, pt_element_t gpte)
return access;
}
static bool FNAME(is_last_gpte)(struct guest_walker *walker,
struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
pt_element_t gpte)
{
if (walker->level == PT_PAGE_TABLE_LEVEL)
return true;
if ((walker->level == PT_DIRECTORY_LEVEL) && is_large_pte(gpte) &&
(PTTYPE == 64 || is_pse(vcpu)))
return true;
if ((walker->level == PT_PDPE_LEVEL) && is_large_pte(gpte) &&
(mmu->root_level == PT64_ROOT_LEVEL))
return true;
return false;
}
/*
* Fetch a guest pte for a guest virtual address
*/
@ -125,18 +147,17 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
gfn_t table_gfn;
unsigned index, pt_access, uninitialized_var(pte_access);
gpa_t pte_gpa;
bool eperm, present, rsvd_fault;
int offset, write_fault, user_fault, fetch_fault;
write_fault = access & PFERR_WRITE_MASK;
user_fault = access & PFERR_USER_MASK;
fetch_fault = access & PFERR_FETCH_MASK;
bool eperm;
int offset;
const int write_fault = access & PFERR_WRITE_MASK;
const int user_fault = access & PFERR_USER_MASK;
const int fetch_fault = access & PFERR_FETCH_MASK;
u16 errcode = 0;
trace_kvm_mmu_pagetable_walk(addr, write_fault, user_fault,
fetch_fault);
walk:
present = true;
eperm = rsvd_fault = false;
retry_walk:
eperm = false;
walker->level = mmu->root_level;
pte = mmu->get_cr3(vcpu);
@ -144,10 +165,8 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
if (walker->level == PT32E_ROOT_LEVEL) {
pte = kvm_pdptr_read_mmu(vcpu, mmu, (addr >> 30) & 3);
trace_kvm_mmu_paging_element(pte, walker->level);
if (!is_present_gpte(pte)) {
present = false;
if (!is_present_gpte(pte))
goto error;
}
--walker->level;
}
#endif
@ -170,42 +189,31 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
real_gfn = mmu->translate_gpa(vcpu, gfn_to_gpa(table_gfn),
PFERR_USER_MASK|PFERR_WRITE_MASK);
if (unlikely(real_gfn == UNMAPPED_GVA)) {
present = false;
break;
}
if (unlikely(real_gfn == UNMAPPED_GVA))
goto error;
real_gfn = gpa_to_gfn(real_gfn);
host_addr = gfn_to_hva(vcpu->kvm, real_gfn);
if (unlikely(kvm_is_error_hva(host_addr))) {
present = false;
break;
}
if (unlikely(kvm_is_error_hva(host_addr)))
goto error;
ptep_user = (pt_element_t __user *)((void *)host_addr + offset);
if (unlikely(__copy_from_user(&pte, ptep_user, sizeof(pte)))) {
present = false;
break;
}
if (unlikely(__copy_from_user(&pte, ptep_user, sizeof(pte))))
goto error;
trace_kvm_mmu_paging_element(pte, walker->level);
if (unlikely(!is_present_gpte(pte))) {
present = false;
break;
}
if (unlikely(!is_present_gpte(pte)))
goto error;
if (unlikely(is_rsvd_bits_set(&vcpu->arch.mmu, pte,
walker->level))) {
rsvd_fault = true;
break;
errcode |= PFERR_RSVD_MASK | PFERR_PRESENT_MASK;
goto error;
}
if (unlikely(write_fault && !is_writable_pte(pte)
&& (user_fault || is_write_protection(vcpu))))
eperm = true;
if (unlikely(user_fault && !(pte & PT_USER_MASK)))
if (!check_write_user_access(vcpu, write_fault, user_fault,
pte))
eperm = true;
#if PTTYPE == 64
@ -213,39 +221,35 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
eperm = true;
#endif
if (!eperm && !rsvd_fault
&& unlikely(!(pte & PT_ACCESSED_MASK))) {
if (!eperm && unlikely(!(pte & PT_ACCESSED_MASK))) {
int ret;
trace_kvm_mmu_set_accessed_bit(table_gfn, index,
sizeof(pte));
ret = FNAME(cmpxchg_gpte)(vcpu, mmu, ptep_user, index,
pte, pte|PT_ACCESSED_MASK);
if (unlikely(ret < 0)) {
present = false;
break;
} else if (ret)
goto walk;
if (unlikely(ret < 0))
goto error;
else if (ret)
goto retry_walk;
mark_page_dirty(vcpu->kvm, table_gfn);
pte |= PT_ACCESSED_MASK;
}
pte_access = pt_access & FNAME(gpte_access)(vcpu, pte);
walker->ptes[walker->level - 1] = pte;
if ((walker->level == PT_PAGE_TABLE_LEVEL) ||
((walker->level == PT_DIRECTORY_LEVEL) &&
is_large_pte(pte) &&
(PTTYPE == 64 || is_pse(vcpu))) ||
((walker->level == PT_PDPE_LEVEL) &&
is_large_pte(pte) &&
mmu->root_level == PT64_ROOT_LEVEL)) {
if (FNAME(is_last_gpte)(walker, vcpu, mmu, pte)) {
int lvl = walker->level;
gpa_t real_gpa;
gfn_t gfn;
u32 ac;
/* check if the kernel is fetching from user page */
if (unlikely(pte_access & PT_USER_MASK) &&
kvm_read_cr4_bits(vcpu, X86_CR4_SMEP))
if (fetch_fault && !user_fault)
eperm = true;
gfn = gpte_to_gfn_lvl(pte, lvl);
gfn += (addr & PT_LVL_OFFSET_MASK(lvl)) >> PAGE_SHIFT;
@ -266,12 +270,14 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
break;
}
pt_access = pte_access;
pt_access &= FNAME(gpte_access)(vcpu, pte, false);
--walker->level;
}
if (unlikely(!present || eperm || rsvd_fault))
if (unlikely(eperm)) {
errcode |= PFERR_PRESENT_MASK;
goto error;
}
if (write_fault && unlikely(!is_dirty_gpte(pte))) {
int ret;
@ -279,17 +285,17 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
trace_kvm_mmu_set_dirty_bit(table_gfn, index, sizeof(pte));
ret = FNAME(cmpxchg_gpte)(vcpu, mmu, ptep_user, index,
pte, pte|PT_DIRTY_MASK);
if (unlikely(ret < 0)) {
present = false;
if (unlikely(ret < 0))
goto error;
} else if (ret)
goto walk;
else if (ret)
goto retry_walk;
mark_page_dirty(vcpu->kvm, table_gfn);
pte |= PT_DIRTY_MASK;
walker->ptes[walker->level - 1] = pte;
}
pte_access = pt_access & FNAME(gpte_access)(vcpu, pte, true);
walker->pt_access = pt_access;
walker->pte_access = pte_access;
pgprintk("%s: pte %llx pte_access %x pt_access %x\n",
@ -297,19 +303,14 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
return 1;
error:
errcode |= write_fault | user_fault;
if (fetch_fault && (mmu->nx ||
kvm_read_cr4_bits(vcpu, X86_CR4_SMEP)))
errcode |= PFERR_FETCH_MASK;
walker->fault.vector = PF_VECTOR;
walker->fault.error_code_valid = true;
walker->fault.error_code = 0;
if (present)
walker->fault.error_code |= PFERR_PRESENT_MASK;
walker->fault.error_code |= write_fault | user_fault;
if (fetch_fault && mmu->nx)
walker->fault.error_code |= PFERR_FETCH_MASK;
if (rsvd_fault)
walker->fault.error_code |= PFERR_RSVD_MASK;
walker->fault.error_code = errcode;
walker->fault.address = addr;
walker->fault.nested_page_fault = mmu != vcpu->arch.walk_mmu;
@ -336,16 +337,11 @@ static bool FNAME(prefetch_invalid_gpte)(struct kvm_vcpu *vcpu,
struct kvm_mmu_page *sp, u64 *spte,
pt_element_t gpte)
{
u64 nonpresent = shadow_trap_nonpresent_pte;
if (is_rsvd_bits_set(&vcpu->arch.mmu, gpte, PT_PAGE_TABLE_LEVEL))
goto no_present;
if (!is_present_gpte(gpte)) {
if (!sp->unsync)
nonpresent = shadow_notrap_nonpresent_pte;
if (!is_present_gpte(gpte))
goto no_present;
}
if (!(gpte & PT_ACCESSED_MASK))
goto no_present;
@ -353,7 +349,7 @@ static bool FNAME(prefetch_invalid_gpte)(struct kvm_vcpu *vcpu,
return false;
no_present:
drop_spte(vcpu->kvm, spte, nonpresent);
drop_spte(vcpu->kvm, spte);
return true;
}
@ -369,9 +365,9 @@ static void FNAME(update_pte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
return;
pgprintk("%s: gpte %llx spte %p\n", __func__, (u64)gpte, spte);
pte_access = sp->role.access & FNAME(gpte_access)(vcpu, gpte);
pte_access = sp->role.access & FNAME(gpte_access)(vcpu, gpte, true);
pfn = gfn_to_pfn_atomic(vcpu->kvm, gpte_to_gfn(gpte));
if (is_error_pfn(pfn)) {
if (mmu_invalid_pfn(pfn)) {
kvm_release_pfn_clean(pfn);
return;
}
@ -381,7 +377,7 @@ static void FNAME(update_pte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
* vcpu->arch.update_pte.pfn was fetched from get_user_pages(write = 1).
*/
mmu_set_spte(vcpu, spte, sp->role.access, pte_access, 0, 0,
is_dirty_gpte(gpte), NULL, PT_PAGE_TABLE_LEVEL,
NULL, PT_PAGE_TABLE_LEVEL,
gpte_to_gfn(gpte), pfn, true, true);
}
@ -432,12 +428,11 @@ static void FNAME(pte_prefetch)(struct kvm_vcpu *vcpu, struct guest_walker *gw,
unsigned pte_access;
gfn_t gfn;
pfn_t pfn;
bool dirty;
if (spte == sptep)
continue;
if (*spte != shadow_trap_nonpresent_pte)
if (is_shadow_present_pte(*spte))
continue;
gpte = gptep[i];
@ -445,18 +440,18 @@ static void FNAME(pte_prefetch)(struct kvm_vcpu *vcpu, struct guest_walker *gw,
if (FNAME(prefetch_invalid_gpte)(vcpu, sp, spte, gpte))
continue;
pte_access = sp->role.access & FNAME(gpte_access)(vcpu, gpte);
pte_access = sp->role.access & FNAME(gpte_access)(vcpu, gpte,
true);
gfn = gpte_to_gfn(gpte);
dirty = is_dirty_gpte(gpte);
pfn = pte_prefetch_gfn_to_pfn(vcpu, gfn,
(pte_access & ACC_WRITE_MASK) && dirty);
if (is_error_pfn(pfn)) {
pte_access & ACC_WRITE_MASK);
if (mmu_invalid_pfn(pfn)) {
kvm_release_pfn_clean(pfn);
break;
}
mmu_set_spte(vcpu, spte, sp->role.access, pte_access, 0, 0,
dirty, NULL, PT_PAGE_TABLE_LEVEL, gfn,
NULL, PT_PAGE_TABLE_LEVEL, gfn,
pfn, true, true);
}
}
@ -467,12 +462,11 @@ static void FNAME(pte_prefetch)(struct kvm_vcpu *vcpu, struct guest_walker *gw,
static u64 *FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
struct guest_walker *gw,
int user_fault, int write_fault, int hlevel,
int *ptwrite, pfn_t pfn, bool map_writable,
int *emulate, pfn_t pfn, bool map_writable,
bool prefault)
{
unsigned access = gw->pt_access;
struct kvm_mmu_page *sp = NULL;
bool dirty = is_dirty_gpte(gw->ptes[gw->level - 1]);
int top_level;
unsigned direct_access;
struct kvm_shadow_walk_iterator it;
@ -480,9 +474,7 @@ static u64 *FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
if (!is_present_gpte(gw->ptes[gw->level - 1]))
return NULL;
direct_access = gw->pt_access & gw->pte_access;
if (!dirty)
direct_access &= ~ACC_WRITE_MASK;
direct_access = gw->pte_access;
top_level = vcpu->arch.mmu.root_level;
if (top_level == PT32E_ROOT_LEVEL)
@ -540,8 +532,8 @@ static u64 *FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
link_shadow_page(it.sptep, sp);
}
mmu_set_spte(vcpu, it.sptep, access, gw->pte_access & access,
user_fault, write_fault, dirty, ptwrite, it.level,
mmu_set_spte(vcpu, it.sptep, access, gw->pte_access,
user_fault, write_fault, emulate, it.level,
gw->gfn, pfn, prefault, map_writable);
FNAME(pte_prefetch)(vcpu, gw, it.sptep);
@ -575,7 +567,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr, u32 error_code,
int user_fault = error_code & PFERR_USER_MASK;
struct guest_walker walker;
u64 *sptep;
int write_pt = 0;
int emulate = 0;
int r;
pfn_t pfn;
int level = PT_PAGE_TABLE_LEVEL;
@ -585,6 +577,10 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr, u32 error_code,
pgprintk("%s: addr %lx err %x\n", __func__, addr, error_code);
if (unlikely(error_code & PFERR_RSVD_MASK))
return handle_mmio_page_fault(vcpu, addr, error_code,
mmu_is_nested(vcpu));
r = mmu_topup_memory_caches(vcpu);
if (r)
return r;
@ -623,9 +619,9 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr, u32 error_code,
&map_writable))
return 0;
/* mmio */
if (is_error_pfn(pfn))
return kvm_handle_bad_page(vcpu->kvm, walker.gfn, pfn);
if (handle_abnormal_pfn(vcpu, mmu_is_nested(vcpu) ? 0 : addr,
walker.gfn, pfn, walker.pte_access, &r))
return r;
spin_lock(&vcpu->kvm->mmu_lock);
if (mmu_notifier_retry(vcpu, mmu_seq))
@ -636,19 +632,19 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr, u32 error_code,
if (!force_pt_level)
transparent_hugepage_adjust(vcpu, &walker.gfn, &pfn, &level);
sptep = FNAME(fetch)(vcpu, addr, &walker, user_fault, write_fault,
level, &write_pt, pfn, map_writable, prefault);
level, &emulate, pfn, map_writable, prefault);
(void)sptep;
pgprintk("%s: shadow pte %p %llx ptwrite %d\n", __func__,
sptep, *sptep, write_pt);
pgprintk("%s: shadow pte %p %llx emulate %d\n", __func__,
sptep, *sptep, emulate);
if (!write_pt)
if (!emulate)
vcpu->arch.last_pt_write_count = 0; /* reset fork detector */
++vcpu->stat.pf_fixed;
trace_kvm_mmu_audit(vcpu, AUDIT_POST_PAGE_FAULT);
spin_unlock(&vcpu->kvm->mmu_lock);
return write_pt;
return emulate;
out_unlock:
spin_unlock(&vcpu->kvm->mmu_lock);
@ -665,6 +661,8 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva)
u64 *sptep;
int need_flush = 0;
vcpu_clear_mmio_info(vcpu, gva);
spin_lock(&vcpu->kvm->mmu_lock);
for_each_shadow_entry(vcpu, gva, iterator) {
@ -688,11 +686,11 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva)
if (is_shadow_present_pte(*sptep)) {
if (is_large_pte(*sptep))
--vcpu->kvm->stat.lpages;
drop_spte(vcpu->kvm, sptep,
shadow_trap_nonpresent_pte);
drop_spte(vcpu->kvm, sptep);
need_flush = 1;
} else
__set_spte(sptep, shadow_trap_nonpresent_pte);
} else if (is_mmio_spte(*sptep))
mmu_spte_clear_no_track(sptep);
break;
}
@ -752,36 +750,6 @@ static gpa_t FNAME(gva_to_gpa_nested)(struct kvm_vcpu *vcpu, gva_t vaddr,
return gpa;
}
static void FNAME(prefetch_page)(struct kvm_vcpu *vcpu,
struct kvm_mmu_page *sp)
{
int i, j, offset, r;
pt_element_t pt[256 / sizeof(pt_element_t)];
gpa_t pte_gpa;
if (sp->role.direct
|| (PTTYPE == 32 && sp->role.level > PT_PAGE_TABLE_LEVEL)) {
nonpaging_prefetch_page(vcpu, sp);
return;
}
pte_gpa = gfn_to_gpa(sp->gfn);
if (PTTYPE == 32) {
offset = sp->role.quadrant << PT64_LEVEL_BITS;
pte_gpa += offset * sizeof(pt_element_t);
}
for (i = 0; i < PT64_ENT_PER_PAGE; i += ARRAY_SIZE(pt)) {
r = kvm_read_guest_atomic(vcpu->kvm, pte_gpa, pt, sizeof pt);
pte_gpa += ARRAY_SIZE(pt) * sizeof(pt_element_t);
for (j = 0; j < ARRAY_SIZE(pt); ++j)
if (r || is_present_gpte(pt[j]))
sp->spt[i+j] = shadow_trap_nonpresent_pte;
else
sp->spt[i+j] = shadow_notrap_nonpresent_pte;
}
}
/*
* Using the cached information from sp->gfns is safe because:
* - The spte has a reference to the struct page, so the pfn for a given gfn
@ -817,7 +785,7 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
gpa_t pte_gpa;
gfn_t gfn;
if (!is_shadow_present_pte(sp->spt[i]))
if (!sp->spt[i])
continue;
pte_gpa = first_pte_gpa + i * sizeof(pt_element_t);
@ -826,26 +794,30 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
sizeof(pt_element_t)))
return -EINVAL;
gfn = gpte_to_gfn(gpte);
if (FNAME(prefetch_invalid_gpte)(vcpu, sp, &sp->spt[i], gpte)) {
vcpu->kvm->tlbs_dirty++;
continue;
}
gfn = gpte_to_gfn(gpte);
pte_access = sp->role.access;
pte_access &= FNAME(gpte_access)(vcpu, gpte, true);
if (sync_mmio_spte(&sp->spt[i], gfn, pte_access, &nr_present))
continue;
if (gfn != sp->gfns[i]) {
drop_spte(vcpu->kvm, &sp->spt[i],
shadow_trap_nonpresent_pte);
drop_spte(vcpu->kvm, &sp->spt[i]);
vcpu->kvm->tlbs_dirty++;
continue;
}
nr_present++;
pte_access = sp->role.access & FNAME(gpte_access)(vcpu, gpte);
host_writable = sp->spt[i] & SPTE_HOST_WRITEABLE;
set_spte(vcpu, &sp->spt[i], pte_access, 0, 0,
is_dirty_gpte(gpte), PT_PAGE_TABLE_LEVEL, gfn,
PT_PAGE_TABLE_LEVEL, gfn,
spte_to_pfn(sp->spt[i]), true, false,
host_writable);
}

View File

@ -1496,11 +1496,14 @@ static void svm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
update_cr0_intercept(svm);
}
static void svm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
static int svm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
{
unsigned long host_cr4_mce = read_cr4() & X86_CR4_MCE;
unsigned long old_cr4 = to_svm(vcpu)->vmcb->save.cr4;
if (cr4 & X86_CR4_VMXE)
return 1;
if (npt_enabled && ((old_cr4 ^ cr4) & X86_CR4_PGE))
svm_flush_tlb(vcpu);
@ -1510,6 +1513,7 @@ static void svm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
cr4 |= host_cr4_mce;
to_svm(vcpu)->vmcb->save.cr4 = cr4;
mark_dirty(to_svm(vcpu)->vmcb, VMCB_CR);
return 0;
}
static void svm_set_segment(struct kvm_vcpu *vcpu,

View File

@ -675,12 +675,12 @@ TRACE_EVENT(kvm_emulate_insn,
),
TP_fast_assign(
__entry->rip = vcpu->arch.emulate_ctxt.decode.fetch.start;
__entry->rip = vcpu->arch.emulate_ctxt.fetch.start;
__entry->csbase = kvm_x86_ops->get_segment_base(vcpu, VCPU_SREG_CS);
__entry->len = vcpu->arch.emulate_ctxt.decode.eip
- vcpu->arch.emulate_ctxt.decode.fetch.start;
__entry->len = vcpu->arch.emulate_ctxt._eip
- vcpu->arch.emulate_ctxt.fetch.start;
memcpy(__entry->insn,
vcpu->arch.emulate_ctxt.decode.fetch.data,
vcpu->arch.emulate_ctxt.fetch.data,
15);
__entry->flags = kei_decode_mode(vcpu->arch.emulate_ctxt.mode);
__entry->failed = failed;
@ -698,6 +698,29 @@ TRACE_EVENT(kvm_emulate_insn,
#define trace_kvm_emulate_insn_start(vcpu) trace_kvm_emulate_insn(vcpu, 0)
#define trace_kvm_emulate_insn_failed(vcpu) trace_kvm_emulate_insn(vcpu, 1)
TRACE_EVENT(
vcpu_match_mmio,
TP_PROTO(gva_t gva, gpa_t gpa, bool write, bool gpa_match),
TP_ARGS(gva, gpa, write, gpa_match),
TP_STRUCT__entry(
__field(gva_t, gva)
__field(gpa_t, gpa)
__field(bool, write)
__field(bool, gpa_match)
),
TP_fast_assign(
__entry->gva = gva;
__entry->gpa = gpa;
__entry->write = write;
__entry->gpa_match = gpa_match
),
TP_printk("gva %#lx gpa %#llx %s %s", __entry->gva, __entry->gpa,
__entry->write ? "Write" : "Read",
__entry->gpa_match ? "GPA" : "GVA")
);
#endif /* _TRACE_KVM_H */
#undef TRACE_INCLUDE_PATH

File diff suppressed because it is too large Load Diff

View File

@ -347,6 +347,7 @@ void kvm_inject_page_fault(struct kvm_vcpu *vcpu, struct x86_exception *fault)
vcpu->arch.cr2 = fault->address;
kvm_queue_exception_e(vcpu, PF_VECTOR, fault->error_code);
}
EXPORT_SYMBOL_GPL(kvm_inject_page_fault);
void kvm_propagate_fault(struct kvm_vcpu *vcpu, struct x86_exception *fault)
{
@ -579,6 +580,22 @@ static bool guest_cpuid_has_xsave(struct kvm_vcpu *vcpu)
return best && (best->ecx & bit(X86_FEATURE_XSAVE));
}
static bool guest_cpuid_has_smep(struct kvm_vcpu *vcpu)
{
struct kvm_cpuid_entry2 *best;
best = kvm_find_cpuid_entry(vcpu, 7, 0);
return best && (best->ebx & bit(X86_FEATURE_SMEP));
}
static bool guest_cpuid_has_fsgsbase(struct kvm_vcpu *vcpu)
{
struct kvm_cpuid_entry2 *best;
best = kvm_find_cpuid_entry(vcpu, 7, 0);
return best && (best->ebx & bit(X86_FEATURE_FSGSBASE));
}
static void update_cpuid(struct kvm_vcpu *vcpu)
{
struct kvm_cpuid_entry2 *best;
@ -598,14 +615,20 @@ static void update_cpuid(struct kvm_vcpu *vcpu)
int kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
{
unsigned long old_cr4 = kvm_read_cr4(vcpu);
unsigned long pdptr_bits = X86_CR4_PGE | X86_CR4_PSE | X86_CR4_PAE;
unsigned long pdptr_bits = X86_CR4_PGE | X86_CR4_PSE |
X86_CR4_PAE | X86_CR4_SMEP;
if (cr4 & CR4_RESERVED_BITS)
return 1;
if (!guest_cpuid_has_xsave(vcpu) && (cr4 & X86_CR4_OSXSAVE))
return 1;
if (!guest_cpuid_has_smep(vcpu) && (cr4 & X86_CR4_SMEP))
return 1;
if (!guest_cpuid_has_fsgsbase(vcpu) && (cr4 & X86_CR4_RDWRGSFS))
return 1;
if (is_long_mode(vcpu)) {
if (!(cr4 & X86_CR4_PAE))
return 1;
@ -615,11 +638,9 @@ int kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
kvm_read_cr3(vcpu)))
return 1;
if (cr4 & X86_CR4_VMXE)
if (kvm_x86_ops->set_cr4(vcpu, cr4))
return 1;
kvm_x86_ops->set_cr4(vcpu, cr4);
if ((cr4 ^ old_cr4) & pdptr_bits)
kvm_mmu_reset_context(vcpu);
@ -787,12 +808,12 @@ EXPORT_SYMBOL_GPL(kvm_get_dr);
* kvm-specific. Those are put in the beginning of the list.
*/
#define KVM_SAVE_MSRS_BEGIN 8
#define KVM_SAVE_MSRS_BEGIN 9
static u32 msrs_to_save[] = {
MSR_KVM_SYSTEM_TIME, MSR_KVM_WALL_CLOCK,
MSR_KVM_SYSTEM_TIME_NEW, MSR_KVM_WALL_CLOCK_NEW,
HV_X64_MSR_GUEST_OS_ID, HV_X64_MSR_HYPERCALL,
HV_X64_MSR_APIC_ASSIST_PAGE, MSR_KVM_ASYNC_PF_EN,
HV_X64_MSR_APIC_ASSIST_PAGE, MSR_KVM_ASYNC_PF_EN, MSR_KVM_STEAL_TIME,
MSR_IA32_SYSENTER_CS, MSR_IA32_SYSENTER_ESP, MSR_IA32_SYSENTER_EIP,
MSR_STAR,
#ifdef CONFIG_X86_64
@ -1388,7 +1409,7 @@ static int set_msr_hyperv_pw(struct kvm_vcpu *vcpu, u32 msr, u64 data)
return 1;
kvm_x86_ops->patch_hypercall(vcpu, instructions);
((unsigned char *)instructions)[3] = 0xc3; /* ret */
if (copy_to_user((void __user *)addr, instructions, 4))
if (__copy_to_user((void __user *)addr, instructions, 4))
return 1;
kvm->arch.hv_hypercall = data;
break;
@ -1415,7 +1436,7 @@ static int set_msr_hyperv(struct kvm_vcpu *vcpu, u32 msr, u64 data)
HV_X64_MSR_APIC_ASSIST_PAGE_ADDRESS_SHIFT);
if (kvm_is_error_hva(addr))
return 1;
if (clear_user((void __user *)addr, PAGE_SIZE))
if (__clear_user((void __user *)addr, PAGE_SIZE))
return 1;
vcpu->arch.hv_vapic = data;
break;
@ -1467,6 +1488,35 @@ static void kvmclock_reset(struct kvm_vcpu *vcpu)
}
}
static void accumulate_steal_time(struct kvm_vcpu *vcpu)
{
u64 delta;
if (!(vcpu->arch.st.msr_val & KVM_MSR_ENABLED))
return;
delta = current->sched_info.run_delay - vcpu->arch.st.last_steal;
vcpu->arch.st.last_steal = current->sched_info.run_delay;
vcpu->arch.st.accum_steal = delta;
}
static void record_steal_time(struct kvm_vcpu *vcpu)
{
if (!(vcpu->arch.st.msr_val & KVM_MSR_ENABLED))
return;
if (unlikely(kvm_read_guest_cached(vcpu->kvm, &vcpu->arch.st.stime,
&vcpu->arch.st.steal, sizeof(struct kvm_steal_time))))
return;
vcpu->arch.st.steal.steal += vcpu->arch.st.accum_steal;
vcpu->arch.st.steal.version += 2;
vcpu->arch.st.accum_steal = 0;
kvm_write_guest_cached(vcpu->kvm, &vcpu->arch.st.stime,
&vcpu->arch.st.steal, sizeof(struct kvm_steal_time));
}
int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data)
{
switch (msr) {
@ -1549,6 +1599,33 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data)
if (kvm_pv_enable_async_pf(vcpu, data))
return 1;
break;
case MSR_KVM_STEAL_TIME:
if (unlikely(!sched_info_on()))
return 1;
if (data & KVM_STEAL_RESERVED_MASK)
return 1;
if (kvm_gfn_to_hva_cache_init(vcpu->kvm, &vcpu->arch.st.stime,
data & KVM_STEAL_VALID_BITS))
return 1;
vcpu->arch.st.msr_val = data;
if (!(data & KVM_MSR_ENABLED))
break;
vcpu->arch.st.last_steal = current->sched_info.run_delay;
preempt_disable();
accumulate_steal_time(vcpu);
preempt_enable();
kvm_make_request(KVM_REQ_STEAL_UPDATE, vcpu);
break;
case MSR_IA32_MCG_CTL:
case MSR_IA32_MCG_STATUS:
case MSR_IA32_MC0_CTL ... MSR_IA32_MC0_CTL + 4 * KVM_MAX_MCE_BANKS - 1:
@ -1834,6 +1911,9 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata)
case MSR_KVM_ASYNC_PF_EN:
data = vcpu->arch.apf.msr_val;
break;
case MSR_KVM_STEAL_TIME:
data = vcpu->arch.st.msr_val;
break;
case MSR_IA32_P5_MC_ADDR:
case MSR_IA32_P5_MC_TYPE:
case MSR_IA32_MCG_CAP:
@ -2145,6 +2225,9 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
kvm_migrate_timers(vcpu);
vcpu->cpu = cpu;
}
accumulate_steal_time(vcpu);
kvm_make_request(KVM_REQ_STEAL_UPDATE, vcpu);
}
void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
@ -2283,6 +2366,13 @@ static void do_cpuid_1_ent(struct kvm_cpuid_entry2 *entry, u32 function,
entry->flags = 0;
}
static bool supported_xcr0_bit(unsigned bit)
{
u64 mask = ((u64)1 << bit);
return mask & (XSTATE_FP | XSTATE_SSE | XSTATE_YMM) & host_xcr0;
}
#define F(x) bit(X86_FEATURE_##x)
static void do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
@ -2328,7 +2418,7 @@ static void do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
0 /* Reserved, DCA */ | F(XMM4_1) |
F(XMM4_2) | F(X2APIC) | F(MOVBE) | F(POPCNT) |
0 /* Reserved*/ | F(AES) | F(XSAVE) | 0 /* OSXSAVE */ | F(AVX) |
F(F16C);
F(F16C) | F(RDRAND);
/* cpuid 0x80000001.ecx */
const u32 kvm_supported_word6_x86_features =
F(LAHF_LM) | F(CMP_LEGACY) | 0 /*SVM*/ | 0 /* ExtApicSpace */ |
@ -2342,6 +2432,10 @@ static void do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
F(ACE2) | F(ACE2_EN) | F(PHE) | F(PHE_EN) |
F(PMM) | F(PMM_EN);
/* cpuid 7.0.ebx */
const u32 kvm_supported_word9_x86_features =
F(SMEP) | F(FSGSBASE) | F(ERMS);
/* all calls to cpuid_count() should be made on the same cpu */
get_cpu();
do_cpuid_1_ent(entry, function, index);
@ -2376,7 +2470,7 @@ static void do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
}
break;
}
/* function 4 and 0xb have additional index. */
/* function 4 has additional index. */
case 4: {
int i, cache_type;
@ -2393,6 +2487,22 @@ static void do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
}
break;
}
case 7: {
entry->flags |= KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
/* Mask ebx against host capbability word 9 */
if (index == 0) {
entry->ebx &= kvm_supported_word9_x86_features;
cpuid_mask(&entry->ebx, 9);
} else
entry->ebx = 0;
entry->eax = 0;
entry->ecx = 0;
entry->edx = 0;
break;
}
case 9:
break;
/* function 0xb has additional index. */
case 0xb: {
int i, level_type;
@ -2410,16 +2520,17 @@ static void do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
break;
}
case 0xd: {
int i;
int idx, i;
entry->flags |= KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
for (i = 1; *nent < maxnent && i < 64; ++i) {
if (entry[i].eax == 0)
for (idx = 1, i = 1; *nent < maxnent && idx < 64; ++idx) {
do_cpuid_1_ent(&entry[i], function, idx);
if (entry[i].eax == 0 || !supported_xcr0_bit(idx))
continue;
do_cpuid_1_ent(&entry[i], function, i);
entry[i].flags |=
KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
++*nent;
++i;
}
break;
}
@ -2438,6 +2549,10 @@ static void do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
(1 << KVM_FEATURE_CLOCKSOURCE2) |
(1 << KVM_FEATURE_ASYNC_PF) |
(1 << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT);
if (sched_info_on())
entry->eax |= (1 << KVM_FEATURE_STEAL_TIME);
entry->ebx = 0;
entry->ecx = 0;
entry->edx = 0;
@ -2451,6 +2566,24 @@ static void do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
entry->ecx &= kvm_supported_word6_x86_features;
cpuid_mask(&entry->ecx, 6);
break;
case 0x80000008: {
unsigned g_phys_as = (entry->eax >> 16) & 0xff;
unsigned virt_as = max((entry->eax >> 8) & 0xff, 48U);
unsigned phys_as = entry->eax & 0xff;
if (!g_phys_as)
g_phys_as = phys_as;
entry->eax = g_phys_as | (virt_as << 8);
entry->ebx = entry->edx = 0;
break;
}
case 0x80000019:
entry->ecx = entry->edx = 0;
break;
case 0x8000001a:
break;
case 0x8000001d:
break;
/*Add support for Centaur's CPUID instruction*/
case 0xC0000000:
/*Just support up to 0xC0000004 now*/
@ -2460,10 +2593,16 @@ static void do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
entry->edx &= kvm_supported_word5_x86_features;
cpuid_mask(&entry->edx, 5);
break;
case 3: /* Processor serial number */
case 5: /* MONITOR/MWAIT */
case 6: /* Thermal management */
case 0xA: /* Architectural Performance Monitoring */
case 0x80000007: /* Advanced power management */
case 0xC0000002:
case 0xC0000003:
case 0xC0000004:
/*Now nothing to do, reserved for the future*/
default:
entry->eax = entry->ebx = entry->ecx = entry->edx = 0;
break;
}
@ -3817,7 +3956,7 @@ static int kvm_fetch_guest_virt(struct x86_emulate_ctxt *ctxt,
exception);
}
static int kvm_read_guest_virt(struct x86_emulate_ctxt *ctxt,
int kvm_read_guest_virt(struct x86_emulate_ctxt *ctxt,
gva_t addr, void *val, unsigned int bytes,
struct x86_exception *exception)
{
@ -3827,6 +3966,7 @@ static int kvm_read_guest_virt(struct x86_emulate_ctxt *ctxt,
return kvm_read_guest_virt_helper(addr, val, bytes, vcpu, access,
exception);
}
EXPORT_SYMBOL_GPL(kvm_read_guest_virt);
static int kvm_read_guest_virt_system(struct x86_emulate_ctxt *ctxt,
gva_t addr, void *val, unsigned int bytes,
@ -3836,7 +3976,7 @@ static int kvm_read_guest_virt_system(struct x86_emulate_ctxt *ctxt,
return kvm_read_guest_virt_helper(addr, val, bytes, vcpu, 0, exception);
}
static int kvm_write_guest_virt_system(struct x86_emulate_ctxt *ctxt,
int kvm_write_guest_virt_system(struct x86_emulate_ctxt *ctxt,
gva_t addr, void *val,
unsigned int bytes,
struct x86_exception *exception)
@ -3868,6 +4008,42 @@ static int kvm_write_guest_virt_system(struct x86_emulate_ctxt *ctxt,
out:
return r;
}
EXPORT_SYMBOL_GPL(kvm_write_guest_virt_system);
static int vcpu_mmio_gva_to_gpa(struct kvm_vcpu *vcpu, unsigned long gva,
gpa_t *gpa, struct x86_exception *exception,
bool write)
{
u32 access = (kvm_x86_ops->get_cpl(vcpu) == 3) ? PFERR_USER_MASK : 0;
if (vcpu_match_mmio_gva(vcpu, gva) &&
check_write_user_access(vcpu, write, access,
vcpu->arch.access)) {
*gpa = vcpu->arch.mmio_gfn << PAGE_SHIFT |
(gva & (PAGE_SIZE - 1));
trace_vcpu_match_mmio(gva, *gpa, write, false);
return 1;
}
if (write)
access |= PFERR_WRITE_MASK;
*gpa = vcpu->arch.walk_mmu->gva_to_gpa(vcpu, gva, access, exception);
if (*gpa == UNMAPPED_GVA)
return -1;
/* For APIC access vmexit */
if ((*gpa & PAGE_MASK) == APIC_DEFAULT_PHYS_BASE)
return 1;
if (vcpu_match_mmio_gpa(vcpu, *gpa)) {
trace_vcpu_match_mmio(gva, *gpa, write, true);
return 1;
}
return 0;
}
static int emulator_read_emulated(struct x86_emulate_ctxt *ctxt,
unsigned long addr,
@ -3876,8 +4052,8 @@ static int emulator_read_emulated(struct x86_emulate_ctxt *ctxt,
struct x86_exception *exception)
{
struct kvm_vcpu *vcpu = emul_to_vcpu(ctxt);
gpa_t gpa;
int handled;
gpa_t gpa;
int handled, ret;
if (vcpu->mmio_read_completed) {
memcpy(val, vcpu->mmio_data, bytes);
@ -3887,13 +4063,12 @@ static int emulator_read_emulated(struct x86_emulate_ctxt *ctxt,
return X86EMUL_CONTINUE;
}
gpa = kvm_mmu_gva_to_gpa_read(vcpu, addr, exception);
ret = vcpu_mmio_gva_to_gpa(vcpu, addr, &gpa, exception, false);
if (gpa == UNMAPPED_GVA)
if (ret < 0)
return X86EMUL_PROPAGATE_FAULT;
/* For APIC access vmexit */
if ((gpa & PAGE_MASK) == APIC_DEFAULT_PHYS_BASE)
if (ret)
goto mmio;
if (kvm_read_guest_virt(ctxt, addr, val, bytes, exception)
@ -3944,16 +4119,16 @@ static int emulator_write_emulated_onepage(unsigned long addr,
struct x86_exception *exception,
struct kvm_vcpu *vcpu)
{
gpa_t gpa;
int handled;
gpa_t gpa;
int handled, ret;
gpa = kvm_mmu_gva_to_gpa_write(vcpu, addr, exception);
ret = vcpu_mmio_gva_to_gpa(vcpu, addr, &gpa, exception, true);
if (gpa == UNMAPPED_GVA)
if (ret < 0)
return X86EMUL_PROPAGATE_FAULT;
/* For APIC access vmexit */
if ((gpa & PAGE_MASK) == APIC_DEFAULT_PHYS_BASE)
if (ret)
goto mmio;
if (emulator_write_phys(vcpu, gpa, val, bytes))
@ -4473,9 +4648,24 @@ static void inject_emulated_exception(struct kvm_vcpu *vcpu)
kvm_queue_exception(vcpu, ctxt->exception.vector);
}
static void init_decode_cache(struct x86_emulate_ctxt *ctxt,
const unsigned long *regs)
{
memset(&ctxt->twobyte, 0,
(void *)&ctxt->regs - (void *)&ctxt->twobyte);
memcpy(ctxt->regs, regs, sizeof(ctxt->regs));
ctxt->fetch.start = 0;
ctxt->fetch.end = 0;
ctxt->io_read.pos = 0;
ctxt->io_read.end = 0;
ctxt->mem_read.pos = 0;
ctxt->mem_read.end = 0;
}
static void init_emulate_ctxt(struct kvm_vcpu *vcpu)
{
struct decode_cache *c = &vcpu->arch.emulate_ctxt.decode;
struct x86_emulate_ctxt *ctxt = &vcpu->arch.emulate_ctxt;
int cs_db, cs_l;
/*
@ -4488,40 +4678,38 @@ static void init_emulate_ctxt(struct kvm_vcpu *vcpu)
kvm_x86_ops->get_cs_db_l_bits(vcpu, &cs_db, &cs_l);
vcpu->arch.emulate_ctxt.eflags = kvm_get_rflags(vcpu);
vcpu->arch.emulate_ctxt.eip = kvm_rip_read(vcpu);
vcpu->arch.emulate_ctxt.mode =
(!is_protmode(vcpu)) ? X86EMUL_MODE_REAL :
(vcpu->arch.emulate_ctxt.eflags & X86_EFLAGS_VM)
? X86EMUL_MODE_VM86 : cs_l
? X86EMUL_MODE_PROT64 : cs_db
? X86EMUL_MODE_PROT32 : X86EMUL_MODE_PROT16;
vcpu->arch.emulate_ctxt.guest_mode = is_guest_mode(vcpu);
memset(c, 0, sizeof(struct decode_cache));
memcpy(c->regs, vcpu->arch.regs, sizeof c->regs);
ctxt->eflags = kvm_get_rflags(vcpu);
ctxt->eip = kvm_rip_read(vcpu);
ctxt->mode = (!is_protmode(vcpu)) ? X86EMUL_MODE_REAL :
(ctxt->eflags & X86_EFLAGS_VM) ? X86EMUL_MODE_VM86 :
cs_l ? X86EMUL_MODE_PROT64 :
cs_db ? X86EMUL_MODE_PROT32 :
X86EMUL_MODE_PROT16;
ctxt->guest_mode = is_guest_mode(vcpu);
init_decode_cache(ctxt, vcpu->arch.regs);
vcpu->arch.emulate_regs_need_sync_from_vcpu = false;
}
int kvm_inject_realmode_interrupt(struct kvm_vcpu *vcpu, int irq, int inc_eip)
{
struct decode_cache *c = &vcpu->arch.emulate_ctxt.decode;
struct x86_emulate_ctxt *ctxt = &vcpu->arch.emulate_ctxt;
int ret;
init_emulate_ctxt(vcpu);
vcpu->arch.emulate_ctxt.decode.op_bytes = 2;
vcpu->arch.emulate_ctxt.decode.ad_bytes = 2;
vcpu->arch.emulate_ctxt.decode.eip = vcpu->arch.emulate_ctxt.eip +
inc_eip;
ret = emulate_int_real(&vcpu->arch.emulate_ctxt, &emulate_ops, irq);
ctxt->op_bytes = 2;
ctxt->ad_bytes = 2;
ctxt->_eip = ctxt->eip + inc_eip;
ret = emulate_int_real(ctxt, irq);
if (ret != X86EMUL_CONTINUE)
return EMULATE_FAIL;
vcpu->arch.emulate_ctxt.eip = c->eip;
memcpy(vcpu->arch.regs, c->regs, sizeof c->regs);
kvm_rip_write(vcpu, vcpu->arch.emulate_ctxt.eip);
kvm_set_rflags(vcpu, vcpu->arch.emulate_ctxt.eflags);
ctxt->eip = ctxt->_eip;
memcpy(vcpu->arch.regs, ctxt->regs, sizeof ctxt->regs);
kvm_rip_write(vcpu, ctxt->eip);
kvm_set_rflags(vcpu, ctxt->eflags);
if (irq == NMI_VECTOR)
vcpu->arch.nmi_pending = false;
@ -4582,21 +4770,21 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu,
int insn_len)
{
int r;
struct decode_cache *c = &vcpu->arch.emulate_ctxt.decode;
struct x86_emulate_ctxt *ctxt = &vcpu->arch.emulate_ctxt;
bool writeback = true;
kvm_clear_exception_queue(vcpu);
if (!(emulation_type & EMULTYPE_NO_DECODE)) {
init_emulate_ctxt(vcpu);
vcpu->arch.emulate_ctxt.interruptibility = 0;
vcpu->arch.emulate_ctxt.have_exception = false;
vcpu->arch.emulate_ctxt.perm_ok = false;
ctxt->interruptibility = 0;
ctxt->have_exception = false;
ctxt->perm_ok = false;
vcpu->arch.emulate_ctxt.only_vendor_specific_insn
ctxt->only_vendor_specific_insn
= emulation_type & EMULTYPE_TRAP_UD;
r = x86_decode_insn(&vcpu->arch.emulate_ctxt, insn, insn_len);
r = x86_decode_insn(ctxt, insn, insn_len);
trace_kvm_emulate_insn_start(vcpu);
++vcpu->stat.insn_emulation;
@ -4612,7 +4800,7 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu,
}
if (emulation_type & EMULTYPE_SKIP) {
kvm_rip_write(vcpu, vcpu->arch.emulate_ctxt.decode.eip);
kvm_rip_write(vcpu, ctxt->_eip);
return EMULATE_DONE;
}
@ -4620,11 +4808,11 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu,
changes registers values during IO operation */
if (vcpu->arch.emulate_regs_need_sync_from_vcpu) {
vcpu->arch.emulate_regs_need_sync_from_vcpu = false;
memcpy(c->regs, vcpu->arch.regs, sizeof c->regs);
memcpy(ctxt->regs, vcpu->arch.regs, sizeof ctxt->regs);
}
restart:
r = x86_emulate_insn(&vcpu->arch.emulate_ctxt);
r = x86_emulate_insn(ctxt);
if (r == EMULATION_INTERCEPTED)
return EMULATE_DONE;
@ -4636,7 +4824,7 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu,
return handle_emulation_failure(vcpu);
}
if (vcpu->arch.emulate_ctxt.have_exception) {
if (ctxt->have_exception) {
inject_emulated_exception(vcpu);
r = EMULATE_DONE;
} else if (vcpu->arch.pio.count) {
@ -4655,13 +4843,12 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu,
r = EMULATE_DONE;
if (writeback) {
toggle_interruptibility(vcpu,
vcpu->arch.emulate_ctxt.interruptibility);
kvm_set_rflags(vcpu, vcpu->arch.emulate_ctxt.eflags);
toggle_interruptibility(vcpu, ctxt->interruptibility);
kvm_set_rflags(vcpu, ctxt->eflags);
kvm_make_request(KVM_REQ_EVENT, vcpu);
memcpy(vcpu->arch.regs, c->regs, sizeof c->regs);
memcpy(vcpu->arch.regs, ctxt->regs, sizeof ctxt->regs);
vcpu->arch.emulate_regs_need_sync_to_vcpu = false;
kvm_rip_write(vcpu, vcpu->arch.emulate_ctxt.eip);
kvm_rip_write(vcpu, ctxt->eip);
} else
vcpu->arch.emulate_regs_need_sync_to_vcpu = true;
@ -4878,6 +5065,30 @@ void kvm_after_handle_nmi(struct kvm_vcpu *vcpu)
}
EXPORT_SYMBOL_GPL(kvm_after_handle_nmi);
static void kvm_set_mmio_spte_mask(void)
{
u64 mask;
int maxphyaddr = boot_cpu_data.x86_phys_bits;
/*
* Set the reserved bits and the present bit of an paging-structure
* entry to generate page fault with PFER.RSV = 1.
*/
mask = ((1ull << (62 - maxphyaddr + 1)) - 1) << maxphyaddr;
mask |= 1ull;
#ifdef CONFIG_X86_64
/*
* If reserved bit is not supported, clear the present bit to disable
* mmio page fault.
*/
if (maxphyaddr == 52)
mask &= ~1ull;
#endif
kvm_mmu_set_mmio_spte_mask(mask);
}
int kvm_arch_init(void *opaque)
{
int r;
@ -4904,10 +5115,10 @@ int kvm_arch_init(void *opaque)
if (r)
goto out;
kvm_set_mmio_spte_mask();
kvm_init_msr_list();
kvm_x86_ops = ops;
kvm_mmu_set_nonpresent_ptes(0ull, 0ull);
kvm_mmu_set_mask_ptes(PT_USER_MASK, PT_ACCESSED_MASK,
PT_DIRTY_MASK, PT64_NX_MASK, 0);
@ -5082,8 +5293,7 @@ int emulator_fix_hypercall(struct x86_emulate_ctxt *ctxt)
kvm_x86_ops->patch_hypercall(vcpu, instruction);
return emulator_write_emulated(&vcpu->arch.emulate_ctxt,
rip, instruction, 3, NULL);
return emulator_write_emulated(ctxt, rip, instruction, 3, NULL);
}
static int move_to_next_stateful_cpuid_entry(struct kvm_vcpu *vcpu, int i)
@ -5384,6 +5594,9 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
r = 1;
goto out;
}
if (kvm_check_request(KVM_REQ_STEAL_UPDATE, vcpu))
record_steal_time(vcpu);
}
r = kvm_mmu_reload(vcpu);
@ -5671,8 +5884,8 @@ int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
* that usually, but some bad designed PV devices (vmware
* backdoor interface) need this to work
*/
struct decode_cache *c = &vcpu->arch.emulate_ctxt.decode;
memcpy(vcpu->arch.regs, c->regs, sizeof c->regs);
struct x86_emulate_ctxt *ctxt = &vcpu->arch.emulate_ctxt;
memcpy(vcpu->arch.regs, ctxt->regs, sizeof ctxt->regs);
vcpu->arch.emulate_regs_need_sync_to_vcpu = false;
}
regs->rax = kvm_register_read(vcpu, VCPU_REGS_RAX);
@ -5801,21 +6014,20 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
int kvm_task_switch(struct kvm_vcpu *vcpu, u16 tss_selector, int reason,
bool has_error_code, u32 error_code)
{
struct decode_cache *c = &vcpu->arch.emulate_ctxt.decode;
struct x86_emulate_ctxt *ctxt = &vcpu->arch.emulate_ctxt;
int ret;
init_emulate_ctxt(vcpu);
ret = emulator_task_switch(&vcpu->arch.emulate_ctxt,
tss_selector, reason, has_error_code,
error_code);
ret = emulator_task_switch(ctxt, tss_selector, reason,
has_error_code, error_code);
if (ret)
return EMULATE_FAIL;
memcpy(vcpu->arch.regs, c->regs, sizeof c->regs);
kvm_rip_write(vcpu, vcpu->arch.emulate_ctxt.eip);
kvm_set_rflags(vcpu, vcpu->arch.emulate_ctxt.eflags);
memcpy(vcpu->arch.regs, ctxt->regs, sizeof ctxt->regs);
kvm_rip_write(vcpu, ctxt->eip);
kvm_set_rflags(vcpu, ctxt->eflags);
kvm_make_request(KVM_REQ_EVENT, vcpu);
return EMULATE_DONE;
}
@ -6093,12 +6305,7 @@ int kvm_arch_vcpu_setup(struct kvm_vcpu *vcpu)
if (r == 0)
r = kvm_mmu_setup(vcpu);
vcpu_put(vcpu);
if (r < 0)
goto free_vcpu;
return 0;
free_vcpu:
kvm_x86_ops->vcpu_free(vcpu);
return r;
}
@ -6126,6 +6333,7 @@ int kvm_arch_vcpu_reset(struct kvm_vcpu *vcpu)
kvm_make_request(KVM_REQ_EVENT, vcpu);
vcpu->arch.apf.msr_val = 0;
vcpu->arch.st.msr_val = 0;
kvmclock_reset(vcpu);

View File

@ -75,10 +75,54 @@ static inline u32 bit(int bitno)
return 1 << (bitno & 31);
}
static inline void vcpu_cache_mmio_info(struct kvm_vcpu *vcpu,
gva_t gva, gfn_t gfn, unsigned access)
{
vcpu->arch.mmio_gva = gva & PAGE_MASK;
vcpu->arch.access = access;
vcpu->arch.mmio_gfn = gfn;
}
/*
* Clear the mmio cache info for the given gva,
* specially, if gva is ~0ul, we clear all mmio cache info.
*/
static inline void vcpu_clear_mmio_info(struct kvm_vcpu *vcpu, gva_t gva)
{
if (gva != (~0ul) && vcpu->arch.mmio_gva != (gva & PAGE_MASK))
return;
vcpu->arch.mmio_gva = 0;
}
static inline bool vcpu_match_mmio_gva(struct kvm_vcpu *vcpu, unsigned long gva)
{
if (vcpu->arch.mmio_gva && vcpu->arch.mmio_gva == (gva & PAGE_MASK))
return true;
return false;
}
static inline bool vcpu_match_mmio_gpa(struct kvm_vcpu *vcpu, gpa_t gpa)
{
if (vcpu->arch.mmio_gfn && vcpu->arch.mmio_gfn == gpa >> PAGE_SHIFT)
return true;
return false;
}
void kvm_before_handle_nmi(struct kvm_vcpu *vcpu);
void kvm_after_handle_nmi(struct kvm_vcpu *vcpu);
int kvm_inject_realmode_interrupt(struct kvm_vcpu *vcpu, int irq, int inc_eip);
void kvm_write_tsc(struct kvm_vcpu *vcpu, u64 data);
int kvm_read_guest_virt(struct x86_emulate_ctxt *ctxt,
gva_t addr, void *val, unsigned int bytes,
struct x86_exception *exception);
int kvm_write_guest_virt_system(struct x86_emulate_ctxt *ctxt,
gva_t addr, void *val, unsigned int bytes,
struct x86_exception *exception);
#endif

View File

@ -161,6 +161,7 @@ struct kvm_pit_config {
#define KVM_EXIT_NMI 16
#define KVM_EXIT_INTERNAL_ERROR 17
#define KVM_EXIT_OSI 18
#define KVM_EXIT_PAPR_HCALL 19
/* For KVM_EXIT_INTERNAL_ERROR */
#define KVM_INTERNAL_ERROR_EMULATION 1
@ -264,6 +265,11 @@ struct kvm_run {
struct {
__u64 gprs[32];
} osi;
struct {
__u64 nr;
__u64 ret;
__u64 args[9];
} papr_hcall;
/* Fix the size of the union. */
char padding[256];
};
@ -544,6 +550,9 @@ struct kvm_ppc_pvinfo {
#define KVM_CAP_TSC_CONTROL 60
#define KVM_CAP_GET_TSC_KHZ 61
#define KVM_CAP_PPC_BOOKE_SREGS 62
#define KVM_CAP_SPAPR_TCE 63
#define KVM_CAP_PPC_SMT 64
#define KVM_CAP_PPC_RMA 65
#ifdef KVM_CAP_IRQ_ROUTING
@ -746,6 +755,9 @@ struct kvm_clock_data {
/* Available with KVM_CAP_XCRS */
#define KVM_GET_XCRS _IOR(KVMIO, 0xa6, struct kvm_xcrs)
#define KVM_SET_XCRS _IOW(KVMIO, 0xa7, struct kvm_xcrs)
#define KVM_CREATE_SPAPR_TCE _IOW(KVMIO, 0xa8, struct kvm_create_spapr_tce)
/* Available with KVM_CAP_RMA */
#define KVM_ALLOCATE_RMA _IOR(KVMIO, 0xa9, struct kvm_allocate_rma)
#define KVM_DEV_ASSIGN_ENABLE_IOMMU (1 << 0)
@ -773,20 +785,14 @@ struct kvm_assigned_pci_dev {
struct kvm_assigned_irq {
__u32 assigned_dev_id;
__u32 host_irq;
__u32 host_irq; /* ignored (legacy field) */
__u32 guest_irq;
__u32 flags;
union {
struct {
__u32 addr_lo;
__u32 addr_hi;
__u32 data;
} guest_msi;
__u32 reserved[12];
};
};
struct kvm_assigned_msix_nr {
__u32 assigned_dev_id;
__u16 entry_nr;

View File

@ -47,6 +47,7 @@
#define KVM_REQ_DEACTIVATE_FPU 10
#define KVM_REQ_EVENT 11
#define KVM_REQ_APF_HALT 12
#define KVM_REQ_STEAL_UPDATE 13
#define KVM_USERSPACE_IRQ_SOURCE_ID 0
@ -326,12 +327,17 @@ static inline struct kvm_memslots *kvm_memslots(struct kvm *kvm)
static inline int is_error_hpa(hpa_t hpa) { return hpa >> HPA_MSB; }
extern struct page *bad_page;
extern struct page *fault_page;
extern pfn_t bad_pfn;
extern pfn_t fault_pfn;
int is_error_page(struct page *page);
int is_error_pfn(pfn_t pfn);
int is_hwpoison_pfn(pfn_t pfn);
int is_fault_pfn(pfn_t pfn);
int is_noslot_pfn(pfn_t pfn);
int is_invalid_pfn(pfn_t pfn);
int kvm_is_error_hva(unsigned long addr);
int kvm_set_memory_region(struct kvm *kvm,
struct kvm_userspace_memory_region *mem,
@ -381,6 +387,8 @@ int kvm_read_guest_page(struct kvm *kvm, gfn_t gfn, void *data, int offset,
int kvm_read_guest_atomic(struct kvm *kvm, gpa_t gpa, void *data,
unsigned long len);
int kvm_read_guest(struct kvm *kvm, gpa_t gpa, void *data, unsigned long len);
int kvm_read_guest_cached(struct kvm *kvm, struct gfn_to_hva_cache *ghc,
void *data, unsigned long len);
int kvm_write_guest_page(struct kvm *kvm, gfn_t gfn, const void *data,
int offset, int len);
int kvm_write_guest(struct kvm *kvm, gpa_t gpa, const void *data,

View File

@ -890,6 +890,7 @@ sigset_from_compat (sigset_t *set, compat_sigset_t *compat)
case 1: set->sig[0] = compat->sig[0] | (((long)compat->sig[1]) << 32 );
}
}
EXPORT_SYMBOL_GPL(sigset_from_compat);
asmlinkage long
compat_sys_rt_sigtimedwait (compat_sigset_t __user *uthese,

View File

@ -19,8 +19,10 @@
#include <linux/time.h>
#include <linux/sysctl.h>
#include <linux/delayacct.h>
#include <linux/module.h>
int delayacct_on __read_mostly = 1; /* Delay accounting turned on/off */
EXPORT_SYMBOL_GPL(delayacct_on);
struct kmem_cache *delayacct_cache;
static int __init delayacct_setup_disable(char *str)

View File

@ -75,6 +75,9 @@
#include <asm/tlb.h>
#include <asm/irq_regs.h>
#include <asm/mutex.h>
#ifdef CONFIG_PARAVIRT
#include <asm/paravirt.h>
#endif
#include "sched_cpupri.h"
#include "workqueue_sched.h"
@ -528,6 +531,12 @@ struct rq {
#ifdef CONFIG_IRQ_TIME_ACCOUNTING
u64 prev_irq_time;
#endif
#ifdef CONFIG_PARAVIRT
u64 prev_steal_time;
#endif
#ifdef CONFIG_PARAVIRT_TIME_ACCOUNTING
u64 prev_steal_time_rq;
#endif
/* calc_load related fields */
unsigned long calc_load_update;
@ -1921,10 +1930,28 @@ void account_system_vtime(struct task_struct *curr)
}
EXPORT_SYMBOL_GPL(account_system_vtime);
#endif /* CONFIG_IRQ_TIME_ACCOUNTING */
#ifdef CONFIG_PARAVIRT
static inline u64 steal_ticks(u64 steal)
{
if (unlikely(steal > NSEC_PER_SEC))
return div_u64(steal, TICK_NSEC);
return __iter_div_u64_rem(steal, TICK_NSEC, &steal);
}
#endif
static void update_rq_clock_task(struct rq *rq, s64 delta)
{
s64 irq_delta;
/*
* In theory, the compile should just see 0 here, and optimize out the call
* to sched_rt_avg_update. But I don't trust it...
*/
#if defined(CONFIG_IRQ_TIME_ACCOUNTING) || defined(CONFIG_PARAVIRT_TIME_ACCOUNTING)
s64 steal = 0, irq_delta = 0;
#endif
#ifdef CONFIG_IRQ_TIME_ACCOUNTING
irq_delta = irq_time_read(cpu_of(rq)) - rq->prev_irq_time;
/*
@ -1947,12 +1974,35 @@ static void update_rq_clock_task(struct rq *rq, s64 delta)
rq->prev_irq_time += irq_delta;
delta -= irq_delta;
#endif
#ifdef CONFIG_PARAVIRT_TIME_ACCOUNTING
if (static_branch((&paravirt_steal_rq_enabled))) {
u64 st;
steal = paravirt_steal_clock(cpu_of(rq));
steal -= rq->prev_steal_time_rq;
if (unlikely(steal > delta))
steal = delta;
st = steal_ticks(steal);
steal = st * TICK_NSEC;
rq->prev_steal_time_rq += steal;
delta -= steal;
}
#endif
rq->clock_task += delta;
if (irq_delta && sched_feat(NONIRQ_POWER))
sched_rt_avg_update(rq, irq_delta);
#if defined(CONFIG_IRQ_TIME_ACCOUNTING) || defined(CONFIG_PARAVIRT_TIME_ACCOUNTING)
if ((irq_delta + steal) && sched_feat(NONTASK_POWER))
sched_rt_avg_update(rq, irq_delta + steal);
#endif
}
#ifdef CONFIG_IRQ_TIME_ACCOUNTING
static int irqtime_account_hi_update(void)
{
struct cpu_usage_stat *cpustat = &kstat_this_cpu.cpustat;
@ -1987,12 +2037,7 @@ static int irqtime_account_si_update(void)
#define sched_clock_irqtime (0)
static void update_rq_clock_task(struct rq *rq, s64 delta)
{
rq->clock_task += delta;
}
#endif /* CONFIG_IRQ_TIME_ACCOUNTING */
#endif
#include "sched_idletask.c"
#include "sched_fair.c"
@ -3845,6 +3890,25 @@ void account_idle_time(cputime_t cputime)
cpustat->idle = cputime64_add(cpustat->idle, cputime64);
}
static __always_inline bool steal_account_process_tick(void)
{
#ifdef CONFIG_PARAVIRT
if (static_branch(&paravirt_steal_enabled)) {
u64 steal, st = 0;
steal = paravirt_steal_clock(smp_processor_id());
steal -= this_rq()->prev_steal_time;
st = steal_ticks(steal);
this_rq()->prev_steal_time += st * TICK_NSEC;
account_steal_time(st);
return st;
}
#endif
return false;
}
#ifndef CONFIG_VIRT_CPU_ACCOUNTING
#ifdef CONFIG_IRQ_TIME_ACCOUNTING
@ -3876,6 +3940,9 @@ static void irqtime_account_process_tick(struct task_struct *p, int user_tick,
cputime64_t tmp = cputime_to_cputime64(cputime_one_jiffy);
struct cpu_usage_stat *cpustat = &kstat_this_cpu.cpustat;
if (steal_account_process_tick())
return;
if (irqtime_account_hi_update()) {
cpustat->irq = cputime64_add(cpustat->irq, tmp);
} else if (irqtime_account_si_update()) {
@ -3929,6 +3996,9 @@ void account_process_tick(struct task_struct *p, int user_tick)
return;
}
if (steal_account_process_tick())
return;
if (user_tick)
account_user_time(p, cputime_one_jiffy, one_jiffy_scaled);
else if ((p != rq->idle) || (irq_count() != HARDIRQ_OFFSET))

View File

@ -61,9 +61,9 @@ SCHED_FEAT(LB_BIAS, 1)
SCHED_FEAT(OWNER_SPIN, 1)
/*
* Decrement CPU power based on irq activity
* Decrement CPU power based on time not spent running tasks
*/
SCHED_FEAT(NONIRQ_POWER, 1)
SCHED_FEAT(NONTASK_POWER, 1)
/*
* Queue remote wakeups on the target CPU and process them

View File

@ -617,7 +617,7 @@ static int kvm_vm_ioctl_set_msix_nr(struct kvm *kvm,
if (adev->entries_nr == 0) {
adev->entries_nr = entry_nr->entry_nr;
if (adev->entries_nr == 0 ||
adev->entries_nr >= KVM_MAX_MSIX_PER_DEV) {
adev->entries_nr > KVM_MAX_MSIX_PER_DEV) {
r = -EINVAL;
goto msix_nr_out;
}

Some files were not shown because too many files have changed in this diff Show More