Wei Yongjun [Thu, 17 Jun 2010 09:33:55 +0000 (17:33 +0800)]
KVM: x86 emulator: fix group3 instruction decoding
Group 3 instruction with ModRM reg field as 001 is
defined as test instruction under AMD arch, and
emulate_grp3() is ready for emulate it, so fix the
decoding.
static inline int emulate_grp3(...)
{
...
switch (c->modrm_reg) {
case 0 ... 1: /* test */
emulate_2op_SrcV("test", c->src, c->dst, ctxt->eflags);
...
}
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Asias He [Sat, 19 Jun 2010 08:52:12 +0000 (16:52 +0800)]
KVM: PPC: fix uninitialized variable warning in kvm_ppc_core_deliver_interrupts
Fixes:
arch/powerpc/kvm/booke.c: In function 'kvmppc_core_deliver_interrupts':
arch/powerpc/kvm/booke.c:147: warning: 'msr_mask' may be used uninitialized in this function
Signed-off-by: Asias He <asias.hejun@gmail.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Jason Wang [Thu, 17 Jun 2010 08:49:22 +0000 (16:49 +0800)]
KVM: Fix typos in Documentation/kvm/mmu.txt
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Chris Lalancette [Wed, 16 Jun 2010 21:11:13 +0000 (17:11 -0400)]
KVM: x86: In DM_LOWEST, only deliver interrupts to vcpus with enabled LAPIC's
Otherwise we might try to deliver a timer interrupt to a cpu that
can't possibly handle it.
Signed-off-by: Chris Lalancette <clalance@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Chris Lalancette [Wed, 16 Jun 2010 21:11:12 +0000 (17:11 -0400)]
KVM: x86: Allow any LAPIC to accept PIC interrupts
If the guest wants to accept timer interrupts on a CPU other
than the BSP, we need to remove this gate.
Signed-off-by: Chris Lalancette <clalance@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Chris Lalancette [Wed, 16 Jun 2010 21:11:11 +0000 (17:11 -0400)]
KVM: x86: Introduce a workqueue to deliver PIT timer interrupts
We really want to "kvm_set_irq" during the hrtimer callback,
but that is risky because that is during interrupt context.
Instead, offload the work to a workqueue, which is a bit safer
and should provide most of the same functionality.
Signed-off-by: Chris Lalancette <clalance@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Wei Yongjun [Tue, 15 Jun 2010 01:03:33 +0000 (09:03 +0800)]
KVM: x86 emulator: fix pusha instruction emulation
emulate pusha instruction only writeback the last
EDI register, but the other registers which need
to be writeback is ignored. This patch fixed it.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Zachary Amsden [Mon, 14 Jun 2010 21:42:15 +0000 (11:42 -1000)]
KVM: x86: fix -DDEBUG oops
Fix a slight error with assertion in local APIC code.
Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Xiao Guangrong [Fri, 11 Jun 2010 13:35:15 +0000 (21:35 +0800)]
KVM: MMU: don't walk every parent pages while mark unsync
While we mark the parent's unsync_child_bitmap, if the parent is already
unsynced, it no need walk it's parent, it can reduce some unnecessary
workload
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Xiao Guangrong [Fri, 11 Jun 2010 13:34:04 +0000 (21:34 +0800)]
KVM: MMU: clear unsync_child_bitmap completely
In current code, some page's unsync_child_bitmap is not cleared completely
in mmu_sync_children(), for example, if two PDPEs shard one PDT, one of
PDPE's unsync_child_bitmap is not cleared.
Currently, it not harm anything just little overload, but it's the prepare
work for the later patch
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Xiao Guangrong [Fri, 11 Jun 2010 13:32:34 +0000 (21:32 +0800)]
KVM: MMU: cleanup for __mmu_unsync_walk()
Decrease sp->unsync_children after clear unsync_child_bitmap bit
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Xiao Guangrong [Fri, 11 Jun 2010 13:31:38 +0000 (21:31 +0800)]
KVM: MMU: don't mark pte notrap if it's just sync transient
If the sync-sp just sync transient, don't mark its pte notrap
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Xiao Guangrong [Fri, 11 Jun 2010 13:30:36 +0000 (21:30 +0800)]
KVM: MMU: avoid double write protected in sync page path
The sync page is already write protected in mmu_sync_children(), don't
write protected it again
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Xiao Guangrong [Fri, 11 Jun 2010 13:29:42 +0000 (21:29 +0800)]
KVM: MMU: cleanup for dirty page judgment
Using wrap function to cleanup page dirty judgment
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Xiao Guangrong [Fri, 11 Jun 2010 13:28:14 +0000 (21:28 +0800)]
KVM: MMU: rename 'page' and 'shadow_page' to 'sp'
Rename 'page' and 'shadow_page' to 'sp' to better fit the context
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Sheng Yang [Sun, 13 Jun 2010 09:29:39 +0000 (17:29 +0800)]
KVM: x86: XSAVE/XRSTOR live migration support
This patch enable save/restore of xsave state.
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Denis Kirjanov [Fri, 11 Jun 2010 11:23:26 +0000 (11:23 +0000)]
KVM: PPC: fix build warning in kvm_arch_vcpu_ioctl_run
Fix compile warning:
CC [M] arch/powerpc/kvm/powerpc.o
arch/powerpc/kvm/powerpc.c: In function 'kvm_arch_vcpu_ioctl_run':
arch/powerpc/kvm/powerpc.c:290: warning: 'gpr' may be used uninitialized in this function
arch/powerpc/kvm/powerpc.c:290: note: 'gpr' was declared here
Signed-off-by: Denis Kirjanov <dkirjanov@kernel.org>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Avi Kivity [Thu, 10 Jun 2010 14:02:16 +0000 (17:02 +0300)]
KVM: Fix mov cr3 #GP at wrong instruction
On Intel, we call skip_emulated_instruction() even if we injected a #GP,
resulting in the #GP pointing at the wrong address.
Fix by injecting the exception and skipping the instruction at the same place,
so we can do just one or the other.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Avi Kivity [Thu, 10 Jun 2010 14:02:15 +0000 (17:02 +0300)]
KVM: Fix mov cr4 #GP at wrong instruction
On Intel, we call skip_emulated_instruction() even if we injected a #GP,
resulting in the #GP pointing at the wrong address.
Fix by injecting the exception and skipping the instruction at the same place,
so we can do just one or the other.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Avi Kivity [Thu, 10 Jun 2010 14:02:14 +0000 (17:02 +0300)]
KVM: Fix mov cr0 #GP at wrong instruction
On Intel, we call skip_emulated_instruction() even if we injected a #GP,
resulting in the #GP pointing at the wrong address.
Fix by injecting the exception and skipping the instruction at the same place,
so we can do just one or the other.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Dexuan Cui [Thu, 10 Jun 2010 03:27:12 +0000 (11:27 +0800)]
KVM: VMX: Enable XSAVE/XRSTOR for guest
This patch enable guest to use XSAVE/XRSTOR instructions.
We assume that host_xcr0 would use all possible bits that OS supported.
And we loaded xcr0 in the same way we handled fpu - do it as late as we can.
Signed-off-by: Dexuan Cui <dexuan.cui@intel.com>
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Thu, 10 Jun 2010 14:21:29 +0000 (17:21 +0300)]
KVM: VMX: Fix incorrect rcu deref in rmode_tss_base()
Signed-off-by: Avi Kivity <avi@redhat.com>
Andi Kleen [Thu, 10 Jun 2010 11:10:55 +0000 (13:10 +0200)]
KVM: Fix unused but set warnings
No real bugs in this one.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Andi Kleen [Thu, 10 Jun 2010 11:10:47 +0000 (13:10 +0200)]
KVM: Fix KVM_SET_SIGNAL_MASK with arg == NULL
When the user passed in a NULL mask pass this on from the ioctl
handler.
Found by gcc 4.6's new warnings.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Xiao Guangrong [Tue, 8 Jun 2010 12:07:01 +0000 (20:07 +0800)]
KVM: MMU: delay local tlb flush
delay local tlb flush until enter guest moden, it can reduce vpid flush
frequency and reduce remote tlb flush IPI(if KVM_REQ_TLB_FLUSH bit is
already set, IPI is not sent)
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Xiao Guangrong [Tue, 8 Jun 2010 12:05:57 +0000 (20:05 +0800)]
KVM: MMU: use wrapper function to flush local tlb
Use kvm_mmu_flush_tlb() function instead of calling
kvm_x86_ops->tlb_flush(vcpu) directly.
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Xiao Guangrong [Tue, 8 Jun 2010 12:05:05 +0000 (20:05 +0800)]
KVM: MMU: remove unnecessary remote tlb flush
This remote tlb flush is no necessary since we have synced while
sp is zapped
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Xiao Guangrong [Tue, 8 Jun 2010 02:15:51 +0000 (10:15 +0800)]
KVM: VMX: fix rcu usage warning in init_rmode()
fix:
[ INFO: suspicious rcu_dereference_check() usage. ]
---------------------------------------------------
include/linux/kvm_host.h:258 invoked rcu_dereference_check() without protection!
other info that might help us debug this:
rcu_scheduler_active = 1, debug_locks = 1
1 lock held by qemu-system-x86/3796:
#0: (&vcpu->mutex){+.+.+.}, at: [<
ffffffffa0217fd8>] vcpu_load+0x1a/0x66 [kvm]
stack backtrace:
Pid: 3796, comm: qemu-system-x86 Not tainted 2.6.34 #25
Call Trace:
[<
ffffffff81070ed1>] lockdep_rcu_dereference+0x9d/0xa5
[<
ffffffffa0214fdf>] gfn_to_memslot_unaliased+0x65/0xa0 [kvm]
[<
ffffffffa0216139>] gfn_to_hva+0x22/0x4c [kvm]
[<
ffffffffa0216217>] kvm_write_guest_page+0x2a/0x7f [kvm]
[<
ffffffffa0216286>] kvm_clear_guest_page+0x1a/0x1c [kvm]
[<
ffffffffa0278239>] init_rmode+0x3b/0x180 [kvm_intel]
[<
ffffffffa02786ce>] vmx_set_cr0+0x350/0x4d3 [kvm_intel]
[<
ffffffffa02274ff>] kvm_arch_vcpu_ioctl_set_sregs+0x122/0x31a [kvm]
[<
ffffffffa021859c>] kvm_vcpu_ioctl+0x578/0xa3d [kvm]
[<
ffffffff8106624c>] ? cpu_clock+0x2d/0x40
[<
ffffffff810f7d86>] ? fget_light+0x244/0x28e
[<
ffffffff810709b9>] ? trace_hardirqs_off_caller+0x1f/0x10e
[<
ffffffff8110501b>] vfs_ioctl+0x32/0xa6
[<
ffffffff81105597>] do_vfs_ioctl+0x47f/0x4b8
[<
ffffffff813ae654>] ? sub_preempt_count+0xa3/0xb7
[<
ffffffff810f7da8>] ? fget_light+0x266/0x28e
[<
ffffffff810f7c53>] ? fget_light+0x111/0x28e
[<
ffffffff81105617>] sys_ioctl+0x47/0x6a
[<
ffffffff81002c1b>] system_call_fastpath+0x16/0x1b
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Gui Jianfeng [Mon, 7 Jun 2010 02:33:27 +0000 (10:33 +0800)]
KVM: VMX: rename vpid_sync_vcpu_all() to vpid_sync_vcpu_single()
The name "pid_sync_vcpu_all" isn't appropriate since it just affect
a single vpid, so rename it to vpid_sync_vcpu_single().
Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Gui Jianfeng [Mon, 7 Jun 2010 02:32:29 +0000 (10:32 +0800)]
KVM: VMX: Add all-context INVVPID type support
Add all-context INVVPID type support.
Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Xiao Guangrong [Fri, 4 Jun 2010 13:56:59 +0000 (21:56 +0800)]
KVM: MMU: reduce remote tlb flush in kvm_mmu_pte_write()
collect remote tlb flush in kvm_mmu_pte_write() path
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Xiao Guangrong [Fri, 4 Jun 2010 13:56:11 +0000 (21:56 +0800)]
KVM: MMU: traverse sp hlish safely
Now, we can safely to traverse sp hlish
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Xiao Guangrong [Fri, 4 Jun 2010 13:55:29 +0000 (21:55 +0800)]
KVM: MMU: gather remote tlb flush which occurs during page zapped
Using kvm_mmu_prepare_zap_page() and kvm_mmu_zap_page() instead of
kvm_mmu_zap_page() that can reduce remote tlb flush IPI
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Xiao Guangrong [Fri, 4 Jun 2010 13:54:38 +0000 (21:54 +0800)]
KVM: MMU: don't get free page number in the loop
In the later patch, we will modify sp's zapping way like below:
kvm_mmu_prepare_zap_page A
kvm_mmu_prepare_zap_page B
kvm_mmu_prepare_zap_page C
....
kvm_mmu_commit_zap_page
[ zaped multiple sps only need to call kvm_mmu_commit_zap_page once ]
In __kvm_mmu_free_some_pages() function, the free page number is
getted form 'vcpu->kvm->arch.n_free_mmu_pages' in loop, it will
hinders us to apply kvm_mmu_prepare_zap_page() and kvm_mmu_commit_zap_page()
since kvm_mmu_prepare_zap_page() not free sp.
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Xiao Guangrong [Fri, 4 Jun 2010 13:53:54 +0000 (21:53 +0800)]
KVM: MMU: split the operations of kvm_mmu_zap_page()
Using kvm_mmu_prepare_zap_page() and kvm_mmu_commit_zap_page() to
split kvm_mmu_zap_page() function, then we can:
- traverse hlist safely
- easily to gather remote tlb flush which occurs during page zapped
Those feature can be used in the later patches
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Xiao Guangrong [Fri, 4 Jun 2010 13:53:07 +0000 (21:53 +0800)]
KVM: MMU: introduce some macros to cleanup hlist traverseing
Introduce for_each_gfn_sp() and for_each_gfn_indirect_valid_sp() to
cleanup hlist traverseing
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Xiao Guangrong [Fri, 4 Jun 2010 13:52:17 +0000 (21:52 +0800)]
KVM: MMU: skip invalid sp when unprotect page
In kvm_mmu_unprotect_page(), the invalid sp can be skipped
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Gui Jianfeng [Fri, 4 Jun 2010 00:51:39 +0000 (08:51 +0800)]
KVM: VMX: Make sure single type invvpid is supported before issuing invvpid instruction
According to SDM, we need check whether single-context INVVPID type is supported
before issuing invvpid instruction.
Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Reviewed-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Lai Jiangshan [Wed, 2 Jun 2010 09:06:03 +0000 (17:06 +0800)]
KVM: x86: use linux/uaccess.h instead of asm/uaccess.h
Should use linux/uaccess.h instead of asm/uaccess.h
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Lai Jiangshan [Wed, 2 Jun 2010 09:01:23 +0000 (17:01 +0800)]
KVM: cleanup "*new.rmap" type
The type of '*new.rmap' is not 'struct page *', fix it
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Sheng Yang [Wed, 2 Jun 2010 06:05:24 +0000 (14:05 +0800)]
KVM: VMX: Enforce EPT pagetable level checking
We only support 4 levels EPT pagetable now.
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Glauber Costa [Tue, 1 Jun 2010 12:22:48 +0000 (08:22 -0400)]
KVM: Add Documentation/kvm/msr.txt
This patch adds a file that documents the usage of KVM-specific
MSRs.
Signed-off-by: Glauber Costa <glommer@redhat.com>
Reviewed-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Andreas Schwab [Mon, 31 May 2010 19:59:13 +0000 (21:59 +0200)]
KVM: PPC: elide struct thread_struct instances from stack
Instead of instantiating a whole thread_struct on the stack use only the
required parts of it.
Signed-off-by: Andreas Schwab <schwab@linux-m68k.org>
Tested-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Mohammed Gamal [Mon, 31 May 2010 19:40:54 +0000 (22:40 +0300)]
KVM: VMX: Properly return error to userspace on vmentry failure
The vmexit handler returns KVM_EXIT_UNKNOWN since there is no handler
for vmentry failures. This intercepts vmentry failures and returns
KVM_FAIL_ENTRY to userspace instead.
Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Gui Jianfeng [Mon, 31 May 2010 09:11:39 +0000 (17:11 +0800)]
KVM: MMU: Don't calculate quadrant if tdp_enabled
There's no need to calculate quadrant if tdp is enabled.
Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Avi Kivity [Thu, 27 May 2010 13:44:12 +0000 (16:44 +0300)]
KVM: MMU: Document large pages
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Avi Kivity [Thu, 27 May 2010 11:46:04 +0000 (14:46 +0300)]
KVM: MMU: Document cr0.wp emulation
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Thu, 27 May 2010 11:22:51 +0000 (14:22 +0300)]
KVM: MMU: Allow spte.w=1 for gpte.w=0 and cr0.wp=0 only in shadow mode
When tdp is enabled, the guest's cr0.wp shouldn't have any effect on spte
permissions.
Signed-off-by: Avi Kivity <avi@redhat.com>
Jan Kiszka [Tue, 25 May 2010 14:01:50 +0000 (16:01 +0200)]
KVM: x86: Propagate fpu_alloc errors
Memory allocation may fail. Propagate such errors.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Reviewed-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Zachary Amsden [Thu, 27 May 2010 01:09:43 +0000 (15:09 -1000)]
KVM: SVM: Fix EFER.LME being stripped
Must set VCPU register to be the guest notion of EFER even if that
setting is not valid on hardware. This was masked by the set in
set_efer until
7657fd5ace88e8092f5f3a84117e093d7b893f26 broke that.
Fix is simply to set the VCPU register before stripping bits.
Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Gui Jianfeng [Thu, 27 May 2010 08:09:48 +0000 (16:09 +0800)]
KVM: MMU: don't check PT_WRITABLE_MASK directly
Since we have is_writable_pte(), make use of it.
Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Lai Jiangshan [Wed, 26 May 2010 08:48:19 +0000 (16:48 +0800)]
KVM: MMU: calculate correct gfn for small host pages backing large guest pages
In Documentation/kvm/mmu.txt:
gfn:
Either the guest page table containing the translations shadowed by this
page, or the base page frame for linear translations. See role.direct.
But in function FNAME(fetch)(), sp->gfn is incorrect when one of following
situations occurred:
1) guest is 32bit paging and the guest PDE maps a 4-MByte page
(backed by 4k host pages), FNAME(fetch)() miss handling the quadrant.
And if guest use pse-36, "table_gfn = gpte_to_gfn(gw->ptes[level - delta]);"
is incorrect.
2) guest is long mode paging and the guest PDPTE maps a 1-GByte page
(backed by 4k or 2M host pages).
So we fix it to suit to the document and suit to the code which
requires sp->gfn correct when sp->role.direct=1.
We use the goal mapping gfn(gw->gfn) to calculate the base page frame
for linear translations, it is simple and easy to be understood.
Reported-by: Marcelo Tosatti <mtosatti@redhat.com>
Reported-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Lai Jiangshan [Wed, 26 May 2010 08:48:25 +0000 (16:48 +0800)]
KVM: MMU: Calculate correct base gfn for direct non-DIR level
In Document/kvm/mmu.txt:
gfn:
Either the guest page table containing the translations shadowed by this
page, or the base page frame for linear translations. See role.direct.
But in __direct_map(), the base gfn calculation is incorrect,
it does not calculate correctly when level=3 or 4.
Fix by using PT64_LVL_ADDR_MASK() which accounts for all levels correctly.
Reported-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Lai Jiangshan [Wed, 26 May 2010 08:49:59 +0000 (16:49 +0800)]
KVM: MMU: Don't allocate gfns page for direct mmu pages
When sp->role.direct is set, sp->gfns does not contain any essential
information, leaf sptes reachable from this sp are for a continuous
guest physical memory range (a linear range).
So sp->gfns[i] (if it was set) equals to sp->gfn + i. (PT_PAGE_TABLE_LEVEL)
Obviously, it is not essential information, we can calculate it when need.
It means we don't need sp->gfns when sp->role.direct=1,
Thus we can save one page usage for every kvm_mmu_page.
Note:
Access to sp->gfns must be wrapped by kvm_mmu_page_get_gfn()
or kvm_mmu_page_set_gfn().
It is only exposed in FNAME(sync_page).
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Mohammed Gamal [Sun, 23 May 2010 22:01:04 +0000 (01:01 +0300)]
KVM: VMX: Add constant for invalid guest state exit reason
For the sake of completeness, this patch adds a symbolic
constant for VMX exit reason 0x21 (invalid guest state).
Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Xiao Guangrong [Mon, 24 May 2010 07:41:33 +0000 (15:41 +0800)]
KVM: MMU: allow more page become unsync at getting sp time
Allow more page become asynchronous at getting sp time, if need create new
shadow page for gfn but it not allow unsync(level > 1), we should unsync all
gfn's unsync page
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Xiao Guangrong [Mon, 24 May 2010 07:40:07 +0000 (15:40 +0800)]
KVM: MMU: allow more page become unsync at gfn mapping time
In current code, shadow page can become asynchronous only if one
shadow page for a gfn, this rule is too strict, in fact, we can
let all last mapping page(i.e, it's the pte page) become unsync,
and sync them at invlpg or flush tlb time.
This patch allow more page become asynchronous at gfn mapping time
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Sun, 23 May 2010 15:37:00 +0000 (18:37 +0300)]
KVM: Update Red Hat copyrights
Signed-off-by: Avi Kivity <avi@redhat.com>
Gleb Natapov [Sun, 23 May 2010 11:28:26 +0000 (14:28 +0300)]
KVM: SVM: correctly trace irq injection
On SVM interrupts are injected by svm_set_irq() not svm_inject_irq().
The later is used only to wait for irq window.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Xiao Guangrong [Sat, 15 May 2010 10:53:35 +0000 (18:53 +0800)]
KVM: MMU: only update unsync page in invlpg path
Only unsync pages need updated at invlpg time since other shadow
pages are write-protected
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Xiao Guangrong [Sat, 15 May 2010 10:52:34 +0000 (18:52 +0800)]
KVM: MMU: don't write-protect if have new mapping to unsync page
Two cases maybe happen in kvm_mmu_get_page() function:
- one case is, the goal sp is already in cache, if the sp is unsync,
we only need update it to assure this mapping is valid, but not
mark it sync and not write-protect sp->gfn since it not broke unsync
rule(one shadow page for a gfn)
- another case is, the goal sp not existed, we need create a new sp
for gfn, i.e, gfn (may)has another shadow page, to keep unsync rule,
we should sync(mark sync and write-protect) gfn's unsync shadow page.
After enabling multiple unsync shadows, we sync those shadow pages
only when the new sp not allow to become unsync(also for the unsyc
rule, the new rule is: allow all pte page become unsync)
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Xiao Guangrong [Sat, 15 May 2010 10:51:24 +0000 (18:51 +0800)]
KVM: MMU: split kvm_sync_page() function
Split kvm_sync_page() into kvm_sync_page() and kvm_sync_page_transient()
to clarify the code address Avi's suggestion
kvm_sync_page_transient() function only update shadow page but not mark
it sync and not write protect sp->gfn. it will be used by later patch
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Sheng Yang [Mon, 17 May 2010 09:08:28 +0000 (17:08 +0800)]
KVM: x86: Use FPU API
Convert KVM to use generic FPU API.
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Sheng Yang [Mon, 17 May 2010 09:08:27 +0000 (17:08 +0800)]
KVM: x86: Use unlazy_fpu() for host FPU
We can avoid unnecessary fpu load when userspace process
didn't use FPU frequently.
Derived from Avi's idea.
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Sheng Yang [Mon, 17 May 2010 09:22:23 +0000 (17:22 +0800)]
x86: Export FPU API for KVM use
Also add some constants.
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Thu, 13 May 2010 09:35:17 +0000 (12:35 +0300)]
KVM: Consolidate arch specific vcpu ioctl locking
Now that all arch specific ioctls have centralized locking, it is easy to
move it to the central dispatcher.
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Thu, 13 May 2010 09:30:43 +0000 (12:30 +0300)]
KVM: PPC: Centralize locking of arch specific vcpu ioctls
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Thu, 13 May 2010 09:21:46 +0000 (12:21 +0300)]
KVM: s390: Centrally lock arch specific vcpu ioctls
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Thu, 13 May 2010 08:53:06 +0000 (11:53 +0300)]
KVM: x86: Lock arch specific vcpu ioctls centrally
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Thu, 13 May 2010 08:25:04 +0000 (11:25 +0300)]
KVM: move vcpu locking to dispatcher for generic vcpu ioctls
All vcpu ioctls need to be locked, so instead of locking each one specifically
we lock at the generic dispatcher.
This patch only updates generic ioctls and leaves arch specific ioctls alone.
Signed-off-by: Avi Kivity <avi@redhat.com>
Xiao Guangrong [Thu, 13 May 2010 02:09:57 +0000 (10:09 +0800)]
KVM: x86: cleanup unused local variable
fix:
arch/x86/kvm/x86.c: In function ‘handle_emulation_failure’:
arch/x86/kvm/x86.c:3844: warning: unused variable ‘ctxt’
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Xiao Guangrong [Thu, 13 May 2010 02:08:08 +0000 (10:08 +0800)]
KVM: MMU: unalias gfn before sp->gfns[] comparison in sync_page
sp->gfns[] contain unaliased gfns, but gpte might contain pointer
to aliased region.
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Xiao Guangrong [Thu, 13 May 2010 02:07:00 +0000 (10:07 +0800)]
KVM: MMU: remove rmap before clear spte
Remove rmap before clear spte otherwise it will trigger BUG_ON() in
some functions such as rmap_write_protect().
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Xiao Guangrong [Thu, 13 May 2010 02:06:02 +0000 (10:06 +0800)]
KVM: MMU: use proper cache object freeing function
Use kmem_cache_free to free objects allocated by kmem_cache_alloc.
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Alex Williamson [Wed, 12 May 2010 13:46:31 +0000 (09:46 -0400)]
KVM: remove CAP_SYS_RAWIO requirement from kvm_vm_ioctl_assign_irq
Remove this check in an effort to allow kvm guests to run without
root privileges. This capability check doesn't seem to add any
security since the device needs to have already been added via the
assign device ioctl and the io actually occurs through the pci
sysfs interface.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Sheng Yang [Wed, 12 May 2010 08:40:42 +0000 (16:40 +0800)]
KVM: VMX: Only reset MMU when necessary
Only modifying some bits of CR0/CR4 needs paging mode switch.
Modify EFER.NXE bit would result in reserved bit updates.
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Sheng Yang [Wed, 12 May 2010 08:40:41 +0000 (16:40 +0800)]
KVM: x86: Clean up duplicate assignment
mmu.free() already set root_hpa to INVALID_PAGE, no need to do it again in the
destory_kvm_mmu().
kvm_x86_ops->set_cr4() and set_efer() already assign cr4/efer to
vcpu->arch.cr4/efer, no need to do it again later.
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Mohammed Gamal [Tue, 11 May 2010 22:39:22 +0000 (01:39 +0300)]
KVM: x86 emulator: Add missing decoder flags for xor instructions
This adds missing decoder flags for xor instructions (opcodes 0x34 - 0x35)
Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Mohammed Gamal [Tue, 11 May 2010 22:39:21 +0000 (01:39 +0300)]
KVM: x86 emulator: Add missing decoder flags for sub instruction
This adds missing decoder flags for sub instructions (opcodes 0x2c - 0x2d)
Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Mohammed Gamal [Tue, 11 May 2010 19:22:40 +0000 (22:22 +0300)]
KVM: x86 emulator: Add test acc, imm instruction (opcodes 0xA8 - 0xA9)
This adds test acc, imm instruction to the x86 emulator
Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Marcelo Tosatti [Thu, 13 May 2010 00:00:35 +0000 (21:00 -0300)]
KVM: pass correct parameter to kvm_mmu_free_some_pages
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Dongxiao Xu [Tue, 11 May 2010 10:29:48 +0000 (18:29 +0800)]
KVM: VMX: VMXON/VMXOFF usage changes
SDM suggests VMXON should be called before VMPTRLD, and VMXOFF
should be called after doing VMCLEAR.
Therefore in vmm coexistence case, we should firstly call VMXON
before any VMCS operation, and then call VMXOFF after the
operation is done.
Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Dongxiao Xu [Tue, 11 May 2010 10:29:45 +0000 (18:29 +0800)]
KVM: VMX: VMCLEAR/VMPTRLD usage changes
Originally VMCLEAR/VMPTRLD is called on vcpu migration. To
support hosted VMM coexistance, VMCLEAR is executed on vcpu
schedule out, and VMPTRLD is executed on vcpu schedule in.
This could also eliminate the IPI when doing VMCLEAR.
Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Dongxiao Xu [Tue, 11 May 2010 10:29:42 +0000 (18:29 +0800)]
KVM: VMX: Some minor changes to code structure
Do some preparations for vmm coexistence support.
Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Dongxiao Xu [Tue, 11 May 2010 10:29:38 +0000 (18:29 +0800)]
KVM: VMX: Define new functions to wrapper direct call of asm code
Define vmcs_load() and kvm_cpu_vmxon() to avoid direct call of asm
code. Also move VMXE bit operation out of kvm_cpu_vmxoff().
Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Gui Jianfeng [Tue, 11 May 2010 06:36:58 +0000 (14:36 +0800)]
KVM: update mmu documetation for role.nxe
There's no member "cr4_nxe" in struct kvm_mmu_page_role, it names "nxe" now.
Update mmu document.
Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Avi Kivity [Mon, 10 May 2010 09:09:56 +0000 (12:09 +0300)]
KVM: MMU: Fix free memory accounting race in mmu_alloc_roots()
We drop the mmu lock between freeing memory and allocating the roots; this
allows some other vcpu to sneak in and allocate memory.
While the race is benign (resulting only in temporary overallocation, not oom)
it is simple and easy to fix by moving the freeing close to the allocation.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Gleb Natapov [Mon, 10 May 2010 08:16:56 +0000 (11:16 +0300)]
KVM: inject #UD if instruction emulation fails and exit to userspace
Do not kill VM when instruction emulation fails. Inject #UD and report
failure to userspace instead. Userspace may choose to reenter guest if
vcpu is in userspace (cpl == 3) in which case guest OS will kill
offending process and continue running.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Avi Kivity [Thu, 29 Apr 2010 09:12:57 +0000 (12:12 +0300)]
KVM: Document KVM_SET_BOOT_CPU_ID
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Thu, 29 Apr 2010 09:08:56 +0000 (12:08 +0300)]
KVM: Document KVM_SET_IDENTITY_MAP ioctl
Signed-off-by: Avi Kivity <avi@redhat.com>
Gui Jianfeng [Wed, 5 May 2010 01:03:49 +0000 (09:03 +0800)]
KVM: MMU: make kvm_mmu_zap_page() return the number of pages it actually freed
Currently, kvm_mmu_zap_page() returning the number of freed children sp.
This might confuse the caller, because caller don't know the actual freed
number. Let's make kvm_mmu_zap_page() return the number of pages it actually
freed.
Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Gui Jianfeng [Wed, 5 May 2010 01:58:33 +0000 (09:58 +0800)]
KVM: MMU: Fix debug output error in walk_addr()
Fix a debug output error in walk_addr
Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Gui Jianfeng [Wed, 5 May 2010 01:09:21 +0000 (09:09 +0800)]
KVM: MMU: mark page table dirty when a pte is actually modified
Sometime cmpxchg_gpte doesn't modify gpte, in such case, don't mark
page table page as dirty.
Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Joerg Roedel [Wed, 5 May 2010 14:04:44 +0000 (16:04 +0200)]
KVM: SVM: Allow EFER.LMSLE to be set with nested svm
This patch enables setting of efer bit 13 which is allowed
in all SVM capable processors. This is necessary for the
SLES11 version of Xen 4.0 to boot with nested svm.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Joerg Roedel [Wed, 5 May 2010 14:04:42 +0000 (16:04 +0200)]
KVM: SVM: Dump vmcb contents on failed vmrun
This patch adds a function to dump the vmcb into the kernel
log and calls it after a failed vmrun to ease debugging.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Mon, 3 May 2010 13:54:48 +0000 (16:54 +0300)]
KVM: Get rid of KVM_REQ_KICK
KVM_REQ_KICK poisons vcpu->requests by having a bit set during normal
operation. This causes the fast path check for a clear vcpu->requests
to fail all the time, triggering tons of atomic operations.
Fix by replacing KVM_REQ_KICK with a vcpu->guest_mode atomic.
Signed-off-by: Avi Kivity <avi@redhat.com>
Gleb Natapov [Wed, 28 Apr 2010 16:15:44 +0000 (19:15 +0300)]
KVM: x86 emulator: do not inject exception directly into vcpu
Return exception as a result of instruction emulation and handle
injection in KVM code.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Gleb Natapov [Wed, 28 Apr 2010 16:15:43 +0000 (19:15 +0300)]
KVM: x86 emulator: move interruptibility state tracking out of emulator
Emulator shouldn't access vcpu directly.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Gleb Natapov [Wed, 28 Apr 2010 16:15:42 +0000 (19:15 +0300)]
KVM: x86 emulator: handle shadowed registers outside emulator
Emulator shouldn't access vcpu directly.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Gleb Natapov [Wed, 28 Apr 2010 16:15:41 +0000 (19:15 +0300)]
KVM: x86 emulator: use shadowed register in emulate_sysexit()
emulate_sysexit() should use shadowed registers copy instead of
looking into vcpu state directly.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>