openwrt/staging/blogic.git
14 years agoKVM: MMU: Let is_rsvd_bits_set take mmu context instead of vcpu
Joerg Roedel [Fri, 10 Sep 2010 15:30:45 +0000 (17:30 +0200)]
KVM: MMU: Let is_rsvd_bits_set take mmu context instead of vcpu

This patch changes is_rsvd_bits_set() function prototype to
take only a kvm_mmu context instead of a full vcpu.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
14 years agoKVM: MMU: Introduce kvm_init_shadow_mmu helper function
Joerg Roedel [Fri, 10 Sep 2010 15:30:44 +0000 (17:30 +0200)]
KVM: MMU: Introduce kvm_init_shadow_mmu helper function

Some logic of the init_kvm_softmmu function is required to
build the Nested Nested Paging context. So factor the
required logic into a seperate function and export it.
Also make the whole init path suitable for more than one mmu
context.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
14 years agoKVM: MMU: Introduce inject_page_fault function pointer
Joerg Roedel [Fri, 10 Sep 2010 15:30:43 +0000 (17:30 +0200)]
KVM: MMU: Introduce inject_page_fault function pointer

This patch introduces an inject_page_fault function pointer
into struct kvm_mmu which will be used to inject a page
fault. This will be used later when Nested Nested Paging is
implemented.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
14 years agoKVM: MMU: Introduce get_cr3 function pointer
Joerg Roedel [Fri, 10 Sep 2010 15:30:42 +0000 (17:30 +0200)]
KVM: MMU: Introduce get_cr3 function pointer

This function pointer in the MMU context is required to
implement Nested Nested Paging.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
14 years agoKVM: X86: Introduce a tdp_set_cr3 function
Joerg Roedel [Fri, 10 Sep 2010 15:30:41 +0000 (17:30 +0200)]
KVM: X86: Introduce a tdp_set_cr3 function

This patch introduces a special set_tdp_cr3 function pointer
in kvm_x86_ops which is only used for tpd enabled mmu
contexts. This allows to remove some hacks from svm code.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
14 years agoKVM: MMU: Make set_cr3 a function pointer in kvm_mmu
Joerg Roedel [Fri, 10 Sep 2010 15:30:40 +0000 (17:30 +0200)]
KVM: MMU: Make set_cr3 a function pointer in kvm_mmu

This is necessary to implement Nested Nested Paging. As a
side effect this allows some cleanups in the SVM nested
paging code.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
14 years agoKVM: MMU: Make tdp_enabled a mmu-context parameter
Joerg Roedel [Fri, 10 Sep 2010 15:30:39 +0000 (17:30 +0200)]
KVM: MMU: Make tdp_enabled a mmu-context parameter

This patch changes the tdp_enabled flag from its global
meaning to the mmu-context and renames it to direct_map
there. This is necessary for Nested SVM with emulation of
Nested Paging where we need an extra MMU context to shadow
the Nested Nested Page Table.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
14 years agoKVM: MMU: Check for root_level instead of long mode
Joerg Roedel [Fri, 10 Sep 2010 15:30:38 +0000 (17:30 +0200)]
KVM: MMU: Check for root_level instead of long mode

The walk_addr function checks for !is_long_mode in its 64
bit version. But what is meant here is a check for pae
paging. Change the condition to really check for pae paging
so that it also works with nested nested paging.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
14 years agoKVM: x86: Emulate MSR_EBC_FREQUENCY_ID
Jes Sorensen [Thu, 9 Sep 2010 10:06:46 +0000 (12:06 +0200)]
KVM: x86: Emulate MSR_EBC_FREQUENCY_ID

Some operating systems store data about the host processor at the
time of installation, and when booted on a more uptodate cpu tries
to read MSR_EBC_FREQUENCY_ID. This has been found with XP.

Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
14 years agox86: Define MSR_EBC_FREQUENCY_ID
Jes Sorensen [Thu, 9 Sep 2010 10:06:45 +0000 (12:06 +0200)]
x86: Define MSR_EBC_FREQUENCY_ID

Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
14 years agoKVM: SVM: Clean up rip handling in vmrun emulation
Roedel, Joerg [Fri, 3 Sep 2010 12:21:40 +0000 (14:21 +0200)]
KVM: SVM: Clean up rip handling in vmrun emulation

This patch changes the rip handling in the vmrun emulation
path from using next_rip to the generic kvm register access
functions.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
14 years agoKVM: SVM: Restore correct registers after sel_cr0 intercept emulation
Joerg Roedel [Thu, 2 Sep 2010 15:29:46 +0000 (17:29 +0200)]
KVM: SVM: Restore correct registers after sel_cr0 intercept emulation

This patch implements restoring of the correct rip, rsp, and
rax after the svm emulation in KVM injected a selective_cr0
write intercept into the guest hypervisor. The problem was
that the vmexit is emulated in the instruction emulation
which later commits the registers right after the write-cr0
instruction. So the l1 guest will continue to run with the
l2 rip, rsp and rax resulting in unpredictable behavior.

This patch is not the final word, it is just an easy patch
to fix the issue. The real fix will be done when the
instruction emulator is made aware of nested virtualization.
Until this is done this patch fixes the issue and provides
an easy way to fix this in -stable too.

Cc: stable@kernel.org
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
14 years agoKVM: MMU: Fix 32 bit legacy paging with NPT
Joerg Roedel [Thu, 2 Sep 2010 15:29:45 +0000 (17:29 +0200)]
KVM: MMU: Fix 32 bit legacy paging with NPT

This patch fixes 32 bit legacy paging with NPT enabled. The
mmu_check_root call on the top-level of the loop causes
root_gfn to take values (in the tdp_enabled path) which are
outside of guest memory. So the mmu_check_root call fails at
some point in the loop interation causing the guest to
tiple-fault.
This patch changes the mmu_check_root calls to the places
where they are really necessary. As a side-effect it
introduces a check for the root of a pae page table too.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
14 years agoKVM: PPC: Move of include to __KERNEL__ section
Alexander Graf [Fri, 3 Sep 2010 08:22:19 +0000 (10:22 +0200)]
KVM: PPC: Move of include to __KERNEL__ section

We have to protect the include for linux/of.h by __KERNEL__ so it doesn't
accidently get referenced outside.

This patch fixes this and makes the tree compile again.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
14 years agoKVM: PPC: Add documentation for magic page enhancements
Alexander Graf [Tue, 31 Aug 2010 02:25:39 +0000 (04:25 +0200)]
KVM: PPC: Add documentation for magic page enhancements

This documents how to detect additional features inside the magic
page when a guest maps it.

Signed-off-by: Alexander Graf <agraf@suse.de>
14 years agoKVM: PPC: Fix compile error in e500_tlb.c
Alexander Graf [Tue, 31 Aug 2010 01:45:39 +0000 (03:45 +0200)]
KVM: PPC: Fix compile error in e500_tlb.c

The e500_tlb.c file didn't compile for me due to the following error:

arch/powerpc/kvm/e500_tlb.c: In function ‘kvmppc_e500_shadow_map’:
arch/powerpc/kvm/e500_tlb.c:300: error: format ‘%lx’ expects type ‘long unsigned int’, but argument 2 has type ‘gfn_t’

So let's explicitly cast the argument to make printk happy.

Signed-off-by: Alexander Graf <agraf@suse.de>
14 years agoKVM: PPC: e500_tlb: Fix a minor copy-paste tracing bug
Kyle Moffett [Mon, 30 Aug 2010 15:38:39 +0000 (11:38 -0400)]
KVM: PPC: e500_tlb: Fix a minor copy-paste tracing bug

The kvmppc_e500_stlbe_invalidate() function was trying to pass too many
parameters to trace_kvm_stlb_inval().  This appears to be a bad
copy-paste from a call to trace_kvm_stlb_write().

Signed-off-by: Kyle Moffett <Kyle.D.Moffett@boeing.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
14 years agoKVM: PPC: Document KVM_INTERRUPT ioctl
Alexander Graf [Tue, 31 Aug 2010 00:03:32 +0000 (02:03 +0200)]
KVM: PPC: Document KVM_INTERRUPT ioctl

This adds some documentation for the KVM_INTERRUPT special cases that
PowerPC now implements.

Signed-off-by: Alexander Graf <agraf@suse.de>
14 years agoKVM: PPC: Implement level interrupts for BookE
Alexander Graf [Mon, 30 Aug 2010 12:03:24 +0000 (14:03 +0200)]
KVM: PPC: Implement level interrupts for BookE

BookE also wants to support level based interrupts, so let's implement
all the necessary logic there. We need to trick a bit here because the
irqprios are 1:1 assigned to architecture defined values. But since there
is some space left there, we can just pick a random one and move it later
on - it's internal anyways.

Signed-off-by: Alexander Graf <agraf@suse.de>
14 years agoKVM: PPC: Expose level based interrupt cap
Alexander Graf [Mon, 30 Aug 2010 11:50:45 +0000 (13:50 +0200)]
KVM: PPC: Expose level based interrupt cap

Now that we have all the level interrupt magic in place, let's
expose the capability to user space, so it can make use of it!

Signed-off-by: Alexander Graf <agraf@suse.de>
14 years agoKVM: PPC: Implement Level interrupts on Book3S
Alexander Graf [Mon, 30 Aug 2010 08:44:15 +0000 (10:44 +0200)]
KVM: PPC: Implement Level interrupts on Book3S

The current interrupt logic is just completely broken. We get a notification
from user space, telling us that an interrupt is there. But then user space
expects us that we just acknowledge an interrupt once we deliver it to the
guest.

This is not how real hardware works though. On real hardware, the interrupt
controller pulls the external interrupt line until it gets notified that the
interrupt was received.

So in reality we have two events: pulling and letting go of the interrupt line.

To maintain backwards compatibility, I added a new request for the pulling
part. The letting go part was implemented earlier already.

With this in place, we can now finally start guests that do not randomly stall
and stop to work at random times.

This patch implements above logic for Book3S.

Signed-off-by: Alexander Graf <agraf@suse.de>
14 years agoKVM: PPC: Enable napping only for Book3s_64
Alexander Graf [Tue, 17 Aug 2010 20:08:39 +0000 (22:08 +0200)]
KVM: PPC: Enable napping only for Book3s_64

Before I incorrectly enabled napping also for BookE, which would result in
needless dcache flushes. Since we only need to force enable napping on
Book3s_64 because it doesn't go into MSR_POW otherwise, we can just #ifdef
that code to this particular platform.

Reported-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
14 years agoKVM: PPC: allow ppc440gp to pass the compatibility check
Hollis Blanchard [Sat, 7 Aug 2010 17:33:58 +0000 (10:33 -0700)]
KVM: PPC: allow ppc440gp to pass the compatibility check

Match only the first part of cur_cpu_spec->platform.

440GP (the first 440 processor) is identified by the string "ppc440gp", while
all later 440 processors use simply "ppc440".

Signed-off-by: Hollis Blanchard <hollis_blanchard@mentor.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
14 years agoKVM: PPC: fix compilation of "dump tlbs" debug function
Hollis Blanchard [Sat, 7 Aug 2010 17:33:57 +0000 (10:33 -0700)]
KVM: PPC: fix compilation of "dump tlbs" debug function

Missing local variable.

Signed-off-by: Hollis Blanchard <hollis_blanchard@mentor.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
14 years agoKVM: PPC: initialize IVORs in addition to IVPR
Hollis Blanchard [Sat, 7 Aug 2010 17:33:56 +0000 (10:33 -0700)]
KVM: PPC: initialize IVORs in addition to IVPR

Developers can now tell at a glace the exact type of the premature interrupt,
instead of just knowing that there was some premature interrupt.

Signed-off-by: Hollis Blanchard <hollis_blanchard@mentor.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
14 years agoKVM: PPC: Don't put MSR_POW in MSR
Alexander Graf [Sun, 15 Aug 2010 06:39:19 +0000 (08:39 +0200)]
KVM: PPC: Don't put MSR_POW in MSR

On Book3S a mtmsr with the MSR_POW bit set indicates that the OS is in
idle and only needs to be waked up on the next interrupt.

Now, unfortunately we let that bit slip into the stored MSR value which
is not what the real CPU does, so that we ended up executing code like
this:

r = mfmsr();
/* r containts MSR_POW */
mtmsr(r | MSR_EE);

This obviously breaks, as we're going into idle mode in code sections that
don't expect to be idling.

This patch masks MSR_POW out of the stored MSR value on wakeup, making
guests happy again.

Signed-off-by: Alexander Graf <agraf@suse.de>
14 years agoKVM: PPC: Implement correct SID mapping on Book3s_32
Alexander Graf [Sun, 15 Aug 2010 06:04:24 +0000 (08:04 +0200)]
KVM: PPC: Implement correct SID mapping on Book3s_32

Up until now we were doing segment mappings wrong on Book3s_32. For Book3s_64
we were using a trick where we know that a single mmu_context gives us 16 bits
of context ids.

The mm system on Book3s_32 instead uses a clever algorithm to distribute VSIDs
across the available range, so a context id really only gives us 16 available
VSIDs.

To keep at least a few guest processes in the SID shadow, let's map a number of
contexts that we can use as VSID pool. This makes the code be actually correct
and shouldn't hurt performance too much.

Signed-off-by: Alexander Graf <agraf@suse.de>
14 years agoKVM: PPC: Force enable nap on KVM
Alexander Graf [Tue, 17 Aug 2010 09:41:44 +0000 (11:41 +0200)]
KVM: PPC: Force enable nap on KVM

There are some heuristics in the PPC power management code that try to find
out if the particular hardware we're running on supports proper power management
or just hangs the machine when going into nap mode.

Since we know that KVM is safe with nap, let's force enable it in the PV code
once we're certain that we are on a KVM VM.

Signed-off-by: Alexander Graf <agraf@suse.de>
14 years agoKVM: PPC: Make PV mtmsrd L=1 work with r30 and r31
Alexander Graf [Thu, 5 Aug 2010 13:44:41 +0000 (15:44 +0200)]
KVM: PPC: Make PV mtmsrd L=1 work with r30 and r31

We had an arbitrary limitation in mtmsrd L=1 that kept us from using r30 and
r31 as input registers. Let's get rid of that and get more potential speedups!

Signed-off-by: Alexander Graf <agraf@suse.de>
14 years agoKVM: PPC: Update int_pending also on dequeue
Alexander Graf [Thu, 5 Aug 2010 10:24:40 +0000 (12:24 +0200)]
KVM: PPC: Update int_pending also on dequeue

When having a decrementor interrupt pending, the dequeuing happens manually
through an mtdec instruction. This instruction simply calls dequeue on that
interrupt, so the int_pending hint doesn't get updated.

This patch enables updating the int_pending hint also on dequeue, thus
correctly enabling guests to stay in guest contexts more often.

Signed-off-by: Alexander Graf <agraf@suse.de>
14 years agoKVM: PPC: Make PV mtmsr work with r30 and r31
Alexander Graf [Thu, 5 Aug 2010 09:26:04 +0000 (11:26 +0200)]
KVM: PPC: Make PV mtmsr work with r30 and r31

So far we've been restricting ourselves to r0-r29 as registers an mtmsr
instruction could use. This was bad, as there are some code paths in
Linux actually using r30.

So let's instead handle all registers gracefully and get rid of that
stupid limitation

Signed-off-by: Alexander Graf <agraf@suse.de>
14 years agoKVM: PPC: Add mtsrin PV code
Alexander Graf [Tue, 3 Aug 2010 08:39:35 +0000 (10:39 +0200)]
KVM: PPC: Add mtsrin PV code

This is the guest side of the mtsr acceleration. Using this a guest can now
call mtsrin with almost no overhead as long as it ensures that it only uses
it with (MSR_IR|MSR_DR) == 0. Linux does that, so we're good.

Signed-off-by: Alexander Graf <agraf@suse.de>
14 years agoKVM: PPC: Put segment registers in shared page
Alexander Graf [Tue, 3 Aug 2010 00:29:27 +0000 (02:29 +0200)]
KVM: PPC: Put segment registers in shared page

Now that the actual mtsr doesn't do anything anymore, we can move the sr
contents over to the shared page, so a guest can directly read and write
its sr contents from guest context.

Signed-off-by: Alexander Graf <agraf@suse.de>
14 years agoKVM: PPC: Interpret SR registers on demand
Alexander Graf [Mon, 2 Aug 2010 23:06:11 +0000 (01:06 +0200)]
KVM: PPC: Interpret SR registers on demand

Right now we're examining the contents of Book3s_32's segment registers when
the register is written and put the interpreted contents into a struct.

There are two reasons this is bad. For starters, the struct has worse real-time
performance, as it occupies more ram. But the more important part is that with
segment registers being interpreted from their raw values, we can put them in
the shared page, allowing guests to mess with them directly.

This patch makes the internal representation of SRs be u32s.

Signed-off-by: Alexander Graf <agraf@suse.de>
14 years agoKVM: PPC: Move BAT handling code into spr handler
Alexander Graf [Mon, 2 Aug 2010 21:23:04 +0000 (23:23 +0200)]
KVM: PPC: Move BAT handling code into spr handler

The current approach duplicates the spr->bat finding logic and makes it harder
to reuse the actually used variables. So let's move everything down to the spr
handler.

Signed-off-by: Alexander Graf <agraf@suse.de>
14 years agoKVM: PPC: Add feature bitmap for magic page
Alexander Graf [Tue, 3 Aug 2010 09:32:56 +0000 (11:32 +0200)]
KVM: PPC: Add feature bitmap for magic page

We will soon add SR PV support to the shared page, so we need some
infrastructure that allows the guest to query for features KVM exports.

This patch adds a second return value to the magic mapping that
indicated to the guest which features are available.

Signed-off-by: Alexander Graf <agraf@suse.de>
14 years agoKVM: PPC: Remove unused define
Alexander Graf [Mon, 2 Aug 2010 20:05:00 +0000 (22:05 +0200)]
KVM: PPC: Remove unused define

The define VSID_ALL is unused. Let's remove it.

Signed-off-by: Alexander Graf <agraf@suse.de>
14 years agoKVM: PPC: Revert "KVM: PPC: Use kernel hash function"
Alexander Graf [Mon, 2 Aug 2010 19:48:53 +0000 (21:48 +0200)]
KVM: PPC: Revert "KVM: PPC: Use kernel hash function"

It turns out the in-kernel hash function is sub-optimal for our subtle
hash inputs where every bit is significant. So let's revert to the original
hash functions.

This reverts commit 05340ab4f9a6626f7a2e8f9fe5397c61d494f445.

Signed-off-by: Alexander Graf <agraf@suse.de>
14 years agoKVM: PPC: Move slb debugging to tracepoints
Alexander Graf [Mon, 2 Aug 2010 19:25:33 +0000 (21:25 +0200)]
KVM: PPC: Move slb debugging to tracepoints

This patch moves debugging printks for shadow SLB debugging over to tracepoints.

Signed-off-by: Alexander Graf <agraf@suse.de>
14 years agoKVM: PPC: Make invalidation code more reliable
Alexander Graf [Mon, 2 Aug 2010 19:24:48 +0000 (21:24 +0200)]
KVM: PPC: Make invalidation code more reliable

There is a race condition in the pte invalidation code path where we can't
be sure if a pte was invalidated already. So let's move the spin lock around
to get rid of the race.

Signed-off-by: Alexander Graf <agraf@suse.de>
14 years agoKVM: PPC: Don't flush PTEs on NX/RO hit
Alexander Graf [Mon, 2 Aug 2010 18:11:39 +0000 (20:11 +0200)]
KVM: PPC: Don't flush PTEs on NX/RO hit

When hitting a no-execute or read-only data/inst storage interrupt we were
flushing the respective PTE so we're sure it gets properly overwritten next.

According to the spec, this is unnecessary though. The guest issues a tlbie
anyways, so we're safe to just keep the PTE around and have it manually removed
from the guest, saving us a flush.

Signed-off-by: Alexander Graf <agraf@suse.de>
14 years agoKVM: PPC: Preload magic page when in kernel mode
Alexander Graf [Mon, 2 Aug 2010 14:08:22 +0000 (16:08 +0200)]
KVM: PPC: Preload magic page when in kernel mode

When the guest jumps into kernel mode and has the magic page mapped, theres a
very high chance that it will also use it. So let's detect that scenario and
map the segment accordingly.

Signed-off-by: Alexander Graf <agraf@suse.de>
14 years agoKVM: PPC: Add tracepoints for generic spte flushes
Alexander Graf [Mon, 2 Aug 2010 11:40:30 +0000 (13:40 +0200)]
KVM: PPC: Add tracepoints for generic spte flushes

The different ways of flusing shadow ptes have their own debug prints which use
stupid old printk.

Let's move them to tracepoints, making them easier available, faster and
possible to activate on demand

Signed-off-by: Alexander Graf <agraf@suse.de>
14 years agoKVM: PPC: Fix sid map search after flush
Alexander Graf [Mon, 2 Aug 2010 11:38:18 +0000 (13:38 +0200)]
KVM: PPC: Fix sid map search after flush

After a flush the sid map contained lots of entries with 0 for their gvsid and
hvsid value. Unfortunately, 0 can be a real value the guest searches for when
looking up a vsid so it would incorrectly find the host's 0 hvsid mapping which
doesn't belong to our sid space.

So let's also check for the valid bit that indicated that the sid we're
looking at actually contains useful data.

Signed-off-by: Alexander Graf <agraf@suse.de>
14 years agoKVM: PPC: Move pte invalidate debug code to tracepoint
Alexander Graf [Mon, 2 Aug 2010 10:55:19 +0000 (12:55 +0200)]
KVM: PPC: Move pte invalidate debug code to tracepoint

This patch moves the SPTE flush debug printk over to tracepoints.

Signed-off-by: Alexander Graf <agraf@suse.de>
14 years agoKVM: PPC: Add tracepoint for generic mmu map
Alexander Graf [Mon, 2 Aug 2010 10:51:07 +0000 (12:51 +0200)]
KVM: PPC: Add tracepoint for generic mmu map

This patch moves the generic mmu map debugging over to tracepoints.

Signed-off-by: Alexander Graf <agraf@suse.de>
14 years agoKVM: PPC: Move book3s_64 mmu map debug print to trace point
Alexander Graf [Mon, 2 Aug 2010 09:38:54 +0000 (11:38 +0200)]
KVM: PPC: Move book3s_64 mmu map debug print to trace point

This patch moves Book3s MMU debugging over to tracepoints.

Signed-off-by: Alexander Graf <agraf@suse.de>
14 years agoKVM: PPC: Move EXIT_DEBUG partially to tracepoints
Alexander Graf [Mon, 2 Aug 2010 09:06:26 +0000 (11:06 +0200)]
KVM: PPC: Move EXIT_DEBUG partially to tracepoints

We have a debug printk on every exit that is usually #ifdef'ed out. Using
tracepoints makes a lot more sense here though, as they can be dynamically
enabled.

This patch converts the most commonly used debug printks of EXIT_DEBUG to
tracepoints.

Signed-off-by: Alexander Graf <agraf@suse.de>
14 years agoKVM: ia64: define kvm_lapic_enabled() to fix a compile error
Takuya Yoshikawa [Thu, 2 Sep 2010 08:55:00 +0000 (17:55 +0900)]
KVM: ia64: define kvm_lapic_enabled() to fix a compile error

The following patch

  commit 57ce1659316f4ca298919649f9b1b55862ac3826
  KVM: x86: In DM_LOWEST, only deliver interrupts to vcpus with enabled LAPIC's

ignored the fact that kvm_irq_delivery_to_apic() was also used by ia64.

We define kvm_lapic_enabled() to fix a compile error caused by this.
This will have the same effect as reverting the problematic patch for ia64.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Avi Kivity <avi@redhat.com>
14 years agoKVM: MMU: lower the aduit frequency
Xiao Guangrong [Mon, 30 Aug 2010 10:26:33 +0000 (18:26 +0800)]
KVM: MMU: lower the aduit frequency

The audit is very high overhead, so we need lower the frequency to assure
the guest is running.

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
14 years agoKVM: MMU: improve spte audit
Xiao Guangrong [Mon, 30 Aug 2010 10:25:51 +0000 (18:25 +0800)]
KVM: MMU: improve spte audit

Both audit_mappings() and audit_sptes_have_rmaps() need to walk vcpu's page
table, so we can do these checking in a spte walking

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
14 years agoKVM: MMU: improve active sp audit
Xiao Guangrong [Mon, 30 Aug 2010 10:25:03 +0000 (18:25 +0800)]
KVM: MMU: improve active sp audit

Both audit_rmap() and audit_write_protection() need to walk all active sp, so
we can do these checking in a sp walking

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
14 years agoKVM: MMU: move audit to a separate file
Xiao Guangrong [Mon, 30 Aug 2010 10:24:10 +0000 (18:24 +0800)]
KVM: MMU: move audit to a separate file

Move the audit code from arch/x86/kvm/mmu.c to arch/x86/kvm/mmu_audit.c

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
14 years agoKVM: MMU: support disable/enable mmu audit dynamicly
Xiao Guangrong [Mon, 30 Aug 2010 10:22:53 +0000 (18:22 +0800)]
KVM: MMU: support disable/enable mmu audit dynamicly

Add a r/w module parameter named 'mmu_audit', it can control audit
enable/disable:

enable:
  echo 1 > /sys/module/kvm/parameters/mmu_audit

disable:
  echo 0 > /sys/module/kvm/parameters/mmu_audit

This patch not change the logic

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
14 years agoKVM: Fix guest kernel crash on MSR_K7_CLK_CTL
Jes Sorensen [Wed, 1 Sep 2010 09:42:04 +0000 (11:42 +0200)]
KVM: Fix guest kernel crash on MSR_K7_CLK_CTL

MSR_K7_CLK_CTL is a no longer documented MSR, which is only relevant
on said old AMD CPU models. This change returns the expected value,
which the Linux kernel is expecting to avoid writing back the MSR,
plus it ignores all writes to the MSR.

Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
14 years agoKVM: i8259: Make ICW1 conform to spec
Avi Kivity [Mon, 30 Aug 2010 09:18:24 +0000 (12:18 +0300)]
KVM: i8259: Make ICW1 conform to spec

ICW is not a full reset, instead it resets a limited number of registers
in the PIC.  Change ICW1 emulation to only reset those registers.

Signed-off-by: Avi Kivity <avi@redhat.com>
14 years agoKVM: x86 emulator: clean up control flow in x86_emulate_insn()
Avi Kivity [Mon, 30 Aug 2010 14:12:28 +0000 (17:12 +0300)]
KVM: x86 emulator: clean up control flow in x86_emulate_insn()

x86_emulate_insn() is full of things like

    if (rc != X86EMUL_CONTINUE)
        goto done;
    break;

consolidate all of those at the end of the switch statement.

Signed-off-by: Avi Kivity <avi@redhat.com>
14 years agoKVM: x86 emulator: fix group 11 decoding for reg != 0
Avi Kivity [Tue, 3 Aug 2010 12:05:46 +0000 (15:05 +0300)]
KVM: x86 emulator: fix group 11 decoding for reg != 0

These are all undefined.

Signed-off-by: Avi Kivity <avi@redhat.com>
14 years agoKVM: x86 emulator: use single stage decoding for mov instructions
Avi Kivity [Tue, 3 Aug 2010 11:46:56 +0000 (14:46 +0300)]
KVM: x86 emulator: use single stage decoding for mov instructions

Signed-off-by: Avi Kivity <avi@redhat.com>
14 years agoKVM: Don't save/restore MSR_IA32_PERF_STATUS
Avi Kivity [Wed, 1 Sep 2010 07:23:35 +0000 (10:23 +0300)]
KVM: Don't save/restore MSR_IA32_PERF_STATUS

It is read/only; restoring it only results in annoying messages.

Signed-off-by: Avi Kivity <avi@redhat.com>
14 years agoKVM: SVM: init_vmcb should reset vcpu->efer
Marcelo Tosatti [Tue, 31 Aug 2010 22:13:14 +0000 (19:13 -0300)]
KVM: SVM: init_vmcb should reset vcpu->efer

Otherwise EFER_LMA bit is retained across a SIPI reset.

Fixes guest cpu onlining.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
14 years agoKVM: SVM: reset mmu context in init_vmcb
Marcelo Tosatti [Tue, 31 Aug 2010 22:13:13 +0000 (19:13 -0300)]
KVM: SVM: reset mmu context in init_vmcb

Since commit aad827034e419fa no mmu reinitialization is performed
via init_vmcb.

Zero vcpu->arch.cr0 and pass the reset value as a parameter to
kvm_set_cr0.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
14 years agoKVM: Fix pio trace direction
Avi Kivity [Mon, 30 Aug 2010 07:46:56 +0000 (10:46 +0300)]
KVM: Fix pio trace direction

out = write, in = read, not the other way round.

Signed-off-by: Avi Kivity <avi@redhat.com>
14 years agoKVM: MMU: remove count_rmaps()
Xiao Guangrong [Sat, 28 Aug 2010 11:25:09 +0000 (19:25 +0800)]
KVM: MMU: remove count_rmaps()

Nothing is checked in count_rmaps(), so remove it

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
14 years agoKVM: MMU: rewrite audit_mappings_page() function
Xiao Guangrong [Sat, 28 Aug 2010 11:24:13 +0000 (19:24 +0800)]
KVM: MMU: rewrite audit_mappings_page() function

There is a bugs in this function, we call gfn_to_pfn() and kvm_mmu_gva_to_gpa_read() in
atomic context(kvm_mmu_audit() is called under the spinlock(mmu_lock)'s protection).

This patch fix it by:
- introduce gfn_to_pfn_atomic instead of gfn_to_pfn
- get the mapping gfn from kvm_mmu_page_get_gfn()

And it adds 'notrap' ptes check in unsync/direct sps

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
14 years agoKVM: MMU: fix wrong not write protected sp report
Xiao Guangrong [Sat, 28 Aug 2010 11:22:46 +0000 (19:22 +0800)]
KVM: MMU: fix wrong not write protected sp report

The audit code reports some sp not write protected in current code, it's just the
bug in audit_write_protection(), since:

- the invalid sp not need write protected
- using uninitialize local variable('gfn')
- call kvm_mmu_audit() out of mmu_lock's protection

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
14 years agoKVM: MMU: check rmap for every spte
Xiao Guangrong [Sat, 28 Aug 2010 11:20:47 +0000 (19:20 +0800)]
KVM: MMU: check rmap for every spte

The read-only spte also has reverse mapping, so fix the code to check them,
also modify the function name to fit its doing

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
14 years agoKVM: MMU: fix compile warning in audit code
Xiao Guangrong [Sat, 28 Aug 2010 11:19:42 +0000 (19:19 +0800)]
KVM: MMU: fix compile warning in audit code

fix:

arch/x86/kvm/mmu.c: In function ‘kvm_mmu_unprotect_page’:
arch/x86/kvm/mmu.c:1741: warning: format ‘%lx’ expects type ‘long unsigned int’, but argument 3 has type ‘gfn_t’
arch/x86/kvm/mmu.c:1745: warning: format ‘%lx’ expects type ‘long unsigned int’, but argument 3 has type ‘gfn_t’
arch/x86/kvm/mmu.c: In function ‘mmu_unshadow’:
arch/x86/kvm/mmu.c:1761: warning: format ‘%lx’ expects type ‘long unsigned int’, but argument 3 has type ‘gfn_t’
arch/x86/kvm/mmu.c: In function ‘set_spte’:
arch/x86/kvm/mmu.c:2005: warning: format ‘%lx’ expects type ‘long unsigned int’, but argument 3 has type ‘gfn_t’
arch/x86/kvm/mmu.c: In function ‘mmu_set_spte’:
arch/x86/kvm/mmu.c:2033: warning: format ‘%lx’ expects type ‘long unsigned int’, but argument 7 has type ‘gfn_t’

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
14 years agoKVM: pit: Do not check pending pit timer in vcpu thread
Jason Wang [Fri, 27 Aug 2010 09:15:06 +0000 (17:15 +0800)]
KVM: pit: Do not check pending pit timer in vcpu thread

Pit interrupt injection was done by workqueue, so no need to check
pending pit timer in vcpu thread which could lead unnecessary
unblocking of vcpu.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
14 years agoKVM: PPC: Fix CONFIG_KVM_GUEST && !CONFIG_KVM case
Alexander Graf [Mon, 30 Aug 2010 10:01:56 +0000 (12:01 +0200)]
KVM: PPC: Fix CONFIG_KVM_GUEST && !CONFIG_KVM case

When CONFIG_KVM_GUEST is selected, but CONFIG_KVM is not, we were missing
some defines in asm-offsets.c and included too many headers at other places.

This patch makes above configuration work.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
14 years agoKVM: x86 emulator: simplify ALU opcode block decode further
Avi Kivity [Thu, 26 Aug 2010 15:34:55 +0000 (18:34 +0300)]
KVM: x86 emulator: simplify ALU opcode block decode further

The ALU opcode block is very regular; introduce D6ALU() to define decode
flags for 6 instructions at a time.

Suggested by Paolo Bonzini.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
14 years agoKVM: Fix build error due to 64-bit division in nsec_to_cycles()
Avi Kivity [Thu, 26 Aug 2010 10:38:03 +0000 (13:38 +0300)]
KVM: Fix build error due to 64-bit division in nsec_to_cycles()

Use do_div() instead.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
14 years agoKVM: x86 emulator: trap and propagate #DE from DIV and IDIV
Avi Kivity [Thu, 26 Aug 2010 08:59:01 +0000 (11:59 +0300)]
KVM: x86 emulator: trap and propagate #DE from DIV and IDIV

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
14 years agoKVM: x86 emulator: add macros for executing instructions that may trap
Avi Kivity [Thu, 26 Aug 2010 08:59:00 +0000 (11:59 +0300)]
KVM: x86 emulator: add macros for executing instructions that may trap

Like DIV and IDIV.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
14 years agoKVM: x86 emulator: simplify instruction decode flags for opcodes 0F 00-FF
Avi Kivity [Thu, 26 Aug 2010 08:56:13 +0000 (11:56 +0300)]
KVM: x86 emulator: simplify instruction decode flags for opcodes 0F 00-FF

Use the new byte/word dual opcode decode.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
14 years agoKVM: x86 emulator: simplify instruction decode flags for opcodes E0-FF
Avi Kivity [Thu, 26 Aug 2010 08:56:12 +0000 (11:56 +0300)]
KVM: x86 emulator: simplify instruction decode flags for opcodes E0-FF

Use the new byte/word dual opcode decode.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
14 years agoKVM: x86 emulator: simplify instruction decode flags for opcodes C0-DF
Avi Kivity [Thu, 26 Aug 2010 08:56:11 +0000 (11:56 +0300)]
KVM: x86 emulator: simplify instruction decode flags for opcodes C0-DF

Use the new byte/word dual opcode decode.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
14 years agoKVM: x86 emulator: simplify instruction decode flags for opcodes A0-AF
Avi Kivity [Thu, 26 Aug 2010 08:56:10 +0000 (11:56 +0300)]
KVM: x86 emulator: simplify instruction decode flags for opcodes A0-AF

Use the new byte/word dual opcode decode.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
14 years agoKVM: x86 emulator: simplify instruction decode flags for opcodes 80-8F
Avi Kivity [Thu, 26 Aug 2010 08:56:09 +0000 (11:56 +0300)]
KVM: x86 emulator: simplify instruction decode flags for opcodes 80-8F

Use the new byte/word dual opcode decode.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
14 years agoKVM: x86 emulator: simplify string instruction decode flags
Avi Kivity [Thu, 26 Aug 2010 08:56:08 +0000 (11:56 +0300)]
KVM: x86 emulator: simplify string instruction decode flags

Use the new byte/word dual opcode decode.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
14 years agoKVM: x86 emulator: simplify ALU block (opcodes 00-3F) decode flags
Avi Kivity [Thu, 26 Aug 2010 08:56:07 +0000 (11:56 +0300)]
KVM: x86 emulator: simplify ALU block (opcodes 00-3F) decode flags

Use the new byte/word dual opcode decode.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
14 years agoKVM: x86 emulator: support byte/word opcode pairs
Avi Kivity [Thu, 26 Aug 2010 08:56:06 +0000 (11:56 +0300)]
KVM: x86 emulator: support byte/word opcode pairs

Many x86 instructions come in byte and word variants distinguished with bit
0 of the opcode.  Add macros to aid in defining them.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
14 years agoKVM: x86 emulator: refuse SrcMemFAddr (e.g. LDS) with register operand
Avi Kivity [Thu, 26 Aug 2010 08:06:15 +0000 (11:06 +0300)]
KVM: x86 emulator: refuse SrcMemFAddr (e.g. LDS) with register operand

SrcMemFAddr is not defined with the modrm operand designating a register
instead of a memory address.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
14 years agoKVM: x86 emulator: get rid of "restart" in emulation context.
Gleb Natapov [Wed, 25 Aug 2010 09:47:43 +0000 (12:47 +0300)]
KVM: x86 emulator: get rid of "restart" in emulation context.

x86_emulate_insn() will return 1 if instruction can be restarted
without re-entering a guest.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
14 years agoKVM: x86 emulator: move string instruction completion check into separate function
Gleb Natapov [Wed, 25 Aug 2010 09:47:42 +0000 (12:47 +0300)]
KVM: x86 emulator: move string instruction completion check into separate function

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
14 years agoKVM: x86 emulator: Rename variable that shadows another local variable.
Gleb Natapov [Wed, 25 Aug 2010 09:47:41 +0000 (12:47 +0300)]
KVM: x86 emulator: Rename variable that shadows another local variable.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
14 years agoKVM: x86 emulator: add CALL FAR instruction emulation (opcode 9a)
Wei Yongjun [Wed, 25 Aug 2010 06:10:53 +0000 (14:10 +0800)]
KVM: x86 emulator: add CALL FAR instruction emulation (opcode 9a)

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
14 years agoKVM: S390: Export kvm_virtio.h
Alexander Graf [Tue, 24 Aug 2010 13:48:52 +0000 (15:48 +0200)]
KVM: S390: Export kvm_virtio.h

As suggested by Christian, we should expose headers to user space with
information that might be valuable there. The s390 virtio interface is
one of those cases. It defines an ABI between hypervisor and guest, so
it should be exposed to user space.

Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
14 years agoKVM: S390: Add virtio hotplug add support
Alexander Graf [Tue, 24 Aug 2010 13:48:51 +0000 (15:48 +0200)]
KVM: S390: Add virtio hotplug add support

The one big missing feature in s390-virtio was hotplugging. This is no more.
This patch implements hotplug add support, so you can on the fly add new devices
in the guest.

Keep in mind that this needs a patch for qemu to actually leverage the
functionality.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
14 years agoKVM: S390: take a full byte as ext_param indicator
Alexander Graf [Tue, 24 Aug 2010 13:48:50 +0000 (15:48 +0200)]
KVM: S390: take a full byte as ext_param indicator

Currenty the ext_param field only distinguishes between "config change" and
"vring interrupt". We can do a lot more with it though, so let's enable a
full byte of possible values and constants to #defines while at it.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
14 years agoKVM: MMU: combine guest pte read between fetch and pte prefetch
Xiao Guangrong [Sun, 22 Aug 2010 11:13:33 +0000 (19:13 +0800)]
KVM: MMU: combine guest pte read between fetch and pte prefetch

Combine guest pte read between guest pte check in the fetch path and pte prefetch

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
14 years agoKVM: MMU: prefetch ptes when intercepted guest #PF
Xiao Guangrong [Sun, 22 Aug 2010 11:12:48 +0000 (19:12 +0800)]
KVM: MMU: prefetch ptes when intercepted guest #PF

Support prefetch ptes when intercept guest #PF, avoid to #PF by later
access

If we meet any failure in the prefetch path, we will exit it and
not try other ptes to avoid become heavy path

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
14 years agoKVM: MMU: introduce gfn_to_page_many_atomic() function
Xiao Guangrong [Sun, 22 Aug 2010 11:11:43 +0000 (19:11 +0800)]
KVM: MMU: introduce gfn_to_page_many_atomic() function

Introduce this function to get consecutive gfn's pages, it can reduce
gup's overload, used by later patch

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
14 years agoKVM: MMU: introduce hva_to_pfn_atomic function
Xiao Guangrong [Sun, 22 Aug 2010 11:10:28 +0000 (19:10 +0800)]
KVM: MMU: introduce hva_to_pfn_atomic function

Introduce hva_to_pfn_atomic(), it's the fast path and can used in atomic
context, the later patch will use it

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
14 years agoexport __get_user_pages_fast() function
Xiao Guangrong [Sun, 22 Aug 2010 11:08:57 +0000 (19:08 +0800)]
export __get_user_pages_fast() function

This function is used by KVM to pin process's page in the atomic context.

Define the 'weak' function to avoid other architecture not support it

Acked-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
14 years agoKVM: x86: Add timekeeping documentation
Zachary Amsden [Fri, 20 Aug 2010 08:07:33 +0000 (22:07 -1000)]
KVM: x86: Add timekeeping documentation

Basic informational document about x86 timekeeping and how KVM
is affected.

Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
14 years agoKVM: x86: Fix a possible backwards warp of kvmclock
Zachary Amsden [Fri, 20 Aug 2010 08:07:30 +0000 (22:07 -1000)]
KVM: x86: Fix a possible backwards warp of kvmclock

Kernel time, which advances in discrete steps may progress much slower
than TSC.  As a result, when kvmclock is adjusted to a new base, the
apparent time to the guest, which runs at a much higher, nsec scaled
rate based on the current TSC, may have already been observed to have
a larger value (kernel_ns + scaled tsc) than the value to which we are
setting it (kernel_ns + 0).

We must instead compute the clock as potentially observed by the guest
for kernel_ns to make sure it does not go backwards.

Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
14 years agox86: pvclock: Move scale_delta into common header
Zachary Amsden [Fri, 20 Aug 2010 08:07:29 +0000 (22:07 -1000)]
x86: pvclock: Move scale_delta into common header

The scale_delta function for shift / multiply with 31-bit
precision moves to a common header so it can be used by both
kernel and kvm module.

Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
14 years agoKVM: x86: Add clock sync request to hardware enable
Zachary Amsden [Fri, 20 Aug 2010 08:07:28 +0000 (22:07 -1000)]
KVM: x86: Add clock sync request to hardware enable

If there are active VCPUs which are marked as belonging to
a particular hardware CPU, request a clock sync for them when
enabling hardware; the TSC could be desynchronized on a newly
arriving CPU, and we need to recompute guests system time
relative to boot after a suspend event.

This covers both cases.

Note that it is acceptable to take the spinlock, as either
no other tasks will be running and no locks held (BSP after
resume), or other tasks will be guaranteed to drop the lock
relatively quickly (AP on CPU_STARTING).

Noting we now get clock synchronization requests for VCPUs
which are starting up (or restarting), it is tempting to
attempt to remove the arch/x86/kvm/x86.c CPU hot-notifiers
at this time, however it is not correct to do so; they are
required for systems with non-constant TSC as the frequency
may not be known immediately after the processor has started
until the cpufreq driver has had a chance to run and query
the chipset.

Updated: implement better locking semantics for hardware_enable

Removed the hack of dropping and retaking the lock by adding the
semantic that we always hold kvm_lock when hardware_enable is
called.  The one place that doesn't need to worry about it is
resume, as resuming a frozen CPU, the spinlock won't be taken.

Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
14 years agoKVM: x86: Robust TSC compensation
Zachary Amsden [Fri, 20 Aug 2010 08:07:26 +0000 (22:07 -1000)]
KVM: x86: Robust TSC compensation

Make the match of TSC find TSC writes that are close to each other
instead of perfectly identical; this allows the compensator to also
work in migration / suspend scenarios.

Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>