openwrt/staging/blogic.git
17 years agoKVM: Use CPU_DYING for disabling virtualization
Avi Kivity [Thu, 24 May 2007 10:11:41 +0000 (13:11 +0300)]
KVM: Use CPU_DYING for disabling virtualization

Only at the CPU_DYING stage can we be sure that no user process will
be scheduled onto the cpu and oops when trying to use virtualization
extensions.

Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: Tune hotplug/suspend IPIs
Avi Kivity [Thu, 24 May 2007 10:09:41 +0000 (13:09 +0300)]
KVM: Tune hotplug/suspend IPIs

The hotplug IPIs can be called from the cpu on which we are currently
running on, so use on_cpu().  Similarly, drop on_each_cpu() for the
suspend/resume callbacks, as we're in atomic context here and only one
cpu is up anyway.

Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: Keep track of which cpus have virtualization enabled
Avi Kivity [Thu, 24 May 2007 10:03:52 +0000 (13:03 +0300)]
KVM: Keep track of which cpus have virtualization enabled

By keeping track of which cpus have virtualization enabled, we
prevent double-enable or double-disable during hotplug, which is a
very fatal oops.

Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoSMP: Allow smp_call_function_single() to current cpu
Avi Kivity [Mon, 9 Jul 2007 14:11:49 +0000 (17:11 +0300)]
SMP: Allow smp_call_function_single() to current cpu

This removes the requirement for callers to get_cpu() to check in simple
cases.  This patch is for !CONFIG_SMP.

Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoi386: Allow smp_call_function_single() to current cpu
Avi Kivity [Mon, 9 Jul 2007 14:11:49 +0000 (17:11 +0300)]
i386: Allow smp_call_function_single() to current cpu

This removes the requirement for callers to get_cpu() to check in simple
cases.

Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agox86_64: Allow smp_call_function_single() to current cpu
Avi Kivity [Mon, 9 Jul 2007 14:11:49 +0000 (17:11 +0300)]
x86_64: Allow smp_call_function_single() to current cpu

This removes the requirement for callers to get_cpu() to check in simple
cases.

Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoHOTPLUG: Adapt thermal throttle to CPU_DYING
Avi Kivity [Thu, 24 May 2007 09:37:34 +0000 (12:37 +0300)]
HOTPLUG: Adapt thermal throttle to CPU_DYING

CPU_DYING is notified in atomic context, so no taking mutexes here.

Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoHOTPLUG: Adapt cpuset hotplug callback to CPU_DYING
Avi Kivity [Thu, 24 May 2007 09:33:15 +0000 (12:33 +0300)]
HOTPLUG: Adapt cpuset hotplug callback to CPU_DYING

CPU_DYING is called in atomic context, so don't try to take any locks.

Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoHOTPLUG: Add CPU_DYING notifier
Avi Kivity [Thu, 24 May 2007 09:23:10 +0000 (12:23 +0300)]
HOTPLUG: Add CPU_DYING notifier

KVM wants a notification when a cpu is about to die, so it can disable
hardware extensions, but at a time when user processes cannot be scheduled
on the cpu, so it doesn't try to use virtualization extensions after they
have been disabled.

This adds a CPU_DYING notification.  The notification is called in atomic
context on the doomed cpu.

Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: Clean up #includes
Avi Kivity [Thu, 28 Jun 2007 18:15:57 +0000 (14:15 -0400)]
KVM: Clean up #includes

Remove unnecessary ones, and rearange the remaining in the standard order.

Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: Remove kvmfs in favor of the anonymous inodes source
Avi Kivity [Thu, 28 Jun 2007 12:38:16 +0000 (08:38 -0400)]
KVM: Remove kvmfs in favor of the anonymous inodes source

kvm uses a pseudo filesystem, kvmfs, to generate inodes, a job that the
new anonymous inodes source does much better.

Cc: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: SVM: Reliably detect if SVM was disabled by BIOS
Joerg Roedel [Fri, 22 Jun 2007 09:29:50 +0000 (12:29 +0300)]
KVM: SVM: Reliably detect if SVM was disabled by BIOS

This patch adds an implementation to the svm is_disabled function to
detect reliably if the BIOS disabled the SVM feature in the CPU. This
fixes the issues with kernel panics when loading the kvm-amd module on
machines where SVM is available but disabled.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: VMX: Remove unnecessary code in vmx_tlb_flush()
Avi Kivity [Thu, 21 Jun 2007 08:54:45 +0000 (11:54 +0300)]
KVM: VMX: Remove unnecessary code in vmx_tlb_flush()

A vmexit implicitly flushes the tlb; the code is bogus.

Noted by Shaohua Li.

Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: MMU: Fix Wrong tlb flush order
Shaohua Li [Wed, 20 Jun 2007 09:13:26 +0000 (17:13 +0800)]
KVM: MMU: Fix Wrong tlb flush order

Need to flush the tlb after updating a pte, not before.

Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: VMX: Reinitialize the real-mode tss when entering real mode
Avi Kivity [Wed, 20 Jun 2007 08:20:04 +0000 (11:20 +0300)]
KVM: VMX: Reinitialize the real-mode tss when entering real mode

Protected mode code may have corrupted the real-mode tss, so re-initialize
it when switching to real mode.

Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: Avoid useless memory write when possible
Luca Tettamanti [Tue, 19 Jun 2007 20:41:38 +0000 (22:41 +0200)]
KVM: Avoid useless memory write when possible

When writing to normal memory and the memory area is unchanged the write
can be safely skipped, avoiding the costly kvm_mmu_pte_write.

Signed-Off-By: Luca Tettamanti <kronos.it@gmail.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: Fix x86 emulator writeback
Luca Tettamanti [Tue, 19 Jun 2007 20:41:20 +0000 (22:41 +0200)]
KVM: Fix x86 emulator writeback

When the old value and new one are the same the emulator skips the
write; this is undesirable when the destination is a MMIO area and the
write shall be performed regardless of the previous value. This
optimization breaks e.g. a Linux guest APIC compiled without
X86_GOOD_APIC.

Remove the check and perform the writeback stage in the emulation unless
it's explicitly disabled (currently push and some 2 bytes instructions
may disable the writeback).

Signed-Off-By: Luca Tettamanti <kronos.it@gmail.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: Add support for in-kernel pio handlers
Eddie Dong [Tue, 19 Jun 2007 15:05:03 +0000 (18:05 +0300)]
KVM: Add support for in-kernel pio handlers

Useful for the PIC and PIT.

Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: VMX: Fix interrupt checking on lightweight exit
Gregory Haskins [Thu, 31 May 2007 18:08:58 +0000 (14:08 -0400)]
KVM: VMX: Fix interrupt checking on lightweight exit

With kernel-injected interrupts, we need to check for interrupts on
lightweight exits too.

Signed-off-by: Gregory Haskins <ghaskins@novell.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: Adds support for in-kernel mmio handlers
Gregory Haskins [Thu, 31 May 2007 18:08:53 +0000 (14:08 -0400)]
KVM: Adds support for in-kernel mmio handlers

Signed-off-by: Gregory Haskins <ghaskins@novell.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: Implement emulation of instruction "ret" (opcode 0xc3)
Nitin A Kamble [Tue, 19 Jun 2007 08:21:15 +0000 (11:21 +0300)]
KVM: Implement emulation of instruction "ret" (opcode 0xc3)

Signed-off-by: Nitin A Kamble <nitin.a.kamble@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: Implement emulation of "pop reg" instruction (opcode 0x58-0x5f)
Nitin A Kamble [Tue, 19 Jun 2007 08:16:04 +0000 (11:16 +0300)]
KVM: Implement emulation of "pop reg" instruction (opcode 0x58-0x5f)

For use in real mode.

Signed-off-by: Nitin A Kamble <nitin.a.kamble@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: VMX: Ensure vcpu time stamp counter is monotonous
Avi Kivity [Wed, 13 Jun 2007 16:55:28 +0000 (19:55 +0300)]
KVM: VMX: Ensure vcpu time stamp counter is monotonous

If the time stamp counter goes backwards, a guest delay loop can become
infinite.  This can happen if a vcpu is migrated to another cpu, where
the counter has a lower value than the first cpu.

Since we're doing an IPI to the first cpu anyway, we can use that to pick
up the old tsc, and use that to calculate the adjustment we need to make
to the tsc offset.

Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: Initialize the BSP bit in the APIC_BASE msr correctly
Avi Kivity [Wed, 13 Jun 2007 16:43:19 +0000 (19:43 +0300)]
KVM: Initialize the BSP bit in the APIC_BASE msr correctly

Needs to be set on vcpu 0 only.

Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: VMX: Replace memset(<addr>, 0, PAGESIZE) with clear_page(<addr>)
Shani Moideen [Mon, 11 Jun 2007 04:01:33 +0000 (09:31 +0530)]
KVM: VMX: Replace memset(<addr>, 0, PAGESIZE) with clear_page(<addr>)

Signed-off-by: Shani Moideen <shani.moideen@wipro.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: SVM: Replace memset(<addr>, 0, PAGESIZE) with clear_page(<addr>)
Shani Moideen [Mon, 11 Jun 2007 03:58:26 +0000 (09:28 +0530)]
KVM: SVM: Replace memset(<addr>, 0, PAGESIZE) with clear_page(<addr>)

Signed-off-by: Shani Moideen <shani.moideen@wipro.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: Flush remote tlbs when reducing shadow pte permissions
Avi Kivity [Thu, 7 Jun 2007 16:18:30 +0000 (19:18 +0300)]
KVM: Flush remote tlbs when reducing shadow pte permissions

When a vcpu causes a shadow tlb entry to have reduced permissions, it
must also clear the tlb on remote vcpus.  We do that by:

- setting a bit on the vcpu that requests a tlb flush before the next entry
- if the vcpu is currently executing, we send an ipi to make sure it
  exits before we continue

Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: Keep an upper bound of initialized vcpus
Avi Kivity [Thu, 7 Jun 2007 16:11:53 +0000 (19:11 +0300)]
KVM: Keep an upper bound of initialized vcpus

That way, we don't need to loop for KVM_MAX_VCPUS for a single vcpu
vm.

Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: Emulate hlt on real mode for Intel
Avi Kivity [Tue, 5 Jun 2007 13:15:51 +0000 (16:15 +0300)]
KVM: Emulate hlt on real mode for Intel

This has two use cases: the bios can't boot from disk, and guest smp
bootstrap.

Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: Move duplicate halt handling code into kvm_main.c
Avi Kivity [Tue, 5 Jun 2007 12:53:05 +0000 (15:53 +0300)]
KVM: Move duplicate halt handling code into kvm_main.c

Will soon have a thid user.

Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: Enable guest smp
Avi Kivity [Tue, 5 Jun 2007 11:37:09 +0000 (14:37 +0300)]
KVM: Enable guest smp

As we don't support guest tlb shootdown yet, this is only reliable
for real-mode guests.

Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: Fix adding an smp virtual machine to the vm list
Avi Kivity [Tue, 5 Jun 2007 11:36:10 +0000 (14:36 +0300)]
KVM: Fix adding an smp virtual machine to the vm list

If we add the vm once per vcpu, we corrupt the list if the guest has
multiple vcpus.

Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: Fix vcpu freeing for guest smp
Avi Kivity [Tue, 5 Jun 2007 09:17:03 +0000 (12:17 +0300)]
KVM: Fix vcpu freeing for guest smp

A vcpu can pin up to four mmu shadow pages, which means the freeing
loop will never terminate.  Fix by first unpinning shadow pages on
all vcpus, then freeing shadow pages.

Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: Remove unnecessary initialization and checks in mark_page_dirty()
Nguyen Anh Quynh [Tue, 5 Jun 2007 07:35:19 +0000 (10:35 +0300)]
KVM: Remove unnecessary initialization and checks in mark_page_dirty()

Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: Replace C code with call to ARRAY_SIZE() macro.
Robert P. J. Day [Sun, 3 Jun 2007 17:35:29 +0000 (13:35 -0400)]
KVM: Replace C code with call to ARRAY_SIZE() macro.

Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: Lazy guest cr3 switching
Avi Kivity [Mon, 4 Jun 2007 12:58:30 +0000 (15:58 +0300)]
KVM: Lazy guest cr3 switching

Switch guest paging context may require us to allocate memory, which
might fail.  Instead of wiring up error paths everywhere, make context
switching lazy and actually do the switch before the next guest entry,
where we can return an error if allocation fails.

Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: MMU: Remove unused large page marker
Avi Kivity [Thu, 31 May 2007 15:28:51 +0000 (18:28 +0300)]
KVM: MMU: Remove unused large page marker

This has not been used for some time, as the same information is available
in the page header.

Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: MMU: Don't cache guest access bits in the shadow page table
Avi Kivity [Thu, 31 May 2007 15:24:09 +0000 (18:24 +0300)]
KVM: MMU: Don't cache guest access bits in the shadow page table

This was once used to avoid accessing the guest pte when upgrading
the shadow pte from read-only to read-write.  But usually we need
to set the guest pte dirty or accessed bits anyway, so this wasn't
really exploited.

Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: MMU: Simpify accessed/dirty/present/nx bit handling
Avi Kivity [Thu, 31 May 2007 15:20:14 +0000 (18:20 +0300)]
KVM: MMU: Simpify accessed/dirty/present/nx bit handling

Always set the accessed and dirty bit (since having them cleared causes
a read-modify-write cycle), always set the present bit, and copy the
nx bit from the guest.

Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: MMU: Remove cr0.wp tricks
Avi Kivity [Thu, 31 May 2007 14:17:06 +0000 (17:17 +0300)]
KVM: MMU: Remove cr0.wp tricks

No longer needed as we do everything in one place.

Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: MMU: Make setting shadow ptes atomic on i386
Avi Kivity [Thu, 31 May 2007 12:46:04 +0000 (15:46 +0300)]
KVM: MMU: Make setting shadow ptes atomic on i386

Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: Make shadow pte updates atomic
Avi Kivity [Thu, 31 May 2007 12:23:35 +0000 (15:23 +0300)]
KVM: Make shadow pte updates atomic

With guest smp, a second vcpu might see partial updates when the first
vcpu services a page fault.  So delay all updates until we have figured
out what the pte should look like.

Note that on i386, this is still not completely atomic as a 64-bit write
will be split into two on a 32-bit machine.

Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: Move shadow pte modifications from set_pte/set_pde to set_pde_common()
Avi Kivity [Thu, 31 May 2007 12:14:09 +0000 (15:14 +0300)]
KVM: Move shadow pte modifications from set_pte/set_pde to set_pde_common()

We want all shadow pte modifications in one place.

Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: MMU: Fold fix_write_pf() into set_pte_common()
Avi Kivity [Thu, 31 May 2007 12:08:29 +0000 (15:08 +0300)]
KVM: MMU: Fold fix_write_pf() into set_pte_common()

This prevents some work from being performed twice, and, more importantly,
reduces the number of places where we modify shadow ptes.

Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: MMU: Fold fix_read_pf() into set_pte_common()
Avi Kivity [Thu, 31 May 2007 08:56:54 +0000 (11:56 +0300)]
KVM: MMU: Fold fix_read_pf() into set_pte_common()

Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: MMU: Pass the guest pde to set_pte_common
Avi Kivity [Thu, 31 May 2007 08:45:18 +0000 (11:45 +0300)]
KVM: MMU: Pass the guest pde to set_pte_common

We will need the accessed bit (in addition to the dirty bit) and
also write access (for setting the dirty bit) in a future patch.

Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: MMU: Move set_pte_common() to pte width dependent code
Avi Kivity [Wed, 30 May 2007 16:31:17 +0000 (19:31 +0300)]
KVM: MMU: Move set_pte_common() to pte width dependent code

In preparation of some modifications.

Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: MMU: Simplify fetch() a little bit
Avi Kivity [Wed, 30 May 2007 11:21:51 +0000 (14:21 +0300)]
KVM: MMU: Simplify fetch() a little bit

Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: MMU: Use slab caches for shadow pages and their headers
Avi Kivity [Wed, 30 May 2007 09:34:53 +0000 (12:34 +0300)]
KVM: MMU: Use slab caches for shadow pages and their headers

Use slab caches instead of a simple custom list.

Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: Use symbolic constants instead of magic numbers
Eddie Dong [Tue, 29 May 2007 12:07:21 +0000 (15:07 +0300)]
KVM: Use symbolic constants instead of magic numbers

Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: Fix includes
Markus Rechberger [Sun, 27 May 2007 07:46:52 +0000 (10:46 +0300)]
KVM: Fix includes

KVM compilation fails for some .configs.  This fixes it.

Signed-off-by: Markus Rechberger <markus.rechberger@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: x86 emulator: implement wbinvd
Avi Kivity [Thu, 24 May 2007 08:17:33 +0000 (11:17 +0300)]
KVM: x86 emulator: implement wbinvd

Vista seems to trigger it.

Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoUse menuconfig objects II - KVM/Virt
Jan Engelhardt [Wed, 23 May 2007 21:22:11 +0000 (14:22 -0700)]
Use menuconfig objects II - KVM/Virt

Make a "menuconfig" out of the Kconfig objects "menu, ..., endmenu",
so that the user can disable all the options in that menu at once
instead of having to disable each option separately.

Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: VMX: Avoid saving and restoring msr_efer on lightweight vmexit
Eddie Dong [Mon, 21 May 2007 04:28:09 +0000 (07:28 +0300)]
KVM: VMX: Avoid saving and restoring msr_efer on lightweight vmexit

MSR_EFER.LME/LMA bits are automatically save/restored by VMX
hardware, KVM only needs to save NX/SCE bits at time of heavy
weight VM Exit. But clearing NX bits in host envirnment may
cause system hang if the host page table is using EXB bits,
thus we leave NX bits as it is. If Host NX=1 and guest NX=0, we
can do guest page table EXB bits check before inserting a shadow
pte (though no guest is expecting to see this kind of gp fault).
If host NX=0, we present guest no Execute-Disable feature to guest,
thus no host NX=0, guest NX=1 combination.

This patch reduces raw vmexit time by ~27%.

Me: fix compile warnings on i386.

Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: VMX: Cleanup redundant code in MSR set
Eddie Dong [Sun, 20 May 2007 07:50:08 +0000 (10:50 +0300)]
KVM: VMX: Cleanup redundant code in MSR set

Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: VMX: Avoid saving and restoring msrs on lightweight vmexit
Eddie Dong [Thu, 17 May 2007 15:55:15 +0000 (18:55 +0300)]
KVM: VMX: Avoid saving and restoring msrs on lightweight vmexit

In a lightweight exit (where we exit and reenter the guest without
scheduling or exiting to userspace in between), we don't need various
msrs on the host, and avoiding shuffling them around reduces raw exit
time by 8%.

i386 compile fix by Daniel Hecken <dh@bahntechnik.de>.

Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: VMX: Handle #SS faults from real mode
Nitin A Kamble [Thu, 17 May 2007 12:50:34 +0000 (15:50 +0300)]
KVM: VMX: Handle #SS faults from real mode

Instructions with address size override prefix opcode 0x67
Cause the #SS fault with 0 error code in VM86 mode.  Forward
them to the emulator.

Signed-Off-By: Nitin A Kamble <nitin.a.kamble@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: VMX: Use local labels in inline assembly
Avi Kivity [Mon, 14 May 2007 17:41:13 +0000 (20:41 +0300)]
KVM: VMX: Use local labels in inline assembly

This makes oprofile dumps and disassebly easier to read.

Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: Fix vmx I/O bitmap initialization on highmem systems
Avi Kivity [Tue, 8 May 2007 08:34:07 +0000 (11:34 +0300)]
KVM: Fix vmx I/O bitmap initialization on highmem systems

kunmap() expects a struct page, not a virtual address.  Fixes an oops loading
kvm-intel.ko on i386 with CONFIG_HIGHMEM.

Thanks to Michael Ivanov <deruhu@peterstar.ru> for reporting.

Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: Avoid corrupting tr in real mode
Avi Kivity [Mon, 7 May 2007 07:55:37 +0000 (10:55 +0300)]
KVM: Avoid corrupting tr in real mode

The real mode tr needs to be set to a specific tss so that I/O
instructions can function.  Divert the new tr values to the real
mode save area from where they will be restored on transition to
protected mode.

This fixes some crashes on reboot when the bios accesses an I/O
instruction.

Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: VMX: Only reload guest msrs if they are already loaded
Avi Kivity [Sun, 6 May 2007 13:10:01 +0000 (16:10 +0300)]
KVM: VMX: Only reload guest msrs if they are already loaded

If we set an msr via an ioctl() instead of by handling a guest exit, we
have the host state loaded, so reloading the msrs would clobber host
state instead of guest state.

This fixes a host oops (and loss of a cpu) on a guest reboot.

Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: MMU: Store shadow page tables as kernel virtual addresses, not physical
Avi Kivity [Sun, 6 May 2007 12:50:58 +0000 (15:50 +0300)]
KVM: MMU: Store shadow page tables as kernel virtual addresses, not physical

Simpifies things a bit.

Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: MMU: Simplify kvm_mmu_free_page() a tiny bit
Avi Kivity [Sun, 6 May 2007 12:36:30 +0000 (15:36 +0300)]
KVM: MMU: Simplify kvm_mmu_free_page() a tiny bit

Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: Implement IA32_EBL_CR_POWERON msr
Matthew Gregan [Sun, 6 May 2007 07:59:46 +0000 (10:59 +0300)]
KVM: Implement IA32_EBL_CR_POWERON msr

Attempting to boot the default 'bsd' kernel of OpenBSD 4.1 i386 in a guest
fails early in the kernel init inside p3_get_bus_clock while trying to read
the IA32_EBL_CR_POWERON MSR.  KVM logs an 'unhandled MSR' message and the
guest kernel faults.

This patch is sufficient to allow OpenBSD to boot, after which it seems to
run fine.  I'm not sure if this is the correct solution for dealing with
this particular MSR, but it works for me.

Signed-off-by: Matthew Gregan <kinetik@flim.org>
Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: Set cr0.mp for guests
Avi Kivity [Wed, 2 May 2007 20:06:22 +0000 (23:06 +0300)]
KVM: Set cr0.mp for guests

This allows fwait instructions to be trapped when the guest fpu is not
loaded.

Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: Consolidate guest fpu activation and deactivation
Avi Kivity [Wed, 2 May 2007 17:40:00 +0000 (20:40 +0300)]
KVM: Consolidate guest fpu activation and deactivation

Easier to keep track of where the fpu is this way.

Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: Rationalize exception bitmap usage
Avi Kivity [Wed, 2 May 2007 14:57:40 +0000 (17:57 +0300)]
KVM: Rationalize exception bitmap usage

Everyone owns a piece of the exception bitmap, but they happily write to
the entire thing like there's no tomorrow.  Centralize handling in
update_exception_bitmap() and have everyone call that.

Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: Move some more msr mangling into vmx_save_host_state()
Avi Kivity [Wed, 2 May 2007 14:33:43 +0000 (17:33 +0300)]
KVM: Move some more msr mangling into vmx_save_host_state()

Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: Fix potential guest state leak into host
Avi Kivity [Wed, 2 May 2007 13:54:03 +0000 (16:54 +0300)]
KVM: Fix potential guest state leak into host

The lightweight vmexit path avoids saving and reloading certain host
state.  However in certain cases lightweight vmexit handling can schedule()
which requires reloading the host state.

So we store the host state in the vcpu structure, and reloaded it if we
relinquish the vcpu.

Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: Increase mmu shadow cache to 1024 pages
Avi Kivity [Tue, 1 May 2007 15:24:38 +0000 (18:24 +0300)]
KVM: Increase mmu shadow cache to 1024 pages

This improves kbuild times by about 10%, bringing it within a respectable
25% of native.

Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: Update shadow pte on write to guest pte
Avi Kivity [Tue, 1 May 2007 13:53:31 +0000 (16:53 +0300)]
KVM: Update shadow pte on write to guest pte

A typical demand page/copy on write pattern is:

- page fault on vaddr
- kvm propagates fault to guest
- guest handles fault, updates pte
- kvm traps write, clears shadow pte, resumes guest
- guest returns to userspace, re-faults on same vaddr
- kvm installs shadow pte, resumes guest
- guest continues

So, three vmexits for a single guest page fault.  But if instead of clearing
the page table entry, we update to correspond to the value that the guest
has just written, we eliminate the third vmexit.

This patch does exactly that, reducing kbuild time by about 10%.

Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: MMU: Respect nonpae pagetable quadrant when zapping ptes
Avi Kivity [Tue, 1 May 2007 13:44:05 +0000 (16:44 +0300)]
KVM: MMU: Respect nonpae pagetable quadrant when zapping ptes

When a guest writes to a page that has an mmu shadow, we have to clear
the shadow pte corresponding to the memory location touched by the guest.

Now, in nonpae mode, a single guest page may have two or four shadow
pages (because a nonpae page maps 4MB or 4GB, whereas the pae shadow maps
2MB or 1GB), so we when we look up the page we find up to three additional
aliases for the page.  Since we _clear_ the shadow pte, it doesn't matter
except for a slight performance penalty, but if we want to _update_ the
shadow pte instead of clearing it, it is vital that we don't modify the
aliases.

Fortunately, exactly which page is needed (the "quadrant") is easily
computed, and is accessible in the shadow page header.  All we need is
to ignore shadow pages from the wrong quadrants.

Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: Unify kvm_mmu_pre_write() and kvm_mmu_post_write()
Avi Kivity [Tue, 1 May 2007 11:16:52 +0000 (14:16 +0300)]
KVM: Unify kvm_mmu_pre_write() and kvm_mmu_post_write()

Instead of calling two functions and repeating expensive checks, call one
function and provide it with before/after information.

Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: Be more careful restoring fs on lightweight vmexit
Avi Kivity [Tue, 1 May 2007 08:32:28 +0000 (11:32 +0300)]
KVM: Be more careful restoring fs on lightweight vmexit

i386 wants fs for accessing the pda even on a lightweight exit, so ensure
we can always restore it.  This fixes a regression on i386 introduced by
the lightweight vmexit patch.

Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: Reduce misfirings of the fork detector
Avi Kivity [Mon, 30 Apr 2007 14:05:38 +0000 (17:05 +0300)]
KVM: Reduce misfirings of the fork detector

The kvm mmu tries to detects forks by looking for repeated writes to a
page table.  If it sees a fork, it unshadows the page table so the page
table copying can proceed at native speed instead of being emulated.

However, the detector also triggered on simple demand paging access patterns:
a linear walk of memory would of course cause repeated writes to the same
pagetable page, causing it to unshadow prematurely.

Fix by resetting the fork detector if we detect a demand fault.

Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: Unindent some code
Avi Kivity [Mon, 30 Apr 2007 13:15:58 +0000 (16:15 +0300)]
KVM: Unindent some code

Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: Avoid saving and restoring some host CPU state on lightweight vmexit
Avi Kivity [Mon, 30 Apr 2007 13:07:54 +0000 (16:07 +0300)]
KVM: Avoid saving and restoring some host CPU state on lightweight vmexit

Many msrs and the like will only be used by the host if we schedule() or
return to userspace.  Therefore, we avoid saving them if we handle the
exit within the kernel, and if a reschedule is not requested.

Based on a patch from Eddie Dong <eddie.dong@intel.com> with a couple of
fixes by me.

Signed-off-by: Yaozu(Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: Assume that writes smaller than 4 bytes are to non-pagetable pages
Avi Kivity [Mon, 30 Apr 2007 11:47:02 +0000 (14:47 +0300)]
KVM: Assume that writes smaller than 4 bytes are to non-pagetable pages

This allows us to remove write protection earlier than otherwise.  Should
some mad OS choose to use byte writes to update pagetables, it will suffer
a performance hit, but still work correctly.

Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: SVM: Allow direct guest access to PC debug port
Anthony Liguori [Mon, 30 Apr 2007 06:48:11 +0000 (09:48 +0300)]
KVM: SVM: Allow direct guest access to PC debug port

The PC debug port is used for IO delay and does not require emulation.

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoKVM: VMX: Enable io bitmaps to avoid IO port 0x80 VMEXITs
He, Qing [Mon, 30 Apr 2007 06:45:24 +0000 (09:45 +0300)]
KVM: VMX: Enable io bitmaps to avoid IO port 0x80 VMEXITs

This patch enables IO bitmaps control on vmx and unmask the 0x80 port to
avoid VMEXITs caused by accessing port 0x80. 0x80 is used as delays (see
include/asm/io.h), and handling VMEXITs on its access is unnecessary but
slows things down. This patch improves kernel build test at around
3%~5%.
Because every VM uses the same io bitmap, it is shared between
all VMs rather than a per-VM data structure.

Signed-off-by: Qing He <qing.he@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
17 years agoMerge git://git.infradead.org/battery-2.6
Linus Torvalds [Sun, 15 Jul 2007 23:56:12 +0000 (16:56 -0700)]
Merge git://git.infradead.org/battery-2.6

* git://git.infradead.org/battery-2.6:
  git-battery vs git-acpi
  Power supply class and drivers: remove non obligatory return statements
  pda_power: clean up irq, timer
  MAINTAINERS: Add maintainers for power supply subsystem and drivers

Fixed up trivial conflict in drivers/w1/slaves/w1_ds2760.c manually

17 years agoMerge master.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6
Linus Torvalds [Sun, 15 Jul 2007 23:51:54 +0000 (16:51 -0700)]
Merge /pub/scm/linux/kernel/git/jejb/scsi-misc-2.6

* master.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (166 commits)
  [SCSI] ibmvscsi: convert to use the data buffer accessors
  [SCSI] dc395x: convert to use the data buffer accessors
  [SCSI] ncr53c8xx: convert to use the data buffer accessors
  [SCSI] sym53c8xx: convert to use the data buffer accessors
  [SCSI] ppa: coding police and printk levels
  [SCSI] aic7xxx_old: remove redundant GFP_ATOMIC from kmalloc
  [SCSI] i2o: remove redundant GFP_ATOMIC from kmalloc from device.c
  [SCSI] remove the dead CYBERSTORMIII_SCSI option
  [SCSI] don't build scsi_dma_{map,unmap} for !HAS_DMA
  [SCSI] Clean up scsi_add_lun a bit
  [SCSI] 53c700: Remove printk, which triggers because of low scsi clock on SNI RMs
  [SCSI] sni_53c710: Cleanup
  [SCSI] qla4xxx: Fix underrun/overrun conditions
  [SCSI] megaraid_mbox: use mutex instead of semaphore
  [SCSI] aacraid: add 51245, 51645 and 52245 adapters to documentation.
  [SCSI] qla2xxx: update version to 8.02.00-k1.
  [SCSI] qla2xxx: add support for NPIV
  [SCSI] stex: use resid for xfer len information
  [SCSI] Add Brownie 1200U3P to blacklist
  [SCSI] scsi.c: convert to use the data buffer accessors
  ...

17 years agoMerge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Linus Torvalds [Sun, 15 Jul 2007 23:50:46 +0000 (16:50 -0700)]
Merge branch 'master' of /linux/kernel/git/davem/net-2.6

* 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: (53 commits)
  [TCP]: Verify the presence of RETRANS bit when leaving FRTO
  [IPV6]: Call inet6addr_chain notifiers on link down
  [NET_SCHED]: Kill CONFIG_NET_CLS_POLICE
  [NET_SCHED]: act_api: qdisc internal reclassify support
  [NET_SCHED]: sch_dsmark: act_api support
  [NET_SCHED]: sch_atm: act_api support
  [NET_SCHED]: sch_atm: Lindent
  [IPV6]: MSG_ERRQUEUE messages do not pass to connected raw sockets
  [IPV4]: Cleanup call to __neigh_lookup()
  [NET_SCHED]: Revert "avoid transmit softirq on watchdog wakeup" optimization
  [NETFILTER]: nf_conntrack: UDPLITE support
  [NETFILTER]: nf_conntrack: mark protocols __read_mostly
  [NETFILTER]: x_tables: add connlimit match
  [NETFILTER]: Lower *tables printk severity
  [NETFILTER]: nf_conntrack: Don't track locally generated special ICMP error
  [NETFILTER]: nf_conntrack: Introduces nf_ct_get_tuplepr and uses it
  [NETFILTER]: nf_conntrack: make l3proto->prepare() generic and renames it
  [NETFILTER]: nf_conntrack: Increment error count on parsing IPv4 header
  [NET]: Add ethtool support for NETIF_F_IPV6_CSUM devices.
  [AF_IUCV]: Add lock when updating accept_q
  ...

17 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh...
Linus Torvalds [Sun, 15 Jul 2007 23:44:53 +0000 (16:44 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/ericvh/v9fs

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
  9p: fix a race condition bug in umount which caused a segfault
  9p: re-enable mount time debug option
  9p: cache meta-data when cache=loose
  net/9p: set error to EREMOTEIO if trans->write returns zero
  net/9p: change net/9p module name to 9pnet
  9p: Reorganization of 9p file system code

17 years agoMerge branch 'for-linus' of git://oss.sgi.com:8090/xfs/xfs-2.6
Linus Torvalds [Sun, 15 Jul 2007 23:43:43 +0000 (16:43 -0700)]
Merge branch 'for-linus' of git://oss.sgi.com:8090/xfs/xfs-2.6

* 'for-linus' of git://oss.sgi.com:8090/xfs/xfs-2.6: (37 commits)
  [XFS] Fix lockdep annotations for xfs_lock_inodes
  [LIB]: export radix_tree_preload()
  [XFS] Fix XFS_IOC_FSBULKSTAT{,_SINGLE} & XFS_IOC_FSINUMBERS in compat mode
  [XFS] Compat ioctl handler for handle operations
  [XFS] Compat ioctl handler for XFS_IOC_FSGEOMETRY_V1.
  [XFS] Clean up function name handling in tracing code
  [XFS] Quota inode has no parent.
  [XFS] Concurrent Multi-File Data Streams
  [XFS] Use uninitialized_var macro to stop warning about rtx
  [XFS] XFS should not be looking at filp reference counts
  [XFS] Use is_power_of_2 instead of open coding checks
  [XFS] Reduce shouting by removing unnecessary macros from dir2 code.
  [XFS] Simplify XFS min/max macros.
  [XFS] Kill off xfs_count_bits
  [XFS] Cancel transactions on xfs_itruncate_start error.
  [XFS] Use do_div() on 64 bit types.
  [XFS] Fix remount,readonly path to flush everything correctly.
  [XFS] Cleanup inode extent size hint extraction
  [XFS] Prevent ENOSPC from aborting transactions that need to succeed
  [XFS] Prevent deadlock when flushing inodes on unmount
  ...

17 years agomake i2c-acorn tristate
Al Viro [Sun, 15 Jul 2007 20:37:16 +0000 (21:37 +0100)]
make i2c-acorn tristate

It depends on tristate I2C and it's trivial to make modular.  The
current Kconfig allows I2C=m, I2C_ACORN=y, which doesn't work at
all; alternatives are dependency on I2C=y and making I2C_ACORN
itself a tristate.  The latter is the right thing to do...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
17 years agoicside: devm_iounmap() needs linux/io.h
Al Viro [Sun, 15 Jul 2007 20:01:32 +0000 (21:01 +0100)]
icside: devm_iounmap() needs linux/io.h

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
17 years agomissing argument in bin_attribute ->read()/->write()
Al Viro [Sun, 15 Jul 2007 20:01:22 +0000 (21:01 +0100)]
missing argument in bin_attribute ->read()/->write()

Fallout from commit 91a6902958f052358899f58683d44e36228d85c2 ('sysfs:
add parameter "struct bin_attribute *" ...')

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agofallout from constified seq_operations
Al Viro [Sun, 15 Jul 2007 20:01:12 +0000 (21:01 +0100)]
fallout from constified seq_operations

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agofallout from Auke's pci ->revision patch
Al Viro [Sun, 15 Jul 2007 20:01:02 +0000 (21:01 +0100)]
fallout from Auke's pci ->revision patch

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agoax88796: dev_dbg() wants device, not platform device
Al Viro [Sun, 15 Jul 2007 20:00:51 +0000 (21:00 +0100)]
ax88796: dev_dbg() wants device, not platform device

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agopass -msize-long to sparse on s390
Al Viro [Sun, 15 Jul 2007 20:00:41 +0000 (21:00 +0100)]
pass -msize-long to sparse on s390

s390 is the only 32bit with unsigned long for size_t (usual for those
is unsigned int).  Tell sparse...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agofrv: missing __clear_user()
Al Viro [Sun, 15 Jul 2007 20:00:31 +0000 (21:00 +0100)]
frv: missing __clear_user()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agozd1211rw: too early inclusion of asm/unaligned.h
Al Viro [Sun, 15 Jul 2007 20:00:21 +0000 (21:00 +0100)]
zd1211rw: too early inclusion of asm/unaligned.h

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agofix return type of skb_checksum_complete()
Al Viro [Sun, 15 Jul 2007 20:00:11 +0000 (21:00 +0100)]
fix return type of skb_checksum_complete()

It returns __sum16, not unsigned int

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agoPDA_POWER depends on having request_irq()
Al Viro [Sun, 15 Jul 2007 20:00:01 +0000 (21:00 +0100)]
PDA_POWER depends on having request_irq()

... so all proud owners of s390-based PDAs will have to live without that one

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agoieee1394: forgotten dereference...
Al Viro [Sun, 15 Jul 2007 19:59:51 +0000 (20:59 +0100)]
ieee1394: forgotten dereference...

Going through the string and waiting for _pointer_ to become '\0'
is not what the authors meant...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Ben Collins <ben.collins@ubuntu.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agothe wrong variable checked after request_irq()
Al Viro [Sun, 15 Jul 2007 19:59:41 +0000 (20:59 +0100)]
the wrong variable checked after request_irq()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agowrong order of arguments of ->readdir()
Al Viro [Sun, 15 Jul 2007 19:59:31 +0000 (20:59 +0100)]
wrong order of arguments of ->readdir()

Shows how many people are testing coda - the bug had been there for 5 years
and results of stepping on it are not subtle.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agominimal fixes for drivers/usb/gadget/m66592-udc.c
Al Viro [Sun, 15 Jul 2007 19:59:22 +0000 (20:59 +0100)]
minimal fixes for drivers/usb/gadget/m66592-udc.c

still looks racy (and definitely leaks)

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>