KVM: PPC: Book3S HV: Fix handling of interrupted VCPUs
This fixes a bug which results in stale vcore pointers being left in
the per-cpu preempted vcore lists when a VM is destroyed. The result
of the stale vcore pointers is usually either a crash or a lockup
inside collect_piggybacks() when another VM is run. A typical
lockup message looks like:
[ 472.161074] NMI watchdog: BUG: soft lockup - CPU#24 stuck for 22s! [qemu-system-ppc:7039]
[ 472.161204] Modules linked in: kvm_hv kvm_pr kvm xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw ses enclosure shpchp rtc_opal i2c_opal powernv_rng binfmt_misc dm_service_time scsi_dh_alua radeon i2c_algo_bit drm_kms_helper ttm drm tg3 ptp pps_core cxgb3 ipr i2c_core mdio dm_multipath [last unloaded: kvm_hv]
[ 472.162111] CPU: 24 PID: 7039 Comm: qemu-system-ppc Not tainted 4.2.0-kvm+ #49
[ 472.162187] task:
c000001e38512750 ti:
c000001e41bfc000 task.ti:
c000001e41bfc000
[ 472.162262] NIP:
c00000000096b094 LR:
c00000000096b08c CTR:
c000000000111130
[ 472.162337] REGS:
c000001e41bff520 TRAP: 0901 Not tainted (4.2.0-kvm+)
[ 472.162399] MSR:
9000000100009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR:
24848844 XER:
00000000
[ 472.162588] CFAR:
c00000000096b0ac SOFTE: 1
GPR00:
c000000000111170 c000001e41bff7a0 c00000000127df00 0000000000000001
GPR04:
0000000000000003 0000000000000001 0000000000000000 0000000000874821
GPR08:
c000001e41bff8e0 0000000000000001 0000000000000000 d00000000efde740
GPR12:
c000000000111130 c00000000fdae400
[ 472.163053] NIP [
c00000000096b094] _raw_spin_lock_irqsave+0xa4/0x130
[ 472.163117] LR [
c00000000096b08c] _raw_spin_lock_irqsave+0x9c/0x130
[ 472.163179] Call Trace:
[ 472.163206] [
c000001e41bff7a0] [
c000001e41bff7f0] 0xc000001e41bff7f0 (unreliable)
[ 472.163295] [
c000001e41bff7e0] [
c000000000111170] __wake_up+0x40/0x90
[ 472.163375] [
c000001e41bff830] [
d00000000efd6fc0] kvmppc_run_core+0x1240/0x1950 [kvm_hv]
[ 472.163465] [
c000001e41bffa30] [
d00000000efd8510] kvmppc_vcpu_run_hv+0x5a0/0xd90 [kvm_hv]
[ 472.163559] [
c000001e41bffb70] [
d00000000e9318a4] kvmppc_vcpu_run+0x44/0x60 [kvm]
[ 472.163653] [
c000001e41bffba0] [
d00000000e92e674] kvm_arch_vcpu_ioctl_run+0x64/0x170 [kvm]
[ 472.163745] [
c000001e41bffbe0] [
d00000000e9263a8] kvm_vcpu_ioctl+0x538/0x7b0 [kvm]
[ 472.163834] [
c000001e41bffd40] [
c0000000002d0f50] do_vfs_ioctl+0x480/0x7c0
[ 472.163910] [
c000001e41bffde0] [
c0000000002d1364] SyS_ioctl+0xd4/0xf0
[ 472.163986] [
c000001e41bffe30] [
c000000000009260] system_call+0x38/0xd0
[ 472.164060] Instruction dump:
[ 472.164098]
ebc1fff0 ebe1fff8 7c0803a6 4e800020 60000000 60000000 60420000 8bad02e2
[ 472.164224]
7fc3f378 4b6a57c1 60000000 7c210b78 <
e92d0000>
89290009 792affe3 40820070
The bug is that kvmppc_run_vcpu does not correctly handle the case
where a vcpu task receives a signal while its guest vcpu is executing
in the guest as a result of being piggy-backed onto the execution of
another vcore. In that case we need to wait for the vcpu to finish
executing inside the guest, and then remove this vcore from the
preempted vcores list. That way, we avoid leaving this vcpu's vcore
on the preempted vcores list when the vcpu gets interrupted.
Fixes: ec2571650826
Reported-by: Thomas Huth <thuth@redhat.com>
Tested-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>