Linus Torvalds [Fri, 21 Jul 2017 18:18:09 +0000 (11:18 -0700)]
Merge branch 'timers-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull timer fix from Ingo Molnar:
"A timer_irq_init() clocksource API robustness fix"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
clocksource/drivers/timer-of: Handle of_irq_get_byname() result correctly
Linus Torvalds [Fri, 21 Jul 2017 18:16:12 +0000 (11:16 -0700)]
Merge branch 'sched-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
"A cputime fix and code comments/organization fix to the deadline
scheduler"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/deadline: Fix confusing comments about selection of top pi-waiter
sched/cputime: Don't use smp_processor_id() in preemptible context
Linus Torvalds [Fri, 21 Jul 2017 18:12:48 +0000 (11:12 -0700)]
Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
"Two hw-enablement patches, two race fixes, three fixes for regressions
of semantics, plus a number of tooling fixes"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel: Add proper condition to run sched_task callbacks
perf/core: Fix locking for children siblings group read
perf/core: Fix scheduling regression of pinned groups
perf/x86/intel: Fix debug_store reset field for freq events
perf/x86/intel: Add Goldmont Plus CPU PMU support
perf/x86/intel: Enable C-state residency events for Apollo Lake
perf symbols: Accept zero as the kernel base address
Revert "perf/core: Drop kernel samples even though :u is specified"
perf annotate: Fix broken arrow at row 0 connecting jmp instruction to its target
perf evsel: State in the default event name if attr.exclude_kernel is set
perf evsel: Fix attr.exclude_kernel setting for default cycles:p
Linus Torvalds [Fri, 21 Jul 2017 18:11:23 +0000 (11:11 -0700)]
Merge branch 'locking-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull locking fixlet from Ingo Molnar:
"Remove an unnecessary priority adjustment in the rtmutex code"
* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/rtmutex: Remove unnecessary priority adjustment
Linus Torvalds [Fri, 21 Jul 2017 18:07:41 +0000 (11:07 -0700)]
Merge branch 'irq-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull irq fixes from Ingo Molnar:
"A resume_irq() fix, plus a number of static declaration fixes"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/digicolor: Drop unnecessary static
irqchip/mips-cpu: Drop unnecessary static
irqchip/gic/realview: Drop unnecessary static
irqchip/mips-gic: Remove population of irq domain names
genirq/PM: Properly pretend disabled state when force resuming interrupts
Linus Torvalds [Fri, 21 Jul 2017 17:41:19 +0000 (10:41 -0700)]
Merge branch 'core-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull core fixes from Ingo Molnar:
"A fix to WARN_ON_ONCE() done by modules, plus a MAINTAINERS update"
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
debug: Fix WARN_ON_ONCE() for modules
MAINTAINERS: Update the PTRACE entry
Jiri Olsa [Wed, 19 Jul 2017 07:52:47 +0000 (09:52 +0200)]
perf/x86/intel: Add proper condition to run sched_task callbacks
We have 2 functions using the same sched_task callback:
- PEBS drain for free running counters
- LBR save/store
Both of them are called from intel_pmu_sched_task() and
either of them can be unwillingly triggered when the
other one is configured to run.
Let's say there's PEBS drain configured in sched_task
callback for the event, but in the callback itself
(intel_pmu_sched_task()) we will also run the code for
LBR save/restore, which we did not ask for, but the
code in intel_pmu_sched_task() does not check for that.
This can lead to extra cycles in some perf monitoring,
like when we monitor PEBS event without LBR data.
# perf record --no-timestamp -c 10000 -e cycles:p ./perf bench sched pipe -l
1000000
(We need PEBS, non freq/non timestamp event to enable
the sched_task callback)
The perf stat of cycles and msr:write_msr for above
command before the change:
...
Performance counter stats for './perf record --no-timestamp -c 10000 -e cycles:p \
./perf bench sched pipe -l
1000000' (5 runs):
18,519,557,441 cycles:k
91,195,527 msr:write_msr
29.
334476406 seconds time elapsed
And after the change:
...
Performance counter stats for './perf record --no-timestamp -c 10000 -e cycles:p \
./perf bench sched pipe -l
1000000' (5 runs):
18,704,973,540 cycles:k
27,184,720 msr:write_msr
16.
977875900 seconds time elapsed
There's no affect on cycles:k because the sched_task happens
with events switched off, however the msr:write_msr tracepoint
counter together with almost 50% of time speedup show the
improvement.
Monitoring LBR event and having extra PEBS drain processing
in sched_task callback showed just a little speedup, because
the drain function does not do much extra work in case there
is no PEBS data.
Adding conditions to recognize the configured work that needs
to be done in the x86_pmu's sched_task callback.
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20170719075247.GA27506@krava
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Jiri Olsa [Thu, 20 Jul 2017 14:14:55 +0000 (16:14 +0200)]
perf/core: Fix locking for children siblings group read
We're missing ctx lock when iterating children siblings
within the perf_read path for group reading. Following
race and crash can happen:
User space doing read syscall on event group leader:
T1:
perf_read
lock event->ctx->mutex
perf_read_group
lock leader->child_mutex
__perf_read_group_add(child)
list_for_each_entry(sub, &leader->sibling_list, group_entry)
----> sub might be invalid at this point, because it could
get removed via perf_event_exit_task_context in T2
Child exiting and cleaning up its events:
T2:
perf_event_exit_task_context
lock ctx->mutex
list_for_each_entry_safe(child_event, next, &child_ctx->event_list,...
perf_event_exit_event(child)
lock ctx->lock
perf_group_detach(child)
unlock ctx->lock
----> child is removed from sibling_list without any sync
with T1 path above
...
free_event(child)
Before the child is removed from the leader's child_list,
(and thus is omitted from perf_read_group processing), we
need to ensure that perf_read_group touches child's
siblings under its ctx->lock.
Peter further notes:
| One additional note; this bug got exposed by commit:
|
|
ba5213ae6b88 ("perf/core: Correct event creation with PERF_FORMAT_GROUP")
|
| which made it possible to actually trigger this code-path.
Tested-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: ba5213ae6b88 ("perf/core: Correct event creation with PERF_FORMAT_GROUP")
Link: http://lkml.kernel.org/r/20170720141455.2106-1-jolsa@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Arnd Bergmann [Fri, 14 Jul 2017 09:25:13 +0000 (11:25 +0200)]
ide: avoid warning for timings calculation
gcc-7 warns about the result of a constant multiplication used as
a boolean:
drivers/ide/ide-timings.c: In function 'ide_timing_quantize':
drivers/ide/ide-timings.c:112:24: error: '*' in boolean context, suggest '&&' instead [-Werror=int-in-bool-context]
q->setup = EZ(t->setup * 1000, T);
This slightly rearranges the macro to simplify the code and avoid
the warning at the same time.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Thu, 20 Jul 2017 23:33:39 +0000 (16:33 -0700)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
1) BPF verifier signed/unsigned value tracking fix, from Daniel
Borkmann, Edward Cree, and Josef Bacik.
2) Fix memory allocation length when setting up calls to
->ndo_set_mac_address, from Cong Wang.
3) Add a new cxgb4 device ID, from Ganesh Goudar.
4) Fix FIB refcount handling, we have to set it's initial value before
the configure callback (which can bump it). From David Ahern.
5) Fix double-free in qcom/emac driver, from Timur Tabi.
6) A bunch of gcc-7 string format overflow warning fixes from Arnd
Bergmann.
7) Fix link level headroom tests in ip_do_fragment(), from Vasily
Averin.
8) Fix chunk walking in SCTP when iterating over error and parameter
headers. From Alexander Potapenko.
9) TCP BBR congestion control fixes from Neal Cardwell.
10) Fix SKB fragment handling in bcmgenet driver, from Doug Berger.
11) BPF_CGROUP_RUN_PROG_SOCK_OPS needs to check for null __sk, from Cong
Wang.
12) xmit_recursion in ppp driver needs to be per-device not per-cpu,
from Gao Feng.
13) Cannot release skb->dst in UDP if IP options processing needs it.
From Paolo Abeni.
14) Some netdev ioctl ifr_name[] NULL termination fixes. From Alexander
Levin and myself.
15) Revert some rtnetlink notification changes that are causing
regressions, from David Ahern.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (83 commits)
net: bonding: Fix transmit load balancing in balance-alb mode
rds: Make sure updates to cp_send_gen can be observed
net: ethernet: ti: cpsw: Push the request_irq function to the end of probe
ipv4: initialize fib_trie prior to register_netdev_notifier call.
rtnetlink: allocate more memory for dev_set_mac_address()
net: dsa: b53: Add missing ARL entries for BCM53125
bpf: more tests for mixed signed and unsigned bounds checks
bpf: add test for mixed signed and unsigned bounds checks
bpf: fix up test cases with mixed signed/unsigned bounds
bpf: allow to specify log level and reduce it for test_verifier
bpf: fix mixed signed/unsigned derived min/max value bounds
ipv6: avoid overflow of offset in ip6_find_1stfragopt
net: tehuti: don't process data if it has not been copied from userspace
Revert "rtnetlink: Do not generate notifications for CHANGEADDR event"
net: dsa: mv88e6xxx: Enable CMODE config support for 6390X
dt-binding: ptp: Add SoC compatibility strings for dte ptp clock
NET: dwmac: Make dwmac reset unconditional
net: Zero terminate ifr_name in dev_ifname().
wireless: wext: terminate ifr name coming from userspace
netfilter: fix netfilter_net_init() return
...
Kosuke Tatsukawa [Thu, 20 Jul 2017 05:20:40 +0000 (05:20 +0000)]
net: bonding: Fix transmit load balancing in balance-alb mode
balance-alb mode used to have transmit dynamic load balancing feature
enabled by default. However, transmit dynamic load balancing no longer
works in balance-alb after commit
8b426dc54cf4 ("bonding: remove
hardcoded value").
Both balance-tlb and balance-alb use the function bond_do_alb_xmit() to
send packets. This function uses the parameter tlb_dynamic_lb.
tlb_dynamic_lb used to have the default value of 1 for balance-alb, but
now the value is set to 0 except in balance-tlb.
Re-enable transmit dyanmic load balancing by initializing tlb_dynamic_lb
for balance-alb similar to balance-tlb.
Fixes: 8b426dc54cf4 ("bonding: remove hardcoded value")
Signed-off-by: Kosuke Tatsukawa <tatsu@ab.jp.nec.com>
Acked-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Håkon Bugge [Thu, 20 Jul 2017 10:28:55 +0000 (12:28 +0200)]
rds: Make sure updates to cp_send_gen can be observed
cp->cp_send_gen is treated as a normal variable, although it may be
used by different threads.
This is fixed by using {READ,WRITE}_ONCE when it is incremented and
READ_ONCE when it is read outside the {acquire,release}_in_xmit
protection.
Normative reference from the Linux-Kernel Memory Model:
Loads from and stores to shared (but non-atomic) variables should
be protected with the READ_ONCE(), WRITE_ONCE(), and
ACCESS_ONCE().
Clause 5.1.2.4/25 in the C standard is also relevant.
Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Knut Omang <knut.omang@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Keerthy [Thu, 20 Jul 2017 11:29:52 +0000 (16:59 +0530)]
net: ethernet: ti: cpsw: Push the request_irq function to the end of probe
Push the request_irq function to the end of probe so as
to ensure all the required fields are populated in the event
of an ISR getting executed right after requesting the irq.
Currently while loading the crash kernel a crash was seen as
soon as devm_request_threaded_irq was called. This was due to
n->poll being NULL which is called as part of net_rx_action
function.
Suggested-by: Sekhar Nori <nsekhar@ti.com>
Signed-off-by: Keerthy <j-keerthy@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mahesh Bandewar [Wed, 19 Jul 2017 22:41:33 +0000 (15:41 -0700)]
ipv4: initialize fib_trie prior to register_netdev_notifier call.
Net stack initialization currently initializes fib-trie after the
first call to netdevice_notifier() call. In fact fib_trie initialization
needs to happen before first rtnl_register(). It does not cause any problem
since there are no devices UP at this moment, but trying to bring 'lo'
UP at initialization would make this assumption wrong and exposes the issue.
Fixes following crash
Call Trace:
? alternate_node_alloc+0x76/0xa0
fib_table_insert+0x1b7/0x4b0
fib_magic.isra.17+0xea/0x120
fib_add_ifaddr+0x7b/0x190
fib_netdev_event+0xc0/0x130
register_netdevice_notifier+0x1c1/0x1d0
ip_fib_init+0x72/0x85
ip_rt_init+0x187/0x1e9
ip_init+0xe/0x1a
inet_init+0x171/0x26c
? ipv4_offload_init+0x66/0x66
do_one_initcall+0x43/0x160
kernel_init_freeable+0x191/0x219
? rest_init+0x80/0x80
kernel_init+0xe/0x150
ret_from_fork+0x22/0x30
Code: f6 46 23 04 74 86 4c 89 f7 e8 ae 45 01 00 49 89 c7 4d 85 ff 0f 85 7b ff ff ff 31 db eb 08 4c 89 ff e8 16 47 01 00 48 8b 44 24 38 <45> 8b 6e 14 4d 63 76 74 48 89 04 24 0f 1f 44 00 00 48 83 c4 08
RIP: kmem_cache_alloc+0xcf/0x1c0 RSP:
ffff9b1500017c28
CR2:
0000000000000014
Fixes: 7b1a74fdbb9e ("[NETNS]: Refactor fib initialization so it can handle multiple namespaces.")
Fixes: 7f9b80529b8a ("[IPV4]: fib hash|trie initialization")
Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
WANG Cong [Thu, 20 Jul 2017 18:27:57 +0000 (11:27 -0700)]
rtnetlink: allocate more memory for dev_set_mac_address()
virtnet_set_mac_address() interprets mac address as struct
sockaddr, but upper layer only allocates dev->addr_len
which is ETH_ALEN + sizeof(sa_family_t) in this case.
We lack a unified definition for mac address, so just fix
the upper layer, this also allows drivers to interpret it
to struct sockaddr freely.
Reported-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Thu, 20 Jul 2017 19:25:22 +0000 (12:25 -0700)]
net: dsa: b53: Add missing ARL entries for BCM53125
The BCM53125 entry was missing an arl_entries member which would
basically prevent the ARL search from terminating properly. This switch
has 4 ARL entries, so add that.
Fixes: 1da6df85c6fb ("net: dsa: b53: Implement ARL add/del/dump operations")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 20 Jul 2017 22:20:28 +0000 (15:20 -0700)]
Merge branch 'BPF-map-value-adjust-fix'
Daniel Borkmann says:
====================
BPF map value adjust fix
First patch in the series is the actual fix and the remaining
patches are just updates to selftests.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Thu, 20 Jul 2017 22:00:25 +0000 (00:00 +0200)]
bpf: more tests for mixed signed and unsigned bounds checks
Add a couple of more test cases to BPF selftests that are related
to mixed signed and unsigned checks.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Edward Cree [Thu, 20 Jul 2017 22:00:24 +0000 (00:00 +0200)]
bpf: add test for mixed signed and unsigned bounds checks
These failed due to a bug in verifier bounds handling.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Thu, 20 Jul 2017 22:00:23 +0000 (00:00 +0200)]
bpf: fix up test cases with mixed signed/unsigned bounds
Fix the few existing test cases that used mixed signed/unsigned
bounds and switch them only to one flavor. Reason why we need this
is that proper boundaries cannot be derived from mixed tests.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Thu, 20 Jul 2017 22:00:22 +0000 (00:00 +0200)]
bpf: allow to specify log level and reduce it for test_verifier
For the test_verifier case, it's quite hard to parse log level 2 to
figure out what's causing an issue when used to log level 1. We do
want to use bpf_verify_program() in order to simulate some of the
tests with strict alignment. So just add an argument to pass the level
and put it to 1 for test_verifier.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Thu, 20 Jul 2017 22:00:21 +0000 (00:00 +0200)]
bpf: fix mixed signed/unsigned derived min/max value bounds
Edward reported that there's an issue in min/max value bounds
tracking when signed and unsigned compares both provide hints
on limits when having unknown variables. E.g. a program such
as the following should have been rejected:
0: (7a) *(u64 *)(r10 -8) = 0
1: (bf) r2 = r10
2: (07) r2 += -8
3: (18) r1 = 0xffff8a94cda93400
5: (85) call bpf_map_lookup_elem#1
6: (15) if r0 == 0x0 goto pc+7
R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R10=fp
7: (7a) *(u64 *)(r10 -16) = -8
8: (79) r1 = *(u64 *)(r10 -16)
9: (b7) r2 = -1
10: (2d) if r1 > r2 goto pc+3
R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R1=inv,min_value=0
R2=imm-1,max_value=
18446744073709551615,min_align=1 R10=fp
11: (65) if r1 s> 0x1 goto pc+2
R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R1=inv,min_value=0,max_value=1
R2=imm-1,max_value=
18446744073709551615,min_align=1 R10=fp
12: (0f) r0 += r1
13: (72) *(u8 *)(r0 +0) = 0
R0=map_value_adj(ks=8,vs=8,id=0),min_value=0,max_value=1 R1=inv,min_value=0,max_value=1
R2=imm-1,max_value=
18446744073709551615,min_align=1 R10=fp
14: (b7) r0 = 0
15: (95) exit
What happens is that in the first part ...
8: (79) r1 = *(u64 *)(r10 -16)
9: (b7) r2 = -1
10: (2d) if r1 > r2 goto pc+3
... r1 carries an unsigned value, and is compared as unsigned
against a register carrying an immediate. Verifier deduces in
reg_set_min_max() that since the compare is unsigned and operation
is greater than (>), that in the fall-through/false case, r1's
minimum bound must be 0 and maximum bound must be r2. Latter is
larger than the bound and thus max value is reset back to being
'invalid' aka BPF_REGISTER_MAX_RANGE. Thus, r1 state is now
'R1=inv,min_value=0'. The subsequent test ...
11: (65) if r1 s> 0x1 goto pc+2
... is a signed compare of r1 with immediate value 1. Here,
verifier deduces in reg_set_min_max() that since the compare
is signed this time and operation is greater than (>), that
in the fall-through/false case, we can deduce that r1's maximum
bound must be 1, meaning with prior test, we result in r1 having
the following state: R1=inv,min_value=0,max_value=1. Given that
the actual value this holds is -8, the bounds are wrongly deduced.
When this is being added to r0 which holds the map_value(_adj)
type, then subsequent store access in above case will go through
check_mem_access() which invokes check_map_access_adj(), that
will then probe whether the map memory is in bounds based
on the min_value and max_value as well as access size since
the actual unknown value is min_value <= x <= max_value; commit
fce366a9dd0d ("bpf, verifier: fix alu ops against map_value{,
_adj} register types") provides some more explanation on the
semantics.
It's worth to note in this context that in the current code,
min_value and max_value tracking are used for two things, i)
dynamic map value access via check_map_access_adj() and since
commit
06c1c049721a ("bpf: allow helpers access to variable memory")
ii) also enforced at check_helper_mem_access() when passing a
memory address (pointer to packet, map value, stack) and length
pair to a helper and the length in this case is an unknown value
defining an access range through min_value/max_value in that
case. The min_value/max_value tracking is /not/ used in the
direct packet access case to track ranges. However, the issue
also affects case ii), for example, the following crafted program
based on the same principle must be rejected as well:
0: (b7) r2 = 0
1: (bf) r3 = r10
2: (07) r3 += -512
3: (7a) *(u64 *)(r10 -16) = -8
4: (79) r4 = *(u64 *)(r10 -16)
5: (b7) r6 = -1
6: (2d) if r4 > r6 goto pc+5
R1=ctx R2=imm0,min_value=0,max_value=0,min_align=
2147483648 R3=fp-512
R4=inv,min_value=0 R6=imm-1,max_value=
18446744073709551615,min_align=1 R10=fp
7: (65) if r4 s> 0x1 goto pc+4
R1=ctx R2=imm0,min_value=0,max_value=0,min_align=
2147483648 R3=fp-512
R4=inv,min_value=0,max_value=1 R6=imm-1,max_value=
18446744073709551615,min_align=1
R10=fp
8: (07) r4 += 1
9: (b7) r5 = 0
10: (6a) *(u16 *)(r10 -512) = 0
11: (85) call bpf_skb_load_bytes#26
12: (b7) r0 = 0
13: (95) exit
Meaning, while we initialize the max_value stack slot that the
verifier thinks we access in the [1,2] range, in reality we
pass -7 as length which is interpreted as u32 in the helper.
Thus, this issue is relevant also for the case of helper ranges.
Resetting both bounds in check_reg_overflow() in case only one
of them exceeds limits is also not enough as similar test can be
created that uses values which are within range, thus also here
learned min value in r1 is incorrect when mixed with later signed
test to create a range:
0: (7a) *(u64 *)(r10 -8) = 0
1: (bf) r2 = r10
2: (07) r2 += -8
3: (18) r1 = 0xffff880ad081fa00
5: (85) call bpf_map_lookup_elem#1
6: (15) if r0 == 0x0 goto pc+7
R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R10=fp
7: (7a) *(u64 *)(r10 -16) = -8
8: (79) r1 = *(u64 *)(r10 -16)
9: (b7) r2 = 2
10: (3d) if r2 >= r1 goto pc+3
R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R1=inv,min_value=3
R2=imm2,min_value=2,max_value=2,min_align=2 R10=fp
11: (65) if r1 s> 0x4 goto pc+2
R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0
R1=inv,min_value=3,max_value=4 R2=imm2,min_value=2,max_value=2,min_align=2 R10=fp
12: (0f) r0 += r1
13: (72) *(u8 *)(r0 +0) = 0
R0=map_value_adj(ks=8,vs=8,id=0),min_value=3,max_value=4
R1=inv,min_value=3,max_value=4 R2=imm2,min_value=2,max_value=2,min_align=2 R10=fp
14: (b7) r0 = 0
15: (95) exit
This leaves us with two options for fixing this: i) to invalidate
all prior learned information once we switch signed context, ii)
to track min/max signed and unsigned boundaries separately as
done in [0]. (Given latter introduces major changes throughout
the whole verifier, it's rather net-next material, thus this
patch follows option i), meaning we can derive bounds either
from only signed tests or only unsigned tests.) There is still the
case of adjust_reg_min_max_vals(), where we adjust bounds on ALU
operations, meaning programs like the following where boundaries
on the reg get mixed in context later on when bounds are merged
on the dst reg must get rejected, too:
0: (7a) *(u64 *)(r10 -8) = 0
1: (bf) r2 = r10
2: (07) r2 += -8
3: (18) r1 = 0xffff89b2bf87ce00
5: (85) call bpf_map_lookup_elem#1
6: (15) if r0 == 0x0 goto pc+6
R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R10=fp
7: (7a) *(u64 *)(r10 -16) = -8
8: (79) r1 = *(u64 *)(r10 -16)
9: (b7) r2 = 2
10: (3d) if r2 >= r1 goto pc+2
R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R1=inv,min_value=3
R2=imm2,min_value=2,max_value=2,min_align=2 R10=fp
11: (b7) r7 = 1
12: (65) if r7 s> 0x0 goto pc+2
R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R1=inv,min_value=3
R2=imm2,min_value=2,max_value=2,min_align=2 R7=imm1,max_value=0 R10=fp
13: (b7) r0 = 0
14: (95) exit
from 12 to 15: R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0
R1=inv,min_value=3 R2=imm2,min_value=2,max_value=2,min_align=2 R7=imm1,min_value=1 R10=fp
15: (0f) r7 += r1
16: (65) if r7 s> 0x4 goto pc+2
R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R1=inv,min_value=3
R2=imm2,min_value=2,max_value=2,min_align=2 R7=inv,min_value=4,max_value=4 R10=fp
17: (0f) r0 += r7
18: (72) *(u8 *)(r0 +0) = 0
R0=map_value_adj(ks=8,vs=8,id=0),min_value=4,max_value=4 R1=inv,min_value=3
R2=imm2,min_value=2,max_value=2,min_align=2 R7=inv,min_value=4,max_value=4 R10=fp
19: (b7) r0 = 0
20: (95) exit
Meaning, in adjust_reg_min_max_vals() we must also reset range
values on the dst when src/dst registers have mixed signed/
unsigned derived min/max value bounds with one unbounded value
as otherwise they can be added together deducing false boundaries.
Once both boundaries are established from either ALU ops or
compare operations w/o mixing signed/unsigned insns, then they
can safely be added to other regs also having both boundaries
established. Adding regs with one unbounded side to a map value
where the bounded side has been learned w/o mixing ops is
possible, but the resulting map value won't recover from that,
meaning such op is considered invalid on the time of actual
access. Invalid bounds are set on the dst reg in case i) src reg,
or ii) in case dst reg already had them. The only way to recover
would be to perform i) ALU ops but only 'add' is allowed on map
value types or ii) comparisons, but these are disallowed on
pointers in case they span a range. This is fine as only BPF_JEQ
and BPF_JNE may be performed on PTR_TO_MAP_VALUE_OR_NULL registers
which potentially turn them into PTR_TO_MAP_VALUE type depending
on the branch, so only here min/max value cannot be invalidated
for them.
In terms of state pruning, value_from_signed is considered
as well in states_equal() when dealing with adjusted map values.
With regards to breaking existing programs, there is a small
risk, but use-cases are rather quite narrow where this could
occur and mixing compares probably unlikely.
Joint work with Josef and Edward.
[0] https://lists.iovisor.org/pipermail/iovisor-dev/2017-June/000822.html
Fixes: 484611357c19 ("bpf: allow access into map value arrays")
Reported-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Thu, 20 Jul 2017 21:56:46 +0000 (14:56 -0700)]
Merge tag 'pm-4.13-rc2' of git://git./linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These are two stable-candidate fixes for the intel_pstate driver and
the generic power domains (genpd) framework.
Specifics:
- Fix the average CPU load computations in the intel_pstate driver on
Knights Landing (Xeon Phi) processors that require an extra factor
to compensate for a rate change differences between the TSC and
MPERF which is missing (Srinivas Pandruvada).
- Fix an initialization ordering issue in the generic power domains
(genpd) framework (Sudeep Holla)"
* tag 'pm-4.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM / Domains: defer dev_pm_domain_set() until genpd->attach_dev succeeds if present
cpufreq: intel_pstate: Correct the busy calculation for KNL
Linus Torvalds [Thu, 20 Jul 2017 18:34:47 +0000 (11:34 -0700)]
x86: mark kprobe templates as character arrays, not single characters
They really are, and the "take the address of a single character" makes
the string fortification code unhappy (it believes that you can now only
acccess one byte, rather than a byte range, and then raises errors for
the memory copies going on in there).
We could now remove a few 'addressof' operators (since arrays naturally
degrade to pointers), but this is the minimal patch that just changes
the C prototypes of those template arrays (the templates themselves are
defined in inline asm).
Reported-by: kernel test robot <xiaolong.ye@intel.com>
Acked-and-tested-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Daniel Micay <danielmicay@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Thu, 20 Jul 2017 17:41:12 +0000 (10:41 -0700)]
Merge branch 'for_linus' of git://git./linux/kernel/git/jack/linux-fs
Pull misc filesystem fixes from Jan Kara:
"Several ACL related fixes for ext2, reiserfs, and hfsplus.
And also one minor isofs cleanup"
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
hfsplus: Don't clear SGID when inheriting ACLs
isofs: Fix off-by-one in 'session' mount option parsing
reiserfs: preserve i_mode if __reiserfs_set_acl() fails
ext2: preserve i_mode if ext2_set_acl() fails
ext2: Don't clear SGID when inheriting ACLs
reiserfs: Don't clear SGID when inheriting ACLs
Linus Torvalds [Thu, 20 Jul 2017 17:30:16 +0000 (10:30 -0700)]
Merge tag 'for-f2fs-v4.13-rc2' of git://git./linux/kernel/git/jaegeuk/f2fs
Pull f2fs fixes from Jaegeuk Kim:
"We've filed some bug fixes:
- missing f2fs case in terms of stale SGID bit, introduced by Jan
- build error for seq_file.h
- avoid cpu lockup
- wrong inode_unlock in error case"
* tag 'for-f2fs-v4.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs:
f2fs: avoid cpu lockup
f2fs: include seq_file.h for sysfs.c
f2fs: Don't clear SGID when inheriting ACLs
f2fs: remove extra inode_unlock() in error path
Linus Torvalds [Thu, 20 Jul 2017 17:22:26 +0000 (10:22 -0700)]
Merge branch 'stable-4.13' of git://git.infradead.org/users/pcmoore/audit
Pull audit fix from Paul Moore:
"A small audit fix, just a single line, to plug a memory leak in some
audit error handling code"
* 'stable-4.13' of git://git.infradead.org/users/pcmoore/audit:
audit: fix memleak in auditd_send_unicast_skb.
Linus Torvalds [Thu, 20 Jul 2017 17:17:53 +0000 (10:17 -0700)]
Merge tag 'libnvdimm-fixes-4.13-rc2' of git://git./linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm fixes from Dan Williams:
"A handful of small fixes for 4.13-rc2. Three of these fixes are tagged
for -stable. They have all appeared in at least one -next release with
no reported issues
- Fix handling of media errors that span a sector
- Fix support of multiple namespaces in a libnvdimm region being in
device-dax mode
- Clean up the machine check notifier properly when the nfit driver
fails to register
- Address a static analysis (smatch) report in device-dax"
* tag 'libnvdimm-fixes-4.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
device-dax: fix sysfs duplicate warnings
MAINTAINERS: list drivers/acpi/nfit/ files for libnvdimm sub-system
acpi/nfit: Fix memory corruption/Unregister mce decoder on failure
device-dax: fix 'passing zero to ERR_PTR()' warning
libnvdimm: fix badblock range handling of ARS range
Linus Torvalds [Thu, 20 Jul 2017 17:14:54 +0000 (10:14 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/jikos/hid
Pull HID fixes from Jiri Kosina:
- HID multitouch 4.12 regression fix from Dmitry Torokhov
- error handling fix for HID++ driver from Gustavo A. R. Silva
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
HID: hid-logitech-hidpp: add NULL check on devm_kmemdup() return value
HID: multitouch: do not blindly set EV_KEY or EV_ABS bits
Rafael J. Wysocki [Thu, 20 Jul 2017 16:57:15 +0000 (18:57 +0200)]
Merge branches 'intel_pstate' and 'pm-domains'
* intel_pstate:
cpufreq: intel_pstate: Correct the busy calculation for KNL
* pm-domains:
PM / Domains: defer dev_pm_domain_set() until genpd->attach_dev succeeds if present
Gustavo A. R. Silva [Fri, 7 Jul 2017 05:12:13 +0000 (00:12 -0500)]
HID: hid-logitech-hidpp: add NULL check on devm_kmemdup() return value
Check return value from call to devm_kmemdup() in order to prevent a NULL
pointer dereference.
Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Reviewed-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Josh Poimboeuf [Sat, 15 Jul 2017 05:10:58 +0000 (00:10 -0500)]
debug: Fix WARN_ON_ONCE() for modules
Mike Galbraith reported a situation where a WARN_ON_ONCE() call in DRM
code turned into an oops. As it turns out, WARN_ON_ONCE() seems to be
completely broken when called from a module.
The bug was introduced with the following commit:
19d436268dde ("debug: Add _ONCE() logic to report_bug()")
That commit changed WARN_ON_ONCE() to move its 'once' logic into the bug
trap handler. It requires a writable bug table so that the BUGFLAG_DONE
bit can be written to the flags to indicate the first warning has
occurred.
The bug table was made writable for vmlinux, which relies on
vmlinux.lds.S and vmlinux.lds.h for laying out the sections. However,
it wasn't made writable for modules, which rely on the ELF section
header flags.
Reported-by: Mike Galbraith <efault@gmx.de>
Tested-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 19d436268dde ("debug: Add _ONCE() logic to report_bug()")
Link: http://lkml.kernel.org/r/a53b04235a65478dd9afc51f5b329fdc65c84364.1500095401.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Alexander Shishkin [Tue, 18 Jul 2017 11:08:34 +0000 (14:08 +0300)]
perf/core: Fix scheduling regression of pinned groups
Vince Weaver reported:
> I was tracking down some regressions in my perf_event_test testsuite.
> Some of the tests broke in the 4.11-rc1 timeframe.
>
> I've bisected one of them, this report is about
> tests/overflow/simul_oneshot_group_overflow
> This test creates an event group containing two sampling events, set
> to overflow to a signal handler (which disables and then refreshes the
> event).
>
> On a good kernel you get the following:
> Event perf::instructions with period
1000000
> Event perf::instructions with period
2000000
> fd 3 overflows: 946 (perf::instructions/
1000000)
> fd 4 overflows: 473 (perf::instructions/
2000000)
> Ending counts:
> Count 0:
946379875
> Count 1:
946365218
>
> With the broken kernels you get:
> Event perf::instructions with period
1000000
> Event perf::instructions with period
2000000
> fd 3 overflows: 938 (perf::instructions/
1000000)
> fd 4 overflows: 318 (perf::instructions/
2000000)
> Ending counts:
> Count 0:
946373080
> Count 1:
653373058
The root cause of the bug is that the following commit:
487f05e18a ("perf/core: Optimize event rescheduling on active contexts")
erronously assumed that event's 'pinned' setting determines whether the
event belongs to a pinned group or not, but in fact, it's the group
leader's pinned state that matters.
This was discovered by Vince in the test case described above, where two instruction
counters are grouped, the group leader is pinned, but the other event is not;
in the regressed case the counters were off by 33% (the difference between events'
periods), but should be the same within the error margin.
Fix the problem by looking at the group leader's pinning.
Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Tested-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Fixes: 487f05e18a ("perf/core: Optimize event rescheduling on active contexts")
Link: http://lkml.kernel.org/r/87lgnmvw7h.fsf@ashishki-desk.ger.corp.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Sabrina Dubroca [Wed, 19 Jul 2017 20:28:55 +0000 (22:28 +0200)]
ipv6: avoid overflow of offset in ip6_find_1stfragopt
In some cases, offset can overflow and can cause an infinite loop in
ip6_find_1stfragopt(). Make it unsigned int to prevent the overflow, and
cap it at IPV6_MAXPLEN, since packets larger than that should be invalid.
This problem has been here since before the beginning of git history.
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Ian King [Wed, 19 Jul 2017 17:46:59 +0000 (18:46 +0100)]
net: tehuti: don't process data if it has not been copied from userspace
The array data is only populated with valid information from userspace
if cmd != SIOCDEVPRIVATE, other cases the array contains garbage on
the stack. The subsequent switch statement acts on a subcommand in
data[0] which could be any garbage value if cmd is SIOCDEVPRIVATE which
seems incorrect to me. Instead, just return EOPNOTSUPP for the case
where cmd == SIOCDEVPRIVATE to avoid this issue.
As a side note, I suspect that the original intention of the code
was for this ioctl to work just for cmd == SIOCDEVPRIVATE (and the
current logic is reversed). However, I don't wont to change the current
semantics in case any userspace code relies on this existing behaviour.
Detected by CoverityScan, CID#139647 ("Uninitialized scalar variable")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Wed, 19 Jul 2017 17:22:40 +0000 (10:22 -0700)]
Revert "rtnetlink: Do not generate notifications for CHANGEADDR event"
This reverts commit
cd8966e75ed3c6b41a37047a904617bc44fa481f.
The duplicate CHANGEADDR event message is sent regardless of link
status whereas the setlink changes only generate a notification when
the link is up. Not sending a notification when the link is down breaks
dhcpcd which only processes hwaddr changes when the link is down.
Fixes reported regression:
https://bugzilla.kernel.org/show_bug.cgi?id=196355
Reported-by: Yaroslav Isakov <yaroslav.isakov@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Martin Hundebøll [Wed, 19 Jul 2017 06:17:02 +0000 (08:17 +0200)]
net: dsa: mv88e6xxx: Enable CMODE config support for 6390X
Commit
f39908d3b1c45 ('net: dsa: mv88e6xxx: Set the CMODE for mv88e6390
ports 9 & 10') added support for setting the CMODE for the 6390X family,
but only enabled it for 9290 and 6390 - and left out 6390X.
Fix support for setting the CMODE on 6390X also by assigning
mv88e6390x_port_set_cmode() to the .port_set_cmode function pointer in
mv88e6390x_ops too.
Fixes: f39908d3b1c4 ("net: dsa: mv88e6xxx: Set the CMODE for mv88e6390 ports 9 & 10")
Signed-off-by: Martin Hundebøll <mnhu@prevas.dk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arun Parameswaran [Thu, 6 Jul 2017 17:37:57 +0000 (10:37 -0700)]
dt-binding: ptp: Add SoC compatibility strings for dte ptp clock
Add SoC specific compatibility strings to the Broadcom DTE
based PTP clock binding document.
Fixed the document heading and node name.
Fixes: 80d6076140b2 ("dt-binding: ptp: add bindings document for dte based ptp clock")
Signed-off-by: Arun Parameswaran <arun.parameswaran@broadcom.com>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Potapenko [Wed, 19 Jul 2017 18:27:30 +0000 (20:27 +0200)]
llist: clang: introduce member_address_is_nonnull()
Currently llist_for_each_entry() and llist_for_each_entry_safe() iterate
until &pos->member != NULL. But when building the kernel with Clang,
the compiler assumes &pos->member cannot be NULL if the member's offset
is greater than 0 (which would be equivalent to the object being
non-contiguous in memory). Therefore the loop condition is always true,
and the loops become infinite.
To work around this, introduce the member_address_is_nonnull() macro,
which casts object pointer to uintptr_t, thus letting the member pointer
to be NULL.
Signed-off-by: Alexander Potapenko <glider@google.com>
Tested-by: Sodagudi Prasad <psodagud@codeaurora.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Eugeniy Paltsev [Tue, 18 Jul 2017 14:07:15 +0000 (17:07 +0300)]
NET: dwmac: Make dwmac reset unconditional
Unconditional reset dwmac before HW init if reset controller is present.
In existing implementation we reset dwmac only after second module
probing:
(module load -> unload -> load again [reset happens])
Now we reset dwmac at every module load:
(module load [reset happens] -> unload -> load again [reset happens])
Also some reset controllers have only reset callback instead of
assert + deassert callbacks pair, so handle this case.
Signed-off-by: Eugeniy Paltsev <Eugeniy.Paltsev@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 19 Jul 2017 20:33:24 +0000 (13:33 -0700)]
net: Zero terminate ifr_name in dev_ifname().
The ifr.ifr_name is passed around and assumed to be NULL terminated.
Signed-off-by: David S. Miller <davem@davemloft.net>
Levin, Alexander [Tue, 18 Jul 2017 04:23:16 +0000 (04:23 +0000)]
wireless: wext: terminate ifr name coming from userspace
ifr name is assumed to be a valid string by the kernel, but nothing
was forcing username to pass a valid string.
In turn, this would cause panics as we tried to access the string
past it's valid memory.
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Wed, 19 Jul 2017 15:55:18 +0000 (08:55 -0700)]
Merge tag 'gcc-plugins-v4.13-rc2' of git://git./linux/kernel/git/kees/linux
Pull structure randomization updates from Kees Cook:
"Now that IPC and other changes have landed, enable manual markings for
randstruct plugin, including the task_struct.
This is the rest of what was staged in -next for the gcc-plugins, and
comes in three patches, largest first:
- mark "easy" structs with __randomize_layout
- mark task_struct with an optional anonymous struct to isolate the
__randomize_layout section
- mark structs to opt _out_ of automated marking (which will come
later)
And, FWIW, this continues to pass allmodconfig (normal and patched to
enable gcc-plugins) builds of x86_64, i386, arm64, arm, powerpc, and
s390 for me"
* tag 'gcc-plugins-v4.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
randstruct: opt-out externally exposed function pointer structs
task_struct: Allow randomized layout
randstruct: Mark various structs for randomization
Linus Torvalds [Wed, 19 Jul 2017 15:49:46 +0000 (08:49 -0700)]
Merge tag 'ceph-for-4.13-rc2' of git://github.com/ceph/ceph-client
Pull ceph fixes from Ilya Dryomov:
"A number of small fixes for -rc1 Luminous changes plus a readdir race
fix, marked for stable"
* tag 'ceph-for-4.13-rc2' of git://github.com/ceph/ceph-client:
libceph: potential NULL dereference in ceph_msg_data_create()
ceph: fix race in concurrent readdir
libceph: don't call encode_request_finish() on MOSDBackoff messages
libceph: use alloc_pg_mapping() in __decode_pg_upmap_items()
libceph: set -EINVAL in one place in crush_decode()
libceph: NULL deref on osdmap_apply_incremental() error path
libceph: fix old style declaration warnings
Shu Wang [Tue, 18 Jul 2017 06:37:24 +0000 (14:37 +0800)]
audit: fix memleak in auditd_send_unicast_skb.
Found this issue by kmemleak report, auditd_send_unicast_skb
did not free skb if rcu_dereference(auditd_conn) returns null.
unreferenced object 0xffff88082568ce00 (size 256):
comm "auditd", pid 1119, jiffies
4294708499
backtrace:
[<
ffffffff8176166a>] kmemleak_alloc+0x4a/0xa0
[<
ffffffff8121820c>] kmem_cache_alloc_node+0xcc/0x210
[<
ffffffff8161b99d>] __alloc_skb+0x5d/0x290
[<
ffffffff8113c614>] audit_make_reply+0x54/0xd0
[<
ffffffff8113dfa7>] audit_receive_msg+0x967/0xd70
----------------
(gdb) list *audit_receive_msg+0x967
0xffffffff8113dff7 is in audit_receive_msg (kernel/audit.c:1133).
1132 skb = audit_make_reply(0, AUDIT_REPLACE, 0,
0, &pvnr, sizeof(pvnr));
---------------
[<
ffffffff8113e402>] audit_receive+0x52/0xa0
[<
ffffffff8166c561>] netlink_unicast+0x181/0x240
[<
ffffffff8166c8e2>] netlink_sendmsg+0x2c2/0x3b0
[<
ffffffff816112e8>] sock_sendmsg+0x38/0x50
[<
ffffffff816117a2>] SYSC_sendto+0x102/0x190
[<
ffffffff81612f4e>] SyS_sendto+0xe/0x10
[<
ffffffff8176d337>] entry_SYSCALL_64_fastpath+0x1a/0xa5
[<
ffffffffffffffff>] 0xffffffffffffffff
Signed-off-by: Shu Wang <shuwang@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Sudeep Holla [Fri, 14 Jul 2017 10:51:48 +0000 (11:51 +0100)]
PM / Domains: defer dev_pm_domain_set() until genpd->attach_dev succeeds if present
If the genpd->attach_dev or genpd->power_on fails, genpd_dev_pm_attach
may return -EPROBE_DEFER initially. However genpd_alloc_dev_data sets
the PM domain for the device unconditionally.
When subsequent attempts are made to call genpd_dev_pm_attach, it may
return -EEXISTS checking dev->pm_domain without re-attempting to call
attach_dev or power_on.
platform_drv_probe then attempts to call drv->probe as the return value
-EEXIST != -EPROBE_DEFER, which may end up in a situation where the
device is accessed without it's power domain switched on.
Fixes: f104e1e5ef57 (PM / Domains: Re-order initialization of generic_pm_domain_data)
Cc: 4.4+ <stable@vger.kernel.org> # v4.4+
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Dan Williams [Wed, 19 Jul 2017 00:49:14 +0000 (17:49 -0700)]
device-dax: fix sysfs duplicate warnings
Fix warnings of the form...
WARNING: CPU: 10 PID: 4983 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x62/0x80
sysfs: cannot create duplicate filename '/class/dax/dax12.0'
Call Trace:
dump_stack+0x63/0x86
__warn+0xcb/0xf0
warn_slowpath_fmt+0x5a/0x80
? kernfs_path_from_node+0x4f/0x60
sysfs_warn_dup+0x62/0x80
sysfs_do_create_link_sd.isra.2+0x97/0xb0
sysfs_create_link+0x25/0x40
device_add+0x266/0x630
devm_create_dax_dev+0x2cf/0x340 [dax]
dax_pmem_probe+0x1f5/0x26e [dax_pmem]
nvdimm_bus_probe+0x71/0x120
...by reusing the namespace id for the device-dax instance name.
Now that we have decided that there will never by more than one
device-dax instance per libnvdimm-namespace parent device [1], we can
directly reuse the namepace ids. There are some possible follow-on
cleanups, but those are saved for a later patch to simplify the -stable
backport.
[1]: https://lists.01.org/pipermail/linux-nvdimm/2016-December/008266.html
Fixes: 98a29c39dc68 ("libnvdimm, namespace: allow creation of multiple pmem...")
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: <stable@vger.kernel.org>
Reported-by: Dariusz Dokupil <dariusz.dokupil@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Carpenter [Tue, 18 Jul 2017 19:38:56 +0000 (22:38 +0300)]
netfilter: fix netfilter_net_init() return
We accidentally return an uninitialized variable.
Fixes: cf56c2f892a8 ("netfilter: remove old pre-netns era hook api")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Julia Lawall [Sat, 15 Jul 2017 20:07:45 +0000 (22:07 +0200)]
irqchip/digicolor: Drop unnecessary static
Drop static on a local variable, when the variable is initialized before
any possible use. Thus, the static has no benefit.
The semantic patch that fixes this problem is as follows:
(http://coccinelle.lip6.fr/)
// <smpl>
@bad exists@
position p;
identifier x;
type T;
@@
static T x@p;
...
x = <+...x...+>
@@
identifier x;
expression e;
type T;
position p != bad.p;
@@
-static
T x@p;
... when != x
when strict
?x = e;
// </smpl>
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Baruch Siach <baruch@tkos.co.il>
Cc: keescook@chromium.org
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: kernel-janitors@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: Jason Cooper <jason@lakedaemon.net>
Link: http://lkml.kernel.org/r/1500149266-32357-11-git-send-email-Julia.Lawall@lip6.fr
Julia Lawall [Sat, 15 Jul 2017 20:07:41 +0000 (22:07 +0200)]
irqchip/mips-cpu: Drop unnecessary static
Drop static on a local variable, when the variable is initialized before
any possible use. Thus, the static has no benefit.
The semantic patch that fixes this problem is as follows:
(http://coccinelle.lip6.fr/)
// <smpl>
@bad exists@
position p;
identifier x;
type T;
@@
static T x@p;
...
x = <+...x...+>
@@
identifier x;
expression e;
type T;
position p != bad.p;
@@
-static
T x@p;
... when != x
when strict
?x = e;
// </smpl>
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: kernel-janitors@vger.kernel.org
Cc: keescook@chromium.org
Cc: Jason Cooper <jason@lakedaemon.net>
Link: http://lkml.kernel.org/r/1500149266-32357-7-git-send-email-Julia.Lawall@lip6.fr
Julia Lawall [Sat, 15 Jul 2017 20:07:40 +0000 (22:07 +0200)]
irqchip/gic/realview: Drop unnecessary static
Drop static on a local variable, when the variable is initialized before
any possible use. Thus, the static has no benefit.
The semantic patch that fixes this problem is as follows:
(http://coccinelle.lip6.fr/)
// <smpl>
@bad exists@
position p;
identifier x;
type T;
@@
static T x@p;
...
x = <+...x...+>
@@
identifier x;
expression e;
type T;
position p != bad.p;
@@
-static
T x@p;
... when != x
when strict
?x = e;
// </smpl>
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: kernel-janitors@vger.kernel.org
Cc: keescook@chromium.org
Cc: Jason Cooper <jason@lakedaemon.net>
Link: http://lkml.kernel.org/r/1500149266-32357-6-git-send-email-Julia.Lawall@lip6.fr
David S. Miller [Tue, 18 Jul 2017 19:01:39 +0000 (12:01 -0700)]
Merge git://git./pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for your net tree,
they are:
1) Missing netlink message sanity check in nfnetlink, patch from
Mateusz Jurczyk.
2) We now have netfilter per-netns hooks, so let's kill global hook
infrastructure, this infrastructure is known to be racy with netns.
We don't care about out of tree modules. Patch from Florian Westphal.
3) find_appropriate_src() is buggy when colissions happens after the
conversion of the nat bysource to rhashtable. Also from Florian.
4) Remove forward chain in nf_tables arp family, it's useless and it is
causing quite a bit of confusion, from Florian Westphal.
5) nf_ct_remove_expect() is called with the wrong parameter, causing
kernel oops, patch from Florian Westphal.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Tue, 18 Jul 2017 09:57:55 +0000 (11:57 +0200)]
udp: preserve skb->dst if required for IP options processing
Eric noticed that in udp_recvmsg() we still need to access
skb->dst while processing the IP options.
Since commit
0a463c78d25b ("udp: avoid a cache miss on dequeue")
skb->dst is no more available at recvmsg() time and bad things
will happen if we enter the relevant code path.
This commit address the issue, avoid clearing skb->dst if
any IP options are present into the relevant skb.
Since the IP CB is contained in the first skb cacheline, we can
test it to decide to leverage the consume_stateless_skb()
optimization, without measurable additional cost in the faster
path.
v1 -> v2: updated commit message tags
Fixes: 0a463c78d25b ("udp: avoid a cache miss on dequeue")
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Tue, 18 Jul 2017 18:51:08 +0000 (11:51 -0700)]
Merge tag 'md/4.13-rc2' of git://git./linux/kernel/git/shli/md
Pull MD fixes from Shaohua Li:
- raid5-ppl fix by Artur. This one is introduced in this release cycle.
- raid5 reshape fix by Xiao. This is an old bug and will be added to
stable.
- bitmap fix by Guoqing.
* tag 'md/4.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md:
raid5-ppl: use BIOSET_NEED_BVECS when creating bioset
Raid5 should update rdev->sectors after reshape
md/bitmap: don't read page from device with Bitmap_sync
Christophe Jaillet [Mon, 17 Jul 2017 17:42:41 +0000 (19:42 +0200)]
atm: zatm: Fix an error handling path in 'zatm_init_one()'
If 'dma_set_mask_and_coherent()' fails, we must undo the previous
'pci_request_regions()' call.
Adjust corresponding 'goto' to jump at the right place of the error
handling path.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Potapenko [Mon, 17 Jul 2017 10:35:58 +0000 (12:35 +0200)]
ipv4: ipv6: initialize treq->txhash in cookie_v[46]_check()
KMSAN reported use of uninitialized memory in skb_set_hash_from_sk(),
which originated from the TCP request socket created in
cookie_v6_check():
==================================================================
BUG: KMSAN: use of uninitialized memory in tcp_transmit_skb+0xf77/0x3ec0
CPU: 1 PID: 2949 Comm: syz-execprog Not tainted 4.11.0-rc5+ #2931
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
TCP: request_sock_TCPv6: Possible SYN flooding on port 20028. Sending cookies. Check SNMP counters.
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:16
dump_stack+0x172/0x1c0 lib/dump_stack.c:52
kmsan_report+0x12a/0x180 mm/kmsan/kmsan.c:927
__msan_warning_32+0x61/0xb0 mm/kmsan/kmsan_instr.c:469
skb_set_hash_from_sk ./include/net/sock.h:2011
tcp_transmit_skb+0xf77/0x3ec0 net/ipv4/tcp_output.c:983
tcp_send_ack+0x75b/0x830 net/ipv4/tcp_output.c:3493
tcp_delack_timer_handler+0x9a6/0xb90 net/ipv4/tcp_timer.c:284
tcp_delack_timer+0x1b0/0x310 net/ipv4/tcp_timer.c:309
call_timer_fn+0x240/0x520 kernel/time/timer.c:1268
expire_timers kernel/time/timer.c:1307
__run_timers+0xc13/0xf10 kernel/time/timer.c:1601
run_timer_softirq+0x36/0xa0 kernel/time/timer.c:1614
__do_softirq+0x485/0x942 kernel/softirq.c:284
invoke_softirq kernel/softirq.c:364
irq_exit+0x1fa/0x230 kernel/softirq.c:405
exiting_irq+0xe/0x10 ./arch/x86/include/asm/apic.h:657
smp_apic_timer_interrupt+0x5a/0x80 arch/x86/kernel/apic/apic.c:966
apic_timer_interrupt+0x86/0x90 arch/x86/entry/entry_64.S:489
RIP: 0010:native_restore_fl ./arch/x86/include/asm/irqflags.h:36
RIP: 0010:arch_local_irq_restore ./arch/x86/include/asm/irqflags.h:77
RIP: 0010:__msan_poison_alloca+0xed/0x120 mm/kmsan/kmsan_instr.c:440
RSP: 0018:
ffff880024917cd8 EFLAGS:
00000246 ORIG_RAX:
ffffffffffffff10
RAX:
0000000000000246 RBX:
ffff8800224c0000 RCX:
0000000000000005
RDX:
0000000000000004 RSI:
ffff880000000000 RDI:
ffffea0000b6d770
RBP:
ffff880024917d58 R08:
0000000000000dd8 R09:
0000000000000004
R10:
0000160000000000 R11:
0000000000000000 R12:
ffffffff85abf810
R13:
ffff880024917dd8 R14:
0000000000000010 R15:
ffffffff81cabde4
</IRQ>
poll_select_copy_remaining+0xac/0x6b0 fs/select.c:293
SYSC_select+0x4b4/0x4e0 fs/select.c:653
SyS_select+0x76/0xa0 fs/select.c:634
entry_SYSCALL_64_fastpath+0x13/0x94 arch/x86/entry/entry_64.S:204
RIP: 0033:0x4597e7
RSP: 002b:
000000c420037ee0 EFLAGS:
00000246 ORIG_RAX:
0000000000000017
RAX:
ffffffffffffffda RBX:
0000000000000000 RCX:
00000000004597e7
RDX:
0000000000000000 RSI:
0000000000000000 RDI:
0000000000000000
RBP:
000000c420037ef0 R08:
000000c420037ee0 R09:
0000000000000059
R10:
0000000000000000 R11:
0000000000000246 R12:
000000000042dc20
R13:
00000000000000f3 R14:
0000000000000030 R15:
0000000000000003
chained origin:
save_stack_trace+0x37/0x40 arch/x86/kernel/stacktrace.c:59
kmsan_save_stack_with_flags mm/kmsan/kmsan.c:302
kmsan_save_stack mm/kmsan/kmsan.c:317
kmsan_internal_chain_origin+0x12a/0x1f0 mm/kmsan/kmsan.c:547
__msan_store_shadow_origin_4+0xac/0x110 mm/kmsan/kmsan_instr.c:259
tcp_create_openreq_child+0x709/0x1ae0 net/ipv4/tcp_minisocks.c:472
tcp_v6_syn_recv_sock+0x7eb/0x2a30 net/ipv6/tcp_ipv6.c:1103
tcp_get_cookie_sock+0x136/0x5f0 net/ipv4/syncookies.c:212
cookie_v6_check+0x17a9/0x1b50 net/ipv6/syncookies.c:245
tcp_v6_cookie_check net/ipv6/tcp_ipv6.c:989
tcp_v6_do_rcv+0xdd8/0x1c60 net/ipv6/tcp_ipv6.c:1298
tcp_v6_rcv+0x41a3/0x4f00 net/ipv6/tcp_ipv6.c:1487
ip6_input_finish+0x82f/0x1ee0 net/ipv6/ip6_input.c:279
NF_HOOK ./include/linux/netfilter.h:257
ip6_input+0x239/0x290 net/ipv6/ip6_input.c:322
dst_input ./include/net/dst.h:492
ip6_rcv_finish net/ipv6/ip6_input.c:69
NF_HOOK ./include/linux/netfilter.h:257
ipv6_rcv+0x1dbd/0x22e0 net/ipv6/ip6_input.c:203
__netif_receive_skb_core+0x2f6f/0x3a20 net/core/dev.c:4208
__netif_receive_skb net/core/dev.c:4246
process_backlog+0x667/0xba0 net/core/dev.c:4866
napi_poll net/core/dev.c:5268
net_rx_action+0xc95/0x1590 net/core/dev.c:5333
__do_softirq+0x485/0x942 kernel/softirq.c:284
origin:
save_stack_trace+0x37/0x40 arch/x86/kernel/stacktrace.c:59
kmsan_save_stack_with_flags mm/kmsan/kmsan.c:302
kmsan_internal_poison_shadow+0xb1/0x1a0 mm/kmsan/kmsan.c:198
kmsan_kmalloc+0x7f/0xe0 mm/kmsan/kmsan.c:337
kmem_cache_alloc+0x1c2/0x1e0 mm/slub.c:2766
reqsk_alloc ./include/net/request_sock.h:87
inet_reqsk_alloc+0xa4/0x5b0 net/ipv4/tcp_input.c:6200
cookie_v6_check+0x4f4/0x1b50 net/ipv6/syncookies.c:169
tcp_v6_cookie_check net/ipv6/tcp_ipv6.c:989
tcp_v6_do_rcv+0xdd8/0x1c60 net/ipv6/tcp_ipv6.c:1298
tcp_v6_rcv+0x41a3/0x4f00 net/ipv6/tcp_ipv6.c:1487
ip6_input_finish+0x82f/0x1ee0 net/ipv6/ip6_input.c:279
NF_HOOK ./include/linux/netfilter.h:257
ip6_input+0x239/0x290 net/ipv6/ip6_input.c:322
dst_input ./include/net/dst.h:492
ip6_rcv_finish net/ipv6/ip6_input.c:69
NF_HOOK ./include/linux/netfilter.h:257
ipv6_rcv+0x1dbd/0x22e0 net/ipv6/ip6_input.c:203
__netif_receive_skb_core+0x2f6f/0x3a20 net/core/dev.c:4208
__netif_receive_skb net/core/dev.c:4246
process_backlog+0x667/0xba0 net/core/dev.c:4866
napi_poll net/core/dev.c:5268
net_rx_action+0xc95/0x1590 net/core/dev.c:5333
__do_softirq+0x485/0x942 kernel/softirq.c:284
==================================================================
Similar error is reported for cookie_v4_check().
Fixes: 58d607d3e52f ("tcp: provide skb->hash to synack packets")
Signed-off-by: Alexander Potapenko <glider@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gao Feng [Mon, 17 Jul 2017 10:34:42 +0000 (18:34 +0800)]
ppp: Fix false xmit recursion detect with two ppp devices
The global percpu variable ppp_xmit_recursion is used to detect the ppp
xmit recursion to avoid the deadlock, which is caused by one CPU tries to
lock the xmit lock twice. But it would report false recursion when one CPU
wants to send the skb from two different PPP devices, like one L2TP on the
PPPoE. It is a normal case actually.
Now use one percpu member of struct ppp instead of the gloable variable to
detect the xmit recursion of one ppp device.
Fixes: 55454a565836 ("ppp: avoid dealock on recursive xmit")
Signed-off-by: Gao Feng <gfree.wind@vip.163.com>
Signed-off-by: Liu Jianying <jianying.liu@ikuai8.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Tue, 18 Jul 2017 18:16:40 +0000 (11:16 -0700)]
Merge tag 'for-linus' of git://git./linux/kernel/git/dledford/rdma
Pull rdma fixes from Doug Ledford:
"First set of -rc fixes for 4.13 cycle:
- misc iSER fixes
- namespace fixups
- fix the fact that IPoIB didn't use the proper API for noio mem allocs
- rxe driver fixes
- hns_roce fixes
- misc core fixes
- misc IPoIB fixes"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (27 commits)
IB/core: Allow QP state transition from reset to error
IB/hns: Fix for checkpatch.pl comment style warnings
IB/hns: Fix the bug with modifying the MAC address without removing the driver
IB/hns: Fix the bug with rdma operation
IB/hns: Fix the bug with wild pointer when destroy rc qp
IB/hns: Fix the bug of polling cq failed for loopback Qps
IB/rxe: Set dma_mask and coherent_dma_mask
IB/rxe: Fix kernel panic from skb destructor
IB/ipoib: Let lower driver handle get_stats64 call
IB/core: Add ordered workqueue for RoCE GID management
IB/mlx5: Clean mr_cache debugfs in case of failure
IB/core: Remove NOIO QP create flag
{net, IB}/mlx4: Remove gfp flags argument
IB/{rdmavt, qib, hfi1}: Remove gfp flags argument
IB/IPoIB: Convert IPoIB to memalloc_noio_* calls
IB/IPoIB: Forward MTU change to driver below
IB: Convert msleep below 20ms to usleep_range
IB/uverbs: Make use of ib_modify_qp variant to avoid resolving DMAC
IB/core: Introduce modify QP operation with udata
IB/core: Don't resolve IP address to the loopback device
...
Linus Torvalds [Tue, 18 Jul 2017 18:11:13 +0000 (11:11 -0700)]
Merge tag 'nfsd-4.13-1' of git://linux-nfs.org/~bfields/linux
Pull nfsd fix from Bruce Fields:
"One fix for a problem introduced in the most recent merge window and
found by Dave Jones and KASAN"
* tag 'nfsd-4.13-1' of git://linux-nfs.org/~bfields/linux:
nfsd: Fix a memory scribble in the callback channel
Jan Kara [Wed, 21 Jun 2017 13:02:47 +0000 (15:02 +0200)]
hfsplus: Don't clear SGID when inheriting ACLs
When new directory 'DIR1' is created in a directory 'DIR0' with SGID bit
set, DIR1 is expected to have SGID bit set (and owning group equal to
the owning group of 'DIR0'). However when 'DIR0' also has some default
ACLs that 'DIR1' inherits, setting these ACLs will result in SGID bit on
'DIR1' to get cleared if user is not member of the owning group.
Fix the problem by creating __hfsplus_set_posix_acl() function that does
not call posix_acl_update_mode() and use it when inheriting ACLs. That
prevents SGID bit clearing and the mode has been properly set by
posix_acl_create() anyway.
Fixes: 073931017b49d9458aa351605b43a7e34598caef
CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Jiri Olsa [Fri, 14 Jul 2017 16:35:51 +0000 (18:35 +0200)]
perf/x86/intel: Fix debug_store reset field for freq events
There's a bug in PEBs event enabling code, that prevents PEBS
freq events to work properly after non freq PEBS event was run.
freq events - perf_event_attr::freq set
-F <freq> option of perf record
PEBS events - perf_event_attr::precise_ip > 0
default for perf record
Like in following example with CPU 0 busy, we expect ~10000 samples
for following perf tool run:
# perf record -F 10000 -C 0 sleep 1
[ perf record: Woken up 2 times to write data ]
[ perf record: Captured and wrote 0.640 MB perf.data (10031 samples) ]
Everything's fine, but once we run non freq PEBS event like:
# perf record -c 10000 -C 0 sleep 1
[ perf record: Woken up 4 times to write data ]
[ perf record: Captured and wrote 1.053 MB perf.data (20061 samples) ]
the freq events start to fail like this:
# perf record -F 10000 -C 0 sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.185 MB perf.data (40 samples) ]
The issue is in non freq PEBs event initialization of debug_store reset
field, which value is used to auto-reload the counter value after PEBS
event drain. This value is not being used for PEBS freq events, but once
we run non freq event it stays in debug_store data and screws the
sample_freq counting for PEBS freq events.
Setting the reset field to 0 for freq events.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170714163551.19459-1-jolsa@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Kan Liang [Wed, 12 Jul 2017 13:44:23 +0000 (09:44 -0400)]
perf/x86/intel: Add Goldmont Plus CPU PMU support
Add perf core PMU support for Intel Goldmont Plus CPU cores:
- The init code is based on Goldmont.
- There is a new cache event list, based on the Goldmont cache event
list.
- All four general-purpose performance counters support PEBS.
- The first general-purpose performance counter is for reduced skid
PEBS mechanism. Using :ppp to indicate the event which want to do
reduced skid PEBS.
- Goldmont Plus has 4-wide pipeline for Topdown
Signed-off-by: Kan Liang <kan.liang@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: acme@kernel.org
Link: http://lkml.kernel.org/r/20170712134423.17766-1-kan.liang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Harry Pan [Mon, 17 Jul 2017 10:37:49 +0000 (18:37 +0800)]
perf/x86/intel: Enable C-state residency events for Apollo Lake
Goldmont microarchitecture supports C1/C3/C6, PC2/PC3/PC6/PC10 state
residency counters, the patch enables them for Apollo Lake platform.
The MSR information is based on Intel Software Developers' Manual,
Vol. 4, Order No. 335592, Table 2-6 and 2-12.
Signed-off-by: Harry Pan <harry.pan@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: bp@suse.de
Cc: davidcc@google.com
Cc: gs0622@gmail.com
Cc: lukasz.odzioba@intel.com
Cc: piotr.luc@intel.com
Cc: srinivas.pandruvada@linux.intel.com
Link: http://lkml.kernel.org/r/20170717103749.24337-1-harry.pan@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Jan Kara [Tue, 18 Jul 2017 10:27:56 +0000 (12:27 +0200)]
isofs: Fix off-by-one in 'session' mount option parsing
According to ECMA-130 standard maximum valid track number is 99. Since
'session' mount option starts indexing at 0 (and we add 1 to the passed
number), we should refuse value 99. Also the condition in
isofs_get_last_session() unnecessarily repeats the check - remove it.
Reported-by: David Howells <dhowells@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Ernesto A. Fernández [Mon, 17 Jul 2017 16:42:41 +0000 (18:42 +0200)]
reiserfs: preserve i_mode if __reiserfs_set_acl() fails
When changing a file's acl mask, reiserfs_set_acl() will first set the
group bits of i_mode to the value of the mask, and only then set the
actual extended attribute representing the new acl.
If the second part fails (due to lack of space, for example) and the
file had no acl attribute to begin with, the system will from now on
assume that the mask permission bits are actual group permission bits,
potentially granting access to the wrong users.
Prevent this by only changing the inode mode after the acl has been set.
Signed-off-by: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Ernesto A. Fernández [Wed, 12 Jul 2017 09:54:19 +0000 (06:54 -0300)]
ext2: preserve i_mode if ext2_set_acl() fails
When changing a file's acl mask, ext2_set_acl() will first set the group
bits of i_mode to the value of the mask, and only then set the actual
extended attribute representing the new acl.
If the second part fails (due to lack of space, for example) and the file
had no acl attribute to begin with, the system will from now on assume
that the mask permission bits are actual group permission bits, potentially
granting access to the wrong users.
Prevent this by only changing the inode mode after the acl has been set.
[JK: Rebased on top of "ext2: Don't clear SGID when inheriting ACLs"]
Signed-off-by: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Matt Redfearn [Tue, 18 Jul 2017 07:39:21 +0000 (08:39 +0100)]
irqchip/mips-gic: Remove population of irq domain names
Since commit
d59f6617eef0f ("genirq: Allow fwnode to carry name
information only") the irqdomain core sets the names of irq domains.
When the name is allocated the new IRQ_DOMAIN_NAME_ALLOCATED flag is
set. Replacing the allocated name with a constant one is not a good
idea, since calling the new irq_domain_update_bus_token() API, added to
the MIPS GIC driver by commit
96f0d93a487e1 ("irqchip/MSI: Use
irq_domain_update_bus_token instead of an open coded access") will
attempt to kfree the pointer, and result in a kernel OOPS.
Fix this by removing the names, now that they are set by the irqdomain
core. This effectively reverts commit
21c57fd13589 ("irqchip/mips-gic:
Populate irq_domain names").
Fixes: d59f6617eef0f ("genirq: Allow fwnode to carry name information only")
Signed-off-by: Matt Redfearn <matt.redfearn@imgtec.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: linux-mips@linux-mips.org
Cc: Jason Cooper <jason@lakedaemon.net>
Link: http://lkml.kernel.org/r/1500363561-32213-1-git-send-email-matt.redfearn@imgtec.com
Jaegeuk Kim [Fri, 14 Jul 2017 18:45:21 +0000 (11:45 -0700)]
f2fs: avoid cpu lockup
Before retrying to flush data or dentry pages, we need to release cpu in order
to prevent watchdog.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Fri, 14 Jul 2017 00:45:21 +0000 (17:45 -0700)]
f2fs: include seq_file.h for sysfs.c
This patch includes seq_file.h to avoid compile error.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Tadeusz Struk [Tue, 30 May 2017 00:20:53 +0000 (17:20 -0700)]
IB/core: Allow QP state transition from reset to error
Playing with IP-O-IB interface can trigger a warning message:
"ib0: Failed to modify QP to ERROR state" to be logged.
This happens when the QP is in IB_QPS_RESET state and the stack
is trying to transition it to IB_QPS_ERR state in ipoib_ib_dev_stop().
According to the IB spec, Table 91 - "QP State Transition Properties"
it looks like the transition from reset to error is valid:
Transition: Any State to Error
Required Attributes: None
Optional Attributes: None allowed
Actions: Queue processing is stopped. Work Requests pending or in
process are completed in error, when possible.
This patch allows the transition and quiets the message.
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
oulijun [Sat, 10 Jun 2017 10:49:25 +0000 (18:49 +0800)]
IB/hns: Fix for checkpatch.pl comment style warnings
This patch correct the comment style warnings caught by
checkpatch.pl script.
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
oulijun [Sat, 10 Jun 2017 10:49:24 +0000 (18:49 +0800)]
IB/hns: Fix the bug with modifying the MAC address without removing the driver
When modified the MAC address used hns_roce_mac function, we release and create
reserved qp again, It is not necessary to use spin_lock_bh and spin_unlock_bh in
handle_en_event, Otherwise, it will occur a error. This patch mainly fixes it.
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
oulijun [Sat, 10 Jun 2017 10:49:23 +0000 (18:49 +0800)]
IB/hns: Fix the bug with rdma operation
When opcode of work request is RDMA read and write, it
should use rdma_wr to get remote_addr and rkey. This
patch fixes it.
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
oulijun [Sat, 10 Jun 2017 10:49:22 +0000 (18:49 +0800)]
IB/hns: Fix the bug with wild pointer when destroy rc qp
When destroyed rc qp, the hr_qp will be used after freed. This patch
will fix it.
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
oulijun [Sat, 10 Jun 2017 10:49:21 +0000 (18:49 +0800)]
IB/hns: Fix the bug of polling cq failed for loopback Qps
In hip06 SoC, RoCE driver creates 8 reserved loopback QPs to
ensure zero wqe when free mr. However, if the enabled phy
port number is less than 6, it will fail in polling cqe with
8 reserved loopback QPs.
In order to solve this problem, the number of loopback Qps
will be adjusted based on the number of enabled phy port.
Signed-off-by: Shaobo Xu <xushaobo2@huawei.com>
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
yonatanc [Thu, 22 Jun 2017 14:10:00 +0000 (17:10 +0300)]
IB/rxe: Set dma_mask and coherent_dma_mask
The RXE coupled with dummy device causes to the kernel panic attached
below. The panic happens when ib_register_device tries to set dma_mask
by accessing a NULLed parent device.
The RXE does not actually use DMA, so we can set the dma_mask
to architecture value.
[16240.199689] RIP: 0010:ib_register_device+0x468/0x5a0 [ib_core]
[16240.205289] RSP: 0018:
ffffc9000220fc10 EFLAGS:
00010246
[16240.209909] RAX:
0000000000000024 RBX:
ffff880220d1a2a8 RCX:
0000000000000000
[16240.212244] RDX:
0000000000000000 RSI:
0000000000000000 RDI:
0000000000000009
[16240.214385] RBP:
ffffc9000220fcb0 R08:
0000000000000000 R09:
000000000000023f
[16240.254465] R10:
0000000000000007 R11:
0000000000000000 R12:
0000000000000000
[16240.259467] R13:
0000000000000000 R14:
0000000000000000 R15:
ffff880220d1a2a8
[16240.263314] FS:
00007fd8ecca0740(0000) GS:
ffff8802364c0000(0000) knlGS:
0000000000000000
[16240.267292] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[16240.273503] CR2:
0000000000000218 CR3:
00000002253ba000 CR4:
00000000000006e0
[16240.277066] Call Trace:
[16240.281836] ? __kmalloc+0x26f/0x280
[16240.286596] rxe_register_device+0x297/0x300 [rdma_rxe]
[16240.291377] rxe_add+0x535/0x5b0 [rdma_rxe]
[16240.297586] rxe_net_add+0x3e/0xc0 [rdma_rxe]
[16240.302375] rxe_param_set_add+0x65/0x144 [rdma_rxe]
[16240.307769] param_attr_store+0x68/0xd0
[16240.311640] module_attr_store+0x1d/0x30
[16240.316421] sysfs_kf_write+0x3a/0x50
[16240.317802] kernfs_fop_write+0xff/0x180
[16240.322989] __vfs_write+0x37/0x140
[16240.328164] ? handle_mm_fault+0xce/0x240
[16240.333340] vfs_write+0xb2/0x1b0
[16240.335013] SyS_write+0x55/0xc0
[16240.340632] entry_SYSCALL_64_fastpath+0x1a/0xa9
Fixes: 8700e3e7c485 ("Soft RoCE driver")
Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com>
Reviewed-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Yonatan Cohen [Thu, 22 Jun 2017 14:09:59 +0000 (17:09 +0300)]
IB/rxe: Fix kernel panic from skb destructor
In the time between rxe_send has finished and skb destructor
called, the QP's ref count might be 0, leading to a possible
QP destruction. This will lead to a kernel panic when the destructor
dereferences the QP.
The operation of incrementing QP ref count at rxe_send and decrementing
from skb destructor will prevent this crash.
BUG: unable to handle kernel NULL pointer dereference at
000000000000072c
IP: [<
ffffffffa05df765>] rxe_skb_tx_dtor+0x15/0x50 [rdma_rxe]
PGD 0 [16240.211178]
Oops: 0002 [#1] SMP
CPU: 3 PID: 0 Comm: swapper/3 Tainted: G OE 4.9.0-mlnx #1
Hardware name: Red Hat KVM, BIOS Bochs 01/01/2011
task:
ffff88042d6b1480 task.stack:
ffffc90001904000
RIP: 0010:[<
ffffffffa05df765>] [<
ffffffffa05df765>] rxe_skb_tx_dtor+0x15/0x50 [rdma_rxe]
RSP: 0018:
ffff88043fcc3df0 EFLAGS:
00010246
RAX:
0000000000000000 RBX:
ffff880429684700 RCX:
ffff88042d248200
RDX:
00000000ffffffff RSI:
00000000fffffe01 RDI:
ffff880429684700
RBP:
ffff88043fcc3e00 R08:
ffff88043fcda240 R09:
00000000ff2d1de6
R10:
0000000000000000 R11:
00000000f49cf6fe R12:
ffff880429684700
R13:
ffffffff81893f96 R14:
ffffffff817d66f0 R15:
ffff880427f74200
FS:
0000000000000000(0000) GS:
ffff88043fcc0000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
000000000000072c CR3:
000000041d3df000 CR4:
00000000000006e0
Stack:
ffffffff817b29cf ffff880429684700 ffff88043fcc3e18 ffffffff817b42c2
ffff880429684700 ffff88043fcc3e40 ffffffff817b4332 ffff880429684700
ffff880427f74238 ffff880427f74228 ffff88043fcc3e58 ffffffff81893f96
Call Trace:
<IRQ> [16240.336345] [<
ffffffff817b29cf>] ? skb_release_head_state+0x4f/0xb0
[<
ffffffff817b42c2>] skb_release_all+0x12/0x30
[<
ffffffff817b4332>] kfree_skb+0x32/0x90
[<
ffffffff81893f96>] ndisc_error_report+0x36/0x40
[<
ffffffff817d4de1>] neigh_invalidate+0x81/0xf0
[<
ffffffff817d68f7>] neigh_timer_handler+0x207/0x2b0
[<
ffffffff81109295>] call_timer_fn+0x35/0x120
[<
ffffffff81109db7>] run_timer_softirq+0x1d7/0x460
[<
ffffffff8106155e>] ? kvm_sched_clock_read+0x1e/0x30
[<
ffffffff810366b9>] ? sched_clock+0x9/0x10
[<
ffffffff810cfed2>] ? sched_clock_cpu+0x72/0xa0
[<
ffffffff818dd537>] __do_softirq+0xd7/0x289
[<
ffffffff810a6c95>] irq_exit+0xb5/0xc0
[<
ffffffff818dd372>] smp_apic_timer_interrupt+0x42/0x50
[<
ffffffff818dc682>] apic_timer_interrupt+0x82/0x90
<EOI> [16240.395776] [<
ffffffff818da156>] ? native_safe_halt+0x6/0x10
[<
ffffffff818d9e6e>] default_idle+0x1e/0xd0
[<
ffffffff8103797f>] arch_cpu_idle+0xf/0x20
[<
ffffffff818da2c5>] default_idle_call+0x35/0x40
[<
ffffffff810e3eb5>] cpu_startup_entry+0x185/0x210
[<
ffffffff81050433>] start_secondary+0x103/0x130
RIP [<
ffffffffa05df765>] rxe_skb_tx_dtor+0x15/0x50 [rdma_rxe]
Fixes: 8700e3e7c485 ("Soft RoCE driver")
Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com>
Reviewed-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Erez Shitrit [Mon, 12 Jun 2017 07:45:21 +0000 (10:45 +0300)]
IB/ipoib: Let lower driver handle get_stats64 call
The driver checks if the lower level driver supports get_stats, and if
so calls it to get the updated statistics, otherwise takes from the
current netdevice stats object.
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Reviewed-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Majd Dibbiny [Tue, 30 May 2017 06:58:06 +0000 (09:58 +0300)]
IB/core: Add ordered workqueue for RoCE GID management
Currently the RoCE GID management uses the ib_wq to do add and delete new GIDs
according to the netdev events.
The ib_wq isn't an ordered workqueue and thus two work elements can be executed
concurrently which will result in unexpected behavior and inconsistency of the
GIDs cache content.
Example:
ifconfig eth1 11.11.11.11/16 up
This command will invoke the following netdev events in the following order:
1. NETDEV_UP
2. NETDEV_DOWN
3. NETDEV_UP
If (2) and (3) will be executed concurrently or in reverse order, instead of
having a new GID with 11.11.11.11 IP, we will end up without any new GIDs.
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Leon Romanovsky [Tue, 30 May 2017 06:44:48 +0000 (09:44 +0300)]
IB/mlx5: Clean mr_cache debugfs in case of failure
The failure in creation of debugfs entries for mr_cache left entries,
which were already created.
It caused to mismatch and misguiding for the end users. The solution
is to clean mr_cache debugfs root, so no leftovers will be in the
system. In addition, let's document why the error is not needed to be
forwarded to user in case of failure.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Leon Romanovsky [Tue, 23 May 2017 11:38:16 +0000 (14:38 +0300)]
IB/core: Remove NOIO QP create flag
There are no users for IB_QP_CREATE_USE_GFP_NOIO flag,
so let's remove it.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Leon Romanovsky [Tue, 23 May 2017 11:38:15 +0000 (14:38 +0300)]
{net, IB}/mlx4: Remove gfp flags argument
The caller to the driver marks GFP_NOIO allocations with help
of memalloc_noio-* calls now. This makes redundant to pass down
to the driver gfp flags, which can be GFP_KERNEL only.
The patch removes the gfp flags argument and updates all driver paths.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Leon Romanovsky [Tue, 23 May 2017 11:38:14 +0000 (14:38 +0300)]
IB/{rdmavt, qib, hfi1}: Remove gfp flags argument
The caller to the driver marks GFP_NOIO allocations with help
of memalloc_noio-* calls now. This makes redundant to pass down
to the driver gfp flags, which can be GFP_KERNEL only.
The patch removes the gfp flags argument and updates all driver paths.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Leon Romanovsky [Tue, 23 May 2017 11:38:13 +0000 (14:38 +0300)]
IB/IPoIB: Convert IPoIB to memalloc_noio_* calls
Commit
21caf2fc1931 ("mm: teach mm by current context info to not do I/O
during memory allocation") added the memalloc_noio_(save|restore) functions
to enable people to modify the MM behavior by disabling I/O during memory
allocation. This was further extended in Fixes:
934f3072c17c ("mm: clear
__GFP_FS when PF_MEMALLOC_NOIO is set"). memalloc_noio_* functions prevent
allocation paths recursing back into the filesystem without explicitly
changing the flags for every allocation site.
However the IPoIB hasn't been keeping up with the changes and missed
completely these memalloc_noio_* calls. This led to update of
allocation site with special QP creation flag, see commit
09b93088d750
("IB: Add a QP creation flag to use GFP_NOIO allocations"), while this
flag is supported by small number of drivers in IB stack.
Let's change it by updating to memalloc_noio_* calls and allow
for every driver underneath enjoy NOIO allocations.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Erez Shitrit [Tue, 23 May 2017 08:42:52 +0000 (11:42 +0300)]
IB/IPoIB: Forward MTU change to driver below
This patch checks if there is a driver below that
needs to be updated on the new MTU and calls it
accordingly.
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Reviewed by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Leon Romanovsky [Tue, 23 May 2017 08:29:42 +0000 (11:29 +0300)]
IB: Convert msleep below 20ms to usleep_range
The msleep(1) may do not sleep 1 ms as expected
and will sleep longer. The simple conversion from
msleep to usleep_range between 1ms and 2ms can solve an
issue.
The full and comprehensive explanation can be found at [1] and [2].
[1] https://lkml.org/lkml/2007/8/3/250
[2] Documentation/timers/timers-howto.txt
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Parav Pandit [Tue, 23 May 2017 08:26:09 +0000 (11:26 +0300)]
IB/uverbs: Make use of ib_modify_qp variant to avoid resolving DMAC
This patch makes use of IB core's ib_modify_qp_with_udata function that
also resolves the DMAC and handles udata.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Parav Pandit [Tue, 23 May 2017 08:26:08 +0000 (11:26 +0300)]
IB/core: Introduce modify QP operation with udata
This patch adds new function ib_modify_qp_with_udata so that
uverbs layer can avoid handling L2 mac address at verbs layer
and depend on the core layer to resolve the mac address consistently
for all required QPs.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Linus Torvalds [Mon, 17 Jul 2017 22:08:29 +0000 (15:08 -0700)]
Merge git://git./linux/kernel/git/davem/sparc
Pull sparc fixes from David Miller:
- Fix DMA regression in 4.13 merge window, only certain chips can do
64-bit DMA. From Dave Dushar.
- Correct cpu cross-call algorithm to correctly detect stalled or stuck
remote cpus, from Jane Chu.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
sparc64: Measure receiver forward progress to avoid send mondo timeout
SPARC64: Fix sun4v DMA panic
Sergei Shtylyov [Mon, 17 Jul 2017 18:00:44 +0000 (21:00 +0300)]
clocksource/drivers/timer-of: Handle of_irq_get_byname() result correctly
of_irq_get_byname() may return a negative error number as well as 0 on
failure, while timer_irq_init() only checks for 0, blithely continuing with
the call to request_[percpu_]irq() -- those functions expect *unsigned int*,
so would probably fail anyway when a large IRQ number resulting from a
conversion of a negative error number is passed to them... This, however,
is incorrect behavior -- error number is not IRQ number.
Filter out the negative error numbers, complain, and return them to the
timer_irq_init()'s callers...
Fixes: dc11bae78529 ("clocksource/drivers: Add timer-of common init routine")
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: http://lkml.kernel.org/r/20170717180114.678825147@cogentembedded.com
WANG Cong [Mon, 17 Jul 2017 18:42:55 +0000 (11:42 -0700)]
bpf: check NULL for sk_to_full_sk() return value
When req->rsk_listener is NULL, sk_to_full_sk() returns
NULL too, so we have to check its return value against
NULL here.
Fixes: 40304b2a1567 ("bpf: BPF support for sock_ops")
Reported-by: David Ahern <dsahern@gmail.com>
Tested-by: David Ahern <dsahern@gmail.com>
Cc: Lawrence Brakmo <brakmo@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Juergen Gross [Mon, 17 Jul 2017 17:47:02 +0000 (19:47 +0200)]
genirq/PM: Properly pretend disabled state when force resuming interrupts
Interrupts with the IRQF_FORCE_RESUME flag set have also the
IRQF_NO_SUSPEND flag set. They are not disabled in the suspend path, but
must be forcefully resumed. That's used by XEN to keep IPIs enabled beyond
the suspension of device irqs. Force resume works by pretending that the
interrupt was disabled and then calling __irq_enable().
Incrementing the disabled depth counter was enough to do that, but with the
recent changes which use state flags to avoid unnecessary hardware access,
this is not longer sufficient. If the state flags are not set, then the
hardware callbacks are not invoked and the interrupt line stays disabled in
"hardware".
Set the disabled and masked state when pretending that an interrupt got
disabled by suspend.
Fixes: bf22ff45bed6 ("genirq: Avoid unnecessary low level irq function calls")
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: xen-devel@lists.xenproject.org
Cc: boris.ostrovsky@oracle.com
Link: http://lkml.kernel.org/r/20170717174703.4603-2-jgross@suse.com
Linus Torvalds [Mon, 17 Jul 2017 20:00:36 +0000 (13:00 -0700)]
Merge branch 'irq-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull irq fix from Thomas Gleixner:
"Fix the fallout from reworking the locking and resource management in
request/free_irq()"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq: Keep chip buslock across irq_request/release_resources()
Linus Torvalds [Mon, 17 Jul 2017 19:54:51 +0000 (12:54 -0700)]
Merge branch 'smp-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull SMP fix from Thomas Gleixner:
"Replace the bogus BUG_ON in the cpu hotplug code"
* 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
smp/hotplug: Replace BUG_ON and react useful
Linus Torvalds [Mon, 17 Jul 2017 19:38:18 +0000 (12:38 -0700)]
Merge tag 'regmap-fix-w1-merge-window' of git://git./linux/kernel/git/broonie/regmap
Pull regmap fix from Mark Brown:
"Fix build due to w1 header refactoring
The regmap support for w1 was added shortly before a reorganization of
the w1 headers. While this was noticed before the merge window and
efforts made to get it resolved in what was sent that managed to fall
through the cracks, this cleans up and updates things so we look for
the header in the new location.
It didn't cause build failures as the driver that's going to be the
first user got held up with other review issues"
* tag 'regmap-fix-w1-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
regmap: regmap-w1: Fix build troubles
Linus Torvalds [Mon, 17 Jul 2017 19:26:12 +0000 (12:26 -0700)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"This is actually just a small set of mainly bug fixes for the original
merge window code plus a few trivial updates and qedi boot from SAN
support feature patch"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: libfc: pass an error pointer to fc_disc_error()
scsi: hisi_sas: make several const arrays static
scsi: qla2xxx: Off by one in qlt_ctio_to_cmd()
scsi: sg: fix SG_DXFER_FROM_DEV transfers
scsi: virtio_scsi: always read VPD pages for multiqueue too
scsi: qedf: fix spelling mistake: "offlading" -> "offloading"
scsi: qedi: fix another spelling mistake: "alloction" -> "allocation"
scsi: isci: fix typo in function names
scsi: cxlflash: return -EFAULT if copy_from_user() fails
scsi: qedi: Add support for Boot from SAN over iSCSI offload
Dan Williams [Mon, 17 Jul 2017 16:58:51 +0000 (09:58 -0700)]
MAINTAINERS: list drivers/acpi/nfit/ files for libnvdimm sub-system
Patches that update the drivers/acpi/nfit/ directory need to be copied
to the nvdimm mailing list. The drivers/acpi/nfit* glob has been broken
ever since the nfit driver source was refactored into multiple files
under the drivers/acpi/nfit/ directory.
Reported-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Prarit Bhargava [Wed, 31 May 2017 17:32:00 +0000 (13:32 -0400)]
acpi/nfit: Fix memory corruption/Unregister mce decoder on failure
nfit_init() calls nfit_mce_register() on module load. When the module
load fails the nfit mce decoder is not unregistered. The module's
memory is freed leaving the decoder chain referencing junk. This will
cause panics as future registrations will reference the free'd memory.
Unregister the nfit mce decoder on module init failure.
[v2]: register and then unregister mce handler to avoid losing mce events
[v3]: also cleanup nfit workqueue
Fixes: 6839a6d96f4e ("nfit: do an ARS scrub on hitting a latent media error")
Cc: <stable@vger.kernel.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Len Brown <lenb@kernel.org>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: "Lee, Chun-Yi" <joeyli.kernel@gmail.com>
Cc: Linda Knippers <linda.knippers@hpe.com>
Cc: lszubowi@redhat.com
Acked-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Reviewed-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Wed, 12 Jul 2017 20:42:37 +0000 (13:42 -0700)]
device-dax: fix 'passing zero to ERR_PTR()' warning
Dan Carpenter reports:
The patch
7b6be8444e0f: "dax: refactor dax-fs into a generic provider
of 'struct dax_device' instances" from Apr 11, 2017, leads to the
following static checker warning:
drivers/dax/device.c:643 devm_create_dev_dax()
warn: passing zero to 'ERR_PTR'
Fix the case where we inadvertently leak 0 to ERR_PTR() by setting at
every error case, and make it clear that 'count' is never 0.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Toshi Kani [Fri, 7 Jul 2017 23:44:26 +0000 (17:44 -0600)]
libnvdimm: fix badblock range handling of ARS range
__add_badblock_range() does not account sector alignment when
it sets 'num_sectors'. Therefore, an ARS error record range
spanning across two sectors is set to a single sector length,
which leaves the 2nd sector unprotected.
Change __add_badblock_range() to set 'num_sectors' properly.
Cc: <stable@vger.kernel.org>
Fixes: 0caeef63e6d2 ("libnvdimm: Add a poison list and export badblocks")
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Reviewed-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>