openwrt/staging/blogic.git
7 years agonet: thunderx: Add support for xdp redirect
Sunil Goutham [Fri, 24 Nov 2017 12:03:26 +0000 (15:03 +0300)]
net: thunderx: Add support for xdp redirect

This patch adds support for XDP_REDIRECT. Flush is not
yet supported.

Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: cjacob <cjacob@caviumnetworks.com>
Signed-off-by: Aleksey Makarov <aleksey.makarov@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge tag 'nfsd-4.15-1' of git://linux-nfs.org/~bfields/linux
Linus Torvalds [Wed, 29 Nov 2017 22:49:26 +0000 (14:49 -0800)]
Merge tag 'nfsd-4.15-1' of git://linux-nfs.org/~bfields/linux

Pull nfsd fixes from Bruce Fields:
 "I screwed up my merge window pull request; I only sent half of what I
  meant to.

  There were no new features, just bugfixes of various importance and
  some very minor cleanup, so I think it's all still appropriate for
  -rc2.

  Highlights:

   - Fixes from Trond for some races in the NFSv4 state code.

   - Fix from Naofumi Honda for a typo in the blocked lock notificiation
     code

   - Fixes from Vasily Averin for some problems starting and stopping
     lockd especially in network namespaces"

* tag 'nfsd-4.15-1' of git://linux-nfs.org/~bfields/linux: (23 commits)
  lockd: fix "list_add double add" caused by legacy signal interface
  nlm_shutdown_hosts_net() cleanup
  race of nfsd inetaddr notifiers vs nn->nfsd_serv change
  race of lockd inetaddr notifiers vs nlmsvc_rqst change
  SUNRPC: make cache_detail structures const
  NFSD: make cache_detail structures const
  sunrpc: make the function arg as const
  nfsd: check for use of the closed special stateid
  nfsd: fix panic in posix_unblock_lock called from nfs4_laundromat
  lockd: lost rollback of set_grace_period() in lockd_down_net()
  lockd: added cleanup checks in exit_net hook
  grace: replace BUG_ON by WARN_ONCE in exit_net hook
  nfsd: fix locking validator warning on nfs4_ol_stateid->st_mutex class
  lockd: remove net pointer from messages
  nfsd: remove net pointer from debug messages
  nfsd: Fix races with check_stateid_generation()
  nfsd: Ensure we check stateid validity in the seqid operation checks
  nfsd: Fix race in lock stateid creation
  nfsd4: move find_lock_stateid
  nfsd: Ensure we don't recognise lock stateids after freeing them
  ...

7 years agoMerge tag 'for-4.15-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave...
Linus Torvalds [Wed, 29 Nov 2017 22:26:50 +0000 (14:26 -0800)]
Merge tag 'for-4.15-rc2-tag' of git://git./linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:
 "We've collected some fixes in since the pre-merge window freeze.

  There's technically only one regression fix for 4.15, but the rest
  seems important and candidates for stable.

   - fix missing flush bio puts in error cases (is serious, but rarely
     happens)

   - fix reporting stat::st_blocks for buffered append writes

   - fix space cache invalidation

   - fix out of bound memory access when setting zlib level

   - fix potential memory corruption when fsync fails in the middle

   - fix crash in integrity checker

   - incremetnal send fix, path mixup for certain unlink/rename
     combination

   - pass flags to writeback so compressed writes can be throttled
     properly

   - error handling fixes"

* tag 'for-4.15-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  Btrfs: incremental send, fix wrong unlink path after renaming file
  btrfs: tree-checker: Fix false panic for sanity test
  Btrfs: fix list_add corruption and soft lockups in fsync
  btrfs: Fix wild memory access in compression level parser
  btrfs: fix deadlock when writing out space cache
  btrfs: clear space cache inode generation always
  Btrfs: fix reported number of inode blocks after buffered append writes
  Btrfs: move definition of the function btrfs_find_new_delalloc_bytes
  Btrfs: bail out gracefully rather than BUG_ON
  btrfs: dev_alloc_list is not protected by RCU, use normal list_del
  btrfs: add missing device::flush_bio puts
  btrfs: Fix transaction abort during failure in btrfs_rm_dev_item
  Btrfs: add write_flags for compression bio

7 years agoMerge tag 'microblaze-4.15-rc2' of git://git.monstr.eu/linux-2.6-microblaze
Linus Torvalds [Wed, 29 Nov 2017 22:19:22 +0000 (14:19 -0800)]
Merge tag 'microblaze-4.15-rc2' of git://git.monstr.eu/linux-2.6-microblaze

Pull Microblaze fix from Michal Simek:
 "Add missing header to mmu_context_mm.h"

* tag 'microblaze-4.15-rc2' of git://git.monstr.eu/linux-2.6-microblaze:
  microblaze: add missing include to mmu_context_mm.h

7 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc
Linus Torvalds [Wed, 29 Nov 2017 22:17:30 +0000 (14:17 -0800)]
Merge git://git./linux/kernel/git/davem/sparc

Pull sparc fix from David Miller:
 "Sparc T4 and later cpu bootup regression fix"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
  sparc64: Fix boot on T4 and later.

7 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Linus Torvalds [Wed, 29 Nov 2017 21:10:25 +0000 (13:10 -0800)]
Merge git://git./linux/kernel/git/davem/net

Pull networking fixes from David Miller:

 1) The forcedeth conversion from pci_*() DMA interfaces to dma_*() ones
    missed one spot. From Zhu Yanjun.

 2) Missing CRYPTO_SHA256 Kconfig dep in cfg80211, from Johannes Berg.

 3) Fix checksum offloading in thunderx driver, from Sunil Goutham.

 4) Add SPDX to vm_sockets_diag.h, from Stephen Hemminger.

 5) Fix use after free of packet headers in TIPC, from Jon Maloy.

 6) "sizeof(ptr)" vs "sizeof(*ptr)" bug in i40e, from Gustavo A R Silva.

 7) Tunneling fixes in mlxsw driver, from Petr Machata.

 8) Fix crash in fanout_demux_rollover() of AF_PACKET, from Mike
    Maloney.

 9) Fix race in AF_PACKET bind() vs. NETDEV_UP notifier, from Eric
    Dumazet.

10) Fix regression in sch_sfq.c due to one of the timer_setup()
    conversions. From Paolo Abeni.

11) SCTP does list_for_each_entry() using wrong struct member, fix from
    Xin Long.

12) Don't use big endian netlink attribute read for
    IFLA_BOND_AD_ACTOR_SYSTEM, it is in cpu endianness. Also from Xin
    Long.

13) Fix mis-initialization of q->link.clock in CBQ scheduler, preventing
    adding filters there. From Jiri Pirko.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (67 commits)
  ethernet: dwmac-stm32: Fix copyright
  net: via: via-rhine: use %p to format void * address instead of %x
  net: ethernet: xilinx: Mark XILINX_LL_TEMAC broken on 64-bit
  myri10ge: Update MAINTAINERS
  net: sched: cbq: create block for q->link.block
  atm: suni: remove extraneous space to fix indentation
  atm: lanai: use %p to format kernel addresses instead of %x
  VSOCK: Don't set sk_state to TCP_CLOSE before testing it
  atm: fore200e: use %pK to format kernel addresses instead of %x
  ambassador: fix incorrect indentation of assignment statement
  vxlan: use __be32 type for the param vni in __vxlan_fdb_delete
  bonding: use nla_get_u64 to extract the value for IFLA_BOND_AD_ACTOR_SYSTEM
  sctp: use right member as the param of list_for_each_entry
  sch_sfq: fix null pointer dereference at timer expiration
  cls_bpf: don't decrement net's refcount when offload fails
  net/packet: fix a race in packet_bind() and packet_notifier()
  packet: fix crash in fanout_demux_rollover()
  sctp: remove extern from stream sched
  sctp: force the params with right types for sctp csum apis
  sctp: force SCTP_ERROR_INV_STRM with __u32 when calling sctp_chunk_fail
  ...

7 years agosparc64: Fix boot on T4 and later.
David S. Miller [Wed, 29 Nov 2017 20:09:29 +0000 (15:09 -0500)]
sparc64: Fix boot on T4 and later.

If we don't put the NG4fls.o object into the same part of
the link as the generic sparc64 objects for fls() and __fls()
then the relocation in the branch we use for patching will
not fit.

Move NG4fls.o into lib-y to fix this problem.

Fixes: 46ad8d2d22c1 ("sparc64: Use sparc optimized fls and __fls for T4 and above")
Signed-off-by: David S. Miller <davem@davemloft.net>
Reported-by: Anatoly Pugachev <matorola@gmail.com>
Tested-by: Anatoly Pugachev <matorola@gmail.com>
7 years agovsprintf: don't use 'restricted_pointer()' when not restricting
Linus Torvalds [Wed, 29 Nov 2017 19:28:09 +0000 (11:28 -0800)]
vsprintf: don't use 'restricted_pointer()' when not restricting

Instead, just fall back on the new '%p' behavior which hashes the
pointer.

Otherwise, '%pK' - that was intended to mark a pointer as restricted -
just ends up leaking pointers that a normal '%p' wouldn't leak.  Which
just make the whole thing pointless.

I suspect we should actually get rid of '%pK' entirely, and make it just
work as '%p' regardless, but this is the minimal obvious fix.  People
who actually use 'kptr_restrict' should weigh in on which behavior they
want.

Cc: Tobin Harding <me@tobin.cc>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agokallsyms: take advantage of the new '%px' format
Linus Torvalds [Wed, 29 Nov 2017 18:30:13 +0000 (10:30 -0800)]
kallsyms: take advantage of the new '%px' format

The conditional kallsym hex printing used a special fixed-width '%lx'
output (KALLSYM_FMT) in preparation for the hashing of %p, but that
series ended up adding a %px specifier to help with the conversions.

Use it, and avoid the "print pointer as an unsigned long" code.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoMerge tag 'printk-hash-pointer-4.15-rc2' of git://github.com/tcharding/linux
Linus Torvalds [Wed, 29 Nov 2017 18:19:29 +0000 (10:19 -0800)]
Merge tag 'printk-hash-pointer-4.15-rc2' of git://github.com/tcharding/linux

Pull printk pointer hashing update from Tobin Harding:
 "Here is the patch set that implements hashing of printk specifier %p.

  First we have two clean up patches then we do the hashing. Hashing is
  done via the SipHash algorithm. The next patch adds printk specifier
  %px for printing pointers when we _really_ want to see the address i.e
  %px is functionally equivalent to %lx. Final patch in the set fixes
  KASAN since we break it by hashing %p.

  For the record here is the justification for the series:

    Currently there exist approximately 14 000 places in the Kernel
    where addresses are being printed using an unadorned %p. This
    potentially leaks sensitive information about the Kernel layout in
    memory. Many of these calls are stale, instead of fixing every call
    we hash the address by default before printing. We then add %px to
    provide a way to print the actual address. Although this is
    achievable using %lx, using %px will assist us if we ever want to
    change pointer printing behaviour. %px is more uniquely grep'able
    (there are already >50 000 uses of %lx).

    The added advantage of hashing %p is that security is now opt-out,
    if you _really_ want the address you have to work a little harder
    and use %px.

  This will of course break some users, forcing code printing needed
  addresses to be updated"

[ I do expect this to be an annoyance, and a number of %px users to be
  added for debuggability. But nobody is willing to audit existing %p
  users for information leaks, and a number of places really only use
  the pointer as an object identifier rather than really 'I need the
  address'.

  IOW - sorry for the inconvenience, but it's the least inconvenient of
  the options.    - Linus ]

* tag 'printk-hash-pointer-4.15-rc2' of git://github.com/tcharding/linux:
  kasan: use %px to print addresses instead of %p
  vsprintf: add printk specifier %px
  printk: hash addresses printed with %p
  vsprintf: refactor %pK code out of pointer()
  docs: correct documentation for %pK

7 years agoRevert "mm, thp: Do not make pmd/pud dirty without a reason"
Linus Torvalds [Wed, 29 Nov 2017 17:01:01 +0000 (09:01 -0800)]
Revert "mm, thp: Do not make pmd/pud dirty without a reason"

This reverts commit 152e93af3cfe2d29d8136cc0a02a8612507136ee.

It was a nice cleanup in theory, but as Nicolai Stange points out, we do
need to make the page dirty for the copy-on-write case even when we
didn't end up making it writable, since the dirty bit is what we use to
check that we've gone through a COW cycle.

Reported-by: Michal Hocko <mhocko@kernel.org>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoethernet: dwmac-stm32: Fix copyright
Benjamin Gaignard [Wed, 29 Nov 2017 14:20:00 +0000 (15:20 +0100)]
ethernet: dwmac-stm32: Fix copyright

Uniformize STMicroelectronics copyrights header

Signed-off-by: Benjamin Gaignard <benjamin.gaignard@st.com>
CC: Alexandre Torgue <alexandre.torgue@st.com>
Acked-by: Alexandre TORGUE <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: via: via-rhine: use %p to format void * address instead of %x
Colin Ian King [Wed, 29 Nov 2017 14:11:49 +0000 (14:11 +0000)]
net: via: via-rhine: use %p to format void * address instead of %x

Don't use %x and casting to print out an address, instead use %p
and remove the casting.  Cleans up smatch warnings:

drivers/net/ethernet/via/via-rhine.c:998 rhine_init_one_common()
warn: argument 4 to %lx specifier is cast from pointer

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: ethernet: xilinx: Mark XILINX_LL_TEMAC broken on 64-bit
Geert Uytterhoeven [Wed, 29 Nov 2017 10:01:09 +0000 (11:01 +0100)]
net: ethernet: xilinx: Mark XILINX_LL_TEMAC broken on 64-bit

On 64-bit (e.g. powerpc64/allmodconfig):

    drivers/net/ethernet/xilinx/ll_temac_main.c: In function 'temac_start_xmit_done':
    drivers/net/ethernet/xilinx/ll_temac_main.c:633:22: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
dev_kfree_skb_irq((struct sk_buff *)cur_p->app4);
  ^

cdmac_bd.app4 is u32, so it is too small to hold a kernel pointer.

Note that several other fields in struct cdmac_bd are also too small to
hold physical addresses on 64-bit platforms.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agomyri10ge: Update MAINTAINERS
Hyong-Youb Kim [Wed, 29 Nov 2017 05:03:50 +0000 (00:03 -0500)]
myri10ge: Update MAINTAINERS

Change the maintainer to Chris Lee who has access to Myricom hardware
and can test/review. Update the website URL.

Signed-off-by: Hyong-Youb Kim <hykim@myri.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agokasan: use %px to print addresses instead of %p
Tobin C. Harding [Wed, 1 Nov 2017 04:32:22 +0000 (15:32 +1100)]
kasan: use %px to print addresses instead of %p

Pointers printed with %p are now hashed by default. Kasan needs the
actual address. We can use the new printk specifier %px for this
purpose.

Use %px instead of %p to print addresses.

Signed-off-by: Tobin C. Harding <me@tobin.cc>
7 years agovsprintf: add printk specifier %px
Tobin C. Harding [Wed, 22 Nov 2017 23:59:45 +0000 (10:59 +1100)]
vsprintf: add printk specifier %px

printk specifier %p now hashes all addresses before printing. Sometimes
we need to see the actual unmodified address. This can be achieved using
%lx but then we face the risk that if in future we want to change the
way the Kernel handles printing of pointers we will have to grep through
the already existent 50 000 %lx call sites. Let's add specifier %px as a
clear, opt-in, way to print a pointer and maintain some level of
isolation from all the other hex integer output within the Kernel.

Add printk specifier %px to print the actual unmodified address.

Signed-off-by: Tobin C. Harding <me@tobin.cc>
7 years agoprintk: hash addresses printed with %p
Tobin C. Harding [Wed, 1 Nov 2017 04:32:23 +0000 (15:32 +1100)]
printk: hash addresses printed with %p

Currently there exist approximately 14 000 places in the kernel where
addresses are being printed using an unadorned %p. This potentially
leaks sensitive information regarding the Kernel layout in memory. Many
of these calls are stale, instead of fixing every call lets hash the
address by default before printing. This will of course break some
users, forcing code printing needed addresses to be updated.

Code that _really_ needs the address will soon be able to use the new
printk specifier %px to print the address.

For what it's worth, usage of unadorned %p can be broken down as
follows (thanks to Joe Perches).

$ git grep -E '%p[^A-Za-z0-9]' | cut -f1 -d"/" | sort | uniq -c
   1084 arch
     20 block
     10 crypto
     32 Documentation
   8121 drivers
   1221 fs
    143 include
    101 kernel
     69 lib
    100 mm
   1510 net
     40 samples
      7 scripts
     11 security
    166 sound
    152 tools
      2 virt

Add function ptr_to_id() to map an address to a 32 bit unique
identifier. Hash any unadorned usage of specifier %p and any malformed
specifiers.

Signed-off-by: Tobin C. Harding <me@tobin.cc>
7 years agovsprintf: refactor %pK code out of pointer()
Tobin C. Harding [Wed, 22 Nov 2017 23:56:39 +0000 (10:56 +1100)]
vsprintf: refactor %pK code out of pointer()

Currently code to handle %pK is all within the switch statement in
pointer(). This is the wrong level of abstraction. Each of the other switch
clauses call a helper function, pK should do the same.

Refactor code out of pointer() to new function restricted_pointer().

Signed-off-by: Tobin C. Harding <me@tobin.cc>
7 years agodocs: correct documentation for %pK
Tobin C. Harding [Wed, 22 Nov 2017 23:55:24 +0000 (10:55 +1100)]
docs: correct documentation for %pK

Current documentation indicates that %pK prints a leading '0x'. This is
not the case.

Correct documentation for printk specifier %pK.

Signed-off-by: Tobin C. Harding <me@tobin.cc>
7 years agoMerge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Linus Torvalds [Wed, 29 Nov 2017 00:22:10 +0000 (16:22 -0800)]
Merge branch 'linus' of git://git./linux/kernel/git/herbert/crypto-2.6

Pull crypto fixes from Herbert Xu:

 - avoid potential bogus alignment for some AEAD operations

 - fix crash in algif_aead

 - avoid sleeping in softirq context with async af_alg

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: skcipher - Fix skcipher_walk_aead_common
  crypto: af_alg - remove locking in async callback
  crypto: algif_aead - skip SGL entries with NULL page

7 years agonet: sched: cbq: create block for q->link.block
Jiri Pirko [Mon, 27 Nov 2017 17:37:21 +0000 (18:37 +0100)]
net: sched: cbq: create block for q->link.block

q->link.block is not initialized, that leads to EINVAL when one tries to
add filter there. So initialize it properly.

This can be reproduced by:
$ tc qdisc add dev eth0 root handle 1: cbq avpkt 1000 rate 1000Mbit bandwidth 1000Mbit
$ tc filter add dev eth0 parent 1: protocol ip prio 100 u32 match ip protocol 0 0x00 flowid 1:1

Reported-by: Jaroslav Aster <jaster@redhat.com>
Reported-by: Ivan Vecera <ivecera@redhat.com>
Fixes: 6529eaba33f0 ("net: sched: introduce tcf block infractructure")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Reviewed-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoatm: suni: remove extraneous space to fix indentation
Colin Ian King [Mon, 27 Nov 2017 13:47:22 +0000 (13:47 +0000)]
atm: suni: remove extraneous space to fix indentation

Remove a leading space, fixes indentation

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoatm: lanai: use %p to format kernel addresses instead of %x
Colin Ian King [Mon, 27 Nov 2017 13:39:32 +0000 (13:39 +0000)]
atm: lanai: use %p to format kernel addresses instead of %x

Don't use %x and casting to print out a kernel address, instead use %p
and remove the casting.  Cleans up smatch warnings:

drivers/atm/lanai.c:1589 service_buffer_allocate() warn: argument 2 to
%08lX specifier is cast from pointer
drivers/atm/lanai.c:2221 lanai_dev_open() warn: argument 4 to %lx
specifier is cast from pointer

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoVSOCK: Don't set sk_state to TCP_CLOSE before testing it
Jorgen Hansen [Mon, 27 Nov 2017 13:29:32 +0000 (05:29 -0800)]
VSOCK: Don't set sk_state to TCP_CLOSE before testing it

A recent commit (3b4477d2dcf2) converted the sk_state to use
TCP constants. In that change, vmci_transport_handle_detach
was changed such that sk->sk_state was set to TCP_CLOSE before
we test whether it is TCP_SYN_SENT. This change moves the
sk_state change back to the original locations in that function.

Signed-off-by: Jorgen Hansen <jhansen@vmware.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoatm: fore200e: use %pK to format kernel addresses instead of %x
Colin Ian King [Mon, 27 Nov 2017 13:24:15 +0000 (13:24 +0000)]
atm: fore200e: use %pK to format kernel addresses instead of %x

Don't use %x and casting to print out a kernel address, instead use the
%pK and remove the casting.  Cleans up smatch warning:

drivers/atm/fore200e.c:3093 fore200e_proc_read() warn: argument 3 to %08x
specifier is cast from pointer

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoambassador: fix incorrect indentation of assignment statement
Colin Ian King [Mon, 27 Nov 2017 13:06:10 +0000 (13:06 +0000)]
ambassador: fix incorrect indentation of assignment statement

Remove one extraneous level of indentation on assignment statement.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agovxlan: use __be32 type for the param vni in __vxlan_fdb_delete
Xin Long [Sun, 26 Nov 2017 13:19:05 +0000 (21:19 +0800)]
vxlan: use __be32 type for the param vni in __vxlan_fdb_delete

All callers of __vxlan_fdb_delete pass vni with __be32 type, and
this param should be declared as __be32 type.

Fixes: 3ad7a4b141eb ("vxlan: support fdb and learning in COLLECT_METADATA mode")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agobonding: use nla_get_u64 to extract the value for IFLA_BOND_AD_ACTOR_SYSTEM
Xin Long [Sun, 26 Nov 2017 13:12:09 +0000 (21:12 +0800)]
bonding: use nla_get_u64 to extract the value for IFLA_BOND_AD_ACTOR_SYSTEM

bond_opt_initval expects a u64 type param, it's better to use
nla_get_u64 to extract the value here, to eliminate a sparse
endianness mismatch warning.

Fixes: 171a42c38c6e ("bonding: add netlink support for sys prio, actor sys mac, and port key")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agosctp: use right member as the param of list_for_each_entry
Xin Long [Sun, 26 Nov 2017 12:56:07 +0000 (20:56 +0800)]
sctp: use right member as the param of list_for_each_entry

Commit d04adf1b3551 ("sctp: reset owner sk for data chunks on out queues
when migrating a sock") made a mistake that using 'list' as the param of
list_for_each_entry to traverse the retransmit, sacked and abandoned
queues, while chunks are using 'transmitted_list' to link into these
queues.

It could cause NULL dereference panic if there are chunks in any of these
queues when peeling off one asoc.

So use the chunk member 'transmitted_list' instead in this patch.

Fixes: d04adf1b3551 ("sctp: reset owner sk for data chunks on out queues when migrating a sock")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agosch_sfq: fix null pointer dereference at timer expiration
Paolo Abeni [Tue, 28 Nov 2017 13:28:39 +0000 (14:28 +0100)]
sch_sfq: fix null pointer dereference at timer expiration

While converting sch_sfq to use timer_setup(), the commit cdeabbb88134
("net: sched: Convert timers to use timer_setup()") forgot to
initialize the 'sch' field. As a result, the timer callback tries to
dereference a NULL pointer, and the kernel does oops.

Fix it initializing such field at qdisc creation time.

Fixes: cdeabbb88134 ("net: sched: Convert timers to use timer_setup()")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agocls_bpf: don't decrement net's refcount when offload fails
Jakub Kicinski [Mon, 27 Nov 2017 19:11:41 +0000 (11:11 -0800)]
cls_bpf: don't decrement net's refcount when offload fails

When cls_bpf offload was added it seemed like a good idea to
call cls_bpf_delete_prog() instead of extending the error
handling path, since the software state is fully initialized
at that point.  This handling of errors without jumping to
the end of the function is error prone, as proven by later
commit missing that extra call to __cls_bpf_delete_prog().

__cls_bpf_delete_prog() is now expected to be invoked with
a reference on exts->net or the field zeroed out.  The call
on the offload's error patch does not fullfil this requirement,
leading to each error stealing a reference on net namespace.

Create a function undoing what cls_bpf_set_parms() did and
use it from __cls_bpf_delete_prog() and the error path.

Fixes: aae2c35ec892 ("cls_bpf: use tcf_exts_get_net() before call_rcu()")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge tag 'drm-for-v4.15-part2-fixes' of git://people.freedesktop.org/~airlied/linux
Linus Torvalds [Tue, 28 Nov 2017 18:01:15 +0000 (10:01 -0800)]
Merge tag 'drm-for-v4.15-part2-fixes' of git://people.freedesktop.org/~airlied/linux

Pull drm fixes from Dave Airlie:

 - TTM regression fix for some virt gpus (bochs vga)

 - a few i915 stable fixes

 - one vc4 fix

 - one uapi fix

* tag 'drm-for-v4.15-part2-fixes' of git://people.freedesktop.org/~airlied/linux:
  drm/ttm: don't attempt to use hugepages if dma32 requested (v2)
  drm/vblank: Pass crtc_id to page_flip_ioctl.
  drm/i915: Fix init_clock_gating for resume
  drm/i915: Mark the userptr invalidate workqueue as WQ_MEM_RECLAIM
  drm/i915: Clear breadcrumb node when cancelling signaling
  drm/i915/gvt: ensure -ve return value is handled correctly
  drm/i915: Re-register PMIC bus access notifier on runtime resume
  drm/i915: Fix false-positive assert_rpm_wakelock_held in i915_pmic_bus_access_notifier v2
  drm/edid: Don't send non-zero YQ in AVI infoframe for HDMI 1.x sinks
  drm/vc4: Account for interrupts in flight

7 years agoRevert "ALSA: usb-audio: Fix potential zero-division at parsing FU"
Takashi Iwai [Mon, 27 Nov 2017 09:59:40 +0000 (10:59 +0100)]
Revert "ALSA: usb-audio: Fix potential zero-division at parsing FU"

The commit 8428a8ebde2d ("ALSA: usb-audio: Fix potential zero-division
at parsing FU") is utterly bogus and breaks the case with csize=1
instead of fixing anything.  Just take it back again.

Reported-by: Jörg Otte <jrg.otte@gmail.com>
Fixes: 8428a8ebde2d ("ALSA: usb-audio: Fix potential zero-division at parsing FU"
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoBtrfs: incremental send, fix wrong unlink path after renaming file
Filipe Manana [Fri, 17 Nov 2017 01:54:00 +0000 (01:54 +0000)]
Btrfs: incremental send, fix wrong unlink path after renaming file

Under some circumstances, an incremental send operation can issue wrong
paths for unlink commands related to files that have multiple hard links
and some (or all) of those links were renamed between the parent and send
snapshots. Consider the following example:

Parent snapshot

 .                                                      (ino 256)
 |---- a/                                               (ino 257)
 |     |---- b/                                         (ino 259)
 |     |     |---- c/                                   (ino 260)
 |     |     |---- f2                                   (ino 261)
 |     |
 |     |---- f2l1                                       (ino 261)
 |
 |---- d/                                               (ino 262)
       |---- f1l1_2                                     (ino 258)
       |---- f2l2                                       (ino 261)
       |---- f1_2                                       (ino 258)

Send snapshot

 .                                                      (ino 256)
 |---- a/                                               (ino 257)
 |     |---- f2l1/                                      (ino 263)
 |             |---- b2/                                (ino 259)
 |                   |---- c/                           (ino 260)
 |                   |     |---- d3                     (ino 262)
 |                   |           |---- f1l1_2           (ino 258)
 |                   |           |---- f2l2_2           (ino 261)
 |                   |           |---- f1_2             (ino 258)
 |                   |
 |                   |---- f2                           (ino 261)
 |                   |---- f1l2                         (ino 258)
 |
 |---- d                                                (ino 261)

When computing the incremental send stream the following steps happen:

1) When processing inode 261, a rename operation is issued that renames
   inode 262, which currently as a path of "d", to an orphan name of
   "o262-7-0". This is done because in the send snapshot, inode 261 has
   of its hard links with a path of "d" as well.

2) Two link operations are issued that create the new hard links for
   inode 261, whose names are "d" and "f2l2_2", at paths "/" and
   "o262-7-0/" respectively.

3) Still while processing inode 261, unlink operations are issued to
   remove the old hard links of inode 261, with names "f2l1" and "f2l2",
   at paths "a/" and "d/". However path "d/" does not correspond anymore
   to the directory inode 262 but corresponds instead to a hard link of
   inode 261 (link command issued in the previous step). This makes the
   receiver fail with a ENOTDIR error when attempting the unlink
   operation.

The problem happens because before sending the unlink operation, we failed
to detect that inode 262 was one of ancestors for inode 261 in the parent
snapshot, and therefore we didn't recompute the path for inode 262 before
issuing the unlink operation for the link named "f2l2" of inode 262. The
detection failed because the function "is_ancestor()" only follows the
first hard link it finds for an inode instead of all of its hard links
(as it was originally created for being used with directories only, for
which only one hard link exists). So fix this by making "is_ancestor()"
follow all hard links of the input inode.

A test case for fstests follows soon.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
7 years agonet/packet: fix a race in packet_bind() and packet_notifier()
Eric Dumazet [Tue, 28 Nov 2017 16:03:30 +0000 (08:03 -0800)]
net/packet: fix a race in packet_bind() and packet_notifier()

syzbot reported crashes [1] and provided a C repro easing bug hunting.

When/if packet_do_bind() calls __unregister_prot_hook() and releases
po->bind_lock, another thread can run packet_notifier() and process an
NETDEV_UP event.

This calls register_prot_hook() and hooks again the socket right before
first thread is able to grab again po->bind_lock.

Fixes this issue by temporarily setting po->num to 0, as suggested by
David Miller.

[1]
dev_remove_pack: ffff8801bf16fa80 not found
------------[ cut here ]------------
kernel BUG at net/core/dev.c:7945!  ( BUG_ON(!list_empty(&dev->ptype_all)); )
invalid opcode: 0000 [#1] SMP KASAN
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
device syz0 entered promiscuous mode
CPU: 0 PID: 3161 Comm: syzkaller404108 Not tainted 4.14.0+ #190
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
task: ffff8801cc57a500 task.stack: ffff8801cc588000
RIP: 0010:netdev_run_todo+0x772/0xae0 net/core/dev.c:7945
RSP: 0018:ffff8801cc58f598 EFLAGS: 00010293
RAX: ffff8801cc57a500 RBX: dffffc0000000000 RCX: ffffffff841f75b2
RDX: 0000000000000000 RSI: 1ffff100398b1ede RDI: ffff8801bf1f8810
device syz0 entered promiscuous mode
RBP: ffff8801cc58f898 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801bf1f8cd8
R13: ffff8801cc58f870 R14: ffff8801bf1f8780 R15: ffff8801cc58f7f0
FS:  0000000001716880(0000) GS:ffff8801db400000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020b13000 CR3: 0000000005e25000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 rtnl_unlock+0xe/0x10 net/core/rtnetlink.c:106
 tun_detach drivers/net/tun.c:670 [inline]
 tun_chr_close+0x49/0x60 drivers/net/tun.c:2845
 __fput+0x333/0x7f0 fs/file_table.c:210
 ____fput+0x15/0x20 fs/file_table.c:244
 task_work_run+0x199/0x270 kernel/task_work.c:113
 exit_task_work include/linux/task_work.h:22 [inline]
 do_exit+0x9bb/0x1ae0 kernel/exit.c:865
 do_group_exit+0x149/0x400 kernel/exit.c:968
 SYSC_exit_group kernel/exit.c:979 [inline]
 SyS_exit_group+0x1d/0x20 kernel/exit.c:977
 entry_SYSCALL_64_fastpath+0x1f/0x96
RIP: 0033:0x44ad19

Fixes: 30f7ea1c2b5f ("packet: race condition in packet_bind")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Francesco Ruggeri <fruggeri@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agopacket: fix crash in fanout_demux_rollover()
Mike Maloney [Tue, 28 Nov 2017 15:44:29 +0000 (10:44 -0500)]
packet: fix crash in fanout_demux_rollover()

syzkaller found a race condition fanout_demux_rollover() while removing
a packet socket from a fanout group.

po->rollover is read and operated on during packet_rcv_fanout(), via
fanout_demux_rollover(), but the pointer is currently cleared before the
synchronization in packet_release().   It is safer to delay the cleanup
until after synchronize_net() has been called, ensuring all calls to
packet_rcv_fanout() for this socket have finished.

To further simplify synchronization around the rollover structure, set
po->rollover in fanout_add() only if there are no errors.  This removes
the need for rcu in the struct and in the call to
packet_getsockopt(..., PACKET_ROLLOVER_STATS, ...).

Crashing stack trace:
 fanout_demux_rollover+0xb6/0x4d0 net/packet/af_packet.c:1392
 packet_rcv_fanout+0x649/0x7c8 net/packet/af_packet.c:1487
 dev_queue_xmit_nit+0x835/0xc10 net/core/dev.c:1953
 xmit_one net/core/dev.c:2975 [inline]
 dev_hard_start_xmit+0x16b/0xac0 net/core/dev.c:2995
 __dev_queue_xmit+0x17a4/0x2050 net/core/dev.c:3476
 dev_queue_xmit+0x17/0x20 net/core/dev.c:3509
 neigh_connected_output+0x489/0x720 net/core/neighbour.c:1379
 neigh_output include/net/neighbour.h:482 [inline]
 ip6_finish_output2+0xad1/0x22a0 net/ipv6/ip6_output.c:120
 ip6_finish_output+0x2f9/0x920 net/ipv6/ip6_output.c:146
 NF_HOOK_COND include/linux/netfilter.h:239 [inline]
 ip6_output+0x1f4/0x850 net/ipv6/ip6_output.c:163
 dst_output include/net/dst.h:459 [inline]
 NF_HOOK.constprop.35+0xff/0x630 include/linux/netfilter.h:250
 mld_sendpack+0x6a8/0xcc0 net/ipv6/mcast.c:1660
 mld_send_initial_cr.part.24+0x103/0x150 net/ipv6/mcast.c:2072
 mld_send_initial_cr net/ipv6/mcast.c:2056 [inline]
 ipv6_mc_dad_complete+0x99/0x130 net/ipv6/mcast.c:2079
 addrconf_dad_completed+0x595/0x970 net/ipv6/addrconf.c:4039
 addrconf_dad_work+0xac9/0x1160 net/ipv6/addrconf.c:3971
 process_one_work+0xbf0/0x1bc0 kernel/workqueue.c:2113
 worker_thread+0x223/0x1990 kernel/workqueue.c:2247
 kthread+0x35e/0x430 kernel/kthread.c:231
 ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:432

Fixes: 0648ab70afe6 ("packet: rollover prepare: per-socket state")
Fixes: 509c7a1ecc860 ("packet: avoid panic in packet_getsockopt()")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Mike Maloney <maloney@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'sctp-fix-sparse-errors'
David S. Miller [Tue, 28 Nov 2017 16:00:14 +0000 (11:00 -0500)]
Merge branch 'sctp-fix-sparse-errors'

Xin Long says:

====================
sctp: fix some other sparse errors

After the last fixes for sparse errors, there are still three sparse
errors in sctp codes, two of them are type cast, and the other one
is using extern.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agosctp: remove extern from stream sched
Xin Long [Sun, 26 Nov 2017 12:16:08 +0000 (20:16 +0800)]
sctp: remove extern from stream sched

Now each stream sched ops is defined in different .c file and
added into the global ops in another .c file, it uses extern
to make this work.

However extern is not good coding style to get them in and
even make C=2 reports errors for this.

This patch adds sctp_sched_ops_xxx_init for each stream sched
ops in their .c file, then get them into the global ops by
calling them when initializing sctp module.

Fixes: 637784ade221 ("sctp: introduce priority based stream scheduler")
Fixes: ac1ed8b82cd6 ("sctp: introduce round robin stream scheduler")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agosctp: force the params with right types for sctp csum apis
Xin Long [Sun, 26 Nov 2017 12:16:07 +0000 (20:16 +0800)]
sctp: force the params with right types for sctp csum apis

Now sctp_csum_xxx doesn't really match the param types of these common
csum apis. As sctp_csum_xxx is defined in sctp/checksum.h, many sparse
errors occur when make C=2 not only with M=net/sctp but also with other
modules that include this header file.

This patch is to force them fit in csum apis with the right types.

Fixes: e6d8b64b34aa ("net: sctp: fix and consolidate SCTP checksumming code")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agosctp: force SCTP_ERROR_INV_STRM with __u32 when calling sctp_chunk_fail
Xin Long [Sun, 26 Nov 2017 12:16:06 +0000 (20:16 +0800)]
sctp: force SCTP_ERROR_INV_STRM with __u32 when calling sctp_chunk_fail

This patch is to force SCTP_ERROR_INV_STRM with right type to
fit in sctp_chunk_fail to avoid the sparse error.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agolmc: Use memdup_user() as a cleanup
Vasyl Gomonovych [Wed, 22 Nov 2017 15:29:57 +0000 (16:29 +0100)]
lmc: Use memdup_user() as a cleanup

Fix coccicheck warning which recommends to use memdup_user():
drivers/net/wan/lmc/lmc_main.c:497:27-34: WARNING opportunity for memdup_user
Generated by: scripts/coccinelle/memdup_user/memdup_user.cocci

Signed-off-by: Vasyl Gomonovych <gomonovych@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agobnxt_en: Fix an error handling path in 'bnxt_get_module_eeprom()'
Christophe JAILLET [Tue, 21 Nov 2017 19:46:49 +0000 (20:46 +0100)]
bnxt_en: Fix an error handling path in 'bnxt_get_module_eeprom()'

Error code returned by 'bnxt_read_sfp_module_eeprom_info()' is handled a
few lines above when reading the A0 portion of the EEPROM.
The same should be done when reading the A2 portion of the EEPROM.

In order to correctly propagate an error, update 'rc' in this 2nd call as
well, otherwise 0 (success) is returned.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: phy: marvell10g: fix the PHY id mask
Antoine Tenart [Tue, 28 Nov 2017 13:26:30 +0000 (14:26 +0100)]
net: phy: marvell10g: fix the PHY id mask

The Marvell 10G PHY driver supports different hardware revisions, which
have their bits 3..0 differing. To get the correct revision number these
bits should be ignored. This patch fixes this by using the already
defined MARVELL_PHY_ID_MASK (0xfffffff0) instead of the custom
0xffffffff mask.

Fixes: 20b2af32ff3f ("net: phy: add Marvell Alaska X 88X3310 10Gigabit PHY support")
Suggested-by: Yan Markman <ymarkman@marvell.com>
Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'mvpp2-fixes'
David S. Miller [Tue, 28 Nov 2017 15:09:52 +0000 (10:09 -0500)]
Merge branch 'mvpp2-fixes'

Antoine Tenart says:

====================
net: mvpp2: set of fixes

This series fixes various issues with the Marvell PPv2 driver. The
patches are sent together to avoid any possible conflict. The series is
based on today's net tree.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: mvpp2: check ethtool sets the Tx ring size is to a valid min value
Antoine Tenart [Tue, 28 Nov 2017 13:19:51 +0000 (14:19 +0100)]
net: mvpp2: check ethtool sets the Tx ring size is to a valid min value

This patch fixes the Tx ring size checks when using ethtool, by adding
an extra check in the PPv2 check_ringparam_valid helper. The Tx ring
size cannot be set to a value smaller than the minimum number of
descriptors needed for TSO.

Fixes: 1d17db08c056 ("net: mvpp2: limit TSO segments and use stop/wake thresholds")
Suggested-by: Yan Markman <ymarkman@marvell.com>
Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: mvpp2: do not disable GMAC padding
Yan Markman [Tue, 28 Nov 2017 13:19:50 +0000 (14:19 +0100)]
net: mvpp2: do not disable GMAC padding

Short fragmented packets may never be sent by the hardware when padding
is disabled. This patch stop modifying the GMAC padding bits, to leave
them to their reset value (disabled).

Fixes: 3919357fb0bb ("net: mvpp2: initialize the GMAC when using a port")
Signed-off-by: Yan Markman <ymarkman@marvell.com>
[Antoine: commit message]
Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: mvpp2: cleanup probed ports in the probe error path
Antoine Tenart [Tue, 28 Nov 2017 13:19:49 +0000 (14:19 +0100)]
net: mvpp2: cleanup probed ports in the probe error path

This patches fixes the probe error path by cleaning up probed ports, to
avoid leaving registered net devices when the driver failed to probe.

Fixes: 3f518509dedc ("ethernet: Add new driver for Marvell Armada 375 network unit")
Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: mvpp2: fix the txq_init error path
Antoine Tenart [Tue, 28 Nov 2017 13:19:48 +0000 (14:19 +0100)]
net: mvpp2: fix the txq_init error path

When an allocation in the txq_init path fails, the allocated buffers
end-up being freed twice: in the txq_init error path, and in txq_deinit.
This lead to issues as txq_deinit would work on already freed memory
regions:

    kernel BUG at mm/slub.c:3915!
    Internal error: Oops - BUG: 0 [#1] PREEMPT SMP

This patch fixes this by removing the txq_init own error path, as the
txq_deinit function is always called on errors. This was introduced by
TSO as way more buffers are allocated.

Fixes: 186cd4d4e414 ("net: mvpp2: software tso support")
Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'mlxsw-GRE-offloading-fixes'
David S. Miller [Tue, 28 Nov 2017 14:55:48 +0000 (09:55 -0500)]
Merge branch 'mlxsw-GRE-offloading-fixes'

Jiri Pirko says:

====================
mlxsw: GRE offloading fixes

Petr says:

This patchset fixes a couple bugs in offloading GRE tunnels in mlxsw
driver.

Patch #1 fixes a problem that local routes pointing at a GRE tunnel
device are offloaded even if that netdevice is down.

Patch #2 detects that as a result of moving a GRE netdevice to a
different VRF, two tunnels now have a conflict of local addresses,
something that the mlxsw driver can't offload.

Patch #3 fixes a FIB abort caused by forming a route pointing at a
GRE tunnel that is eligible for offloading but already onloaded.

Patch #4 fixes a problem that next hops migrated to a new RIF kept the
old RIF reference, which went dangling shortly afterwards.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agomlxsw: spectrum_router: Update nexthop RIF on update
Petr Machata [Tue, 28 Nov 2017 12:17:14 +0000 (13:17 +0100)]
mlxsw: spectrum_router: Update nexthop RIF on update

The function mlxsw_sp_nexthop_rif_update() walks the list of nexthops
associated with a RIF, and updates the corresponding entries in the
switch. It is used in particular when a tunnel underlay netdevice moves
to a different VRF, and all the nexthops are migrated over to a new RIF.
The problem is that each nexthop holds a reference to its RIF, and that
is not updated. So after the old RIF is gone, further activity on these
nexthops (such as downing the underlay netdevice) dereferences a
dangling pointer.

Fix the issue by updating rif of impacted nexthops before calling
mlxsw_sp_nexthop_rif_update().

Fixes: 0c5f1cd5ba8c ("mlxsw: spectrum_router: Generalize __mlxsw_sp_ipip_entry_update_tunnel()")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agomlxsw: spectrum_router: Handle encap to demoted tunnels
Petr Machata [Tue, 28 Nov 2017 12:17:13 +0000 (13:17 +0100)]
mlxsw: spectrum_router: Handle encap to demoted tunnels

Some tunnels that are offloadable on their own can nonetheless be
demoted to slow path if their local address is in conflict with that of
another tunnel. When a route is formed for such a tunnel,
mlxsw_sp_nexthop_ipip_init() fails to find the corresponding IPIP entry,
and that triggers a FIB abort.

Resolve the problem by not assuming that a tunnel for which
mlxsw_sp_ipip_ops.can_offload() holds also automatically has an IPIP
entry.

Fixes: af641713e97d ("mlxsw: spectrum_router: Onload conflicting tunnels")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agomlxsw: spectrum_router: Demote tunnels on VRF migration
Petr Machata [Tue, 28 Nov 2017 12:17:12 +0000 (13:17 +0100)]
mlxsw: spectrum_router: Demote tunnels on VRF migration

The mlxsw driver currently doesn't offload GRE tunnels if they have the
same local address and use the same underlay VRF. When such a situation
arises, the tunnels in conflict are demoted to slow path.

However, the current code only verifies this condition on tunnel
creation and tunnel change, not when a tunnel is moved to a different
VRF. When the tunnel has no bound device, underlay and overlay are the
same. Thus moving a tunnel moves the underlay as well, and that can
cause local address conflict.

So modify mlxsw_sp_netdevice_ipip_ol_vrf_event() to check if there are
any conflicting tunnels, and demote them if yes.

Fixes: af641713e97d ("mlxsw: spectrum_router: Onload conflicting tunnels")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agomlxsw: spectrum_router: Offload decap only for up tunnels
Petr Machata [Tue, 28 Nov 2017 12:17:11 +0000 (13:17 +0100)]
mlxsw: spectrum_router: Offload decap only for up tunnels

When a new local route is added, an IPIP entry is looked up to determine
whether the route should be offloaded as a tunnel decap or as a trap.
That decision should take into account whether the tunnel netdevice in
question is actually IFF_UP, and only install a decap offload if it is.

Fixes: 0063587d3587 ("mlxsw: spectrum: Support decap-only IP-in-IP tunnels")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net...
David S. Miller [Tue, 28 Nov 2017 14:52:04 +0000 (09:52 -0500)]
Merge branch '40GbE' of git://git./linux/kernel/git/jkirsher/net-queue

Jeff Kirsher says:

====================
Intel Wired LAN Driver Updates 2017-11-27

This series contains updates to e1000, e1000e and i40e.

Gustavo A. R. Silva fixes a sizeof() issue where we were taking the size of
the pointer (which is always the size of the pointer).

Sasha does a follow up fix to a previous fix for buffer overrun, to resolve
community feedback from David Laight and the use of magic numbers.

Amritha fixes the reporting of error codes for when adding a cloud filter
fails.

Ahmad Fatoum brushes the dust off the e1000 driver to fix a code comment
and debug message which was incorrect about what the code was really doing.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agobtrfs: tree-checker: Fix false panic for sanity test
Qu Wenruo [Wed, 8 Nov 2017 00:54:24 +0000 (08:54 +0800)]
btrfs: tree-checker: Fix false panic for sanity test

[BUG]
If we run btrfs with CONFIG_BTRFS_FS_RUN_SANITY_TESTS=y, it will
instantly cause kernel panic like:

------
...
assertion failed: 0, file: fs/btrfs/disk-io.c, line: 3853
...
Call Trace:
 btrfs_mark_buffer_dirty+0x187/0x1f0 [btrfs]
 setup_items_for_insert+0x385/0x650 [btrfs]
 __btrfs_drop_extents+0x129a/0x1870 [btrfs]
...
-----

[Cause]
Btrfs will call btrfs_check_leaf() in btrfs_mark_buffer_dirty() to check
if the leaf is valid with CONFIG_BTRFS_FS_RUN_SANITY_TESTS=y.

However quite some btrfs_mark_buffer_dirty() callers(*) don't really
initialize its item data but only initialize its item pointers, leaving
item data uninitialized.

This makes tree-checker catch uninitialized data as error, causing
such panic.

*: These callers include but not limited to
setup_items_for_insert()
btrfs_split_item()
btrfs_expand_item()

[Fix]
Add a new parameter @check_item_data to btrfs_check_leaf().
With @check_item_data set to false, item data check will be skipped and
fallback to old btrfs_check_leaf() behavior.

So we can still get early warning if we screw up item pointers, and
avoid false panic.

Cc: Filipe Manana <fdmanana@gmail.com>
Reported-by: Lakshmipathi.G <lakshmipathi.g@gmail.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
7 years agoproc: don't report kernel addresses in /proc/<pid>/stack
Linus Torvalds [Tue, 28 Nov 2017 00:45:56 +0000 (16:45 -0800)]
proc: don't report kernel addresses in /proc/<pid>/stack

This just changes the file to report them as zero, although maybe even
that could be removed.  I checked, and at least procps doesn't actually
seem to parse the 'stack' file at all.

And since the file doesn't necessarily even exist (it requires
CONFIG_STACKTRACE), possibly other tools don't really use it either.

That said, in case somebody parses it with tools, just having that zero
there should keep such tools happy.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoe1000: Fix off-by-one in debug message
Ahmad Fatoum [Sat, 18 Nov 2017 20:53:58 +0000 (21:53 +0100)]
e1000: Fix off-by-one in debug message

Signed-off-by: Ahmad Fatoum <ahmad@a3f.at>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7 years agoi40e: Fix reporting incorrect error codes
Amritha Nambiar [Fri, 17 Nov 2017 23:35:57 +0000 (15:35 -0800)]
i40e: Fix reporting incorrect error codes

Adding cloud filters could fail for a number of reasons,
unsupported filter fields for example, which fails during
validation of fields itself. This will not result in admin
command errors and converting the admin queue status to posix
error code using i40e_aq_rc_to_posix would result in incorrect
error values. If the failure was due to AQ error itself,
reporting that correctly is handled in the inner function.

Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7 years agoe1000e: fix the use of magic numbers for buffer overrun issue
Sasha Neftin [Mon, 6 Nov 2017 06:31:59 +0000 (08:31 +0200)]
e1000e: fix the use of magic numbers for buffer overrun issue

This is a follow on to commit b10effb92e27 ("fix buffer overrun while the
 I219 is processing DMA transactions") to address David Laights concerns
about the use of "magic" numbers.  So define masks as well as add
additional code comments to give a better understanding of what needs to
be done to avoid a buffer overrun.

Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Reviewed-by: Alexander H Duyck <alexander.h.duyck@intel.com>
Reviewed-by: Dima Ruinskiy <dima.ruinskiy@intel.com>
Reviewed-by: Raanan Avargil <raanan.avargil@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7 years agoi40e/virtchnl: fix application of sizeof to pointer
Gustavo A R Silva [Wed, 18 Oct 2017 20:34:25 +0000 (15:34 -0500)]
i40e/virtchnl: fix application of sizeof to pointer

sizeof when applied to a pointer typed expression gives the size of
the pointer.

The proper fix in this particular case is to code sizeof(*vfres)
instead of sizeof(vfres).

This issue was detected with the help of Coccinelle.

Signed-off-by: Gustavo A R Silva <garsilva@embeddedor.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7 years agolockd: fix "list_add double add" caused by legacy signal interface
Vasily Averin [Mon, 13 Nov 2017 04:25:40 +0000 (07:25 +0300)]
lockd: fix "list_add double add" caused by legacy signal interface

restart_grace() uses hardcoded init_net.
It can cause to "list_add double add" in following scenario:

1) nfsd and lockd was started in several net namespaces
2) nfsd in init_net was stopped (lockd was not stopped because
 it have users from another net namespaces)
3) lockd got signal, called restart_grace() -> set_grace_period()
 and enabled lock_manager in hardcoded init_net.
4) nfsd in init_net is started again,
 its lockd_up() calls set_grace_period() and tries to add
 lock_manager into init_net 2nd time.

Jeff Layton suggest:
"Make it safe to call locks_start_grace multiple times on the same
lock_manager. If it's already on the global grace_list, then don't try
to add it again.  (But we don't intentionally add twice, so for now we
WARN about that case.)

With this change, we also need to ensure that the nfsd4 lock manager
initializes the list before we call locks_start_grace. While we're at
it, move the rest of the nfsd_net initialization into
nfs4_state_create_net. I see no reason to have it spread over two
functions like it is today."

Suggested patch was updated to generate warning in described situation.

Suggested-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
7 years agonlm_shutdown_hosts_net() cleanup
Vasily Averin [Mon, 30 Oct 2017 13:47:58 +0000 (16:47 +0300)]
nlm_shutdown_hosts_net() cleanup

nlm_complain_hosts() walks through nlm_server_hosts hlist, which should
be protected by nlm_host_mutex.

Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
7 years agorace of nfsd inetaddr notifiers vs nn->nfsd_serv change
Vasily Averin [Fri, 10 Nov 2017 07:19:35 +0000 (10:19 +0300)]
race of nfsd inetaddr notifiers vs nn->nfsd_serv change

nfsd_inet[6]addr_event uses nn->nfsd_serv without taking nfsd_mutex,
which can be changed during execution of notifiers and crash the host.

Moreover if notifiers were enabled in one net namespace they are enabled
in all other net namespaces, from creation until destruction.

This patch allows notifiers to access nn->nfsd_serv only after the
pointer is correctly initialized and delays cleanup until notifiers are
no longer in use.

Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Tested-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
7 years agorace of lockd inetaddr notifiers vs nlmsvc_rqst change
Vasily Averin [Fri, 10 Nov 2017 07:19:26 +0000 (10:19 +0300)]
race of lockd inetaddr notifiers vs nlmsvc_rqst change

lockd_inet[6]addr_event use nlmsvc_rqst without taken nlmsvc_mutex,
nlmsvc_rqst can be changed during execution of notifiers and crash the host.

Patch enables access to nlmsvc_rqst only when it was correctly initialized
and delays its cleanup until notifiers are no longer in use.

Note that nlmsvc_rqst can be temporally set to ERR_PTR, so the "if
(nlmsvc_rqst)" check in notifiers is insufficient on its own.

Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Tested-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
7 years agoSUNRPC: make cache_detail structures const
Bhumika Goyal [Tue, 17 Oct 2017 16:14:26 +0000 (18:14 +0200)]
SUNRPC: make cache_detail structures const

Make these const as they are only getting passed to the function
cache_create_net having the argument as const.

Signed-off-by: Bhumika Goyal <bhumirks@gmail.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
7 years agoNFSD: make cache_detail structures const
Bhumika Goyal [Tue, 17 Oct 2017 16:14:25 +0000 (18:14 +0200)]
NFSD: make cache_detail structures const

Make these const as they are only getting passed to the function
cache_create_net having the argument as const.

Signed-off-by: Bhumika Goyal <bhumirks@gmail.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
7 years agosunrpc: make the function arg as const
Bhumika Goyal [Tue, 17 Oct 2017 16:14:23 +0000 (18:14 +0200)]
sunrpc: make the function arg as const

Make the struct cache_detail *tmpl argument of the function
cache_create_net as const as it is only getting passed to kmemup having
the argument as const void *.
Add const to the prototype too.

Signed-off-by: Bhumika Goyal <bhumirks@gmail.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
7 years agonfsd: check for use of the closed special stateid
Andrew Elble [Thu, 9 Nov 2017 18:41:10 +0000 (13:41 -0500)]
nfsd: check for use of the closed special stateid

Prevent the use of the closed (invalid) special stateid by clients.

Signed-off-by: Andrew Elble <aweits@rit.edu>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
7 years agonfsd: fix panic in posix_unblock_lock called from nfs4_laundromat
Naofumi Honda [Thu, 9 Nov 2017 15:57:16 +0000 (10:57 -0500)]
nfsd: fix panic in posix_unblock_lock called from nfs4_laundromat

From kernel 4.9, my two nfsv4 servers sometimes suffer from
    "panic: unable to handle kernel page request"
in posix_unblock_lock() called from nfs4_laundromat().

These panics diseappear if we revert the commit "nfsd: add a LRU list
for blocked locks".

The cause appears to be a typo in nfs4_laundromat(), which is also
present in nfs4_state_shutdown_net().

Cc: stable@vger.kernel.org
Fixes: 7919d0a27f1e "nfsd: add a LRU list for blocked locks"
Cc: jlayton@redhat.com
Reveiwed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
7 years agolockd: lost rollback of set_grace_period() in lockd_down_net()
Vasily Averin [Thu, 2 Nov 2017 10:03:42 +0000 (13:03 +0300)]
lockd: lost rollback of set_grace_period() in lockd_down_net()

Commit efda760fe95ea ("lockd: fix lockd shutdown race") is incorrect,
it removes lockd_manager and disarm grace_period_end for init_net only.

If nfsd was started from another net namespace lockd_up_net() calls
set_grace_period() that adds lockd_manager into per-netns list
and queues grace_period_end delayed work.

These action should be reverted in lockd_down_net().
Otherwise it can lead to double list_add on after restart nfsd in netns,
and to use-after-free if non-disarmed delayed work will be executed after netns destroy.

Fixes: efda760fe95e ("lockd: fix lockd shutdown race")
Cc: stable@vger.kernel.org
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
7 years agolockd: added cleanup checks in exit_net hook
Vasily Averin [Mon, 6 Nov 2017 13:23:24 +0000 (16:23 +0300)]
lockd: added cleanup checks in exit_net hook

Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
7 years agograce: replace BUG_ON by WARN_ONCE in exit_net hook
Vasily Averin [Mon, 6 Nov 2017 13:22:48 +0000 (16:22 +0300)]
grace: replace BUG_ON by WARN_ONCE in exit_net hook

Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
7 years agonfsd: fix locking validator warning on nfs4_ol_stateid->st_mutex class
Andrew Elble [Wed, 8 Nov 2017 22:29:51 +0000 (17:29 -0500)]
nfsd: fix locking validator warning on nfs4_ol_stateid->st_mutex class

The use of the st_mutex has been confusing the validator. Use the
proper nested notation so as to not produce warnings.

Signed-off-by: Andrew Elble <aweits@rit.edu>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
7 years agolockd: remove net pointer from messages
Vasily Averin [Wed, 8 Nov 2017 05:55:55 +0000 (08:55 +0300)]
lockd: remove net pointer from messages

Publishing of net pointer is not safe,
use net->ns.inum as net ID in debug messages

[  171.757678] lockd_up_net: per-net data created; net=f00001e7
[  171.767188] NFSD: starting 90-second grace period (net f00001e7)
[  300.653313] lockd: nuking all hosts in net f00001e7...
[  300.653641] lockd: host garbage collection for net f00001e7
[  300.653968] lockd: nlmsvc_mark_resources for net f00001e7
[  300.711483] lockd_down_net: per-net data destroyed; net=f00001e7
[  300.711847] lockd: nuking all hosts in net 0...
[  300.711847] lockd: host garbage collection for net 0
[  300.711848] lockd: nlmsvc_mark_resources for net 0

Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
7 years agonfsd: remove net pointer from debug messages
Vasily Averin [Wed, 8 Nov 2017 05:55:22 +0000 (08:55 +0300)]
nfsd: remove net pointer from debug messages

Publishing of net pointer is not safe,
replace it in debug meesages by net->ns.inum

[  119.989161] nfsd: initializing export module (net: f00001e7).
[  171.767188] NFSD: starting 90-second grace period (net f00001e7)
[  322.185240] nfsd: shutting down export module (net: f00001e7).
[  322.186062] nfsd: export shutdown complete (net: f00001e7).

Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
7 years agonfsd: Fix races with check_stateid_generation()
Trond Myklebust [Fri, 3 Nov 2017 12:00:16 +0000 (08:00 -0400)]
nfsd: Fix races with check_stateid_generation()

The various functions that call check_stateid_generation() in order
to compare a client-supplied stateid with the nfs4_stid state, usually
need to atomically check for closed state. Those that perform the
check after locking the st_mutex using nfsd4_lock_ol_stateid()
should now be OK, but we do want to fix up the others.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
7 years agonfsd: Ensure we check stateid validity in the seqid operation checks
Trond Myklebust [Fri, 3 Nov 2017 12:00:15 +0000 (08:00 -0400)]
nfsd: Ensure we check stateid validity in the seqid operation checks

After taking the stateid st_mutex, we want to know that the stateid
still represents valid state before performing any non-idempotent
actions.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
7 years agonfsd: Fix race in lock stateid creation
Trond Myklebust [Fri, 3 Nov 2017 12:00:14 +0000 (08:00 -0400)]
nfsd: Fix race in lock stateid creation

If we're looking up a new lock state, and the creation fails, then
we want to unhash it, just like we do for OPEN. However in order
to do so, we need to that no other LOCK requests can grab the
mutex until we have unhashed it (and marked it as closed).

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
7 years agonfsd4: move find_lock_stateid
Trond Myklebust [Fri, 3 Nov 2017 12:00:14 +0000 (08:00 -0400)]
nfsd4: move find_lock_stateid

Trivial cleanup to simplify following patch.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
7 years agonfsd: Ensure we don't recognise lock stateids after freeing them
Trond Myklebust [Fri, 3 Nov 2017 12:00:13 +0000 (08:00 -0400)]
nfsd: Ensure we don't recognise lock stateids after freeing them

In order to deal with lookup races, nfsd4_free_lock_stateid() needs
to be able to signal to other stateful functions that the lock stateid
is no longer valid. Right now, nfsd_lock() will check whether or not an
existing stateid is still hashed, but only in the "new lock" path.

To ensure the stateid invalidation is also recognised by the "existing lock"
path, and also by a second call to nfsd4_free_lock_stateid() itself, we can
change the type to NFS4_CLOSED_STID under the stp->st_mutex.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
7 years agonfsd: CLOSE SHOULD return the invalid special stateid for NFSv4.x (x>0)
Trond Myklebust [Fri, 3 Nov 2017 12:00:12 +0000 (08:00 -0400)]
nfsd: CLOSE SHOULD return the invalid special stateid for NFSv4.x (x>0)

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
7 years agonfsd: Fix another OPEN stateid race
Trond Myklebust [Fri, 3 Nov 2017 12:00:11 +0000 (08:00 -0400)]
nfsd: Fix another OPEN stateid race

If nfsd4_process_open2() is initialising a new stateid, and yet the
call to nfs4_get_vfs_file() fails for some reason, then we must
declare the stateid closed, and unhash it before dropping the mutex.

Right now, we unhash the stateid after dropping the mutex, and without
changing the stateid type, meaning that another OPEN could theoretically
look it up and attempt to use it.

Reported-by: Andrew W Elble <aweits@rit.edu>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
7 years agonfsd: Fix stateid races between OPEN and CLOSE
Trond Myklebust [Fri, 3 Nov 2017 12:00:10 +0000 (08:00 -0400)]
nfsd: Fix stateid races between OPEN and CLOSE

Open file stateids can linger on the nfs4_file list of stateids even
after they have been closed. In order to avoid reusing such a
stateid, and confusing the client, we need to recheck the
nfs4_stid's type after taking the mutex.
Otherwise, we risk reusing an old stateid that was already closed,
which will confuse clients that expect new stateids to conform to
RFC7530 Sections 9.1.4.2 and 16.2.5 or RFC5661 Sections 8.2.2 and 18.2.4.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
7 years agoRename superblock flags (MS_xyz -> SB_xyz)
Linus Torvalds [Mon, 27 Nov 2017 21:05:09 +0000 (13:05 -0800)]
Rename superblock flags (MS_xyz -> SB_xyz)

This is a pure automated search-and-replace of the internal kernel
superblock flags.

The s_flags are now called SB_*, with the names and the values for the
moment mirroring the MS_* flags that they're equivalent to.

Note how the MS_xyz flags are the ones passed to the mount system call,
while the SB_xyz flags are what we then use in sb->s_flags.

The script to do this was:

    # places to look in; re security/*: it generally should *not* be
    # touched (that stuff parses mount(2) arguments directly), but
    # there are two places where we really deal with superblock flags.
    FILES="drivers/mtd drivers/staging/lustre fs ipc mm \
            include/linux/fs.h include/uapi/linux/bfs_fs.h \
            security/apparmor/apparmorfs.c security/apparmor/include/lib.h"
    # the list of MS_... constants
    SYMS="RDONLY NOSUID NODEV NOEXEC SYNCHRONOUS REMOUNT MANDLOCK \
          DIRSYNC NOATIME NODIRATIME BIND MOVE REC VERBOSE SILENT \
          POSIXACL UNBINDABLE PRIVATE SLAVE SHARED RELATIME KERNMOUNT \
          I_VERSION STRICTATIME LAZYTIME SUBMOUNT NOREMOTELOCK NOSEC BORN \
          ACTIVE NOUSER"

    SED_PROG=
    for i in $SYMS; do SED_PROG="$SED_PROG -e s/MS_$i/SB_$i/g"; done

    # we want files that contain at least one of MS_...,
    # with fs/namespace.c and fs/pnode.c excluded.
    L=$(for i in $SYMS; do git grep -w -l MS_$i $FILES; done| sort|uniq|grep -v '^fs/namespace.c'|grep -v '^fs/pnode.c')

    for f in $L; do sed -i $f $SED_PROG; done

Requested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoauxdisplay: img-ascii-lcd: Only build on archs that have IOMEM
Thomas Meyer [Thu, 10 Aug 2017 08:53:53 +0000 (10:53 +0200)]
auxdisplay: img-ascii-lcd: Only build on archs that have IOMEM

This avoids the MODPOST error:

  ERROR: "devm_ioremap_resource" [drivers/auxdisplay/img-ascii-lcd.ko] undefined!

Signed-off-by: Thomas Meyer <thomas@m3y3r.de>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agomm, thp: Do not make pmd/pud dirty without a reason
Kirill A. Shutemov [Mon, 27 Nov 2017 03:21:26 +0000 (06:21 +0300)]
mm, thp: Do not make pmd/pud dirty without a reason

Currently we make page table entries dirty all the time regardless of
access type and don't even consider if the mapping is write-protected.
The reasoning is that we don't really need dirty tracking on THP and
making the entry dirty upfront may save some time on first write to the
page.

Unfortunately, such approach may result in false-positive
can_follow_write_pmd() for huge zero page or read-only shmem file.

Let's only make page dirty only if we about to write to the page anyway
(as we do for small pages).

I've restructured the code to make entry dirty inside
maybe_p[mu]d_mkwrite(). It also takes into account if the vma is
write-protected.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agomm, thp: Do not make page table dirty unconditionally in touch_p[mu]d()
Kirill A. Shutemov [Mon, 27 Nov 2017 03:21:25 +0000 (06:21 +0300)]
mm, thp: Do not make page table dirty unconditionally in touch_p[mu]d()

Currently, we unconditionally make page table dirty in touch_pmd().
It may result in false-positive can_follow_write_pmd().

We may avoid the situation, if we would only make the page table entry
dirty if caller asks for write access -- FOLL_WRITE.

The patch also changes touch_pud() in the same way.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agotipc: eliminate access after delete in group_filter_msg()
Jon Maloy [Mon, 27 Nov 2017 19:13:39 +0000 (20:13 +0100)]
tipc: eliminate access after delete in group_filter_msg()

KASAN revealed another access after delete in group.c. This time
it found that we read the header of a received message after the
buffer has been released.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoxen-netfront: remove warning when unloading module
Eduardo Otubo [Thu, 23 Nov 2017 14:18:35 +0000 (15:18 +0100)]
xen-netfront: remove warning when unloading module

v2:
 * Replace busy wait with wait_event()/wake_up_all()
 * Cannot garantee that at the time xennet_remove is called, the
   xen_netback state will not be XenbusStateClosed, so added a
   condition for that
 * There's a small chance for the xen_netback state is
   XenbusStateUnknown by the time the xen_netfront switches to Closed,
   so added a condition for that.

When unloading module xen_netfront from guest, dmesg would output
warning messages like below:

  [  105.236836] xen:grant_table: WARNING: g.e. 0x903 still in use!
  [  105.236839] deferring g.e. 0x903 (pfn 0x35805)

This problem relies on netfront and netback being out of sync. By the time
netfront revokes the g.e.'s netback didn't have enough time to free all of
them, hence displaying the warnings on dmesg.

The trick here is to make netfront to wait until netback frees all the g.e.'s
and only then continue to cleanup for the module removal, and this is done by
manipulating both device states.

Signed-off-by: Eduardo Otubo <otubo@redhat.com>
Acked-by: Juergen Gross <jgross@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoBtrfs: fix list_add corruption and soft lockups in fsync
Liu Bo [Tue, 21 Nov 2017 21:35:40 +0000 (14:35 -0700)]
Btrfs: fix list_add corruption and soft lockups in fsync

Xfstests btrfs/146 revealed this corruption,

[   58.138831] Buffer I/O error on dev dm-0, logical block 2621424, async page read
[   58.151233] BTRFS error (device sdf): bdev /dev/mapper/error-test errs: wr 1, rd 0, flush 0, corrupt 0, gen 0
[   58.152403] list_add corruption. prev->next should be next (ffff88005e6775d8), but was ffffc9000189be88. (prev=ffffc9000189be88).
[   58.153518] ------------[ cut here ]------------
[   58.153892] WARNING: CPU: 1 PID: 1287 at lib/list_debug.c:31 __list_add_valid+0x169/0x1f0
...
[   58.157379] RIP: 0010:__list_add_valid+0x169/0x1f0
...
[   58.161956] Call Trace:
[   58.162264]  btrfs_log_inode_parent+0x5bd/0xfb0 [btrfs]
[   58.163583]  btrfs_log_dentry_safe+0x60/0x80 [btrfs]
[   58.164003]  btrfs_sync_file+0x4c2/0x6f0 [btrfs]
[   58.164393]  vfs_fsync_range+0x5f/0xd0
[   58.164898]  do_fsync+0x5a/0x90
[   58.165170]  SyS_fsync+0x10/0x20
[   58.165395]  entry_SYSCALL_64_fastpath+0x1f/0xbe
...

It turns out that we could record btrfs_log_ctx:io_err in
log_one_extents when IO fails, but make log_one_extents() return '0'
instead of -EIO, so the IO error is not acknowledged by the callers,
i.e.  btrfs_log_inode_parent(), which would remove btrfs_log_ctx:list
from list head 'root->log_ctxs'.  Since btrfs_log_ctx is allocated
from stack memory, it'd get freed with a object alive on the
list. then a future list_add will throw the above warning.

This returns the correct error in the above case.

Jeff also reported this while testing against his fsync error
patch set[1].

[1]: https://www.spinics.net/lists/linux-btrfs/msg65308.html
"btrfs list corruption and soft lockups while testing writeback error handling"

Fixes: 8407f553268a4611f254 ("Btrfs: fix data corruption after fast fsync and writeback error")
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
7 years agoMerge tag 'mac80211-for-davem-2017-11-27' of git://git.kernel.org/pub/scm/linux/kerne...
David S. Miller [Mon, 27 Nov 2017 16:09:42 +0000 (01:09 +0900)]
Merge tag 'mac80211-for-davem-2017-11-27' of git://git./linux/kernel/git/jberg/mac80211

Johannes Berg says:

====================
Four fixes:
 * CRYPTO_SHA256 is needed for regdb validation
 * mac80211: mesh path metric was wrong in some frames
 * mac80211: use QoS null-data packets on QoS connections
 * mac80211: tear down RX aggregation sessions first to
   drop fewer packets in HW restart scenarios
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agobtrfs: Fix wild memory access in compression level parser
Qu Wenruo [Mon, 6 Nov 2017 02:43:18 +0000 (10:43 +0800)]
btrfs: Fix wild memory access in compression level parser

[BUG]
Kernel panic when mounting with "-o compress" mount option.
KASAN will report like:
------
==================================================================
BUG: KASAN: wild-memory-access in strncmp+0x31/0xc0
Read of size 1 at addr d86735fce994f800 by task mount/662
...
Call Trace:
 dump_stack+0xe3/0x175
 kasan_report+0x163/0x370
 __asan_load1+0x47/0x50
 strncmp+0x31/0xc0
 btrfs_compress_str2level+0x20/0x70 [btrfs]
 btrfs_parse_options+0xff4/0x1870 [btrfs]
 open_ctree+0x2679/0x49f0 [btrfs]
 btrfs_mount+0x1b7f/0x1d30 [btrfs]
 mount_fs+0x49/0x190
 vfs_kern_mount.part.29+0xba/0x280
 vfs_kern_mount+0x13/0x20
 btrfs_mount+0x31e/0x1d30 [btrfs]
 mount_fs+0x49/0x190
 vfs_kern_mount.part.29+0xba/0x280
 do_mount+0xaad/0x1a00
 SyS_mount+0x98/0xe0
 entry_SYSCALL_64_fastpath+0x1f/0xbe
------

[Cause]
For 'compress' and 'compress_force' options, its token doesn't expect
any parameter so its args[0] contains uninitialized data.
Accessing args[0] will cause above wild memory access.

[Fix]
For Opt_compress and Opt_compress_force, set compression level to
the default.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ set the default in advance ]
Signed-off-by: David Sterba <dsterba@suse.com>
7 years agoMerge branch 'sctp-stream-reconfig-fixes'
David S. Miller [Mon, 27 Nov 2017 15:38:45 +0000 (00:38 +0900)]
Merge branch 'sctp-stream-reconfig-fixes'

Xin Long says:

====================
sctp: a bunch of fixes for stream reconfig

This patchset is to make stream reset and asoc reset work more correctly
for stream reconfig.

Thank to Marcelo making them very clear.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agosctp: set sender next_tsn for the old result with ctsn_ack_point plus 1
Xin Long [Sat, 25 Nov 2017 13:05:36 +0000 (21:05 +0800)]
sctp: set sender next_tsn for the old result with ctsn_ack_point plus 1

When doing asoc reset, if the sender of the response has already sent some
chunk and increased asoc->next_tsn before the duplicate request comes, the
response will use the old result with an incorrect sender next_tsn.

Better than asoc->next_tsn, asoc->ctsn_ack_point can't be changed after
the sender of the response has performed the asoc reset and before the
peer has confirmed it, and it's value is still asoc->next_tsn original
value minus 1.

This patch sets sender next_tsn for the old result with ctsn_ack_point
plus 1 when processing the duplicate request, to make sure the sender
next_tsn value peer gets will be always right.

Fixes: 692787cef651 ("sctp: implement receiver-side procedures for the SSN/TSN Reset Request Parameter")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agosctp: avoid flushing unsent queue when doing asoc reset
Xin Long [Sat, 25 Nov 2017 13:05:35 +0000 (21:05 +0800)]
sctp: avoid flushing unsent queue when doing asoc reset

Now when doing asoc reset, it cleans up sacked and abandoned queues
by calling sctp_outq_free where it also cleans up unsent, retransmit
and transmitted queues.

It's safe for the sender of response, as these 3 queues are empty at
that time. But when the receiver of response is doing the reset, the
users may already enqueue some chunks into unsent during the time
waiting the response, and these chunks should not be flushed.

To void the chunks in it would be removed, it moves the queue into a
temp list, then gets it back after sctp_outq_free is done.

The patch also fixes some incorrect comments in
sctp_process_strreset_tsnreq.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agosctp: only allow the asoc reset when the asoc outq is empty
Xin Long [Sat, 25 Nov 2017 13:05:34 +0000 (21:05 +0800)]
sctp: only allow the asoc reset when the asoc outq is empty

As it says in rfc6525#section5.1.4, before sending the request,

   C2:  The sender has either no outstanding TSNs or considers all
        outstanding TSNs abandoned.

Prior to this patch, it tried to consider all outstanding TSNs abandoned
by dropping all chunks in all outqs with sctp_outq_free (even including
sacked, retransmit and transmitted queues) when doing this reset, which
is too aggressive.

To make it work gently, this patch will only allow the asoc reset when
the sender has no outstanding TSNs by checking if unsent, transmitted
and retransmit are all empty with sctp_outq_is_empty before sending
and processing the request.

Fixes: 692787cef651 ("sctp: implement receiver-side procedures for the SSN/TSN Reset Request Parameter")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agosctp: only allow the out stream reset when the stream outq is empty
Xin Long [Sat, 25 Nov 2017 13:05:33 +0000 (21:05 +0800)]
sctp: only allow the out stream reset when the stream outq is empty

Now the out stream reset in sctp stream reconf could be done even if
the stream outq is not empty. It means that users can not be sure
since which msg the new ssn will be used.

To make this more synchronous, it shouldn't allow to do out stream
reset until these chunks in unsent outq all are sent out.

This patch checks the corresponding stream outqs when sending and
processing the request . If any of them has unsent chunks in outq,
it will return -EAGAIN instead or send SCTP_STRRESET_IN_PROGRESS
back to the sender.

Fixes: 7f9d68ac944e ("sctp: implement sender-side procedures for SSN Reset Request Parameter")
Suggested-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agosctp: use sizeof(__u16) for each stream number length instead of magic number
Xin Long [Sat, 25 Nov 2017 13:05:32 +0000 (21:05 +0800)]
sctp: use sizeof(__u16) for each stream number length instead of magic number

Now in stream reconf part there are still some places using magic
number 2 for each stream number length. To make it more readable,
this patch is to replace them with sizeof(__u16).

Reported-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agobtrfs: fix deadlock when writing out space cache
Josef Bacik [Wed, 15 Nov 2017 21:20:52 +0000 (16:20 -0500)]
btrfs: fix deadlock when writing out space cache

If we fail to prepare our pages for whatever reason (out of memory in
our case) we need to make sure to drop the block_group->data_rwsem,
otherwise hilarity ensues.

Signed-off-by: Josef Bacik <jbacik@fb.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ add label and use existing unlocking code ]
Signed-off-by: David Sterba <dsterba@suse.com>