openwrt/staging/blogic.git
5 years agoMerge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Linus Torvalds [Fri, 15 Feb 2019 16:11:43 +0000 (08:11 -0800)]
Merge branch 'linus' of git://git./linux/kernel/git/herbert/crypto-2.6

Pull crypto fix from Herbert Xu:
 "This fixes a crash on resume in the ccree driver"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: ccree - fix resume race condition on init

5 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Linus Torvalds [Fri, 15 Feb 2019 16:00:11 +0000 (08:00 -0800)]
Merge git://git./linux/kernel/git/davem/net

Pull networking fixes from David Miller:

 1) Fix MAC address setting in mac80211 pmsr code, from Johannes Berg.

 2) Probe SFP modules after being attached, from Russell King.

 3) Byte ordering bug in SMC rx_curs_confirmed code, from Ursula Braun.

 4) Revert some r8169 changes that are causing regressions, from Heiner
    Kallweit.

 5) Fix spurious connection timeouts in netfilter nat code, from Florian
    Westphal.

 6) SKB leak in tipc, from Hoang Le.

 7) Short packet checkum issue in mlx4, similar to a previous mlx5
    change, from Saeed Mahameed. The issue is that whilst padding bytes
    are usually zero, it is not guarateed and the hardware doesn't take
    the padding bytes into consideration when generating the checksum.

 8) Fix various races in cls_tcindex, from Cong Wang.

 9) Need to set stream ext to NULL before freeing in SCTP code, from Xin
    Long.

10) Fix locking in phy_is_started, from Heiner Kallweit.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (54 commits)
  net: ethernet: freescale: set FEC ethtool regs version
  net: hns: Fix object reference leaks in hns_dsaf_roce_reset()
  mm: page_alloc: fix ref bias in page_frag_alloc() for 1-byte allocs
  net: phy: fix potential race in the phylib state machine
  net: phy: don't use locking in phy_is_started
  selftests: fix timestamping Makefile
  net: dsa: bcm_sf2: potential array overflow in bcm_sf2_sw_suspend()
  net: fix possible overflow in __sk_mem_raise_allocated()
  dsa: mv88e6xxx: Ensure all pending interrupts are handled prior to exit
  net: phy: fix interrupt handling in non-started states
  sctp: set stream ext to NULL after freeing it in sctp_stream_outq_migrate
  sctp: call gso_reset_checksum when computing checksum in sctp_gso_segment
  net/mlx5e: XDP, fix redirect resources availability check
  net/mlx5: Fix a compilation warning in events.c
  net/mlx5: No command allowed when command interface is not ready
  net/mlx5e: Fix NULL pointer derefernce in set channels error flow
  netfilter: nft_compat: use-after-free when deleting targets
  team: avoid complex list operations in team_nl_cmd_options_set()
  net_sched: fix two more memory leaks in cls_tcindex
  net_sched: fix a memory leak in cls_tcindex
  ...

5 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm...
Linus Torvalds [Fri, 15 Feb 2019 15:56:24 +0000 (07:56 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/ebiederm/user-namespace

Pull signal fix from Eric Biederman:
 "Just a single patch that restores PTRACE_EVENT_EXIT functionality that
  was accidentally broken by last weeks fixes"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  signal: Restore the stop PTRACE_EVENT_EXIT

5 years agoRevert "exec: load_script: don't blindly truncate shebang string"
Linus Torvalds [Thu, 14 Feb 2019 23:02:18 +0000 (15:02 -0800)]
Revert "exec: load_script: don't blindly truncate shebang string"

This reverts commit 8099b047ecc431518b9bb6bdbba3549bbecdc343.

It turns out that people do actually depend on the shebang string being
truncated, and on the fact that an interpreter (like perl) will often
just re-interpret it entirely to get the full argument list.

Reported-by: Samuel Dionne-Riel <samuel@dionne-riel.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agoRevert "gfs2: read journal in large chunks to locate the head"
Bob Peterson [Wed, 13 Feb 2019 20:12:17 +0000 (15:12 -0500)]
Revert "gfs2: read journal in large chunks to locate the head"

This reverts commit 2a5f14f279f59143139bcd1606903f2f80a34241.

This patch causes xfstests generic/311 to fail. Reverting this for
now until we have a proper fix.

Signed-off-by: Abhi Das <adas@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agonet: ethernet: freescale: set FEC ethtool regs version
Vivien Didelot [Thu, 14 Feb 2019 16:15:35 +0000 (11:15 -0500)]
net: ethernet: freescale: set FEC ethtool regs version

Currently the ethtool_regs version is set to 0 for FEC devices.

Use this field to store the register dump version exposed by the
kernel. The choosen version 2 corresponds to the kernel compile test:

        #if defined(CONFIG_M523x) || defined(CONFIG_M527x)
        || defined(CONFIG_M528x) || defined(CONFIG_M520x)
        || defined(CONFIG_M532x) || defined(CONFIG_ARM)
        || defined(CONFIG_ARM64) || defined(CONFIG_COMPILE_TEST)

and version 1 corresponds to the opposite. Binaries of ethtool unaware
of this version will dump the whole set as usual.

Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns: Fix object reference leaks in hns_dsaf_roce_reset()
Huang Zijiang [Thu, 14 Feb 2019 06:41:45 +0000 (14:41 +0800)]
net: hns: Fix object reference leaks in hns_dsaf_roce_reset()

The of_find_device_by_node() takes a reference to the underlying device
structure, we should release that reference.

Signed-off-by: Huang Zijiang <huang.zijiang@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomm: page_alloc: fix ref bias in page_frag_alloc() for 1-byte allocs
Jann Horn [Wed, 13 Feb 2019 21:45:59 +0000 (22:45 +0100)]
mm: page_alloc: fix ref bias in page_frag_alloc() for 1-byte allocs

The basic idea behind ->pagecnt_bias is: If we pre-allocate the maximum
number of references that we might need to create in the fastpath later,
the bump-allocation fastpath only has to modify the non-atomic bias value
that tracks the number of extra references we hold instead of the atomic
refcount. The maximum number of allocations we can serve (under the
assumption that no allocation is made with size 0) is nc->size, so that's
the bias used.

However, even when all memory in the allocation has been given away, a
reference to the page is still held; and in the `offset < 0` slowpath, the
page may be reused if everyone else has dropped their references.
This means that the necessary number of references is actually
`nc->size+1`.

Luckily, from a quick grep, it looks like the only path that can call
page_frag_alloc(fragsz=1) is TAP with the IFF_NAPI_FRAGS flag, which
requires CAP_NET_ADMIN in the init namespace and is only intended to be
used for kernel testing and fuzzing.

To test for this issue, put a `WARN_ON(page_ref_count(page) == 0)` in the
`offset < 0` path, below the virt_to_page() call, and then repeatedly call
writev() on a TAP device with IFF_TAP|IFF_NO_PI|IFF_NAPI_FRAGS|IFF_NAPI,
with a vector consisting of 15 elements containing 1 byte each.

Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'net-phy-fix-locking-issue'
David S. Miller [Thu, 14 Feb 2019 17:04:55 +0000 (12:04 -0500)]
Merge branch 'net-phy-fix-locking-issue'

Heiner Kallweit says:

====================
net: phy: fix locking issue

Russell pointed out that the locking used in phy_is_started() isn't
needed and misleading. This locking also contributes to a race fixed
with patch 2.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: phy: fix potential race in the phylib state machine
Heiner Kallweit [Wed, 13 Feb 2019 19:12:54 +0000 (20:12 +0100)]
net: phy: fix potential race in the phylib state machine

Russell reported the following race in the phylib state machine
(quoting from his mail):

if (phy_polling_mode(phydev) && phy_is_started(phydev))
phy_queue_state_machine(phydev, PHY_STATE_TIME);

state = PHY_UP
thread 0 thread 1
phy_disconnect()
+-phy_is_started()
phy_is_started()                |
`-phy_stop()
  +-phydev->state = PHY_HALTED
  `-phy_stop_machine()
    `-cancel_delayed_work_sync()
phy_queue_state_machine()
`-mod_delayed_work()

At this point, the phydev->state_queue() has been added back onto the
system workqueue despite phy_stop_machine() having been called and
cancel_delayed_work_sync() called on it.

Fix this by protecting the complete operation in thread 0.

Fixes: 2b3e88ea6528 ("net: phy: improve phy state checking")
Reported-by: Russell King - ARM Linux admin <linux@armlinux.org.uk>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: phy: don't use locking in phy_is_started
Heiner Kallweit [Wed, 13 Feb 2019 19:11:40 +0000 (20:11 +0100)]
net: phy: don't use locking in phy_is_started

Russell suggested to remove the locking from phy_is_started() because
the read is atomic anyway and actually the locking may be more
misleading.

Fixes: 2b3e88ea6528 ("net: phy: improve phy state checking")
Suggested-by: Russell King - ARM Linux admin <linux@armlinux.org.uk>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoselftests: fix timestamping Makefile
Deepa Dinamani [Wed, 13 Feb 2019 17:09:13 +0000 (09:09 -0800)]
selftests: fix timestamping Makefile

The clean target in the makefile conflicts with the generic
kselftests lib.mk, and fails to properly remove the compiled
test programs.

Remove the redundant rule, the TEST_GEN_FILES will be already
removed by the CLEAN macro in lib.mk.

Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Acked-by: Shuah Khan <shuah@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: dsa: bcm_sf2: potential array overflow in bcm_sf2_sw_suspend()
Dan Carpenter [Wed, 13 Feb 2019 08:23:04 +0000 (11:23 +0300)]
net: dsa: bcm_sf2: potential array overflow in bcm_sf2_sw_suspend()

The value of ->num_ports comes from bcm_sf2_sw_probe() and it is less
than or equal to DSA_MAX_PORTS.  The ds->ports[] array is used inside
the dsa_is_user_port() and dsa_is_cpu_port() functions.  The ds->ports[]
array is allocated in dsa_switch_alloc() and it has ds->num_ports
elements so this leads to a static checker warning about a potential out
of bounds read.

Fixes: 8cfa94984c9c ("net: dsa: bcm_sf2: add suspend/resume callbacks")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: fix possible overflow in __sk_mem_raise_allocated()
Eric Dumazet [Tue, 12 Feb 2019 20:26:27 +0000 (12:26 -0800)]
net: fix possible overflow in __sk_mem_raise_allocated()

With many active TCP sockets, fat TCP sockets could fool
__sk_mem_raise_allocated() thanks to an overflow.

They would increase their share of the memory, instead
of decreasing it.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agodsa: mv88e6xxx: Ensure all pending interrupts are handled prior to exit
John David Anglin [Mon, 11 Feb 2019 18:40:21 +0000 (13:40 -0500)]
dsa: mv88e6xxx: Ensure all pending interrupts are handled prior to exit

The GPIO interrupt controller on the espressobin board only supports edge interrupts.
If one enables the use of hardware interrupts in the device tree for the 88E6341, it is
possible to miss an edge.  When this happens, the INTn pin on the Marvell switch is
stuck low and no further interrupts occur.

I found after adding debug statements to mv88e6xxx_g1_irq_thread_work() that there is
a race in handling device interrupts (e.g. PHY link interrupts).  Some interrupts are
directly cleared by reading the Global 1 status register.  However, the device interrupt
flag, for example, is not cleared until all the unmasked SERDES and PHY ports are serviced.
This is done by reading the relevant SERDES and PHY status register.

The code only services interrupts whose status bit is set at the time of reading its status
register.  If an interrupt event occurs after its status is read and before all interrupts
are serviced, then this event will not be serviced and the INTn output pin will remain low.

This is not a problem with polling or level interrupts since the handler will be called
again to process the event.  However, it's a big problem when using level interrupts.

The fix presented here is to add a loop around the code servicing switch interrupts.  If
any pending interrupts remain after the current set has been handled, we loop and process
the new set.  If there are no pending interrupts after servicing, we are sure that INTn has
gone high and we will get an edge when a new event occurs.

Tested on espressobin board.

Fixes: dc30c35be720 ("net: dsa: mv88e6xxx: Implement interrupt support.")
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Tested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: phy: fix interrupt handling in non-started states
Heiner Kallweit [Tue, 12 Feb 2019 18:56:15 +0000 (19:56 +0100)]
net: phy: fix interrupt handling in non-started states

phylib enables interrupts before phy_start() has been called, and if
we receive an interrupt in a non-started state, the interrupt handler
returns IRQ_NONE. This causes problems with at least one Marvell chip
as reported by Andrew.
Fix this by handling interrupts the same as in phy_mac_interrupt(),
basically always running the phylib state machine. It knows when it
has to do something and when not.
This change allows to handle interrupts gracefully even if they
occur in a non-started state.

Fixes: 2b3e88ea6528 ("net: phy: improve phy state checking")
Reported-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agosctp: set stream ext to NULL after freeing it in sctp_stream_outq_migrate
Xin Long [Tue, 12 Feb 2019 10:51:01 +0000 (18:51 +0800)]
sctp: set stream ext to NULL after freeing it in sctp_stream_outq_migrate

In sctp_stream_init(), after sctp_stream_outq_migrate() freed the
surplus streams' ext, but sctp_stream_alloc_out() returns -ENOMEM,
stream->outcnt will not be set to 'outcnt'.

With the bigger value on stream->outcnt, when closing the assoc and
freeing its streams, the ext of those surplus streams will be freed
again since those stream exts were not set to NULL after freeing in
sctp_stream_outq_migrate(). Then the invalid-free issue reported by
syzbot would be triggered.

We fix it by simply setting them to NULL after freeing.

Fixes: 5bbbbe32a431 ("sctp: introduce stream scheduler foundations")
Reported-by: syzbot+58e480e7b28f2d890bfd@syzkaller.appspotmail.com
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agosctp: call gso_reset_checksum when computing checksum in sctp_gso_segment
Xin Long [Tue, 12 Feb 2019 10:47:30 +0000 (18:47 +0800)]
sctp: call gso_reset_checksum when computing checksum in sctp_gso_segment

Jianlin reported a panic when running sctp gso over gre over vlan device:

  [   84.772930] RIP: 0010:do_csum+0x6d/0x170
  [   84.790605] Call Trace:
  [   84.791054]  csum_partial+0xd/0x20
  [   84.791657]  gre_gso_segment+0x2c3/0x390
  [   84.792364]  inet_gso_segment+0x161/0x3e0
  [   84.793071]  skb_mac_gso_segment+0xb8/0x120
  [   84.793846]  __skb_gso_segment+0x7e/0x180
  [   84.794581]  validate_xmit_skb+0x141/0x2e0
  [   84.795297]  __dev_queue_xmit+0x258/0x8f0
  [   84.795949]  ? eth_header+0x26/0xc0
  [   84.796581]  ip_finish_output2+0x196/0x430
  [   84.797295]  ? skb_gso_validate_network_len+0x11/0x80
  [   84.798183]  ? ip_finish_output+0x169/0x270
  [   84.798875]  ip_output+0x6c/0xe0
  [   84.799413]  ? ip_append_data.part.50+0xc0/0xc0
  [   84.800145]  iptunnel_xmit+0x144/0x1c0
  [   84.800814]  ip_tunnel_xmit+0x62d/0x930 [ip_tunnel]
  [   84.801699]  gre_tap_xmit+0xac/0xf0 [ip_gre]
  [   84.802395]  dev_hard_start_xmit+0xa5/0x210
  [   84.803086]  sch_direct_xmit+0x14f/0x340
  [   84.803733]  __dev_queue_xmit+0x799/0x8f0
  [   84.804472]  ip_finish_output2+0x2e0/0x430
  [   84.805255]  ? skb_gso_validate_network_len+0x11/0x80
  [   84.806154]  ip_output+0x6c/0xe0
  [   84.806721]  ? ip_append_data.part.50+0xc0/0xc0
  [   84.807516]  sctp_packet_transmit+0x716/0xa10 [sctp]
  [   84.808337]  sctp_outq_flush+0xd7/0x880 [sctp]

It was caused by SKB_GSO_CB(skb)->csum_start not set in sctp_gso_segment.
sctp_gso_segment() calls skb_segment() with 'feature | NETIF_F_HW_CSUM',
which causes SKB_GSO_CB(skb)->csum_start not to be set in skb_segment().

For TCP/UDP, when feature supports HW_CSUM, CHECKSUM_PARTIAL will be set
and gso_reset_checksum will be called to set SKB_GSO_CB(skb)->csum_start.

So SCTP should do the same as TCP/UDP, to call gso_reset_checksum() when
computing checksum in sctp_gso_segment.

Reported-by: Jianlin Shi <jishi@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
David S. Miller [Thu, 14 Feb 2019 00:13:47 +0000 (19:13 -0500)]
Merge git://git./pub/scm/linux/kernel/git/pablo/nf

Pablo Neira Ayuso says:

====================
Netfilter/IPVS fixes for net

The following patchset contains Netfilter/IPVS fixes for net:

1) Missing structure initialization in ebtables causes splat with
   32-bit user level on a 64-bit kernel, from Francesco Ruggeri.

2) Missing dependency on nf_defrag in IPVS IPv6 codebase, from
   Andrea Claudi.

3) Fix possible use-after-free from release path of target extensions.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge tag 'mlx5-fixes-2019-02-13' of git://git.kernel.org/pub/scm/linux/kernel/git...
David S. Miller [Thu, 14 Feb 2019 00:07:47 +0000 (19:07 -0500)]
Merge tag 'mlx5-fixes-2019-02-13' of git://git./linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
Mellanox, mlx5 fixes 2019-02-13

This series introduces some fixes to mlx5 driver.
For more information please see tag log below.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet/mlx5e: XDP, fix redirect resources availability check
Saeed Mahameed [Tue, 12 Feb 2019 00:27:02 +0000 (16:27 -0800)]
net/mlx5e: XDP, fix redirect resources availability check

Currently mlx5 driver creates xdp redirect hw queues unconditionally on
netdevice open, This is great until someone starts redirecting XDP traffic
via ndo_xdp_xmit on mlx5 device and changes the device configuration at
the same time, this might cause crashes, since the other device's napi
is not aware of the mlx5 state change (resources un-availability).

To fix this we must synchronize with other devices napi's on the system.
Added a new flag under mlx5e_priv to determine XDP TX resources are
available, set/clear it up when necessary and use synchronize_rcu()
when the flag is turned off, so other napi's are in-sync with it, before
we actually cleanup the hw resources.

The flag is tested prior to committing to transmit on mlx5e_xdp_xmit, and
it is sufficient to determine if it safe to transmit or not. The other
two internal flags (MLX5E_STATE_OPENED and MLX5E_SQ_STATE_ENABLED) become
unnecessary. Thus, they are removed from data path.

Fixes: 58b99ee3e3eb ("net/mlx5e: Add support for XDP_REDIRECT in device-out side")
Reported-by: Toke Høiland-Jørgensen <toke@redhat.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5: Fix a compilation warning in events.c
Tariq Toukan [Tue, 22 Jan 2019 12:18:04 +0000 (14:18 +0200)]
net/mlx5: Fix a compilation warning in events.c

Eliminate the following compilation warning:

drivers/net/ethernet/mellanox/mlx5/core/events.c: warning: 'error_str'
may be used uninitialized in this function [-Wuninitialized]:  => 238:3

Fixes: c2fb3db22d35 ("net/mlx5: Rework handling of port module events")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Mikhael Goikhman <migo@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5: No command allowed when command interface is not ready
Huy Nguyen [Thu, 7 Feb 2019 15:22:56 +0000 (09:22 -0600)]
net/mlx5: No command allowed when command interface is not ready

When EEH is injected and PCI bus stalls, mlx5's pci error detect
function is called to deactivate the command interface and tear down
the device. The issue is that there can be a thread that already
passed MLX5_DEVICE_STATE_INTERNAL_ERROR check, it will send the command
and stuck in the wait_func.

Solution:
Add function mlx5_cmd_flush to disable command interface and clear all
the pending commands. When device state is set to
MLX5_DEVICE_STATE_INTERNAL_ERROR, call mlx5_cmd_flush to ensure all
pending threads waiting for firmware commands completion are terminated.

Fixes: c1d4d2e92ad6 ("net/mlx5: Avoid calling sleeping function by the health poll thread")
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5e: Fix NULL pointer derefernce in set channels error flow
Maria Pasechnik [Sun, 3 Feb 2019 15:55:09 +0000 (17:55 +0200)]
net/mlx5e: Fix NULL pointer derefernce in set channels error flow

New channels are applied to the priv channels only after they
are successfully opened. Then, the indirection table should be built
according to the new number of channels.
Currently, such build is preformed independently of whether the
channels opening is successful, and is not reverted on failure.

The bug is caused due to removal of rss params from channels struct
and moving it to priv struct. That change cause to independency between
channels and rss params.
This causes a crash on a later point, when accessing rqn of a non
existing channel.

This patch fixes it by moving the indirection table build right before
switching the priv channels to new channels struct, after the new set of
channels was successfully opened.

Fixes: bbeb53b8b2c9 ("net/mlx5e: Move RSS params to a dedicated struct")
Signed-off-by: Maria Pasechnik <mariap@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agoMerge tag 'trace-v5.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt...
Linus Torvalds [Wed, 13 Feb 2019 18:28:17 +0000 (10:28 -0800)]
Merge tag 'trace-v5.0-rc4' of git://git./linux/kernel/git/rostedt/linux-trace

Pull tracing fix from Steven Rostedt:
 "This fixes kprobes/uprobes dynamic processing of strings, where it
  processes the args but does not update the remaining length of the
  buffer that the string arguments will be placed in. It constantly
  passes in the total size of buffer used instead of passing in the
  remaining size of the buffer used.

  This could cause issues if the strings are larger than the max size of
  an event which could cause the strings to be written beyond what was
  reserved on the buffer"

* tag 'trace-v5.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: probeevent: Correctly update remaining space in dynamic area

5 years agonetfilter: nft_compat: use-after-free when deleting targets
Pablo Neira Ayuso [Wed, 13 Feb 2019 12:03:53 +0000 (13:03 +0100)]
netfilter: nft_compat: use-after-free when deleting targets

Fetch pointer to module before target object is released.

Fixes: 29e3880109e3 ("netfilter: nf_tables: fix use-after-free when deleting compat expressions")
Fixes: 0ca743a55991 ("netfilter: nf_tables: add compatibility layer for x_tables")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
5 years agosignal: Restore the stop PTRACE_EVENT_EXIT
Eric W. Biederman [Tue, 12 Feb 2019 05:27:42 +0000 (23:27 -0600)]
signal: Restore the stop PTRACE_EVENT_EXIT

In the middle of do_exit() there is there is a call
"ptrace_event(PTRACE_EVENT_EXIT, code);" That call places the process
in TACKED_TRACED aka "(TASK_WAKEKILL | __TASK_TRACED)" and waits for
for the debugger to release the task or SIGKILL to be delivered.

Skipping past dequeue_signal when we know a fatal signal has already
been delivered resulted in SIGKILL remaining pending and
TIF_SIGPENDING remaining set.  This in turn caused the
scheduler to not sleep in PTACE_EVENT_EXIT as it figured
a fatal signal was pending.  This also caused ptrace_freeze_traced
in ptrace_check_attach to fail because it left a per thread
SIGKILL pending which is what fatal_signal_pending tests for.

This difference in signal state caused strace to report
strace: Exit of unknown pid NNNNN ignored

Therefore update the signal handling state like dequeue_signal
would when removing a per thread SIGKILL, by removing SIGKILL
from the per thread signal mask and clearing TIF_SIGPENDING.

Acked-by: Oleg Nesterov <oleg@redhat.com>
Reported-by: Oleg Nesterov <oleg@redhat.com>
Reported-by: Ivan Delalande <colona@arista.com>
Cc: stable@vger.kernel.org
Fixes: 35634ffa1751 ("signal: Always notice exiting tasks")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
5 years agoMerge branch 'akpm' (patches from Andrew)
Linus Torvalds [Wed, 13 Feb 2019 01:15:33 +0000 (17:15 -0800)]
Merge branch 'akpm' (patches from Andrew)

Merge fixes from Andrew Morton:
 "6 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  mm: proc: smaps_rollup: fix pss_locked calculation
  Rename include/{uapi => }/asm-generic/shmparam.h really
  Revert "mm: use early_pfn_to_nid in page_ext_init"
  mm/gup: fix gup_pmd_range() for dax
  Revert "mm: slowly shrink slabs with a relatively small number of objects"
  Revert "mm: don't reclaim inodes with many attached pages"

5 years agomm: proc: smaps_rollup: fix pss_locked calculation
Sandeep Patil [Tue, 12 Feb 2019 23:36:11 +0000 (15:36 -0800)]
mm: proc: smaps_rollup: fix pss_locked calculation

The 'pss_locked' field of smaps_rollup was being calculated incorrectly.
It accumulated the current pss everytime a locked VMA was found.  Fix
that by adding to 'pss_locked' the same time as that of 'pss' if the vma
being walked is locked.

Link: http://lkml.kernel.org/r/20190203065425.14650-1-sspatil@android.com
Fixes: 493b0e9d945f ("mm: add /proc/pid/smaps_rollup")
Signed-off-by: Sandeep Patil <sspatil@android.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Daniel Colascione <dancol@google.com>
Cc: <stable@vger.kernel.org> [4.14.x, 4.19.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agoRename include/{uapi => }/asm-generic/shmparam.h really
Masahiro Yamada [Tue, 12 Feb 2019 23:36:06 +0000 (15:36 -0800)]
Rename include/{uapi => }/asm-generic/shmparam.h really

Commit 36c0f7f0f899 ("arch: unexport asm/shmparam.h for all
architectures") is different from the patch I submitted.

My patch is this:

  https://lore.kernel.org/lkml/1546904307-11124-1-git-send-email-yamada.masahiro@socionext.com/T/#u

The file renaming part:

  rename include/{uapi => }/asm-generic/shmparam.h (100%)

was lost when it was picked up.

I think it was an accident because Andrew did not say anything.

Link: http://lkml.kernel.org/r/1549158277-24558-1-git-send-email-yamada.masahiro@socionext.com
Fixes: 36c0f7f0f899 ("arch: unexport asm/shmparam.h for all architectures")
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agoRevert "mm: use early_pfn_to_nid in page_ext_init"
Qian Cai [Tue, 12 Feb 2019 23:36:03 +0000 (15:36 -0800)]
Revert "mm: use early_pfn_to_nid in page_ext_init"

This reverts commit fe53ca54270a ("mm: use early_pfn_to_nid in
page_ext_init").

When booting a system with "page_owner=on",

start_kernel
  page_ext_init
    invoke_init_callbacks
      init_section_page_ext
        init_page_owner
          init_early_allocated_pages
            init_zones_in_node
              init_pages_in_zone
                lookup_page_ext
                  page_to_nid

The issue here is that page_to_nid() will not work since some page flags
have no node information until later in page_alloc_init_late() due to
DEFERRED_STRUCT_PAGE_INIT.  Hence, it could trigger an out-of-bounds
access with an invalid nid.

  UBSAN: Undefined behaviour in ./include/linux/mm.h:1104:50
  index 7 is out of range for type 'zone [5]'

Also, kernel will panic since flags were poisoned earlier with,

CONFIG_DEBUG_VM_PGFLAGS=y
CONFIG_NODE_NOT_IN_PAGE_FLAGS=n

start_kernel
  setup_arch
    pagetable_init
      paging_init
        sparse_init
          sparse_init_nid
            memblock_alloc_try_nid_raw

It did not handle it well in init_pages_in_zone() which ends up calling
page_to_nid().

  page:ffffea0004200000 is uninitialized and poisoned
  raw: ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
  raw: ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
  page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
  page_owner info is not active (free page?)
  kernel BUG at include/linux/mm.h:990!
  RIP: 0010:init_page_owner+0x486/0x520

This means that assumptions behind commit fe53ca54270a ("mm: use
early_pfn_to_nid in page_ext_init") are incomplete.  Therefore, revert
the commit for now.  A proper way to move the page_owner initialization
to sooner is to hook into memmap initialization.

Link: http://lkml.kernel.org/r/20190115202812.75820-1-cai@lca.pw
Signed-off-by: Qian Cai <cai@lca.pw>
Acked-by: Michal Hocko <mhocko@kernel.org>
Cc: Pasha Tatashin <Pavel.Tatashin@microsoft.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Yang Shi <yang.shi@linaro.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agomm/gup: fix gup_pmd_range() for dax
Yu Zhao [Tue, 12 Feb 2019 23:35:58 +0000 (15:35 -0800)]
mm/gup: fix gup_pmd_range() for dax

For dax pmd, pmd_trans_huge() returns false but pmd_huge() returns true
on x86.  So the function works as long as hugetlb is configured.
However, dax doesn't depend on hugetlb.

Link: http://lkml.kernel.org/r/20190111034033.601-1-yuzhao@google.com
Signed-off-by: Yu Zhao <yuzhao@google.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Keith Busch <keith.busch@intel.com>
Cc: "Michael S . Tsirkin" <mst@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agoRevert "mm: slowly shrink slabs with a relatively small number of objects"
Dave Chinner [Tue, 12 Feb 2019 23:35:55 +0000 (15:35 -0800)]
Revert "mm: slowly shrink slabs with a relatively small number of objects"

This reverts commit 172b06c32b9497 ("mm: slowly shrink slabs with a
relatively small number of objects").

This change changes the agressiveness of shrinker reclaim, causing small
cache and low priority reclaim to greatly increase scanning pressure on
small caches.  As a result, light memory pressure has a disproportionate
affect on small caches, and causes large caches to be reclaimed much
faster than previously.

As a result, it greatly perturbs the delicate balance of the VFS caches
(dentry/inode vs file page cache) such that the inode/dentry caches are
reclaimed much, much faster than the page cache and this drives us into
several other caching imbalance related problems.

As such, this is a bad change and needs to be reverted.

[ Needs some massaging to retain the later seekless shrinker
  modifications.]

Link: http://lkml.kernel.org/r/20190130041707.27750-3-david@fromorbit.com
Fixes: 172b06c32b9497 ("mm: slowly shrink slabs with a relatively small number of objects")
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Cc: Wolfgang Walter <linux@stwm.de>
Cc: Roman Gushchin <guro@fb.com>
Cc: Spock <dairinin@gmail.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agoRevert "mm: don't reclaim inodes with many attached pages"
Dave Chinner [Tue, 12 Feb 2019 23:35:51 +0000 (15:35 -0800)]
Revert "mm: don't reclaim inodes with many attached pages"

This reverts commit a76cf1a474d7d ("mm: don't reclaim inodes with many
attached pages").

This change causes serious changes to page cache and inode cache
behaviour and balance, resulting in major performance regressions when
combining worklaods such as large file copies and kernel compiles.

  https://bugzilla.kernel.org/show_bug.cgi?id=202441

This change is a hack to work around the problems introduced by changing
how agressive shrinkers are on small caches in commit 172b06c32b94 ("mm:
slowly shrink slabs with a relatively small number of objects").  It
creates more problems than it solves, wasn't adequately reviewed or
tested, so it needs to be reverted.

Link: http://lkml.kernel.org/r/20190130041707.27750-2-david@fromorbit.com
Fixes: a76cf1a474d7d ("mm: don't reclaim inodes with many attached pages")
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Cc: Wolfgang Walter <linux@stwm.de>
Cc: Roman Gushchin <guro@fb.com>
Cc: Spock <dairinin@gmail.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agoMerge tag 'hwmon-for-v5.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/groec...
Linus Torvalds [Wed, 13 Feb 2019 00:29:33 +0000 (16:29 -0800)]
Merge tag 'hwmon-for-v5.0-rc7' of git://git./linux/kernel/git/groeck/linux-staging

Pull hwmon fix from Guenter Roeck:
 "Fix fan detection for NCT6793D"

* tag 'hwmon-for-v5.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
  hwmon: (nct6775) Fix fan6 detection for NCT6793D

5 years agoMerge tag 'riscv-for-linus-5.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Tue, 12 Feb 2019 21:33:48 +0000 (13:33 -0800)]
Merge tag 'riscv-for-linus-5.0-rc7' of git://git./linux/kernel/git/palmer/riscv-linux

Pull RISC-V fixes from Palmer Dabbelt:
 "This contains a pair of bug fixes that I'd like to include in 5.0:

   - A fix to disambiguate swap from invalid PTEs, which fixes an error
     when trying to unmap PROT_NONE pages.

   - A revert to an optimization of the size of flat binaries. This is
     really a workaround to prevent breaking existing boot flows, but
     since the change was introduced as part of the 5.0 merge window I'd
     like to have the fix in before 5.0 so we can avoid a regression for
     any proper releases.

  With these I hope we're out of patches for 5.0 in RISC-V land"

* tag 'riscv-for-linus-5.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux:
  Revert "RISC-V: Make BSS section as the last section in vmlinux.lds.S"
  riscv: Add pte bit to distinguish swap from invalid

5 years agoteam: avoid complex list operations in team_nl_cmd_options_set()
Cong Wang [Tue, 12 Feb 2019 05:59:51 +0000 (21:59 -0800)]
team: avoid complex list operations in team_nl_cmd_options_set()

The current opt_inst_list operations inside team_nl_cmd_options_set()
is too complex to track:

    LIST_HEAD(opt_inst_list);
    nla_for_each_nested(...) {
        list_for_each_entry(opt_inst, &team->option_inst_list, list) {
            if (__team_option_inst_tmp_find(&opt_inst_list, opt_inst))
                continue;
            list_add(&opt_inst->tmp_list, &opt_inst_list);
        }
    }
    team_nl_send_event_options_get(team, &opt_inst_list);

as while we retrieve 'opt_inst' from team->option_inst_list, it could
be added to the local 'opt_inst_list' for multiple times. The
__team_option_inst_tmp_find() doesn't work, as the setter
team_mode_option_set() still calls team->ops.exit() which uses
->tmp_list too in __team_options_change_check().

Simplify the list operations by moving the 'opt_inst_list' and
team_nl_send_event_options_get() into the nla_for_each_nested() loop so
that it can be guranteed that we won't insert a same list entry for
multiple times. Therefore, __team_option_inst_tmp_find() can be removed
too.

Fixes: 4fb0534fb7bb ("team: avoid adding twice the same option to the event list")
Fixes: 2fcdb2c9e659 ("team: allow to send multiple set events in one message")
Reported-by: syzbot+4d4af685432dc0e56c91@syzkaller.appspotmail.com
Reported-by: syzbot+68ee510075cf64260cc4@syzkaller.appspotmail.com
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'net_sched-some-fixes-for-cls_tcindex'
David S. Miller [Tue, 12 Feb 2019 19:10:56 +0000 (14:10 -0500)]
Merge branch 'net_sched-some-fixes-for-cls_tcindex'

Cong Wang says:

====================
net_sched: some fixes for cls_tcindex

This patchset contains 3 bug fixes for tcindex filter. Please check
each patch for details.

v2: fix a compile error in patch 2
    drop netns refcnt in patch 1
====================

Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
5 years agonet_sched: fix two more memory leaks in cls_tcindex
Cong Wang [Mon, 11 Feb 2019 21:06:16 +0000 (13:06 -0800)]
net_sched: fix two more memory leaks in cls_tcindex

struct tcindex_filter_result contains two parts:
struct tcf_exts and struct tcf_result.

For the local variable 'cr', its exts part is never used but
initialized without being released properly on success path. So
just completely remove the exts part to fix this leak.

For the local variable 'new_filter_result', it is never properly
released if not used by 'r' on success path.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet_sched: fix a memory leak in cls_tcindex
Cong Wang [Mon, 11 Feb 2019 21:06:15 +0000 (13:06 -0800)]
net_sched: fix a memory leak in cls_tcindex

When tcindex_destroy() destroys all the filter results in
the perfect hash table, it invokes the walker to delete
each of them. However, results with class==0 are skipped
in either tcindex_walk() or tcindex_delete(), which causes
a memory leak reported by kmemleak.

This patch fixes it by skipping the walker and directly
deleting these filter results so we don't miss any filter
result.

As a result of this change, we have to initialize exts->net
properly in tcindex_alloc_perfect_hash(). For net-next, we
need to consider whether we should initialize ->net in
tcf_exts_init() instead, before that just directly test
CONFIG_NET_CLS_ACT=y.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet_sched: fix a race condition in tcindex_destroy()
Cong Wang [Mon, 11 Feb 2019 21:06:14 +0000 (13:06 -0800)]
net_sched: fix a race condition in tcindex_destroy()

tcindex_destroy() invokes tcindex_destroy_element() via
a walker to delete each filter result in its perfect hash
table, and tcindex_destroy_element() calls tcindex_delete()
which schedules tcf RCU works to do the final deletion work.
Unfortunately this races with the RCU callback
__tcindex_destroy(), which could lead to use-after-free as
reported by Adrian.

Fix this by migrating this RCU callback to tcf RCU work too,
as that workqueue is ordered, we will not have use-after-free.

Note, we don't need to hold netns refcnt because we don't call
tcf_exts_destroy() here.

Fixes: 27ce4f05e2ab ("net_sched: use tcf_queue_work() in tcindex filter")
Reported-by: Adrian <bugs@abtelecom.ro>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'ena-races'
David S. Miller [Tue, 12 Feb 2019 19:05:07 +0000 (14:05 -0500)]
Merge branch 'ena-races'

Arthur Kiyanovski says:

====================
net: ena: race condition bug fix and version update

This patchset includes a fix to a race condition that can cause
kernel panic, as well as a driver version update because of this
fix.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: ena: update driver version from 2.0.2 to 2.0.3
Arthur Kiyanovski [Mon, 11 Feb 2019 17:17:44 +0000 (19:17 +0200)]
net: ena: update driver version from 2.0.2 to 2.0.3

Update driver version due to bug fix.

Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: ena: fix race between link up and device initalization
Arthur Kiyanovski [Mon, 11 Feb 2019 17:17:43 +0000 (19:17 +0200)]
net: ena: fix race between link up and device initalization

Fix race condition between ena_update_on_link_change() and
ena_restore_device().

This race can occur if link notification arrives while the driver
is performing a reset sequence. In this case link can be set up,
enabling the device, before it is fully restored. If packets are
sent at this time, the driver might access uninitialized data
structures, causing kernel crash.

Move the clearing of ENA_FLAG_ONGOING_RESET and netif_carrier_on()
after ena_up() to ensure the device is ready when link is set up.

Fixes: d18e4f683445 ("net: ena: fix race condition between device reset and link up setup")
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet/packet: fix 4gb buffer limit due to overflow check
Kal Conley [Sun, 10 Feb 2019 08:57:11 +0000 (09:57 +0100)]
net/packet: fix 4gb buffer limit due to overflow check

When calculating rb->frames_per_block * req->tp_block_nr the result
can overflow. Check it for overflow without limiting the total buffer
size to UINT_MAX.

This change fixes support for packet ring buffers >= UINT_MAX.

Fixes: 8f8d28e4d6d8 ("net/packet: fix overflow in check for tp_frame_nr")
Signed-off-by: Kal Conley <kal.conley@dectris.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoinet_diag: fix reporting cgroup classid and fallback to priority
Konstantin Khlebnikov [Sat, 9 Feb 2019 10:35:52 +0000 (13:35 +0300)]
inet_diag: fix reporting cgroup classid and fallback to priority

Field idiag_ext in struct inet_diag_req_v2 used as bitmap of requested
extensions has only 8 bits. Thus extensions starting from DCTCPINFO
cannot be requested directly. Some of them included into response
unconditionally or hook into some of lower 8 bits.

Extension INET_DIAG_CLASS_ID has not way to request from the beginning.

This patch bundle it with INET_DIAG_TCLASS (ipv6 tos), fixes space
reservation, and documents behavior for other extensions.

Also this patch adds fallback to reporting socket priority. This filed
is more widely used for traffic classification because ipv4 sockets
automatically maps TOS to priority and default qdisc pfifo_fast knows
about that. But priority could be changed via setsockopt SO_PRIORITY so
INET_DIAG_TOS isn't enough for predicting class.

Also cgroup2 obsoletes net_cls classid (it always zero), but we cannot
reuse this field for reporting cgroup2 id because it is 64-bit (ino+gen).

So, after this patch INET_DIAG_CLASS_ID will report socket priority
for most common setup when net_cls isn't set and/or cgroup2 in use.

Fixes: 0888e372c37f ("net: inet: diag: expose sockets cgroup classid")
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agobatman-adv: fix uninit-value in batadv_interface_tx()
Eric Dumazet [Mon, 11 Feb 2019 22:41:22 +0000 (14:41 -0800)]
batman-adv: fix uninit-value in batadv_interface_tx()

KMSAN reported batadv_interface_tx() was possibly using a
garbage value [1]

batadv_get_vid() does have a pskb_may_pull() call
but batadv_interface_tx() does not actually make sure
this did not fail.

[1]
BUG: KMSAN: uninit-value in batadv_interface_tx+0x908/0x1e40 net/batman-adv/soft-interface.c:231
CPU: 0 PID: 10006 Comm: syz-executor469 Not tainted 4.20.0-rc7+ #5
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x173/0x1d0 lib/dump_stack.c:113
 kmsan_report+0x12e/0x2a0 mm/kmsan/kmsan.c:613
 __msan_warning+0x82/0xf0 mm/kmsan/kmsan_instr.c:313
 batadv_interface_tx+0x908/0x1e40 net/batman-adv/soft-interface.c:231
 __netdev_start_xmit include/linux/netdevice.h:4356 [inline]
 netdev_start_xmit include/linux/netdevice.h:4365 [inline]
 xmit_one net/core/dev.c:3257 [inline]
 dev_hard_start_xmit+0x607/0xc40 net/core/dev.c:3273
 __dev_queue_xmit+0x2e42/0x3bc0 net/core/dev.c:3843
 dev_queue_xmit+0x4b/0x60 net/core/dev.c:3876
 packet_snd net/packet/af_packet.c:2928 [inline]
 packet_sendmsg+0x8306/0x8f30 net/packet/af_packet.c:2953
 sock_sendmsg_nosec net/socket.c:621 [inline]
 sock_sendmsg net/socket.c:631 [inline]
 __sys_sendto+0x8c4/0xac0 net/socket.c:1788
 __do_sys_sendto net/socket.c:1800 [inline]
 __se_sys_sendto+0x107/0x130 net/socket.c:1796
 __x64_sys_sendto+0x6e/0x90 net/socket.c:1796
 do_syscall_64+0xbc/0xf0 arch/x86/entry/common.c:291
 entry_SYSCALL_64_after_hwframe+0x63/0xe7
RIP: 0033:0x441889
Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 bb 10 fc ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007ffdda6fd468 EFLAGS: 00000216 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 0000000000441889
RDX: 000000000000000e RSI: 00000000200000c0 RDI: 0000000000000003
RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000216 R12: 00007ffdda6fd4c0
R13: 00007ffdda6fd4b0 R14: 0000000000000000 R15: 0000000000000000

Uninit was created at:
 kmsan_save_stack_with_flags mm/kmsan/kmsan.c:204 [inline]
 kmsan_internal_poison_shadow+0x92/0x150 mm/kmsan/kmsan.c:158
 kmsan_kmalloc+0xa6/0x130 mm/kmsan/kmsan_hooks.c:176
 kmsan_slab_alloc+0xe/0x10 mm/kmsan/kmsan_hooks.c:185
 slab_post_alloc_hook mm/slab.h:446 [inline]
 slab_alloc_node mm/slub.c:2759 [inline]
 __kmalloc_node_track_caller+0xe18/0x1030 mm/slub.c:4383
 __kmalloc_reserve net/core/skbuff.c:137 [inline]
 __alloc_skb+0x309/0xa20 net/core/skbuff.c:205
 alloc_skb include/linux/skbuff.h:998 [inline]
 alloc_skb_with_frags+0x1c7/0xac0 net/core/skbuff.c:5220
 sock_alloc_send_pskb+0xafd/0x10e0 net/core/sock.c:2083
 packet_alloc_skb net/packet/af_packet.c:2781 [inline]
 packet_snd net/packet/af_packet.c:2872 [inline]
 packet_sendmsg+0x661a/0x8f30 net/packet/af_packet.c:2953
 sock_sendmsg_nosec net/socket.c:621 [inline]
 sock_sendmsg net/socket.c:631 [inline]
 __sys_sendto+0x8c4/0xac0 net/socket.c:1788
 __do_sys_sendto net/socket.c:1800 [inline]
 __se_sys_sendto+0x107/0x130 net/socket.c:1796
 __x64_sys_sendto+0x6e/0x90 net/socket.c:1796
 do_syscall_64+0xbc/0xf0 arch/x86/entry/common.c:291
 entry_SYSCALL_64_after_hwframe+0x63/0xe7

Fixes: c6c8fea29769 ("net: Add batman-adv meshing protocol")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Marek Lindner <mareklindner@neomailbox.ch>
Cc: Simon Wunderlich <sw@simonwunderlich.de>
Cc: Antonio Quartulli <a@unstable.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge tag 'sound-5.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai...
Linus Torvalds [Tue, 12 Feb 2019 18:18:08 +0000 (10:18 -0800)]
Merge tag 'sound-5.0-rc7' of git://git./linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
 "It's a bit of surprising that we've got more changes than hoped at
  this late stage, but they all don't look too scary but small fixes.

  One change in ALSA core side is again the PCM regression fix that was
  partially addressed for OSS, but now the all relevant change is
  reverted instead. Also, a few ASoC core fixes for UAF and OOB are
  included, while the rest are usual random device-specific fixes"

* tag 'sound-5.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: pcm: Revert capture stream behavior change in blocking mode
  ALSA: usb-audio: Fix implicit fb endpoint setup by quirk
  ALSA: hda - Add quirk for HP EliteBook 840 G5
  ASoC: samsung: Prevent clk_get_rate() calls in atomic context
  ASoC: rsnd: ssiu: correct shift bit for ssiu9
  ASoC: rsnd: fixup rsnd_ssi_master_clk_start() user count check
  ASoC: dapm: fix out-of-bounds accesses to DAPM lookup tables
  ASoC: topology: fix oops/use-after-free case with dai driver
  ASoC: rsnd: fixup MIX kctrl registration
  ASoC: core: Allow soc_find_component lookups to match parent of_node
  ASoC: rt5682: Correct the setting while select ASRC clk for AD/DA filter
  ASoC: MAINTAINERS: fsl: Change Fabio's email address
  ASoC: hdmi-codec: fix oops on re-probe

5 years agoMerge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Tue, 12 Feb 2019 17:57:45 +0000 (09:57 -0800)]
Merge tag 'ext4_for_linus_stable' of git://git./linux/kernel/git/tytso/ext4

Pull ext4 fix from Ted Ts'o:
 "Revert a commit which landed in v5.0-rc1 since it makes fsync in ext4
  nojournal mode unsafe"

* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  Revert "ext4: use ext4_write_inode() when fsyncing w/o a journal"

5 years agoipv6: propagate genlmsg_reply return code
Li RongQing [Mon, 11 Feb 2019 11:32:20 +0000 (19:32 +0800)]
ipv6: propagate genlmsg_reply return code

genlmsg_reply can fail, so propagate its return code

Fixes: 915d7e5e593 ("ipv6: sr: add code base for control plane support of SR-IPv6")
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet/mlx4_en: Force CHECKSUM_NONE for short ethernet frames
Saeed Mahameed [Mon, 11 Feb 2019 16:04:17 +0000 (18:04 +0200)]
net/mlx4_en: Force CHECKSUM_NONE for short ethernet frames

When an ethernet frame is padded to meet the minimum ethernet frame
size, the padding octets are not covered by the hardware checksum.
Fortunately the padding octets are usually zero's, which don't affect
checksum. However, it is not guaranteed. For example, switches might
choose to make other use of these octets.
This repeatedly causes kernel hardware checksum fault.

Prior to the cited commit below, skb checksum was forced to be
CHECKSUM_NONE when padding is detected. After it, we need to keep
skb->csum updated. However, fixing up CHECKSUM_COMPLETE requires to
verify and parse IP headers, it does not worth the effort as the packets
are so small that CHECKSUM_COMPLETE has no significant advantage.

Future work: when reporting checksum complete is not an option for
IP non-TCP/UDP packets, we can actually fallback to report checksum
unnecessary, by looking at cqe IPOK bit.

Fixes: 88078d98d1bb ("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends")
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: phylink: avoid resolving link state too early
Russell King [Mon, 11 Feb 2019 15:04:24 +0000 (15:04 +0000)]
net: phylink: avoid resolving link state too early

During testing on Armada 388 platforms, it was found with a certain
module configuration that it was possible to trigger a kernel oops
during the module load process, caused by the phylink resolver being
triggered for a currently disabled interface.

This problem was introduced by changing the way the SFP registration
works, which now can result in the sfp link down notification being
called during phylink_create().

Fixes: b5bfc21af5cb ("net: sfp: do not probe SFP module before we're attached")
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agogeneve: change NET_UDP_TUNNEL dependency to select
Matteo Croce [Mon, 11 Feb 2019 14:32:36 +0000 (15:32 +0100)]
geneve: change NET_UDP_TUNNEL dependency to select

Due to the depends on NET_UDP_TUNNEL, at the moment it is impossible to
compile GENEVE if no other protocol depending on NET_UDP_TUNNEL is
selected.

Fix this changing the depends to a select, and drop NET_IP_TUNNEL from the
select list, as it already depends on NET_UDP_TUNNEL.

Signed-off-by: Matteo Croce <mcroce@redhat.com>
Reviewed-and-tested-by: Andrea Claudi <aclaudi@redhat.com>
Tested-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agosfc: initialise found bitmap in efx_ef10_mtd_probe
Bert Kenward [Tue, 12 Feb 2019 13:10:00 +0000 (13:10 +0000)]
sfc: initialise found bitmap in efx_ef10_mtd_probe

The bitmap of found partitions in efx_ef10_mtd_probe was not
initialised, causing partitions to be suppressed based off whatever
value was in the bitmap at the start.

Fixes: 3366463513f5 ("sfc: suppress duplicate nvmem partition types in efx_ef10_mtd_probe")
Signed-off-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge tag 'mac80211-for-davem-2019-02-12' of git://git.kernel.org/pub/scm/linux/kerne...
David S. Miller [Tue, 12 Feb 2019 16:43:19 +0000 (11:43 -0500)]
Merge tag 'mac80211-for-davem-2019-02-12' of git://git./linux/kernel/git/jberg/mac80211

Johannes Berg says:

====================
Just a few fixes:
 * aggregation session teardown with internal TXQs was
   continuing to send some frames marked as aggregation,
   fix from Ilan
 * IBSS join was missed during firmware restart, should
   such a thing happen
 * speculative execution based on the return value of
   cfg80211_classify8021d() - which is controlled by the
   sender of the packet - could be problematic in some
   code using it, prevent it
 * a few peer measurement fixes
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoipvs: fix dependency on nf_defrag_ipv6
Andrea Claudi [Mon, 11 Feb 2019 15:14:39 +0000 (16:14 +0100)]
ipvs: fix dependency on nf_defrag_ipv6

ipvs relies on nf_defrag_ipv6 module to manage IPv6 fragmentation,
but lacks proper Kconfig dependencies and does not explicitly
request defrag features.

As a result, if netfilter hooks are not loaded, when IPv6 fragmented
packet are handled by ipvs only the first fragment makes through.

Fix it properly declaring the dependency on Kconfig and registering
netfilter hooks on ip_vs_add_service() and ip_vs_new_dest().

Reported-by: Li Shuang <shuali@redhat.com>
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
5 years agotipc: fix link session and re-establish issues
Tuong Lien [Mon, 11 Feb 2019 06:29:43 +0000 (13:29 +0700)]
tipc: fix link session and re-establish issues

When a link endpoint is re-created (e.g. after a node reboot or
interface reset), the link session number is varied by random, the peer
endpoint will be synced with this new session number before the link is
re-established.

However, there is a shortcoming in this mechanism that can lead to the
link never re-established or faced with a failure then. It happens when
the peer endpoint is ready in ESTABLISHING state, the 'peer_session' as
well as the 'in_session' flag have been set, but suddenly this link
endpoint leaves. When it comes back with a random session number, there
are two situations possible:

1/ If the random session number is larger than (or equal to) the
previous one, the peer endpoint will be updated with this new session
upon receipt of a RESET_MSG from this endpoint, and the link can be re-
established as normal. Otherwise, all the RESET_MSGs from this endpoint
will be rejected by the peer. In turn, when this link endpoint receives
one ACTIVATE_MSG from the peer, it will move to ESTABLISHED and start
to send STATE_MSGs, but again these messages will be dropped by the
peer due to wrong session.
The peer link endpoint can still become ESTABLISHED after receiving a
traffic message from this endpoint (e.g. a BCAST_PROTOCOL or
NAME_DISTRIBUTOR), but since all the STATE_MSGs are invalid, the link
will be forced down sooner or later!

Even in case the random session number is larger than the previous one,
it can be that the ACTIVATE_MSG from the peer arrives first, and this
link endpoint moves quickly to ESTABLISHED without sending out any
RESET_MSG yet. Consequently, the peer link will not be updated with the
new session number, and the same link failure scenario as above will
happen.

2/ Another situation can be that, the peer link endpoint was reset due
to any reasons in the meantime, its link state was set to RESET from
ESTABLISHING but still in session, i.e. the 'in_session' flag is not
reset...
Now, if the random session number from this endpoint is less than the
previous one, all the RESET_MSGs from this endpoint will be rejected by
the peer. In the other direction, when this link endpoint receives a
RESET_MSG from the peer, it moves to ESTABLISHING and starts to send
ACTIVATE_MSGs, but all these messages will be rejected by the peer too.
As a result, the link cannot be re-established but gets stuck with this
link endpoint in state ESTABLISHING and the peer in RESET!

Solution:

===========

This link endpoint should not go directly to ESTABLISHED when getting
ACTIVATE_MSG from the peer which may belong to the old session if the
link was re-created. To ensure the session to be correct before the
link is re-established, the peer endpoint in ESTABLISHING state will
send back the last session number in ACTIVATE_MSG for a verification at
this endpoint. Then, if needed, a new and more appropriate session
number will be regenerated to force a re-synch first.

In addition, when a link in ESTABLISHING state is reset, its state will
move to RESET according to the link FSM, along with resetting the
'in_session' flag (and the other data) as a normal link reset, it will
also be deleted if requested.

The solution is backward compatible.

Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: fix IPv6 prefix route residue
Zhiqiang Liu [Mon, 11 Feb 2019 02:57:46 +0000 (10:57 +0800)]
net: fix IPv6 prefix route residue

Follow those steps:
 # ip addr add 2001:123::1/32 dev eth0
 # ip addr add 2001:123:456::2/64 dev eth0
 # ip addr del 2001:123::1/32 dev eth0
 # ip addr del 2001:123:456::2/64 dev eth0
and then prefix route of 2001:123::1/32 will still exist.

This is because ipv6_prefix_equal in check_cleanup_prefix_route
func does not check whether two IPv6 addresses have the same
prefix length. If the prefix of one address starts with another
shorter address prefix, even though their prefix lengths are
different, the return value of ipv6_prefix_equal is true.

Here I add a check of whether two addresses have the same prefix
to decide whether their prefixes are equal.

Fixes: 5b84efecb7d9 ("ipv6 addrconf: don't cleanup prefix route for IFA_F_NOPREFIXROUTE")
Signed-off-by: Zhiqiang Liu <liuzhiqiang26@huawei.com>
Reported-by: Wenhao Zhang <zhangwenhao8@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agotipc: fix skb may be leaky in tipc_link_input
Hoang Le [Mon, 11 Feb 2019 02:18:28 +0000 (09:18 +0700)]
tipc: fix skb may be leaky in tipc_link_input

When we free skb at tipc_data_input, we return a 'false' boolean.
Then, skb passed to subcalling tipc_link_input in tipc_link_rcv,

<snip>
1303 int tipc_link_rcv:
...
1354    if (!tipc_data_input(l, skb, l->inputq))
1355        rc |= tipc_link_input(l, skb, l->inputq);
</snip>

Fix it by simple changing to a 'true' boolean when skb is being free-ed.
Then, tipc_link_rcv will bypassed to subcalling tipc_link_input as above
condition.

Acked-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <maloy@donjonn.com>
Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonetfilter: compat: initialize all fields in xt_init
Francesco Ruggeri [Sun, 10 Feb 2019 19:58:29 +0000 (11:58 -0800)]
netfilter: compat: initialize all fields in xt_init

If a non zero value happens to be in xt[NFPROTO_BRIDGE].cur at init
time, the following panic can be caused by running

% ebtables -t broute -F BROUTING

from a 32-bit user level on a 64-bit kernel. This patch replaces
kmalloc_array with kcalloc when allocating xt.

[  474.680846] BUG: unable to handle kernel paging request at 0000000009600920
[  474.687869] PGD 2037006067 P4D 2037006067 PUD 2038938067 PMD 0
[  474.693838] Oops: 0000 [#1] SMP
[  474.697055] CPU: 9 PID: 4662 Comm: ebtables Kdump: loaded Not tainted 4.19.17-11302235.AroraKernelnext.fc18.x86_64 #1
[  474.707721] Hardware name: Supermicro X9DRT/X9DRT, BIOS 3.0 06/28/2013
[  474.714313] RIP: 0010:xt_compat_calc_jump+0x2f/0x63 [x_tables]
[  474.720201] Code: 40 0f b6 ff 55 31 c0 48 6b ff 70 48 03 3d dc 45 00 00 48 89 e5 8b 4f 6c 4c 8b 47 60 ff c9 39 c8 7f 2f 8d 14 08 d1 fa 48 63 fa <41> 39 34 f8 4c 8d 0c fd 00 00 00 00 73 05 8d 42 01 eb e1 76 05 8d
[  474.739023] RSP: 0018:ffffc9000943fc58 EFLAGS: 00010207
[  474.744296] RAX: 0000000000000000 RBX: ffffc90006465000 RCX: 0000000002580249
[  474.751485] RDX: 00000000012c0124 RSI: fffffffff7be17e9 RDI: 00000000012c0124
[  474.758670] RBP: ffffc9000943fc58 R08: 0000000000000000 R09: ffffffff8117cf8f
[  474.765855] R10: ffffc90006477000 R11: 0000000000000000 R12: 0000000000000001
[  474.773048] R13: 0000000000000000 R14: ffffc9000943fcb8 R15: ffffc9000943fcb8
[  474.780234] FS:  0000000000000000(0000) GS:ffff88a03f840000(0063) knlGS:00000000f7ac7700
[  474.788612] CS:  0010 DS: 002b ES: 002b CR0: 0000000080050033
[  474.794632] CR2: 0000000009600920 CR3: 0000002037422006 CR4: 00000000000606e0
[  474.802052] Call Trace:
[  474.804789]  compat_do_replace+0x1fb/0x2a3 [ebtables]
[  474.810105]  compat_do_ebt_set_ctl+0x69/0xe6 [ebtables]
[  474.815605]  ? try_module_get+0x37/0x42
[  474.819716]  compat_nf_setsockopt+0x4f/0x6d
[  474.824172]  compat_ip_setsockopt+0x7e/0x8c
[  474.828641]  compat_raw_setsockopt+0x16/0x3a
[  474.833220]  compat_sock_common_setsockopt+0x1d/0x24
[  474.838458]  __compat_sys_setsockopt+0x17e/0x1b1
[  474.843343]  ? __check_object_size+0x76/0x19a
[  474.847960]  __ia32_compat_sys_socketcall+0x1cb/0x25b
[  474.853276]  do_fast_syscall_32+0xaf/0xf6
[  474.857548]  entry_SYSENTER_compat+0x6b/0x7a

Signed-off-by: Francesco Ruggeri <fruggeri@arista.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
5 years agoRevert "RISC-V: Make BSS section as the last section in vmlinux.lds.S"
Palmer Dabbelt [Fri, 8 Feb 2019 17:11:08 +0000 (09:11 -0800)]
Revert "RISC-V: Make BSS section as the last section in vmlinux.lds.S"

At least BBL relies on the flat binaries containing all the bytes in the
actual image to exist in the file.  Before this revert the flat images
dropped the trailing zeros, which caused BBL to put its copy of the
device tree where Linux thought the BSS was, which wreaks all sorts of
havoc.  Manifesting the bug is a bit subtle because BBL aligns
everything to 2MiB page boundaries, but with large enough kernels you're
almost certain to get bitten by the bug.

While moving the sections around isn't a great long-term fix, it will at
least avoid producing broken images.

This reverts commit 22e6a2e14cb8ebcae059488cf24e778e4058c2bf.

Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
5 years agoriscv: Add pte bit to distinguish swap from invalid
Stefan O'Rear [Sun, 16 Dec 2018 18:03:36 +0000 (13:03 -0500)]
riscv: Add pte bit to distinguish swap from invalid

Previously, invalid PTEs and swap PTEs had the same binary
representation, causing errors when attempting to unmap PROT_NONE
mappings, including implicit unmap on exit.

Typical error:

swap_info_get: Bad swap file entry 40000000007a9879
BUG: Bad page map in process a.out  pte:3d4c3cc0 pmd:3e521401

Cc: stable@vger.kernel.org
Signed-off-by: Stefan O'Rear <sorear2@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
5 years agoMerge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux...
Linus Torvalds [Mon, 11 Feb 2019 21:30:50 +0000 (13:30 -0800)]
Merge branch 'fixes' of git://git./linux/kernel/git/evalenti/linux-soc-thermal

Pull thermal SoC management fixes from Eduardo Valentin:
 "Minor fixes on of-thermal and cpu cooling"

* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal:
  thermal: cpu_cooling: Clarify error message
  thermal: of-thermal: Print name of device node with error

5 years agonet/x25: do not hold the cpu too long in x25_new_lci()
Eric Dumazet [Fri, 8 Feb 2019 20:41:05 +0000 (12:41 -0800)]
net/x25: do not hold the cpu too long in x25_new_lci()

Due to quadratic behavior of x25_new_lci(), syzbot was able
to trigger an rcu stall.

Fix this by not blocking BH for the whole duration of
the function, and inserting a reschedule point when possible.

If we care enough, using a bitmap could get rid of the quadratic
behavior.

syzbot report :

rcu: INFO: rcu_preempt self-detected stall on CPU
rcu:    0-...!: (10500 ticks this GP) idle=4fa/1/0x4000000000000002 softirq=283376/283376 fqs=0
rcu:     (t=10501 jiffies g=383105 q=136)
rcu: rcu_preempt kthread starved for 10502 jiffies! g383105 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=0
rcu: RCU grace-period kthread stack dump:
rcu_preempt     I28928    10      2 0x80000000
Call Trace:
 context_switch kernel/sched/core.c:2844 [inline]
 __schedule+0x817/0x1cc0 kernel/sched/core.c:3485
 schedule+0x92/0x180 kernel/sched/core.c:3529
 schedule_timeout+0x4db/0xfd0 kernel/time/timer.c:1803
 rcu_gp_fqs_loop kernel/rcu/tree.c:1948 [inline]
 rcu_gp_kthread+0x956/0x17a0 kernel/rcu/tree.c:2105
 kthread+0x357/0x430 kernel/kthread.c:246
 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352
NMI backtrace for cpu 0
CPU: 0 PID: 8759 Comm: syz-executor2 Not tainted 5.0.0-rc4+ #51
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 <IRQ>
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x172/0x1f0 lib/dump_stack.c:113
 nmi_cpu_backtrace.cold+0x63/0xa4 lib/nmi_backtrace.c:101
 nmi_trigger_cpumask_backtrace+0x1be/0x236 lib/nmi_backtrace.c:62
 arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38
 trigger_single_cpu_backtrace include/linux/nmi.h:164 [inline]
 rcu_dump_cpu_stacks+0x183/0x1cf kernel/rcu/tree.c:1211
 print_cpu_stall kernel/rcu/tree.c:1348 [inline]
 check_cpu_stall kernel/rcu/tree.c:1422 [inline]
 rcu_pending kernel/rcu/tree.c:3018 [inline]
 rcu_check_callbacks.cold+0x500/0xa4a kernel/rcu/tree.c:2521
 update_process_times+0x32/0x80 kernel/time/timer.c:1635
 tick_sched_handle+0xa2/0x190 kernel/time/tick-sched.c:161
 tick_sched_timer+0x47/0x130 kernel/time/tick-sched.c:1271
 __run_hrtimer kernel/time/hrtimer.c:1389 [inline]
 __hrtimer_run_queues+0x33e/0xde0 kernel/time/hrtimer.c:1451
 hrtimer_interrupt+0x314/0x770 kernel/time/hrtimer.c:1509
 local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1035 [inline]
 smp_apic_timer_interrupt+0x120/0x570 arch/x86/kernel/apic/apic.c:1060
 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:807
 </IRQ>
RIP: 0010:__read_once_size include/linux/compiler.h:193 [inline]
RIP: 0010:queued_write_lock_slowpath+0x13e/0x290 kernel/locking/qrwlock.c:86
Code: 00 00 fc ff df 4c 8d 2c 01 41 83 c7 03 41 0f b6 45 00 41 38 c7 7c 08 84 c0 0f 85 0c 01 00 00 8b 03 3d 00 01 00 00 74 1a f3 90 <41> 0f b6 55 00 41 38 d7 7c eb 84 d2 74 e7 48 89 df e8 6c 0f 4f 00
RSP: 0018:ffff88805f117bd8 EFLAGS: 00000206 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000300 RBX: ffffffff89413ba0 RCX: 1ffffffff1282774
RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffffffff89413ba0
RBP: ffff88805f117c70 R08: 1ffffffff1282774 R09: fffffbfff1282775
R10: fffffbfff1282774 R11: ffffffff89413ba3 R12: 00000000000000ff
R13: fffffbfff1282774 R14: 1ffff1100be22f7d R15: 0000000000000003
 queued_write_lock include/asm-generic/qrwlock.h:104 [inline]
 do_raw_write_lock+0x1d6/0x290 kernel/locking/spinlock_debug.c:203
 __raw_write_lock_bh include/linux/rwlock_api_smp.h:204 [inline]
 _raw_write_lock_bh+0x3b/0x50 kernel/locking/spinlock.c:312
 x25_insert_socket+0x21/0xe0 net/x25/af_x25.c:267
 x25_bind+0x273/0x340 net/x25/af_x25.c:705
 __sys_bind+0x23f/0x290 net/socket.c:1505
 __do_sys_bind net/socket.c:1516 [inline]
 __se_sys_bind net/socket.c:1514 [inline]
 __x64_sys_bind+0x73/0xb0 net/socket.c:1514
 do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x457e39
Code: ad b8 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 7b b8 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007fafccd0dc78 EFLAGS: 00000246 ORIG_RAX: 0000000000000031
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000457e39
RDX: 0000000000000012 RSI: 0000000020000240 RDI: 0000000000000004
RBP: 000000000073bf00 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007fafccd0e6d4
R13: 00000000004bdf8b R14: 00000000004ce4b8 R15: 00000000ffffffff
Sending NMI from CPU 0 to CPUs 1:
NMI backtrace for cpu 1
CPU: 1 PID: 8752 Comm: syz-executor4 Not tainted 5.0.0-rc4+ #51
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:__x25_find_socket+0x78/0x120 net/x25/af_x25.c:328
Code: 89 f8 48 c1 e8 03 80 3c 18 00 0f 85 a6 00 00 00 4d 8b 64 24 68 4d 85 e4 74 7f e8 03 97 3d fb 49 83 ec 68 74 74 e8 f8 96 3d fb <49> 8d bc 24 88 04 00 00 48 89 f8 48 c1 e8 03 0f b6 04 18 84 c0 74
RSP: 0018:ffff8880639efc58 EFLAGS: 00000246
RAX: 0000000000040000 RBX: dffffc0000000000 RCX: ffffc9000e677000
RDX: 0000000000040000 RSI: ffffffff863244b8 RDI: ffff88806a764628
RBP: ffff8880639efc80 R08: ffff8880a80d05c0 R09: fffffbfff1282775
R10: fffffbfff1282774 R11: ffffffff89413ba3 R12: ffff88806a7645c0
R13: 0000000000000001 R14: ffff88809f29ac00 R15: 0000000000000000
FS:  00007fe8d0c58700(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000001b32823000 CR3: 00000000672eb000 CR4: 00000000001406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 x25_new_lci net/x25/af_x25.c:357 [inline]
 x25_connect+0x374/0xdf0 net/x25/af_x25.c:786
 __sys_connect+0x266/0x330 net/socket.c:1686
 __do_sys_connect net/socket.c:1697 [inline]
 __se_sys_connect net/socket.c:1694 [inline]
 __x64_sys_connect+0x73/0xb0 net/socket.c:1694
 do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x457e39
Code: ad b8 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 7b b8 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007fe8d0c57c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002a
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000457e39
RDX: 0000000000000012 RSI: 0000000020000200 RDI: 0000000000000004
RBP: 000000000073bf00 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007fe8d0c586d4
R13: 00000000004be378 R14: 00000000004ceb00 R15: 00000000ffffffff

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Andrew Hendry <andrew.hendry@gmail.com>
Cc: linux-x25@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agotracing: probeevent: Correctly update remaining space in dynamic area
Andreas Ziegler [Wed, 6 Feb 2019 19:00:13 +0000 (20:00 +0100)]
tracing: probeevent: Correctly update remaining space in dynamic area

Commit 9178412ddf5a ("tracing: probeevent: Return consumed
bytes of dynamic area") improved the string fetching
mechanism by returning the number of required bytes after
copying the argument to the dynamic area. However, this
return value is now only used to increment the pointer
inside the dynamic area but misses updating the 'maxlen'
variable which indicates the remaining space in the dynamic
area.

This means that fetch_store_string() always reads the *total*
size of the dynamic area from the data_loc pointer instead of
the *remaining* size (and passes it along to
strncpy_from_{user,unsafe}) even if we're already about to
copy data into the middle of the dynamic area.

Link: http://lkml.kernel.org/r/20190206190013.16405-1-andreas.ziegler@fau.de
Cc: Ingo Molnar <mingo@redhat.com>
Cc: stable@vger.kernel.org
Fixes: 9178412ddf5a ("tracing: probeevent: Return consumed bytes of dynamic area")
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Andreas Ziegler <andreas.ziegler@fau.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
5 years agovxlan: test dev->flags & IFF_UP before calling netif_rx()
Eric Dumazet [Thu, 7 Feb 2019 20:27:38 +0000 (12:27 -0800)]
vxlan: test dev->flags & IFF_UP before calling netif_rx()

netif_rx() must be called under a strict contract.

At device dismantle phase, core networking clears IFF_UP
and flush_all_backlogs() is called after rcu grace period
to make sure no incoming packet might be in a cpu backlog
and still referencing the device.

Most drivers call netif_rx() from their interrupt handler,
and since the interrupts are disabled at device dismantle,
netif_rx() does not have to check dev->flags & IFF_UP

Virtual drivers do not have this guarantee, and must
therefore make the check themselves.

Otherwise we risk use-after-free and/or crashes.

Note this patch also fixes a small issue that came
with commit ce6502a8f957 ("vxlan: fix a use after free
in vxlan_encap_bypass"), since the dev->stats.rx_dropped
change was done on the wrong device.

Fixes: d342894c5d2f ("vxlan: virtual extensible lan")
Fixes: ce6502a8f957 ("vxlan: fix a use after free in vxlan_encap_bypass")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Petr Machata <petrm@mellanox.com>
Cc: Ido Schimmel <idosch@mellanox.com>
Cc: Roopa Prabhu <roopa@cumulusnetworks.com>
Cc: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoDocumentation: bring operstate documentation up-to-date
Jouke Witteveen [Thu, 7 Feb 2019 16:14:32 +0000 (17:14 +0100)]
Documentation: bring operstate documentation up-to-date

Netlink has moved from bitmasks to group numbers long ago.

Signed-off-by: Jouke Witteveen <j.witteveen@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
David S. Miller [Mon, 11 Feb 2019 18:42:50 +0000 (10:42 -0800)]
Merge git://git./pub/scm/linux/kernel/git/pablo/nf

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for net:

1) Out-of-bound access to packet data from the snmp nat helper,
   from Jann Horn.

2) ICMP(v6) error packets are set as related traffic by conntrack,
   update protocol number before calling nf_nat_ipv4_manip_pkt()
   to use ICMP(v6) rather than the original protocol number,
   from Florian Westphal.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge tag 's390-5.0-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Linus Torvalds [Mon, 11 Feb 2019 18:28:48 +0000 (10:28 -0800)]
Merge tag 's390-5.0-3' of git://git./linux/kernel/git/s390/linux

Pull s390 bug fixes from Martin Schwidefsky:

 - Fix specification exception on z196 during ap probe

 - A fix for suspend-to-disk, the VMAP stack patch broke the
   swsusp_arch_suspend function

 - The EMC CKD ioctl of the dasd driver needs an additional size check
   for user space data

 - Revert an incorrect patch for the PCI base code that removed a bit
   lock that turned out to be required after all

* tag 's390-5.0-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  Revert "s390/pci: remove bit_lock usage in interrupt handler"
  s390/zcrypt: fix specification exception on z196 during ap probe
  s390/dasd: fix using offset into zero size array error
  s390/suspend: fix stack setup in swsusp_arch_suspend

5 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mattst88...
Linus Torvalds [Mon, 11 Feb 2019 18:19:33 +0000 (10:19 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/mattst88/alpha

Pull alpha fixes from Matt Turner:
 "A few changes for alpha, including a build fix, a fix for the Eiger
  platform, and a fix for a tricky bug uncovered by the strace test
  suite that has existed since at least 1997 (v2.1.32)!"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mattst88/alpha:
  alpha: fix page fault handling for r16-r18 targets
  alpha: Fix Eiger NR_IRQS to 128
  tools uapi: fix Alpha support

5 years agoDocumentation: Fix grammatical error in sysctl/fs.txt & clarify negative dentry
Waiman Long [Thu, 7 Feb 2019 20:15:42 +0000 (15:15 -0500)]
Documentation: Fix grammatical error in sysctl/fs.txt & clarify negative dentry

Fix a grammatical error in the dentry-state text and clarify the usage
of negative dentries.

Fixes: af0c9af1b3f66 ("fs/dcache: Track & report number of negative dentries")
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agonetfilter: nat: fix spurious connection timeouts
Florian Westphal [Fri, 8 Feb 2019 15:39:52 +0000 (16:39 +0100)]
netfilter: nat: fix spurious connection timeouts

Sander Eikelenboom bisected a NAT related regression down
to the l4proto->manip_pkt indirection removal.

I forgot that ICMP(v6) errors (e.g. PKTTOOBIG) can be set as related
to the existing conntrack entry.

Therefore, when passing the skb to nf_nat_ipv4/6_manip_pkt(), that
ended up calling the wrong l4 manip function, as tuple->dst.protonum
is the original flows l4 protocol (TCP, UDP, etc).

Set the dst protocol field to ICMP(v6), we already have a private copy
of the tuple due to the inversion of src/dst.

Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
Tested-by: Sander Eikelenboom <linux@eikelenboom.it>
Fixes: faec18dbb0405 ("netfilter: nat: remove l4proto->manip_pkt")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
5 years agonetfilter: nf_nat_snmp_basic: add missing length checks in ASN.1 cbs
Jann Horn [Wed, 6 Feb 2019 21:56:15 +0000 (22:56 +0100)]
netfilter: nf_nat_snmp_basic: add missing length checks in ASN.1 cbs

The generic ASN.1 decoder infrastructure doesn't guarantee that callbacks
will get as much data as they expect; callbacks have to check the `datalen`
parameter before looking at `data`. Make sure that snmp_version() and
snmp_helper() don't read/write beyond the end of the packet data.

(Also move the assignment to `pdata` down below the check to make it clear
that it isn't necessarily a pointer we can use before the `datalen` check.)

Fixes: cc2d58634e0f ("netfilter: nf_nat_snmp_basic: use asn1 decoder library")
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
5 years agomac80211: Fix Tx aggregation session tear down with ITXQs
Ilan Peer [Wed, 6 Feb 2019 11:17:21 +0000 (13:17 +0200)]
mac80211: Fix Tx aggregation session tear down with ITXQs

When mac80211 requests the low level driver to stop an ongoing
Tx aggregation, the low level driver is expected to call
ieee80211_stop_tx_ba_cb_irqsafe() to indicate that it is ready
to stop the session. The callback in turn schedules a worker
to complete the session tear down, which in turn also handles
the relevant state for the intermediate Tx queue.

However, as this flow in asynchronous, the intermediate queue
should be stopped and not continue servicing frames, as in
such a case frames that are dequeued would be marked as part
of an aggregation, although the aggregation is already been
stopped.

Fix this by stopping the intermediate Tx queue, before
calling the low level driver to stop the Tx aggregation.

Signed-off-by: Ilan Peer <ilan.peer@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
5 years agocfg80211: prevent speculation on cfg80211_classify8021d() return
Johannes Berg [Wed, 6 Feb 2019 11:17:14 +0000 (13:17 +0200)]
cfg80211: prevent speculation on cfg80211_classify8021d() return

It's possible that the caller of cfg80211_classify8021d() uses the
value to index an array, like mac80211 in ieee80211_downgrade_queue().
Prevent speculation on the return value.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
5 years agocfg80211: pmsr: record netlink port ID
Johannes Berg [Wed, 6 Feb 2019 11:17:07 +0000 (13:17 +0200)]
cfg80211: pmsr: record netlink port ID

Without recording the netlink port ID, we cannot return the
results or complete messages to userspace, nor will we be
able to abort if the socket is closed, so clearly we need
to fill the value.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
5 years agonl80211: Fix FTM per burst maximum value
Aviya Erenfeld [Wed, 6 Feb 2019 11:17:08 +0000 (13:17 +0200)]
nl80211: Fix FTM per burst maximum value

Fix FTM per burst maximum value from 15 to 31
(The maximal bits that represents that number in the frame
is 5 hence a maximal value of 31)

Signed-off-by: Aviya Erenfeld <aviya.erenfeld@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
5 years agomac80211: call drv_ibss_join() on restart
Johannes Berg [Wed, 6 Feb 2019 11:17:12 +0000 (13:17 +0200)]
mac80211: call drv_ibss_join() on restart

If a driver does any significant activity in its ibss_join method,
then it will very well expect that to be called during restart,
before any stations are added. Do that.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
5 years agoalpha: fix page fault handling for r16-r18 targets
Sergei Trofimovich [Mon, 31 Dec 2018 11:53:55 +0000 (11:53 +0000)]
alpha: fix page fault handling for r16-r18 targets

Fix page fault handling code to fixup r16-r18 registers.
Before the patch code had off-by-two registers bug.
This bug caused overwriting of ps,pc,gp registers instead
of fixing intended r16,r17,r18 (see `struct pt_regs`).

More details:

Initially Dmitry noticed a kernel bug as a failure
on strace test suite. Test passes unmapped userspace
pointer to io_submit:

```c
    #include <err.h>
    #include <unistd.h>
    #include <sys/mman.h>
    #include <asm/unistd.h>
    int main(void)
    {
        unsigned long ctx = 0;
        if (syscall(__NR_io_setup, 1, &ctx))
            err(1, "io_setup");
        const size_t page_size = sysconf(_SC_PAGESIZE);
        const size_t size = page_size * 2;
        void *ptr = mmap(NULL, size, PROT_READ | PROT_WRITE,
                         MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
        if (MAP_FAILED == ptr)
            err(1, "mmap(%zu)", size);
        if (munmap(ptr, size))
            err(1, "munmap");
        syscall(__NR_io_submit, ctx, 1, ptr + page_size);
        syscall(__NR_io_destroy, ctx);
        return 0;
    }
```

Running this test causes kernel to crash when handling page fault:

```
    Unable to handle kernel paging request at virtual address ffffffffffff9468
    CPU 3
    aio(26027): Oops 0
    pc = [<fffffc00004eddf8>]  ra = [<fffffc00004edd5c>]  ps = 0000    Not tainted
    pc is at sys_io_submit+0x108/0x200
    ra is at sys_io_submit+0x6c/0x200
    v0 = fffffc00c58e6300  t0 = fffffffffffffff2  t1 = 000002000025e000
    t2 = fffffc01f159fef8  t3 = fffffc0001009640  t4 = fffffc0000e0f6e0
    t5 = 0000020001002e9e  t6 = 4c41564e49452031  t7 = fffffc01f159c000
    s0 = 0000000000000002  s1 = 000002000025e000  s2 = 0000000000000000
    s3 = 0000000000000000  s4 = 0000000000000000  s5 = fffffffffffffff2
    s6 = fffffc00c58e6300
    a0 = fffffc00c58e6300  a1 = 0000000000000000  a2 = 000002000025e000
    a3 = 00000200001ac260  a4 = 00000200001ac1e8  a5 = 0000000000000001
    t8 = 0000000000000008  t9 = 000000011f8bce30  t10= 00000200001ac440
    t11= 0000000000000000  pv = fffffc00006fd320  at = 0000000000000000
    gp = 0000000000000000  sp = 00000000265fd174
    Disabling lock debugging due to kernel taint
    Trace:
    [<fffffc0000311404>] entSys+0xa4/0xc0
```

Here `gp` has invalid value. `gp is s overwritten by a fixup for the
following page fault handler in `io_submit` syscall handler:

```
    __se_sys_io_submit
    ...
        ldq     a1,0(t1)
        bne     t0,4280 <__se_sys_io_submit+0x180>
```

After a page fault `t0` should contain -EFALUT and `a1` is 0.
Instead `gp` was overwritten in place of `a1`.

This happens due to a off-by-two bug in `dpf_reg()` for `r16-r18`
(aka `a0-a2`).

I think the bug went unnoticed for a long time as `gp` is one
of scratch registers. Any kernel function call would re-calculate `gp`.

Dmitry tracked down the bug origin back to 2.1.32 kernel version
where trap_a{0,1,2} fields were inserted into struct pt_regs.
And even before that `dpf_reg()` contained off-by-one error.

Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: linux-alpha@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Reported-and-reviewed-by: "Dmitry V. Levin" <ldv@altlinux.org>
Cc: stable@vger.kernel.org # v2.1.32+
Bug: https://bugs.gentoo.org/672040
Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
Signed-off-by: Matt Turner <mattst88@gmail.com>
5 years agoalpha: Fix Eiger NR_IRQS to 128
Meelis Roos [Fri, 12 Oct 2018 09:27:51 +0000 (12:27 +0300)]
alpha: Fix Eiger NR_IRQS to 128

Eiger machine vector definition has nr_irqs 128, and working 2.6.26
boot shows SCSI getting IRQ-s 64 and 65. Current kernel boot fails
because Symbios SCSI fails to request IRQ-s and does not find the disks.
It has been broken at least since 3.18 - the earliest I could test with
my gcc-5.

The headers have moved around and possibly another order of defines has
worked in the past - but since 128 seems to be correct and used, fix
arch/alpha/include/asm/irq.h to have NR_IRQS=128 for Eiger.

This fixes 4.19-rc7 boot on my Force Flexor A264 (Eiger subarch).

Cc: stable@vger.kernel.org # v3.18+
Signed-off-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: Matt Turner <mattst88@gmail.com>
5 years agotools uapi: fix Alpha support
Bob Tracy [Tue, 22 Jan 2019 05:09:14 +0000 (21:09 -0800)]
tools uapi: fix Alpha support

Cc: stable@vger.kernel.org # v4.18+
Signed-off-by: Bob Tracy <rct@frus.com>
Signed-off-by: Matt Turner <mattst88@gmail.com>
5 years agoLinux 5.0-rc6
Linus Torvalds [Sun, 10 Feb 2019 22:42:20 +0000 (14:42 -0800)]
Linux 5.0-rc6

5 years agoMerge branch 'r8169-revert-two-commits-due-to-a-regression'
David S. Miller [Sun, 10 Feb 2019 20:54:49 +0000 (12:54 -0800)]
Merge branch 'r8169-revert-two-commits-due-to-a-regression'

Heiner Kallweit says:

====================
r8169: revert two commits due to a regression

Sander reported a regression (kernel panic, see[1]), therefore let's
revert these commits. Removal of the barriers doesn't seem to
contribute to the issue, the patch just overlaps with the problematic
one and only reverting both patches was tested.

[1] https://marc.info/?t=154965066400001&r=1&w=2

v2:
- improve commit message
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoRevert "r8169: make use of xmit_more and __netdev_sent_queue"
Heiner Kallweit [Sun, 10 Feb 2019 14:28:04 +0000 (15:28 +0100)]
Revert "r8169: make use of xmit_more and __netdev_sent_queue"

This reverts commit 2e6eedb4813e34d8d84ac0eb3afb668966f3f356.

Sander reported a regression causing a kernel panic[1],
therefore let's revert this commit.

[1] https://marc.info/?t=154965066400001&r=1&w=2

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoRevert "r8169: remove unneeded mmiowb barriers"
Heiner Kallweit [Sun, 10 Feb 2019 14:26:07 +0000 (15:26 +0100)]
Revert "r8169: remove unneeded mmiowb barriers"

This reverts commit bd7153bd83b806bfcc2e79b7a6f43aa653d06ef3.

There doesn't seem to be anything wrong with this patch,
it's just reverted to get a stable baseline again.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge tag 'dmaengine-fix-5.0-rc6' of git://git.infradead.org/users/vkoul/slave-dma
Linus Torvalds [Sun, 10 Feb 2019 18:39:37 +0000 (10:39 -0800)]
Merge tag 'dmaengine-fix-5.0-rc6' of git://git.infradead.org/users/vkoul/slave-dma

Pull dmaengine fixes from Vinod Koul:
 - Fix in at_xdmac fr wrongful channel state
 - Fix for imx driver for wrong callback invocation
 - Fix to bcm driver for interrupt race & transaction abort.
 - Fix in dmatest to abort in mapping error

* tag 'dmaengine-fix-5.0-rc6' of git://git.infradead.org/users/vkoul/slave-dma:
  dmaengine: dmatest: Abort test in case of mapping error
  dmaengine: bcm2835: Fix abort of transactions
  dmaengine: bcm2835: Fix interrupt race on RT
  dmaengine: imx-dma: fix wrong callback invoke
  dmaengine: at_xdmac: Fix wrongfull report of a channel as in use

5 years agoMerge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 10 Feb 2019 17:57:42 +0000 (09:57 -0800)]
Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull x86 fixes from Ingo Molnar:
 "A handful of fixes:

   - Fix an MCE corner case bug/crash found via MCE injection testing

   - Fix 5-level paging boot crash

   - Fix MCE recovery cache invalidation bug

   - Fix regression on Xen guests caused by a recent PMD level mremap
     speedup optimization"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm: Make set_pmd_at() paravirt aware
  x86/mm/cpa: Fix set_mce_nospec()
  x86/boot/compressed/64: Do not corrupt EDX on EFER.LME=1 setting
  x86/MCE: Initialize mce.bank in the case of a fatal error in mce_no_way_out()

5 years agoMerge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 10 Feb 2019 17:54:19 +0000 (09:54 -0800)]
Merge branch 'irq-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull irq fixes from Ingo Molnar:
 "irqchip driver fixes: most of them are race fixes for ARM GIC (General
  Interrupt Controller) variants, but also a fix for the ARM MMP
  (Marvell PXA168 et al) irqchip affecting OLPC keyboards"

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/gic-v3-its: Fix ITT_entry_size accessor
  irqchip/mmp: Only touch the PJ4 IRQ & FIQ bits on enable/disable
  irqchip/gic-v3-its: Gracefully fail on LPI exhaustion
  irqchip/gic-v3-its: Plug allocation race for devices sharing a DevID
  irqchip/gic-v4: Fix occasional VLPI drop

5 years agoMerge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 10 Feb 2019 17:48:18 +0000 (09:48 -0800)]
Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull perf fixes from Ingo Molnar:
 "A couple of kernel side fixes:

   - Fix the Intel uncore driver on certain hardware configurations

   - Fix a CPU hotplug related memory allocation bug

   - Remove a spurious WARN()

  ... plus also a handful of perf tooling fixes"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf script python: Add Python3 support to tests/attr.py
  perf trace: Support multiple "vfs_getname" probes
  perf symbols: Filter out hidden symbols from labels
  perf symbols: Add fallback definitions for GELF_ST_VISIBILITY()
  tools headers uapi: Sync linux/in.h copy from the kernel sources
  perf clang: Do not use 'return std::move(something)'
  perf mem/c2c: Fix perf_mem_events to support powerpc
  perf tests evsel-tp-sched: Fix bitwise operator
  perf/core: Don't WARN() for impossible ring-buffer sizes
  perf/x86/intel: Delay memory deallocation until x86_pmu_dead_cpu()
  perf/x86/intel/uncore: Add Node ID mask

5 years agoMerge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 10 Feb 2019 17:44:52 +0000 (09:44 -0800)]
Merge branch 'locking-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull locking fixes from Ingo Molnar:
 "An rtmutex (PI-futex) deadlock scenario fix, plus a locking
  documentation fix"

* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  futex: Handle early deadlock return correctly
  futex: Fix barrier comment

5 years agox86/mm: Make set_pmd_at() paravirt aware
Juergen Gross [Sun, 10 Feb 2019 07:40:56 +0000 (08:40 +0100)]
x86/mm: Make set_pmd_at() paravirt aware

set_pmd_at() calls native_set_pmd() unconditionally on x86. This was
fine as long as only huge page entries were written via set_pmd_at(),
as Xen pv guests don't support those.

Commit 2c91bd4a4e2e53 ("mm: speed up mremap by 20x on large regions")
introduced a usage of set_pmd_at() possible on pv guests, leading to
failures like:

BUG: unable to handle kernel paging request at ffff888023e26778
#PF error: [PROT] [WRITE]
RIP: e030:move_page_tables+0x7c1/0xae0
move_vma.isra.3+0xd1/0x2d0
__se_sys_mremap+0x3c6/0x5b0
 do_syscall_64+0x49/0x100
entry_SYSCALL_64_after_hwframe+0x44/0xa9

Make set_pmd_at() paravirt aware by just letting it use set_pmd().

Fixes: 2c91bd4a4e2e53 ("mm: speed up mremap by 20x on large regions")
Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: xen-devel@lists.xenproject.org
Cc: boris.ostrovsky@oracle.com
Cc: sstabellini@kernel.org
Cc: hpa@zytor.com
Cc: bp@alien8.de
Cc: torvalds@linux-foundation.org
Link: https://lkml.kernel.org/r/20190210074056.11842-1-jgross@suse.com
5 years agoMerge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa...
Linus Torvalds [Sat, 9 Feb 2019 21:43:12 +0000 (13:43 -0800)]
Merge branch 'i2c/for-current' of git://git./linux/kernel/git/wsa/linux

Pull i2c fixes from Wolfram Sang:
 "One PM related driver bugfix and a MAINTAINERS update"

* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  MAINTAINERS: Update the ocores i2c bus driver maintainer, etc
  i2c: omap: Use noirq system sleep pm ops to idle device for suspend

5 years agoMerge tag 'mips_fixes_5.0_3' of git://git.kernel.org/pub/scm/linux/kernel/git/mips...
Linus Torvalds [Sat, 9 Feb 2019 20:41:14 +0000 (12:41 -0800)]
Merge tag 'mips_fixes_5.0_3' of git://git./linux/kernel/git/mips/linux

Pull MIPS fixes from Paul Burton:
 "A batch of MIPS fixes for 5.0, nothing too scary.

   - A workaround for a Loongson 3 CPU bug is the biggest change, but
     still fairly straightforward. It adds extra memory barriers (sync
     instructions) around atomics to avoid a CPU bug that can break
     atomicity.

   - Loongson64 also sees a fix for powering off some systems which
     would incorrectly reboot rather than waiting for the power down
     sequence to complete.

   - We have DT fixes for the Ingenic JZ4740 SoC & the JZ4780-based Ci20
     board, and a DT warning fix for the Nexsys4/MIPSfpga board.

   - The Cavium Octeon platform sees a further fix to the behaviour of
     the pcie_disable command line argument that was introduced in v3.3.

   - The VDSO, introduced in v4.4, sees build fixes for configurations
     of GCC that were built using the --with-fp-32= flag to specify a
     default 32-bit floating point ABI.

   - get_frame_info() sees a fix for configurations with
     CONFIG_KALLSYMS=n, for which it previously always returned an
     error.

   - If the MIPS Coherence Manager (CM) reports an error then we'll now
     clear that error correctly so that the GCR_ERROR_CAUSE register
     will be updated with information about any future errors"

* tag 'mips_fixes_5.0_3' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
  mips: cm: reprime error cause
  mips: loongson64: remove unreachable(), fix loongson_poweroff().
  MIPS: Remove function size check in get_frame_info()
  MIPS: Use lower case for addresses in nexys4ddr.dts
  MIPS: Loongson: Introduce and use loongson_llsc_mb()
  MIPS: VDSO: Include $(ccflags-vdso) in o32,n32 .lds builds
  MIPS: VDSO: Use same -m%-float cflag as the kernel proper
  MIPS: OCTEON: don't set octeon_dma_bar_type if PCI is disabled
  DTS: CI20: Fix bugs in ci20's device tree.
  MIPS: DTS: jz4740: Correct interrupt number of DMA core

5 years agoMerge tag 'for-linus-20190209' of git://git.kernel.dk/linux-block
Linus Torvalds [Sat, 9 Feb 2019 18:26:09 +0000 (10:26 -0800)]
Merge tag 'for-linus-20190209' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:

 - NVMe pull request from Christoph, fixing namespace locking when
   dealing with the effects log, and a rapid add/remove issue (Keith)

 - blktrace tweak, ensuring requests with -1 sectors are shown (Jan)

 - link power management quirk for a Smasung SSD (Hans)

 - m68k nfblock dynamic major number fix (Chengguang)

 - series fixing blk-iolatency inflight counter issue (Liu)

 - ensure that we clear ->private when setting up the aio kiocb (Mike)

 - __find_get_block_slow() rate limit print (Tetsuo)

* tag 'for-linus-20190209' of git://git.kernel.dk/linux-block:
  blk-mq: remove duplicated definition of blk_mq_freeze_queue
  Blk-iolatency: warn on negative inflight IO counter
  blk-iolatency: fix IO hang due to negative inflight counter
  blktrace: Show requests without sector
  fs: ratelimit __find_get_block_slow() failure message.
  m68k: set proper major_num when specifying module param major_num
  libata: Add NOLPM quirk for SAMSUNG MZ7TE512HMHP-000L1 SSD
  nvme-pci: fix rapid add remove sequence
  nvme: lock NS list changes while handling command effects
  aio: initialize kiocb private in case any filesystems expect it.

5 years agoMerge tag 'mtd/fixes-for-5.0-rc6' of git://git.infradead.org/linux-mtd
Linus Torvalds [Sat, 9 Feb 2019 18:17:01 +0000 (10:17 -0800)]
Merge tag 'mtd/fixes-for-5.0-rc6' of git://git.infradead.org/linux-mtd

Pull mtd fixes from Boris Brezillon:

 - Fix a problem with the imx28 ECC engine

 - Remove a debug trace introduced in 2b6f0090a333 ("mtd: Check
   add_mtd_device() ret code")

 - Make sure partitions of size 0 can be registered

 - Fix kernel-doc warning in the rawnand core

 - Fix the error path of spinand_init() (missing manufacturer cleanup in
   a few places)

 - Address a problem with the SPI NAND PROGRAM LOAD operation which does
   not work as expected on some parts.

* tag 'mtd/fixes-for-5.0-rc6' of git://git.infradead.org/linux-mtd:
  mtd: rawnand: gpmi: fix MX28 bus master lockup problem
  mtd: Make sure mtd->erasesize is valid even if the partition is of size 0
  mtd: Remove a debug trace in mtdpart.c
  mtd: rawnand: fix kernel-doc warnings
  mtd: spinand: Fix the error/cleanup path in spinand_init()
  mtd: spinand: Handle the case where PROGRAM LOAD does not reset the cache

5 years agoMerge tag 'for-linus-5.0-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sat, 9 Feb 2019 17:44:08 +0000 (09:44 -0800)]
Merge tag 'for-linus-5.0-rc6-tag' of git://git./linux/kernel/git/xen/tip

Pull xen fixes from Juergen Gross:
 "Two very minor fixes: one remove of a #include for an unused header
  and a fix of the xen ML address in MAINTAINERS"

* tag 'for-linus-5.0-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  MAINTAINERS: unify reference to xen-devel list
  arch/arm/xen: Remove duplicate header

5 years agoMerge tag 'perf-urgent-for-mingo-5.0-20190205' of git://git.kernel.org/pub/scm/linux...
Ingo Molnar [Sat, 9 Feb 2019 12:13:45 +0000 (13:13 +0100)]
Merge tag 'perf-urgent-for-mingo-5.0-20190205' of git://git./linux/kernel/git/acme/linux into perf/urgent

Pull perf/urgent fixes from Arnaldo Carvalho de Melo:

perf trace:

  Arnaldo Carvalho de Melo:

    Fix handling of probe:vfs_getname when the probed routine is
    inlined in multiple places, fixing the collection of the 'filename'
    parameter in open syscalls.

perf test:

  Gustavo A. R. Silva:

    Fix bitwise operator usage in evsel-tp-sched test, which made tat
    test always detect fields as signed.

  Jiri Olsa:

    Filter out hidden symbols from labels, added in systems where the
    annobin plugin is used, such as RHEL8, which, if left in place make
    the DWARF unwind 'perf test' to fail on PPC.

  Tony Jones:

    Fix 'perf_event_attr' tests when building with python3.

perf mem/c2c:

  Ravi Bangoria:

    Fix perf_mem_events on PowerPC.

tools headers UAPI:

  Arnaldo Carvalho de Melo:

    Sync linux/in.h copy from the kernel sources, silencing a perf build warning.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
5 years agonet/smc: fix byte_order for rx_curs_confirmed
Ursula Braun [Thu, 7 Feb 2019 13:52:54 +0000 (14:52 +0100)]
net/smc: fix byte_order for rx_curs_confirmed

The recent change in the rx_curs_confirmed assignment disregards
byte order, which causes problems on little endian architectures.
This patch fixes it.

Fixes: b8649efad879 ("net/smc: fix sender_free computation") (net-tree)
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agovsock: cope with memory allocation failure at socket creation time
Paolo Abeni [Thu, 7 Feb 2019 13:13:18 +0000 (14:13 +0100)]
vsock: cope with memory allocation failure at socket creation time

In the unlikely event that the kmalloc call in vmci_transport_socket_init()
fails, we end-up calling vmci_transport_destruct() with a NULL vmci_trans()
and oopsing.

This change addresses the above explicitly checking for zero vmci_trans()
at destruction time.

Reported-by: Xiumei Mu <xmu@redhat.com>
Fixes: d021c344051a ("VSOCK: Introduce VM Sockets")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Jorgen Hansen <jhansen@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: ipv4: use a dedicated counter for icmp_v4 redirect packets
Lorenzo Bianconi [Wed, 6 Feb 2019 18:18:04 +0000 (19:18 +0100)]
net: ipv4: use a dedicated counter for icmp_v4 redirect packets

According to the algorithm described in the comment block at the
beginning of ip_rt_send_redirect, the host should try to send
'ip_rt_redirect_number' ICMP redirect packets with an exponential
backoff and then stop sending them at all assuming that the destination
ignores redirects.
If the device has previously sent some ICMP error packets that are
rate-limited (e.g TTL expired) and continues to receive traffic,
the redirect packets will never be transmitted. This happens since
peer->rate_tokens will be typically greater than 'ip_rt_redirect_number'
and so it will never be reset even if the redirect silence timeout
(ip_rt_redirect_silence) has elapsed without receiving any packet
requiring redirects.

Fix it by using a dedicated counter for the number of ICMP redirect
packets that has been sent by the host

I have not been able to identify a given commit that introduced the
issue since ip_rt_send_redirect implements the same rate-limiting
algorithm from commit 1da177e4c3f4 ("Linux-2.6.12-rc2")

Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>