Linus Torvalds [Sat, 18 Oct 2014 19:52:08 +0000 (12:52 -0700)]
Merge tag 'nfs-for-3.18-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client updates from Trond Myklebust:
"Highlights include:
Stable fixes:
- fix an uninitialised pointer Oops in the writeback error path
- fix a bogus warning (and early exit from the loop) in nfs_generic_pgio()
Features:
- Add NFSv4.2 SEEK feature and client support for lseek(SEEK_HOLE/SEEK_DATA)
Other fixes:
- pnfs: replace broken pnfs_put_lseg_async
- Remove dead prototype for nfs4_insert_deviceid_node"
* tag 'nfs-for-3.18-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
NFS: Fix a bogus warning in nfs_generic_pgio
NFS: Fix an uninitialised pointer Oops in the writeback error path
NFSv4.1/pnfs: replace broken pnfs_put_lseg_async
NFSv4: Remove dead prototype for nfs4_insert_deviceid_node()
NFS: Implement SEEK
Linus Torvalds [Sat, 18 Oct 2014 19:25:30 +0000 (12:25 -0700)]
Merge tag 'dm-3.18' of git://git./linux/kernel/git/device-mapper/linux-dm
Pull device-mapper updates from Mike Snitzer:
"I rebased the DM tree ontop of linux-block.git's 'for-3.18/core' at
the beginning of October because DM core now depends on the newly
introduced bioset_create_nobvec() interface.
Summary:
- fix DM's long-standing excessive use of memory by leveraging the
new bioset_create_nobvec() interface when creating the DM's bioset
- fix a few bugs in dm-bufio and dm-log-userspace
- add DM core support for a DM multipath use-case that requires
loading DM tables that contain devices that have failed (by
allowing active and inactive DM tables to share dm_devs)
- add discard support to the DM raid target; like MD raid456 the user
must opt-in to raid456 discard support be specifying the
devices_handle_discard_safely=Y module param"
* tag 'dm-3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm log userspace: fix memory leak in dm_ulog_tfr_init failure path
dm bufio: when done scanning return from __scan immediately
dm bufio: update last_accessed when relinking a buffer
dm raid: add discard support for RAID levels 4, 5 and 6
dm raid: add discard support for RAID levels 1 and 10
dm: allow active and inactive tables to share dm_devs
dm mpath: stop queueing IO when no valid paths exist
dm: use bioset_create_nobvec()
dm: remove nr_iovecs parameter from alloc_tio()
Linus Torvalds [Sat, 18 Oct 2014 19:12:45 +0000 (12:12 -0700)]
Merge branch 'for-3.18/drivers' of git://git.kernel.dk/linux-block
Pull block layer driver update from Jens Axboe:
"This is the block driver pull request for 3.18. Not a lot in there
this round, and nothing earth shattering.
- A round of drbd fixes from the linbit team, and an improvement in
asender performance.
- Removal of deprecated (and unused) IRQF_DISABLED flag in rsxx and
hd from Michael Opdenacker.
- Disable entropy collection from flash devices by default, from Mike
Snitzer.
- A small collection of xen blkfront/back fixes from Roger Pau Monné
and Vitaly Kuznetsov"
* 'for-3.18/drivers' of git://git.kernel.dk/linux-block:
block: disable entropy contributions for nonrot devices
xen, blkfront: factor out flush-related checks from do_blkif_request()
xen-blkback: fix leak on grant map error path
xen/blkback: unmap all persistent grants when frontend gets disconnected
rsxx: Remove deprecated IRQF_DISABLED
block: hd: remove deprecated IRQF_DISABLED
drbd: use RB_DECLARE_CALLBACKS() to define augment callbacks
drbd: compute the end before rb_insert_augmented()
drbd: Add missing newline in resync progress display in /proc/drbd
drbd: reduce lock contention in drbd_worker
drbd: Improve asender performance
drbd: Get rid of the WORK_PENDING macro
drbd: Get rid of the __no_warn and __cond_lock macros
drbd: Avoid inconsistent locking warning
drbd: Remove superfluous newline from "resync_extents" debugfs entry.
drbd: Use consistent names for all the bi_end_io callbacks
drbd: Use better variable names
Linus Torvalds [Sat, 18 Oct 2014 18:53:51 +0000 (11:53 -0700)]
Merge branch 'for-3.18/core' of git://git.kernel.dk/linux-block
Pull core block layer changes from Jens Axboe:
"This is the core block IO pull request for 3.18. Apart from the new
and improved flush machinery for blk-mq, this is all mostly bug fixes
and cleanups.
- blk-mq timeout updates and fixes from Christoph.
- Removal of REQ_END, also from Christoph. We pass it through the
->queue_rq() hook for blk-mq instead, freeing up one of the request
bits. The space was overly tight on 32-bit, so Martin also killed
REQ_KERNEL since it's no longer used.
- blk integrity updates and fixes from Martin and Gu Zheng.
- Update to the flush machinery for blk-mq from Ming Lei. Now we
have a per hardware context flush request, which both cleans up the
code should scale better for flush intensive workloads on blk-mq.
- Improve the error printing, from Rob Elliott.
- Backing device improvements and cleanups from Tejun.
- Fixup of a misplaced rq_complete() tracepoint from Hannes.
- Make blk_get_request() return error pointers, fixing up issues
where we NULL deref when a device goes bad or missing. From Joe
Lawrence.
- Prep work for drastically reducing the memory consumption of dm
devices from Junichi Nomura. This allows creating clone bio sets
without preallocating a lot of memory.
- Fix a blk-mq hang on certain combinations of queue depths and
hardware queues from me.
- Limit memory consumption for blk-mq devices for crash dump
scenarios and drivers that use crazy high depths (certain SCSI
shared tag setups). We now just use a single queue and limited
depth for that"
* 'for-3.18/core' of git://git.kernel.dk/linux-block: (58 commits)
block: Remove REQ_KERNEL
blk-mq: allocate cpumask on the home node
bio-integrity: remove the needless fail handle of bip_slab creating
block: include func name in __get_request prints
block: make blk_update_request print prefix match ratelimited prefix
blk-merge: don't compute bi_phys_segments from bi_vcnt for cloned bio
block: fix alignment_offset math that assumes io_min is a power-of-2
blk-mq: Make bt_clear_tag() easier to read
blk-mq: fix potential hang if rolling wakeup depth is too high
block: add bioset_create_nobvec()
block: use bio_clone_fast() in blk_rq_prep_clone()
block: misplaced rq_complete tracepoint
sd: Honor block layer integrity handling flags
block: Replace strnicmp with strncasecmp
block: Add T10 Protection Information functions
block: Don't merge requests if integrity flags differ
block: Integrity checksum flag
block: Relocate bio integrity flags
block: Add a disk flag to block integrity profile
block: Add prefix to block integrity profile flags
...
Linus Torvalds [Sat, 18 Oct 2014 18:48:03 +0000 (11:48 -0700)]
Merge tag 'for-linus-
20141015' of git://git.infradead.org/linux-mtd
Pull MTD update from Brian Norris:
"Sorry for delaying this a bit later than usual. There's one mild
regression from 3.16 that was noticed during the 3.17 cycle, and I
meant to send a fix for it along with this pull request. I'll
probably try to queue it up for a later pull request once I've had a
better look at it, hopefully by -rc2 at the latest.
Summary for this pull:
NAND
- Cleanup for Denali driver
- Atmel: add support for new page sizes
- Atmel: fix up 'raw' mode support
- Atmel: miscellaneous cleanups
- New timing mode helpers for non-ONFI NAND
- OMAP: allow driver to be (properly) built as a module
- bcm47xx: RESET support and other cleanups
SPI NOR
- Miscellaneous cleanups, to prepare framework for wider use (some
further work still pending)
- Compile-time configuration to select 4K vs. 64K support for flash
that support both (necessary for using UBIFS on some SPI NOR)
A few scattered code quality fixes, detected by Coverity
See the changesets for more"
* tag 'for-linus-
20141015' of git://git.infradead.org/linux-mtd: (59 commits)
mtd: nand: omap: Correct CONFIG_MTD_NAND_OMAP_BCH help message
mtd: nand: Force omap_elm to be built as a module if omap2_nand is a module
mtd: move support for struct flash_platform_data into m25p80
mtd: spi-nor: add Kconfig option to disable 4K sectors
mtd: nand: Move ELM driver and rename as omap_elm
nand: omap2: Replace pr_err with dev_err
nand: omap2: Remove horrible ifdefs to fix module probe
mtd: nand: add Hynix's H27UCG8T2ATR-BC to nand_ids table
mtd: nand: support ONFI timing mode retrieval for non-ONFI NANDs
mtd: physmap_of: Add non-obsolete map_rom probe
mtd: physmap_of: Fix ROM support via OF
MAINTAINERS: add l2-mtd.git, 'next' tree for MTD
mtd: denali: fix indents and other trivial things
mtd: denali: remove unnecessary parentheses
mtd: denali: remove another set-but-unused variable
mtd: denali: fix include guard and license block of denali.h
mtd: nand: don't break long print messages
mtd: bcm47xxnflash: replace some magic numbers
mtd: bcm47xxnflash: NAND_CMD_RESET support
mtd: bcm47xxnflash: add cmd_ctrl handler
...
Linus Torvalds [Sat, 18 Oct 2014 18:39:52 +0000 (11:39 -0700)]
Merge tag 'md/3.18' of git://neil.brown.name/md
Pull md updates from Neil Brown:
- a few minor bug fixes
- quite a lot of code tidy-up and simplification
- remove PRINT_RAID_DEBUG ioctl. I'm fairly sure it is unused, and it
isn't particularly useful.
* tag 'md/3.18' of git://neil.brown.name/md: (21 commits)
lib/raid6: Add log level to printks
md: move EXPORT_SYMBOL to after function in md.c
md: discard PRINT_RAID_DEBUG ioctl
md: remove MD_BUG()
md: clean up 'exit' labels in md_ioctl().
md: remove unnecessary test for MD_MAJOR in md_ioctl()
md: don't allow "-sync" to be set for device in an active array.
md: remove unwanted white space from md.c
md: don't start resync thread directly from md thread.
md: Just use RCU when checking for overlap between arrays.
md: avoid potential long delay under pers_lock
md: simplify export_array()
md: discard find_rdev_nr in favour of find_rdev_nr_rcu
md: use wait_event() to simplify md_super_wait()
md: be more relaxed about stopping an array which isn't started.
md/raid1: process_checks doesn't use its return value.
md/raid5: fix init_stripe() inconsistencies
md/raid10: another memory leak due to reshape.
md: use set_bit/clear_bit instead of shift/mask for bi_flags changes.
md/raid1: minor typos and reformatting.
...
Linus Torvalds [Sat, 18 Oct 2014 17:26:10 +0000 (10:26 -0700)]
Merge branch 'for-linus2' of git://git./linux/kernel/git/jmorris/linux-security
Pull selinux fix from James Morris:
"Fix for a list corruption bug in the SELinux code"
* 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
selinux: fix inode security list corruption
Linus Torvalds [Sat, 18 Oct 2014 17:25:09 +0000 (10:25 -0700)]
Merge tag 'virtio-next-for-linus' of git://git./linux/kernel/git/rusty/linux
Pull virtio updates from Rusty Russell:
"One cc: stable commit, the rest are a series of minor cleanups which
have been sitting in MST's tree during my vacation. I changed a
function name and made one trivial change, then they spent two days in
linux-next"
* tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (25 commits)
virtio-rng: refactor probe error handling
virtio_scsi: drop scan callback
virtio_balloon: enable VQs early on restore
virtio_scsi: fix race on device removal
virito_scsi: use freezable WQ for events
virtio_net: enable VQs early on restore
virtio_console: enable VQs early on restore
virtio_scsi: enable VQs early on restore
virtio_blk: enable VQs early on restore
virtio_scsi: move kick event out from virtscsi_init
virtio_net: fix use after free on allocation failure
9p/trans_virtio: enable VQs early
virtio_console: enable VQs early
virtio_blk: enable VQs early
virtio_net: enable VQs early
virtio: add API to enable VQs early
virtio_net: minor cleanup
virtio-net: drop config_mutex
virtio_net: drop config_enable
virtio-blk: drop config_mutex
...
Linus Torvalds [Sat, 18 Oct 2014 17:24:26 +0000 (10:24 -0700)]
Merge tag 'modules-next-for-linus' of git://git./linux/kernel/git/rusty/linux
Pull module fix from Rusty Russell:
"A single panic fix for a rare race, stable CC'd"
* tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
modules, lock around setting of MODULE_STATE_UNFORMED
Jonathan Corbet [Fri, 17 Oct 2014 12:59:26 +0000 (08:59 -0400)]
MAINTAINERS: Become the docs maintainer
It seems it's my turn to be the documentation maintainer for a bit. My
plan is to work to ensure that docs patches don't fall through the cracks;
I assume most changes will continue to flow through subsystem-specific
trees.
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andy Lutomirski [Wed, 8 Oct 2014 16:02:13 +0000 (09:02 -0700)]
x86,kvm,vmx: Preserve CR4 across VM entry
CR4 isn't constant; at least the TSD and PCE bits can vary.
TBH, treating CR0 and CR3 as constant scares me a bit, too, but it looks
like it's correct.
This adds a branch and a read from cr4 to each vm entry. Because it is
extremely likely that consecutive entries into the same vcpu will have
the same host cr4 value, this fixes up the vmcs instead of restoring cr4
after the fact. A subsequent patch will add a kernel-wide cr4 shadow,
reducing the overhead in the common case to just two memory reads and a
branch.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: stable@vger.kernel.org
Cc: Petr Matousek <pmatouse@redhat.com>
Cc: Gleb Natapov <gleb@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sat, 18 Oct 2014 16:31:37 +0000 (09:31 -0700)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
1) Include fixes for netrom and dsa (Fabian Frederick and Florian
Fainelli)
2) Fix FIXED_PHY support in stmmac, from Giuseppe CAVALLARO.
3) Several SKB use after free fixes (vxlan, openvswitch, vxlan,
ip_tunnel, fou), from Li ROngQing.
4) fec driver PTP support fixes from Luwei Zhou and Nimrod Andy.
5) Use after free in virtio_net, from Michael S Tsirkin.
6) Fix flow mask handling for megaflows in openvswitch, from Pravin B
Shelar.
7) ISDN gigaset and capi bug fixes from Tilman Schmidt.
8) Fix route leak in ip_send_unicast_reply(), from Vasily Averin.
9) Fix two eBPF JIT bugs on x86, from Alexei Starovoitov.
10) TCP_SKB_CB() reorganization caused a few regressions, fixed by Cong
Wang and Eric Dumazet.
11) Don't overwrite end of SKB when parsing malformed sctp ASCONF
chunks, from Daniel Borkmann.
12) Don't call sock_kfree_s() with NULL pointers, this function also has
the side effect of adjusting the socket memory usage. From Cong Wang.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (90 commits)
bna: fix skb->truesize underestimation
net: dsa: add includes for ethtool and phy_fixed definitions
openvswitch: Set flow-key members.
netrom: use linux/uaccess.h
dsa: Fix conversion from host device to mii bus
tipc: fix bug in bundled buffer reception
ipv6: introduce tcp_v6_iif()
sfc: add support for skb->xmit_more
r8152: return -EBUSY for runtime suspend
ipv4: fix a potential use after free in fou.c
ipv4: fix a potential use after free in ip_tunnel_core.c
hyperv: Add handling of IP header with option field in netvsc_set_hash()
openvswitch: Create right mask with disabled megaflows
vxlan: fix a free after use
openvswitch: fix a use after free
ipv4: dst_entry leak in ip_send_unicast_reply()
ipv4: clean up cookie_v4_check()
ipv4: share tcp_v4_save_options() with cookie_v4_check()
ipv4: call __ip_options_echo() in cookie_v4_check()
atm: simplify lanai.c by using module_pci_driver
...
Linus Torvalds [Sat, 18 Oct 2014 16:30:41 +0000 (09:30 -0700)]
Merge git://git./linux/kernel/git/davem/sparc
Pull Sparc bugfix from David Miller:
"Sparc64 AES ctr mode bug fix"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
sparc64: Fix FPU register corruption with AES crypto offload.
Linus Torvalds [Sat, 18 Oct 2014 16:29:59 +0000 (09:29 -0700)]
Merge git://git./linux/kernel/git/davem/ide
Pull IDE cleanup from David Miller:
"One IDE driver cleanup"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/ide:
Drivers: ide: Remove typedef atiixp_ide_timing
Catalin Marinas [Fri, 17 Oct 2014 16:38:49 +0000 (17:38 +0100)]
futex: Ensure get_futex_key_refs() always implies a barrier
Commit
b0c29f79ecea (futexes: Avoid taking the hb->lock if there's
nothing to wake up) changes the futex code to avoid taking a lock when
there are no waiters. This code has been subsequently fixed in commit
11d4616bd07f (futex: revert back to the explicit waiter counting code).
Both the original commit and the fix-up rely on get_futex_key_refs() to
always imply a barrier.
However, for private futexes, none of the cases in the switch statement
of get_futex_key_refs() would be hit and the function completes without
a memory barrier as required before checking the "waiters" in
futex_wake() -> hb_waiters_pending(). The consequence is a race with a
thread waiting on a futex on another CPU, allowing the waker thread to
read "waiters == 0" while the waiter thread to have read "futex_val ==
locked" (in kernel).
Without this fix, the problem (user space deadlocks) can be seen with
Android bionic's mutex implementation on an arm64 multi-cluster system.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: Matteo Franchin <Matteo.Franchin@arm.com>
Fixes: b0c29f79ecea (futexes: Avoid taking the hb->lock if there's nothing to wake up)
Acked-by: Davidlohr Bueso <dave@stgolabs.net>
Tested-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: <stable@vger.kernel.org>
Cc: Darren Hart <dvhart@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Eric Dumazet [Fri, 17 Oct 2014 19:45:55 +0000 (12:45 -0700)]
bna: fix skb->truesize underestimation
skb->truesize is not meant to be tracking amount of used bytes
in an skb, but amount of reserved/consumed bytes in memory.
For instance, if we use a single byte in last page fragment,
we have to account the full size of the fragment.
skb->truesize can be very different from skb->len, that has
a very specific safety purpose.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Rasesh Mody <rasesh.mody@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Fri, 17 Oct 2014 23:02:13 +0000 (16:02 -0700)]
net: dsa: add includes for ethtool and phy_fixed definitions
net/dsa/slave.c uses functions and structures declared in phy_fixed.h
but does not explicitely include it, while dsa.h needs structure
declarations for 'struct ethtool_wolinfo' and 'struct ethtool_eee', fix
those by including the correct header files.
Fixes: ec9436baedb6 ("net: dsa: allow drivers to do link adjustment")
Fixes: ce31b31c68e7 ("net: dsa: allow updating fixed PHY link information")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pravin B Shelar [Fri, 17 Oct 2014 20:56:31 +0000 (13:56 -0700)]
openvswitch: Set flow-key members.
This patch adds missing memset which are required to initialize
flow key member. For example for IP flow we need to initialize
ip.frag for all cases.
Found by inspection.
This bug is introduced by commit
0714812134d7dcadeb7ecfbfeb18788aa7e1eaac
("openvswitch: Eliminate memset() from flow_extract").
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fabian Frederick [Fri, 17 Oct 2014 20:00:22 +0000 (22:00 +0200)]
netrom: use linux/uaccess.h
replace asm/uaccess.h by linux/uaccess.h
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
Guenter Roeck [Fri, 17 Oct 2014 19:30:58 +0000 (12:30 -0700)]
dsa: Fix conversion from host device to mii bus
Commit
b4d2394d01bc ("dsa: Replace mii_bus with a generic host device")
replaces mii_bus with a generic host_dev, and introduces
dsa_host_dev_to_mii_bus() to support conversion from host_dev to mii_bus.
However, in some cases it uses to_mii_bus to perform that conversion.
Since host_dev is not the phy bus device but typically a platform device,
this fails and results in a crash with the affected drivers.
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<
ffffffff81781d35>] __mutex_lock_slowpath+0x75/0x100
PGD
406783067 PUD
406784067 PMD 0
Oops: 0002 [#1] SMP
...
Call Trace:
[<
ffffffff810a538b>] ? pick_next_task_fair+0x61b/0x880
[<
ffffffff81781de3>] mutex_lock+0x23/0x37
[<
ffffffff81533244>] mdiobus_read+0x34/0x60
[<
ffffffff8153b95a>] __mv88e6xxx_reg_read+0x8a/0xa0
[<
ffffffff8153b9bc>] mv88e6xxx_reg_read+0x4c/0xa0
Fixes: b4d2394d01bc ("dsa: Replace mii_bus with a generic host device")
Cc: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Acked-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Paul Maloy [Fri, 17 Oct 2014 19:25:28 +0000 (15:25 -0400)]
tipc: fix bug in bundled buffer reception
In commit
ec8a2e5621db2da24badb3969eda7fd359e1869f ("tipc: same receive
code path for connection protocol and data messages") we omitted the
the possiblilty that an arriving message extracted from a bundle buffer
may be a multicast message. Such messages need to be to be delivered to
the socket via a separate function, tipc_sk_mcast_rcv(). As a result,
small multicast messages arriving as members of a bundle buffer will be
silently dropped.
This commit corrects the error by considering this case in the function
tipc_link_bundle_rcv().
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 17 Oct 2014 16:17:20 +0000 (09:17 -0700)]
ipv6: introduce tcp_v6_iif()
Commit
971f10eca186 ("tcp: better TCP_SKB_CB layout to reduce cache line
misses") added a regression for SO_BINDTODEVICE on IPv6.
This is because we still use inet6_iif() which expects that IP6 control
block is still at the beginning of skb->cb[]
This patch adds tcp_v6_iif() helper and uses it where necessary.
Because __inet6_lookup_skb() is used by TCP and DCCP, we add an iif
parameter to it.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: 971f10eca186 ("tcp: better TCP_SKB_CB layout to reduce cache line misses")
Acked-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Edward Cree [Fri, 17 Oct 2014 14:32:25 +0000 (15:32 +0100)]
sfc: add support for skb->xmit_more
Don't ring the doorbell, and don't do PIO. This will also prevent
TX Push, because there will be more than one buffer waiting when
the doorbell is rung.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
hayeswang [Fri, 17 Oct 2014 08:55:08 +0000 (16:55 +0800)]
r8152: return -EBUSY for runtime suspend
Remove calling cancel_delayed_work_sync() for runtime suspend,
because it would cause dead lock. Instead, return -EBUSY to
avoid the device enters suspending if the net is running and
the delayed work is pending or running. The delayed work would
try to wake up the device later, so the suspending is not
necessary.
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Li RongQing [Fri, 17 Oct 2014 08:53:47 +0000 (16:53 +0800)]
ipv4: fix a potential use after free in fou.c
pskb_may_pull() maybe change skb->data and make uh pointer oboslete,
so reload uh and guehdr
Fixes: 37dd0247 ("gue: Receive side for Generic UDP Encapsulation")
Cc: Tom Herbert <therbert@google.com>
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Li RongQing [Fri, 17 Oct 2014 08:53:23 +0000 (16:53 +0800)]
ipv4: fix a potential use after free in ip_tunnel_core.c
pskb_may_pull() maybe change skb->data and make eth pointer oboslete,
so set eth after pskb_may_pull()
Fixes:
3d7b46cd("ip_tunnel: push generic protocol handling to ip_tunnel module")
Cc: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Haiyang Zhang [Thu, 16 Oct 2014 21:47:58 +0000 (14:47 -0700)]
hyperv: Add handling of IP header with option field in netvsc_set_hash()
In case that the IP header has optional field at the end, this patch will
get the port numbers after that field, and compute the hash. The general
parser skb_flow_dissect() is used here.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pravin B Shelar [Fri, 17 Oct 2014 04:55:45 +0000 (21:55 -0700)]
openvswitch: Create right mask with disabled megaflows
If megaflows are disabled, the userspace does not send the netlink attribute
OVS_FLOW_ATTR_MASK, and the kernel must create an exact match mask.
sw_flow_mask_set() sets every bytes (in 'range') of the mask to 0xff, even the
bytes that represent padding for struct sw_flow, or the bytes that represent
fields that may not be set during ovs_flow_extract().
This is a problem, because when we extract a flow from a packet,
we do not memset() anymore the struct sw_flow to 0.
This commit gets rid of sw_flow_mask_set() and introduces mask_set_nlattr(),
which operates on the netlink attributes rather than on the mask key. Using
this approach we are sure that only the bytes that the user provided in the
flow are matched.
Also, if the parse_flow_mask_nlattrs() for the mask ENCAP attribute fails, we
now return with an error.
This bug is introduced by commit
0714812134d7dcadeb7ecfbfeb18788aa7e1eaac
("openvswitch: Eliminate memset() from flow_extract").
Reported-by: Alex Wang <alexw@nicira.com>
Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com>
Signed-off-by: Andy Zhou <azhou@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Li RongQing [Fri, 17 Oct 2014 06:06:16 +0000 (14:06 +0800)]
vxlan: fix a free after use
pskb_may_pull maybe change skb->data and make eth pointer oboslete,
so eth needs to reload
Fixes: 91269e390d062 ("vxlan: using pskb_may_pull as early as possible")
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Li RongQing [Fri, 17 Oct 2014 06:03:08 +0000 (14:03 +0800)]
openvswitch: fix a use after free
pskb_may_pull() called by arphdr_ok can change skb->data, so put the arp
setting after arphdr_ok to avoid the use the freed memory
Fixes: 0714812134d7d ("openvswitch: Eliminate memset() from flow_extract.")
Cc: Jesse Gross <jesse@nicira.com>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vasily Averin [Wed, 15 Oct 2014 12:24:02 +0000 (16:24 +0400)]
ipv4: dst_entry leak in ip_send_unicast_reply()
ip_setup_cork() called inside ip_append_data() steals dst entry from rt to cork
and in case errors in __ip_append_data() nobody frees stolen dst entry
Fixes: 2e77d89b2fa8 ("net: avoid a pair of dst_hold()/dst_release() in ip_append_data()")
Signed-off-by: Vasily Averin <vvs@parallels.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cong Wang [Wed, 15 Oct 2014 21:33:22 +0000 (14:33 -0700)]
ipv4: clean up cookie_v4_check()
We can retrieve opt from skb, no need to pass it as a parameter.
And opt should always be non-NULL, no need to check.
Cc: Krzysztof Kolasa <kkolasa@winsoft.pl>
Cc: Eric Dumazet <edumazet@google.com>
Tested-by: Krzysztof Kolasa <kkolasa@winsoft.pl>
Signed-off-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cong Wang [Wed, 15 Oct 2014 21:33:21 +0000 (14:33 -0700)]
ipv4: share tcp_v4_save_options() with cookie_v4_check()
cookie_v4_check() allocates ip_options_rcu in the same way
with tcp_v4_save_options(), we can just make it a helper function.
Cc: Krzysztof Kolasa <kkolasa@winsoft.pl>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cong Wang [Wed, 15 Oct 2014 21:33:20 +0000 (14:33 -0700)]
ipv4: call __ip_options_echo() in cookie_v4_check()
commit
971f10eca186cab238c49da ("tcp: better TCP_SKB_CB layout to reduce cache line misses")
missed that cookie_v4_check() still calls ip_options_echo() which uses
IPCB(). It should use TCPCB() at TCP layer, so call __ip_options_echo()
instead.
Fixes: commit 971f10eca186cab238c49da ("tcp: better TCP_SKB_CB layout to reduce cache line misses")
Cc: Krzysztof Kolasa <kkolasa@winsoft.pl>
Cc: Eric Dumazet <edumazet@google.com>
Reported-by: Krzysztof Kolasa <kkolasa@winsoft.pl>
Tested-by: Krzysztof Kolasa <kkolasa@winsoft.pl>
Signed-off-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Opdenacker [Wed, 15 Oct 2014 07:45:50 +0000 (09:45 +0200)]
atm: simplify lanai.c by using module_pci_driver
This simplifies the lanai.c driver by using
the module_pci_driver() macro, at the expense
of losing only debugging messages.
Signed-off-by: Michael Opdenacker <michael.opdenacker@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nicolas Dichtel [Thu, 16 Oct 2014 13:47:51 +0000 (15:47 +0200)]
netlink: fix description of portid
Avoid confusion between pid and portid.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 16 Oct 2014 18:42:51 +0000 (14:42 -0400)]
Merge branch 'master' of git://git./linux/kernel/git/jkirsher/net
Jeff Kirsher says:
====================
Intel Wired LAN Driver Updates 2014-10-16
This series contains updates to fm10k and ixgbe.
Matthew provides two fixes for fm10k, first sets the flag to fetch the
host state before kicking off the service task that reads the host
state when bringing the interface up. The second makes sure that we
release the mailbox lock after detecting an error and before we return
the error code.
Andy Zhou provides a compile fix for fm10k, when the driver is compiled
into the kernel and the VXLAN driver is compiled as a module.
Emil provides a fix for ixgbe to prevent against a panic by trying
to dereference a NULL pointer in ixgbe_ndo_set_vf_spoofchk().
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
James Morris [Thu, 16 Oct 2014 10:04:18 +0000 (21:04 +1100)]
Merge branch 'stable-3.18' of git://git.infradead.org/users/pcmoore/selinux into for-linus2
Emil Tantilov [Thu, 16 Oct 2014 15:49:02 +0000 (15:49 +0000)]
ixgbe: check for vfs outside of sriov_num_vfs before dereference
The check for vfinfo is not sufficient because it does not protect
against specifying vf that is outside of sriov_num_vfs range.
All of the ndo functions have a check for it except for
ixgbevf_ndo_set_spoofcheck().
The following patch is all we need to protect against this panic:
ip link set p96p1 vf 0 spoofchk off
BUG: unable to handle kernel NULL pointer dereference at
0000000000000052
IP: [<
ffffffffa044a1c1>]
ixgbe_ndo_set_vf_spoofchk+0x51/0x150 [ixgbe]
Reported-by: Thierry Herbelot <thierry.herbelot@6wind.com>
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Acked-by: Thierry Herbelot <thierry.herbelot@6wind.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Andy Zhou [Sat, 4 Oct 2014 06:19:11 +0000 (06:19 +0000)]
fm10k: Add CONFIG_FM10K_VXLAN configuration option
Compiling with CONFIG_FM10K=y and VXLAN=m resulting in linking error:
drivers/built-in.o: In function `fm10k_open':
(.text+0x1f9d7a): undefined reference to `vxlan_get_rx_port'
make: *** [vmlinux] Error 1
The fix follows the same strategy as I40E.
Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Matthew Vick [Fri, 3 Oct 2014 00:43:35 +0000 (00:43 +0000)]
fm10k: Unlock mailbox on VLAN addition failures
After grabbing the mailbox lock and detecting an error, the lock must be
released before the error code can be returned.
Signed-off-by: Matthew Vick <matthew.vick@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Matthew Vick [Thu, 2 Oct 2014 05:10:18 +0000 (05:10 +0000)]
fm10k: Check the host state when bringing the interface up
Set the flag to fetch the host state before kicking off the service task
that reads the host state when bringing the interface back up.
Signed-off-by: Matthew Vick <matthew.vick@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Li RongQing [Thu, 16 Oct 2014 01:17:18 +0000 (09:17 +0800)]
vxlan: using pskb_may_pull as early as possible
pskb_may_pull should be used to check if skb->data has enough space,
skb->len can not ensure that.
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Li RongQing [Thu, 16 Oct 2014 00:49:41 +0000 (08:49 +0800)]
vxlan: fix a use after free in vxlan_encap_bypass
when netif_rx() is done, the netif_rx handled skb maybe be freed,
and should not be used.
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fabian Frederick [Wed, 15 Oct 2014 19:03:41 +0000 (21:03 +0200)]
openvswitch: use vport instead of p
All functions used struct vport *vport except
ovs_vport_find_upcall_portid.
This fixes 1 kerneldoc warning
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fabian Frederick [Wed, 15 Oct 2014 19:03:18 +0000 (21:03 +0200)]
openvswitch: kerneldoc warning fix
s/sock/gs
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Claudiu Manoil [Wed, 15 Oct 2014 16:11:46 +0000 (19:11 +0300)]
gianfar: Add FCS to rx buffer size (fix)
For each Rx frame the eTSEC writes its FCS (Frame Check Sequence)
to the Rx buffer.
The eTSEC h/w manual states in the "Receive Buffer Descriptor Field
Descriptions" table:
"Data length is the number of octets written by the eTSEC into this BD's
data buffer if L is cleared (the value is equal to MRBLR), or, if L is
set, the length of the frame including *CRC*, FCB (if RCTRL[PRSDEP > 00),
preamble (if MACCFG2[PreAmRxEn]=1), time stamp (if RCTRL[TS] = 1) and
any padding (RCTRL[PAL])."
Though the FCS bytes are removed by the driver before passing the skb
to the net stack, the Rx buffer size computation does not currently
take into account the FCS bytes (4 bytes).
Because the Rx buffer size is multiple of 512 bytes, leaving out the
FCS is not a problem for the default MTU of 1500, as the Rx buffer size
is 1536 in this case. However, for custom MTUs, where the difference
between the MTU size and the Rx buffer size is less, this can be a
problem as the computed Rx buffer size won't be enough to accomodate
the FCS for a received frame that is big enough (close to MTU size).
In such case the received frame is considered to be incomplete (L flag
not set in the RxBD status) and silently dropped.
Note that the driver does not currently support S/G on Rx, so it has to
compute its Rx buffer size based on the MTU of the device.
Reported-by: Kristian Otnes <kotnes@cisco.com>
Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael S. Tsirkin [Wed, 15 Oct 2014 13:23:28 +0000 (16:23 +0300)]
virtio_net: fix use after free
commit
0b725a2ca61bedc33a2a63d0451d528b268cf975
net: Remove ndo_xmit_flush netdev operation, use signalling instead.
added code that looks at skb->xmit_more after the skb has
been put in TX VQ. Since some paths process the ring and free the skb
immediately, this can cause use after free.
Fix by storing xmit_more in a local variable.
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nimrod Andy [Wed, 15 Oct 2014 09:30:12 +0000 (17:30 +0800)]
net: fec: ptp: fix convergence issue to support LinuxPTP stack
iMX6SX IEEE 1588 module has one hw issue in capturing the ATVR register.
The current SW flow is:
ENET0->ATCR |= ENET_ATCR_CAPTURE_MASK;
ts_counter_ns = ENET0->ATVR;
The ATVR value is not expected value that cause LinuxPTP stack cannot be convergent.
ENET Block Guide/ Chapter for the iMX6SX (PELE) address the issue:
After set ENET_ATCR[Capture], there need some time cycles before the counter
value is capture in the register clock domain. The wait-time-cycles is at least
6 clock cycles of the slower clock between the register clock and the 1588 clock.
So need something like:
ENET0->ATCR |= ENET_ATCR_CAPTURE_MASK;
wait();
ts_counter_ns = ENET0->ATVR;
For iMX6SX, the 1588 ts_clk is fixed to 25Mhz, register clock is 66Mhz, so the
wait-time-cycles must be greater than 240ns (40ns * 6). The patch add 1us delay
before cpu read ATVR register.
Changes V2:
Modify the commit/comments log to describe the issue clearly.
Signed-off-by: Fugang Duan <B38611@freescale.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Himangi Saraogi [Thu, 14 Aug 2014 16:44:30 +0000 (22:14 +0530)]
Drivers: ide: Remove typedef atiixp_ide_timing
The Linux kernel coding style guidelines suggest not using typedefs
for structure types. This patch gets rid of the typedef for
atiixp_ide_timing.
The following Coccinelle semantic patch detects the case:
@tn1@
type td;
@@
typedef struct { ... } td;
@script:python tf@
td << tn1.td;
tdres;
@@
coccinelle.tdres = td;
@@
type tn1.td;
identifier tf.tdres;
@@
-typedef
struct
+ tdres
{ ... }
-td
;
@@
type tn1.td;
identifier tf.tdres;
@@
-td
+ struct tdres
Signed-off-by: Himangi Saraogi <himangi774@gmail.com>
Acked-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Anish Bhatt [Wed, 15 Oct 2014 07:26:47 +0000 (00:26 -0700)]
cxgb4i : Fix -Wmaybe-uninitialized warning.
Identified by kbuild test robot. csk family is always set to be AF_INET or
AF_INET6, so skb will always be initialized to some value but there is no harm
in silencing the warning anyways.
Signed-off-by: Anish Bhatt <anish@chelsio.com>
Fixes :
f42bb57c61fd ('cxgb4i : Fix -Wunused-function warning')
Signed-off-by: David S. Miller <davem@davemloft.net>
Tom Herbert [Tue, 14 Oct 2014 22:19:06 +0000 (15:19 -0700)]
net: Add ndo_gso_check
Add ndo_gso_check which a device can define to indicate whether is
is capable of doing GSO on a packet. This funciton would be called from
the stack to determine whether software GSO is needed to be done. A
driver should populate this function if it advertises GSO types for
which there are combinations that it wouldn't be able to handle. For
instance a device that performs UDP tunneling might only implement
support for transparent Ethernet bridging type of inner packets
or might have limitations on lengths of inner headers.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Giuseppe CAVALLARO [Wed, 15 Oct 2014 05:30:41 +0000 (07:30 +0200)]
stmmac: fix sti compatibililies
this patch is to fix the stmmac data compatibilities for
all the SoCs inside the platform file.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stephen Smalley [Mon, 6 Oct 2014 20:32:52 +0000 (16:32 -0400)]
selinux: fix inode security list corruption
sb_finish_set_opts() can race with inode_free_security()
when initializing inode security structures for inodes
created prior to initial policy load or by the filesystem
during ->mount(). This appears to have always been
a possible race, but commit
3dc91d4 ("SELinux: Fix possible
NULL pointer dereference in selinux_inode_permission()")
made it more evident by immediately reusing the unioned
list/rcu element of the inode security structure for call_rcu()
upon an inode_free_security(). But the underlying issue
was already present before that commit as a possible use-after-free
of isec.
Shivnandan Kumar reported the list corruption and proposed
a patch to split the list and rcu elements out of the union
as separate fields of the inode_security_struct so that setting
the rcu element would not affect the list element. However,
this would merely hide the issue and not truly fix the code.
This patch instead moves up the deletion of the list entry
prior to dropping the sbsec->isec_lock initially. Then,
if the inode is dropped subsequently, there will be no further
references to the isec.
Reported-by: Shivnandan Kumar <shivnandan.k@samsung.com>
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Cc: stable@vger.kernel.org
Signed-off-by: Paul Moore <pmoore@redhat.com>
Linus Torvalds [Wed, 15 Oct 2014 05:48:18 +0000 (07:48 +0200)]
Merge branch 'for-3.18-consistent-ops' of git://git./linux/kernel/git/tj/percpu
Pull percpu consistent-ops changes from Tejun Heo:
"Way back, before the current percpu allocator was implemented, static
and dynamic percpu memory areas were allocated and handled separately
and had their own accessors. The distinction has been gone for many
years now; however, the now duplicate two sets of accessors remained
with the pointer based ones - this_cpu_*() - evolving various other
operations over time. During the process, we also accumulated other
inconsistent operations.
This pull request contains Christoph's patches to clean up the
duplicate accessor situation. __get_cpu_var() uses are replaced with
with this_cpu_ptr() and __this_cpu_ptr() with raw_cpu_ptr().
Unfortunately, the former sometimes is tricky thanks to C being a bit
messy with the distinction between lvalues and pointers, which led to
a rather ugly solution for cpumask_var_t involving the introduction of
this_cpu_cpumask_var_ptr().
This converts most of the uses but not all. Christoph will follow up
with the remaining conversions in this merge window and hopefully
remove the obsolete accessors"
* 'for-3.18-consistent-ops' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (38 commits)
irqchip: Properly fetch the per cpu offset
percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_t -fix
ia64: sn_nodepda cannot be assigned to after this_cpu conversion. Use __this_cpu_write.
percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_t
Revert "powerpc: Replace __get_cpu_var uses"
percpu: Remove __this_cpu_ptr
clocksource: Replace __this_cpu_ptr with raw_cpu_ptr
sparc: Replace __get_cpu_var uses
avr32: Replace __get_cpu_var with __this_cpu_write
blackfin: Replace __get_cpu_var uses
tile: Use this_cpu_ptr() for hardware counters
tile: Replace __get_cpu_var uses
powerpc: Replace __get_cpu_var uses
alpha: Replace __get_cpu_var
ia64: Replace __get_cpu_var uses
s390: cio driver &__get_cpu_var replacements
s390: Replace __get_cpu_var uses
mips: Replace __get_cpu_var uses
MIPS: Replace __get_cpu_var uses in FPU emulator.
arm: Replace __this_cpu_ptr with raw_cpu_ptr
...
Linus Torvalds [Wed, 15 Oct 2014 05:30:52 +0000 (07:30 +0200)]
Merge tag 'llvmlinux-for-v3.18' of git://git.linuxfoundation.org/llvmlinux/kernel
Pull LLVM updates from Behan Webster:
"These patches remove the use of VLAIS using a new SHASH_DESC_ON_STACK
macro.
Some of the previously accepted VLAIS removal patches haven't used
this macro. I will push new patches to consistently use this macro in
all those older cases for 3.19"
[ More LLVM patches coming in through subsystem trees, and LLVM itself
needs some fixes that are already in many distributions but not in
released versions of LLVM. Some day this will all "just work" - Linus ]
* tag 'llvmlinux-for-v3.18' of git://git.linuxfoundation.org/llvmlinux/kernel:
crypto: LLVMLinux: Remove VLAIS usage from crypto/testmgr.c
security, crypto: LLVMLinux: Remove VLAIS from ima_crypto.c
crypto: LLVMLinux: Remove VLAIS usage from libcrc32c.c
crypto: LLVMLinux: Remove VLAIS usage from crypto/hmac.c
crypto, dm: LLVMLinux: Remove VLAIS usage from dm-crypt
crypto: LLVMLinux: Remove VLAIS from crypto/.../qat_algs.c
crypto: LLVMLinux: Remove VLAIS from crypto/omap_sham.c
crypto: LLVMLinux: Remove VLAIS from crypto/n2_core.c
crypto: LLVMLinux: Remove VLAIS from crypto/mv_cesa.c
crypto: LLVMLinux: Remove VLAIS from crypto/ccp/ccp-crypto-sha.c
btrfs: LLVMLinux: Remove VLAIS
crypto: LLVMLinux: Add macro to remove use of VLAIS in crypto code
Linus Torvalds [Wed, 15 Oct 2014 05:23:49 +0000 (07:23 +0200)]
Merge tag 'iommu-updates-v3.18' of git://git./linux/kernel/git/joro/iommu
Pull IOMMU updates from Joerg Roedel:
"This pull-request includes:
- change in the IOMMU-API to convert the former iommu_domain_capable
function to just iommu_capable
- various fixes in handling RMRR ranges for the VT-d driver (one fix
requires a device driver core change which was acked by Greg KH)
- the AMD IOMMU driver now assigns and deassigns complete alias
groups to fix issues with devices using the wrong PCI request-id
- MMU-401 support for the ARM SMMU driver
- multi-master IOMMU group support for the ARM SMMU driver
- various other small fixes all over the place"
* tag 'iommu-updates-v3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (41 commits)
iommu/vt-d: Work around broken RMRR firmware entries
iommu/vt-d: Store bus information in RMRR PCI device path
iommu/vt-d: Only remove domain when device is removed
driver core: Add BUS_NOTIFY_REMOVED_DEVICE event
iommu/amd: Fix devid mapping for ivrs_ioapic override
iommu/irq_remapping: Fix the regression of hpet irq remapping
iommu: Fix bus notifier breakage
iommu/amd: Split init_iommu_group() from iommu_init_device()
iommu: Rework iommu_group_get_for_pci_dev()
iommu: Make of_device_id array const
amd_iommu: do not dereference a NULL pointer address.
iommu/omap: Remove omap_iommu unused owner field
iommu: Remove iommu_domain_has_cap() API function
IB/usnic: Convert to use new iommu_capable() API function
vfio: Convert to use new iommu_capable() API function
kvm: iommu: Convert to use new iommu_capable() API function
iommu/tegra: Convert to iommu_capable() API function
iommu/msm: Convert to iommu_capable() API function
iommu/vt-d: Convert to iommu_capable() API function
iommu/fsl: Convert to iommu_capable() API function
...
Linus Torvalds [Wed, 15 Oct 2014 05:05:03 +0000 (07:05 +0200)]
Merge tag 'clk-for-linus-3.18' of git://git.linaro.org/people/mike.turquette/linux
Pull clock tree updates from Mike Turquette:
"The clk tree changes for 3.18 are dominated by clock drivers. Mostly
fixes and enhancements to existing drivers as well as new drivers.
This tag contains a bit more arch code than I usually take due to some
OMAP2+ changes. Additionally it contains the restart notifier
handlers which are merged as a dependency into several trees.
The PXA changes are the only messy part. Due to having a stable tree
I had to revert one patch and follow up with one more fix near the tip
of this tag. Some dead code is introduced but it will soon become
live code after 3.18-rc1 is released as the rest of the PXA family is
converted over to the common clock framework.
Another trend in this tag is that multiple vendors have started to
push the complexity of changing their CPU frequency into the clock
driver, whereas this used to be done in CPUfreq drivers.
Changes to the clk core include a generic gpio-clock type and a
clk_set_phase() function added to the top-level clk.h api. Due to
some confusion on the fbdev mailing list the kernel boot parameters
documentation was updated to further explain the clk_ignore_unused
parameter, which is often required by users of the simplefb driver.
Finally some fixes to the locking around the clock debugfs stuff was
done to prevent deadlocks when interacting with other subsystems."
* tag 'clk-for-linus-3.18' of git://git.linaro.org/people/mike.turquette/linux: (99 commits)
clk: pxa clocks build system fix
Revert "arm: pxa: Transition pxa27x to clk framework"
clk: samsung: register restart handlers for s3c2412 and s3c2443
clk: rockchip: add restart handler
clk: rockchip: rk3288: i2s_frac adds flag to set parent's rate
doc/kernel-parameters.txt: clarify clk_ignore_unused
arm: pxa: Transition pxa27x to clk framework
dts: add devicetree bindings for pxa27x clocks
clk: add pxa27x clock drivers
arm: pxa: add clock pll selection bits
clk: dts: document pxa clock binding
clk: add pxa clocks infrastructure
clk: gpio-gate: Ensure gpiod_ APIs are prototyped
clk: ti: dra7-atl-clock: Mark the device as pm_runtime_irq_safe
clk: ti: LLVMLinux: Move __init outside of type definition
clk: ti: consider the fact that of_clk_get() might return an error
clk: ti: dra7-atl-clock: fix a memory leak
clk: ti: change clock init to use generic of_clk_init
clk: hix5hd2: add I2C clocks
clk: hix5hd2: add watchdog0 clocks
...
Linus Torvalds [Wed, 15 Oct 2014 04:58:16 +0000 (06:58 +0200)]
Merge tag 'mfd-for-linus-3.18' of git://git./linux/kernel/git/lee/mfd
Pull MFD updates from Lee Jones:
"Changes to existing drivers:
- DT clean-ups in da9055-core, max14577, rn5t618, arizona, hi6421, stmpe, twl4030
- Export symbols for use in modules in max14577
- Plenty of static code analysis/Coccinelle fixes throughout the SS
- Regmap clean-ups in arizona, wm5102, wm5110, da9052, tps65217, rk808
- Remove unused/duplicate code in da9052, 88pm860x, ti_ssp, lpc_sch, arizona
- Bug fixes in ti_am335x_tscadc, da9052, ti_am335x_tscadc, rtsx_pcr
- IRQ fixups in arizona, stmpe, max14577
- Regulator related changes in axp20x
- Pass DMA coherency information from parent => child in MFD core
- Rename DT document files for consistency
- Add ACPI support to the MFD core
- Add Andreas Werner to MAINTAINERS for MEN F21BMC
New drivers/supported devices:
- New driver for MEN 14F021P00 Board Management Controller
- New driver for Ricoh RN5T618 PMIC
- New driver for Rockchip RK808
- New driver for HiSilicon Hi6421 PMIC
- New driver for Qualcomm SPMI PMICs
- Add support for Intel Braswell in lpc_ich
- Add support for Intel 9 Series PCH in lpc_ich
- Add support for Intel Quark ILB in lpc_sch"
[ Delayed to after the poweer/reset pull due to Kconfig problems with
recursive Kconfig select/depends-on chains. - Linus ]
* tag 'mfd-for-linus-3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd: (79 commits)
mfd: cros_ec: wait for completion of commands that return IN_PROGRESS
i2c: i2c-cros-ec-tunnel: Set retries to 3
mfd: cros_ec: move locking into cros_ec_cmd_xfer
mfd: cros_ec: stop calling ->cmd_xfer() directly
mfd: cros_ec: Delay for 50ms when we see EC_CMD_REBOOT_EC
MAINTAINERS: Adds Andreas Werner to maintainers list for MEN F21BMC
mfd: arizona: Correct mask to allow setting micbias external cap
mfd: Add ACPI support
Revert "mfd: wm5102: Manually apply register patch"
mfd: ti_am335x_tscadc: Update logic in CTRL register for 5-wire TS
mfd: dt-bindings: atmel-gpbr: Rename doc file to conform to naming convention
mfd: dt-bindings: qcom-pm8xxx: Rename doc file to conform to naming convention
mfd: Inherit coherent_dma_mask from parent device
mfd: Document DT bindings for Qualcomm SPMI PMICs
mfd: Add support for Qualcomm SPMI PMICs
mfd: dt-bindings: pm8xxx: Add new compatible string
mfd: axp209x: Drop the parent supplies field
mfd: twl4030-power: Use 'ti,system-power-controller' as alternative way to support system power off
mfd: dt-bindings: twl4030-power: Use the standard property to mark power control
mfd: syscon: Add Atmel GPBR DT bindings documention
...
Linus Torvalds [Wed, 15 Oct 2014 04:56:23 +0000 (06:56 +0200)]
Merge tag 'for-v3.18' of git://git.infradead.org/battery-2.6
Pull power supply and reset updates from Sebastian Reichel:
- Initial support for the following chips
* max77836 (charger)
* max14577 (charger)
* bq27742 (battery gauge)
* ltc2952 (poweroff)
* stih416 (restart)
* syscon-reboot (restart)
* gpio-restart (restart)
- cleanup of power supply core
- misc fixes in power supply and reset drivers
* tag 'for-v3.18' of git://git.infradead.org/battery-2.6: (48 commits)
power: ab8500_fg: Fix build warning
Documentation: charger: max14577: Update the date of introducing ABI
power: reset: corrections for simple syscon reboot driver
Documentation: power: reset: Add documentation for generic SYSCON reboot driver
power: reset: Add generic SYSCON register mapped reset
bq27x00_battery: Fix flag reading for bq27742
power: reset: use restart_notifier mechanism for msm-poweroff
power: Add simple gpio-restart driver
power: reset: st: Provide DT bindings for ST's Power Reset driver
power: reset: Add restart functionality for STiH41x platforms
power: charger-manager: Fix NULL pointer exception with missing cm-fuel-gauge
power: max14577: Fix circular config SYSFS dependency
power: gpio-charger: do not use gpio value directly
power: max8925: Use of_get_child_by_name
power: max8925: Fix NULL ptr dereference on memory allocation failure
bq27x00_battery: Add support to bq27742
Documentation: charger: max14577: Document exported sysfs entry
devicetree: mfd: max14577: Add device tree bindings document
power: max17040: Add ID for MAX77836 Fuel Gauge block
charger: max14577: Configure battery-dependent settings from DTS and sysfs
...
Conflicts:
drivers/power/reset/Kconfig
drivers/power/reset/Makefile
Linus Torvalds [Wed, 15 Oct 2014 04:46:01 +0000 (06:46 +0200)]
Merge branch 'for-linus' of git://git./linux/kernel/git/sage/ceph-client
Pull Ceph updates from Sage Weil:
"There is the long-awaited discard support for RBD (Guangliang Zhao,
Josh Durgin), a pile of RBD bug fixes that didn't belong in late -rc's
(Ilya Dryomov, Li RongQing), a pile of fs/ceph bug fixes and
performance and debugging improvements (Yan, Zheng, John Spray), and a
smattering of cleanups (Chao Yu, Fabian Frederick, Joe Perches)"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (40 commits)
ceph: fix divide-by-zero in __validate_layout()
rbd: rbd workqueues need a resque worker
libceph: ceph-msgr workqueue needs a resque worker
ceph: fix bool assignments
libceph: separate multiple ops with commas in debugfs output
libceph: sync osd op definitions in rados.h
libceph: remove redundant declaration
ceph: additional debugfs output
ceph: export ceph_session_state_name function
ceph: include the initial ACL in create/mkdir/mknod MDS requests
ceph: use pagelist to present MDS request data
libceph: reference counting pagelist
ceph: fix llistxattr on symlink
ceph: send client metadata to MDS
ceph: remove redundant code for max file size verification
ceph: remove redundant io_iter_advance()
ceph: move ceph_find_inode() outside the s_mutex
ceph: request xattrs if xattr_version is zero
rbd: set the remaining discard properties to enable support
rbd: use helpers to handle discard for layered images correctly
...
Linus Torvalds [Wed, 15 Oct 2014 04:43:27 +0000 (06:43 +0200)]
Merge branch 'CVE-2014-7970' of git://git./linux/kernel/git/luto/linux
Pull pivot_root() fix from Andy Lutomirski.
Prevent a leak of unreachable mounts.
* 'CVE-2014-7970' of git://git.kernel.org/pub/scm/linux/kernel/git/luto/linux:
mnt: Prevent pivot_root from creating a loop in the mount tree
David S. Miller [Wed, 15 Oct 2014 04:29:08 +0000 (00:29 -0400)]
Merge branch 'cxgb4'
Anish Bhatt says:
====================
ipv6 and related cleanup for cxgb4/cxgb4i
This patch set removes some duplicated/extraneous code from cxgb4i, guards
cxgb4 against compilation failure based on ipv6 tristate, make ipv6 related
code no longer be enabled by default irrespective of ipv6 tristate and fixes
a refcnt issue.
-Anish
v2 : Provide more detailed commit messages, make subject more concise as
recommended by Dave Miller.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Anish Bhatt [Wed, 15 Oct 2014 03:07:24 +0000 (20:07 -0700)]
cxgb4i: Remove duplicate call to dst_neigh_lookup()
There is an extra call to dst_neigh_lookup() leftover in cxgb4i that can cause
an unreleased refcnt issue. Remove extraneous call.
Signed-off-by: Anish Bhatt <anish@chelsio.com>
Fixes :
759a0cc5a3e1b ('cxgb4i: Add ipv6 code to driver, call into libcxgbi ipv6 api')
Signed-off-by: David S. Miller <davem@davemloft.net>
Anish Bhatt [Wed, 15 Oct 2014 03:07:23 +0000 (20:07 -0700)]
cxgb4i : Fix -Wunused-function warning
A bunch of ipv6 related code is left on by default. While this causes no
compilation issues, there is no need to have this enabled by default. Guard
with an ipv6 check, which also takes care of a -Wunused-function warning.
Signed-off-by: Anish Bhatt <anish@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Anish Bhatt [Wed, 15 Oct 2014 03:07:22 +0000 (20:07 -0700)]
cxgb4 : Fix build failure in cxgb4 when ipv6 is disabled/not in-built
cxgb4 ipv6 does not guard against ipv6 being disabled, or the standard
ipv6 module vs inbuilt tri-state issue. This was fixed for cxgb4i & iw_cxgb4
but missed for cxgb4.
Signed-off-by: Anish Bhatt <anish@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Anish Bhatt [Wed, 15 Oct 2014 03:07:21 +0000 (20:07 -0700)]
cxgb4i : Remove duplicated CLIP handling code
cxgb4 already handles CLIP updates from a previous changeset for iw_cxgb4,
there is no need to have this functionality in cxgb4i. Remove duplicated code
Signed-off-by: Anish Bhatt <anish@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 15 Oct 2014 02:37:58 +0000 (19:37 -0700)]
sparc64: Fix FPU register corruption with AES crypto offload.
The AES loops in arch/sparc/crypto/aes_glue.c use a scheme where the
key material is preloaded into the FPU registers, and then we loop
over and over doing the crypt operation, reusing those pre-cooked key
registers.
There are intervening blkcipher*() calls between the crypt operation
calls. And those might perform memcpy() and thus also try to use the
FPU.
The sparc64 kernel FPU usage mechanism is designed to allow such
recursive uses, but with a catch.
There has to be a trap between the two FPU using threads of control.
The mechanism works by, when the FPU is already in use by the kernel,
allocating a slot for FPU saving at trap time. Then if, within the
trap handler, we try to use the FPU registers, the pre-trap FPU
register state is saved into the slot. Then at trap return time we
notice this and restore the pre-trap FPU state.
Over the long term there are various more involved ways we can make
this work, but for a quick fix let's take advantage of the fact that
the situation where this happens is very limited.
All sparc64 chips that support the crypto instructiosn also are using
the Niagara4 memcpy routine, and that routine only uses the FPU for
large copies where we can't get the source aligned properly to a
multiple of 8 bytes.
We look to see if the FPU is already in use in this context, and if so
we use the non-large copy path which only uses integer registers.
Furthermore, we also limit this special logic to when we are doing
kernel copy, rather than a user copy.
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael S. Tsirkin [Tue, 14 Oct 2014 23:52:33 +0000 (10:22 +1030)]
virtio-rng: refactor probe error handling
Code like
vi->vq = NULL;
kfree(vi)
does not make sense.
Clean it up, use goto error labels for cleanup.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Michael S. Tsirkin [Tue, 14 Oct 2014 23:52:33 +0000 (10:22 +1030)]
virtio_scsi: drop scan callback
Enable VQs early like we do for restore.
This makes it possible to drop the scan callback,
moving scanning into the probe function, and making
code simpler.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Michael S. Tsirkin [Tue, 14 Oct 2014 23:52:33 +0000 (10:22 +1030)]
virtio_balloon: enable VQs early on restore
virtio spec requires drivers to set DRIVER_OK before using VQs.
This is set automatically after resume returns, virtio balloon
violated this rule by adding bufs, which causes the VQ to be used
directly within restore.
To fix, call virtio_device_ready before using VQ.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Michael S. Tsirkin [Tue, 14 Oct 2014 23:52:33 +0000 (10:22 +1030)]
virtio_scsi: fix race on device removal
We cancel event work on device removal, but an interrupt
could trigger immediately after this, and queue it
again.
To fix, set a flag.
Loosely based on patch by Paolo Bonzini
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Paolo Bonzini [Tue, 14 Oct 2014 23:52:33 +0000 (10:22 +1030)]
virito_scsi: use freezable WQ for events
Michael S. Tsirkin noticed a race condition:
we reset device on freeze, but system WQ is still
running so it might try adding bufs to a VQ meanwhile.
To fix, switch to handling events from the freezable WQ.
Reported-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Michael S. Tsirkin [Tue, 14 Oct 2014 23:52:32 +0000 (10:22 +1030)]
virtio_net: enable VQs early on restore
virtio spec requires drivers to set DRIVER_OK before using VQs.
This is set automatically after restore returns, virtio net violated this
rule by using receive VQs within restore.
To fix, call virtio_device_ready before using VQs.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Michael S. Tsirkin [Tue, 14 Oct 2014 23:52:32 +0000 (10:22 +1030)]
virtio_console: enable VQs early on restore
virtio spec requires drivers to set DRIVER_OK before using VQs.
This is set automatically after resume returns, virtio console violated this
rule by adding inbufs, which causes the VQ to be used directly within
restore.
To fix, call virtio_device_ready before using VQs.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Michael S. Tsirkin [Tue, 14 Oct 2014 23:52:32 +0000 (10:22 +1030)]
virtio_scsi: enable VQs early on restore
virtio spec requires drivers to set DRIVER_OK before using VQs.
This is set automatically after restore returns, virtio scsi violated
this rule on restore by kicking event vq within restore.
To fix, call virtio_device_ready before using event queue.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Michael S. Tsirkin [Tue, 14 Oct 2014 23:52:32 +0000 (10:22 +1030)]
virtio_blk: enable VQs early on restore
virtio spec requires drivers to set DRIVER_OK before using VQs.
This is set automatically after restore returns, virtio block violated
this rule on restore by restarting queues, which might in theory
cause the VQ to be used directly within restore.
To fix, call virtio_device_ready before using starting queues.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Michael S. Tsirkin [Tue, 14 Oct 2014 23:52:31 +0000 (10:22 +1030)]
virtio_scsi: move kick event out from virtscsi_init
We currently kick event within virtscsi_init,
before host is fully initialized.
This can in theory confuse guest if device
consumes the buffers immediately.
To fix, move virtscsi_kick_event_all out to scan/restore.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Michael S. Tsirkin [Tue, 14 Oct 2014 23:52:31 +0000 (10:22 +1030)]
virtio_net: fix use after free on allocation failure
In the extremely unlikely event that driver initialization fails after
RX buffers are added, virtio net frees RX buffers while VQs are
still active, potentially causing device to use a freed buffer.
To fix, reset device first - same as we do on device removal.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Michael S. Tsirkin [Tue, 14 Oct 2014 23:52:31 +0000 (10:22 +1030)]
9p/trans_virtio: enable VQs early
virtio spec requires drivers to set DRIVER_OK before using VQs.
This is set automatically after probe returns, but virtio 9p device
adds self to channel list within probe, at which point VQ can be
used in violation of the spec.
To fix, call virtio_device_ready before using VQs.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Michael S. Tsirkin [Tue, 14 Oct 2014 23:52:31 +0000 (10:22 +1030)]
virtio_console: enable VQs early
virtio spec requires drivers to set DRIVER_OK before using VQs.
This is set automatically after probe returns, virtio console violated this
rule by adding inbufs, which causes the VQ to be used directly within
probe.
To fix, call virtio_device_ready before using VQs.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Michael S. Tsirkin [Tue, 14 Oct 2014 23:52:30 +0000 (10:22 +1030)]
virtio_blk: enable VQs early
virtio spec requires drivers to set DRIVER_OK before using VQs.
This is set automatically after probe returns, virtio block violated this
rule by calling add_disk, which causes the VQ to be used directly within
probe.
To fix, call virtio_device_ready before using VQs.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Michael S. Tsirkin [Tue, 14 Oct 2014 23:52:30 +0000 (10:22 +1030)]
virtio_net: enable VQs early
virtio spec requires drivers to set DRIVER_OK before using VQs.
This is set automatically after probe returns, virtio net violated this
rule by using receive VQs within probe.
To fix, call virtio_device_ready before using VQs.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Michael S. Tsirkin [Tue, 14 Oct 2014 23:52:30 +0000 (10:22 +1030)]
virtio: add API to enable VQs early
virtio spec 0.9.X requires DRIVER_OK to be set before
VQs are used, but some drivers use VQs before probe
function returns.
Since DRIVER_OK is set after probe, this violates the spec.
Even though under virtio 1.0 transitional devices support this
behaviour, we want to make it possible for those early callers to become
spec compliant and eventually support non-transitional devices.
Add API for drivers to call before using VQs.
Sets DRIVER_OK internally.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Michael S. Tsirkin [Tue, 14 Oct 2014 23:52:30 +0000 (10:22 +1030)]
virtio_net: minor cleanup
goto done;
done:
return;
is ugly, it was put there to make diff review easier.
replace by open-coded return.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Michael S. Tsirkin [Tue, 14 Oct 2014 23:52:29 +0000 (10:22 +1030)]
virtio-net: drop config_mutex
config_mutex served two purposes: prevent multiple concurrent config
change handlers, and synchronize access to config_enable flag.
Since commit
dbf2576e37da0fcc7aacbfbb9fd5d3de7888a3c1
workqueue: make all workqueues non-reentrant
all workqueues are non-reentrant, and config_enable
is now gone.
Get rid of the unnecessary lock.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Michael S. Tsirkin [Tue, 14 Oct 2014 23:52:29 +0000 (10:22 +1030)]
virtio_net: drop config_enable
Now that virtio core ensures config changes don't arrive during probing,
drop config_enable flag in virtio net.
On removal, flush is now sufficient to guarantee that no change work is
queued.
This help simplify the driver, and will allow setting DRIVER_OK earlier
without losing config change notifications.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Michael S. Tsirkin [Tue, 14 Oct 2014 23:52:29 +0000 (10:22 +1030)]
virtio-blk: drop config_mutex
config_mutex served two purposes: prevent multiple concurrent config
change handlers, and synchronize access to config_enable flag.
Since commit
dbf2576e37da0fcc7aacbfbb9fd5d3de7888a3c1
workqueue: make all workqueues non-reentrant
all workqueues are non-reentrant, and config_enable
is now gone.
Get rid of the unnecessary lock.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Michael S. Tsirkin [Tue, 14 Oct 2014 23:52:26 +0000 (10:22 +1030)]
virtio_blk: drop config_enable
Now that virtio core ensures config changes don't
arrive during probing, drop config_enable flag
in virtio blk.
On removal, flush is now sufficient to guarantee that
no change work is queued.
This help simplify the driver, and will allow
setting DRIVER_OK earlier without losing config
change notifications.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Michael S. Tsirkin [Tue, 14 Oct 2014 23:51:55 +0000 (10:21 +1030)]
virtio: defer config changed notifications
Defer config changed notifications that arrive during
probe/scan/freeze/restore.
This will allow drivers to set DRIVER_OK earlier, without worrying about
racing with config change interrupts.
This change will also benefit old hypervisors (before 2009)
that send interrupts without checking DRIVER_OK: previously,
the callback could race with driver-specific initialization.
This will also help simplify drivers.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (cosmetic changes)
Michael S. Tsirkin [Tue, 14 Oct 2014 00:10:35 +0000 (10:40 +1030)]
virtio-pci: move freeze/restore to virtio core
This is in preparation to extending config changed event handling
in core.
Wrapping these in an API also seems to make for a cleaner code.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Michael S. Tsirkin [Tue, 14 Oct 2014 00:10:34 +0000 (10:40 +1030)]
virtio: unify config_changed handling
Replace duplicated code in all transports with a single wrapper in
virtio.c.
The only functional change is in virtio_mmio.c: if a buggy device sends
us an interrupt before driver is set, we previously returned IRQ_NONE,
now we return IRQ_HANDLED.
As this must not happen in practice, this does not look like a big deal.
See also commit
3fff0179e33cd7d0a688dab65700c46ad089e934
virtio-pci: do not oops on config change if driver not loaded.
for the original motivation behind the driver check.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Michael S. Tsirkin [Tue, 14 Oct 2014 00:10:29 +0000 (10:40 +1030)]
virtio_pci: fix virtio spec compliance on restore
On restore, virtio pci does the following:
+ set features
+ init vqs etc - device can be used at this point!
+ set ACKNOWLEDGE,DRIVER and DRIVER_OK status bits
This is in violation of the virtio spec, which
requires the following order:
- ACKNOWLEDGE
- DRIVER
- init vqs
- DRIVER_OK
This behaviour will break with hypervisors that assume spec compliant
behaviour. It seems like a good idea to have this patch applied to
stable branches to reduce the support butden for the hypervisors.
Cc: stable@vger.kernel.org
Cc: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Prarit Bhargava [Mon, 13 Oct 2014 16:21:39 +0000 (02:51 +1030)]
modules, lock around setting of MODULE_STATE_UNFORMED
A panic was seen in the following sitation.
There are two threads running on the system. The first thread is a system
monitoring thread that is reading /proc/modules. The second thread is
loading and unloading a module (in this example I'm using my simple
dummy-module.ko). Note, in the "real world" this occurred with the qlogic
driver module.
When doing this, the following panic occurred:
------------[ cut here ]------------
kernel BUG at kernel/module.c:3739!
invalid opcode: 0000 [#1] SMP
Modules linked in: binfmt_misc sg nfsv3 rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel lrw igb gf128mul glue_helper iTCO_wdt iTCO_vendor_support ablk_helper ptp sb_edac cryptd pps_core edac_core shpchp i2c_i801 pcspkr wmi lpc_ich ioatdma mfd_core dca ipmi_si nfsd ipmi_msghandler auth_rpcgss nfs_acl lockd sunrpc xfs libcrc32c sr_mod cdrom sd_mod crc_t10dif crct10dif_common mgag200 syscopyarea sysfillrect sysimgblt i2c_algo_bit drm_kms_helper ttm isci drm libsas ahci libahci scsi_transport_sas libata i2c_core dm_mirror dm_region_hash dm_log dm_mod [last unloaded: dummy_module]
CPU: 37 PID: 186343 Comm: cat Tainted: GF O-------------- 3.10.0+ #7
Hardware name: Intel Corporation S2600CP/S2600CP, BIOS RMLSDP.86I.00.29.D696.
1311111329 11/11/2013
task:
ffff8807fd2d8000 ti:
ffff88080fa7c000 task.ti:
ffff88080fa7c000
RIP: 0010:[<
ffffffff810d64c5>] [<
ffffffff810d64c5>] module_flags+0xb5/0xc0
RSP: 0018:
ffff88080fa7fe18 EFLAGS:
00010246
RAX:
0000000000000003 RBX:
ffffffffa03b5200 RCX:
0000000000000000
RDX:
0000000000001000 RSI:
ffff88080fa7fe38 RDI:
ffffffffa03b5000
RBP:
ffff88080fa7fe28 R08:
0000000000000010 R09:
0000000000000000
R10:
0000000000000000 R11:
000000000000000f R12:
ffffffffa03b5000
R13:
ffffffffa03b5008 R14:
ffffffffa03b5200 R15:
ffffffffa03b5000
FS:
00007f6ae57ef740(0000) GS:
ffff88101e7a0000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
0000000000404f70 CR3:
0000000ffed48000 CR4:
00000000001407e0
DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
DR3:
0000000000000000 DR6:
00000000ffff0ff0 DR7:
0000000000000400
Stack:
ffffffffa03b5200 ffff8810101e4800 ffff88080fa7fe70 ffffffff810d666c
ffff88081e807300 000000002e0f2fbf 0000000000000000 ffff88100f257b00
ffffffffa03b5008 ffff88080fa7ff48 ffff8810101e4800 ffff88080fa7fee0
Call Trace:
[<
ffffffff810d666c>] m_show+0x19c/0x1e0
[<
ffffffff811e4d7e>] seq_read+0x16e/0x3b0
[<
ffffffff812281ed>] proc_reg_read+0x3d/0x80
[<
ffffffff811c0f2c>] vfs_read+0x9c/0x170
[<
ffffffff811c1a58>] SyS_read+0x58/0xb0
[<
ffffffff81605829>] system_call_fastpath+0x16/0x1b
Code: 48 63 c2 83 c2 01 c6 04 03 29 48 63 d2 eb d9 0f 1f 80 00 00 00 00 48 63 d2 c6 04 13 2d 41 8b 0c 24 8d 50 02 83 f9 01 75 b2 eb cb <0f> 0b 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41
RIP [<
ffffffff810d64c5>] module_flags+0xb5/0xc0
RSP <
ffff88080fa7fe18>
Consider the two processes running on the system.
CPU 0 (/proc/modules reader)
CPU 1 (loading/unloading module)
CPU 0 opens /proc/modules, and starts displaying data for each module by
traversing the modules list via fs/seq_file.c:seq_open() and
fs/seq_file.c:seq_read(). For each module in the modules list, seq_read
does
op->start() <-- this is a pointer to m_start()
op->show() <- this is a pointer to m_show()
op->stop() <-- this is a pointer to m_stop()
The m_start(), m_show(), and m_stop() module functions are defined in
kernel/module.c. The m_start() and m_stop() functions acquire and release
the module_mutex respectively.
ie) When reading /proc/modules, the module_mutex is acquired and released
for each module.
m_show() is called with the module_mutex held. It accesses the module
struct data and attempts to write out module data. It is in this code
path that the above BUG_ON() warning is encountered, specifically m_show()
calls
static char *module_flags(struct module *mod, char *buf)
{
int bx = 0;
BUG_ON(mod->state == MODULE_STATE_UNFORMED);
...
The other thread, CPU 1, in unloading the module calls the syscall
delete_module() defined in kernel/module.c. The module_mutex is acquired
for a short time, and then released. free_module() is called without the
module_mutex. free_module() then sets mod->state = MODULE_STATE_UNFORMED,
also without the module_mutex. Some additional code is called and then the
module_mutex is reacquired to remove the module from the modules list:
/* Now we can delete it from the lists */
mutex_lock(&module_mutex);
stop_machine(__unlink_module, mod, NULL);
mutex_unlock(&module_mutex);
This is the sequence of events that leads to the panic.
CPU 1 is removing dummy_module via delete_module(). It acquires the
module_mutex, and then releases it. CPU 1 has NOT set dummy_module->state to
MODULE_STATE_UNFORMED yet.
CPU 0, which is reading the /proc/modules, acquires the module_mutex and
acquires a pointer to the dummy_module which is still in the modules list.
CPU 0 calls m_show for dummy_module. The check in m_show() for
MODULE_STATE_UNFORMED passed for dummy_module even though it is being
torn down.
Meanwhile CPU 1, which has been continuing to remove dummy_module without
holding the module_mutex, now calls free_module() and sets
dummy_module->state to MODULE_STATE_UNFORMED.
CPU 0 now calls module_flags() with dummy_module and ...
static char *module_flags(struct module *mod, char *buf)
{
int bx = 0;
BUG_ON(mod->state == MODULE_STATE_UNFORMED);
and BOOM.
Acquire and release the module_mutex lock around the setting of
MODULE_STATE_UNFORMED in the teardown path, which should resolve the
problem.
Testing: In the unpatched kernel I can panic the system within 1 minute by
doing
while (true) do insmod dummy_module.ko; rmmod dummy_module.ko; done
and
while (true) do cat /proc/modules; done
in separate terminals.
In the patched kernel I was able to run just over one hour without seeing
any issues. I also verified the output of panic via sysrq-c and the output
of /proc/modules looks correct for all three states for the dummy_module.
dummy_module 12661 0 - Unloading 0xffffffffa03a5000 (OE-)
dummy_module 12661 0 - Live 0xffffffffa03bb000 (OE)
dummy_module 14015 1 - Loading 0xffffffffa03a5000 (OE+)
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: stable@kernel.org
Eric W. Biederman [Wed, 8 Oct 2014 17:42:27 +0000 (10:42 -0700)]
mnt: Prevent pivot_root from creating a loop in the mount tree
Andy Lutomirski recently demonstrated that when chroot is used to set
the root path below the path for the new ``root'' passed to pivot_root
the pivot_root system call succeeds and leaks mounts.
In examining the code I see that starting with a new root that is
below the current root in the mount tree will result in a loop in the
mount tree after the mounts are detached and then reattached to one
another. Resulting in all kinds of ugliness including a leak of that
mounts involved in the leak of the mount loop.
Prevent this problem by ensuring that the new mount is reachable from
the current root of the mount tree.
[Added stable cc. Fixes CVE-2014-7970. --Andy]
Cc: stable@vger.kernel.org
Reported-by: Andy Lutomirski <luto@amacapital.net>
Reviewed-by: Andy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/87bnpmihks.fsf@x220.int.ebiederm.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Eric Dumazet [Mon, 13 Oct 2014 13:27:47 +0000 (06:27 -0700)]
tcp: TCP Small Queues and strange attractors
TCP Small queues tries to keep number of packets in qdisc
as small as possible, and depends on a tasklet to feed following
packets at TX completion time.
Choice of tasklet was driven by latencies requirements.
Then, TCP stack tries to avoid reorders, by locking flows with
outstanding packets in qdisc in a given TX queue.
What can happen is that many flows get attracted by a low performing
TX queue, and cpu servicing TX completion has to feed packets for all of
them, making this cpu 100% busy in softirq mode.
This became particularly visible with latest skb->xmit_more support
Strategy adopted in this patch is to detect when tcp_wfree() is called
from ksoftirqd and let the outstanding queue for this flow being drained
before feeding additional packets, so that skb->ooo_okay can be set
to allow select_queue() to select the optimal queue :
Incoming ACKS are normally handled by different cpus, so this patch
gives more chance for these cpus to take over the burden of feeding
qdisc with future packets.
Tested:
lpaa23:~# ./super_netperf 1400 --google-pacing-rate
3028000 -H lpaa24 -l 3600 &
lpaa23:~# sar -n DEV 1 10 | grep eth1
06:16:18 AM eth1 595448.00
1190564.00 38381.09
1760253.12 0.00 0.00 1.00
06:16:19 AM eth1 594858.00
1189686.00 38340.76
1758952.72 0.00 0.00 0.00
06:16:20 AM eth1 597017.00
1194019.00 38480.79
1765370.29 0.00 0.00 1.00
06:16:21 AM eth1 595450.00
1190936.00 38380.19
1760805.05 0.00 0.00 0.00
06:16:22 AM eth1 596385.00
1193096.00 38442.56
1763976.29 0.00 0.00 1.00
06:16:23 AM eth1 598155.00
1195978.00 38552.97
1768264.60 0.00 0.00 0.00
06:16:24 AM eth1 594405.00
1188643.00 38312.57
1757414.89 0.00 0.00 1.00
06:16:25 AM eth1 593366.00
1187154.00 38252.16
1755195.83 0.00 0.00 0.00
06:16:26 AM eth1 593188.00
1186118.00 38232.88
1753682.57 0.00 0.00 1.00
06:16:27 AM eth1 596301.00
1192241.00 38440.94
1762733.09 0.00 0.00 0.00
Average: eth1 595457.30
1190843.50 38381.69
1760664.84 0.00 0.00 0.50
lpaa23:~# ./tc -s -d qd sh dev eth1 | grep backlog
backlog
7606336b 2513p requeues 167982
backlog
224072b 74p requeues 566
backlog
581376b 192p requeues 5598
backlog
181680b 60p requeues 1070
backlog
5305056b 1753p requeues 110166 // Here, this TX queue is attracting flows
backlog
157456b 52p requeues 1758
backlog
672216b 222p requeues 3025
backlog 60560b 20p requeues 24541
backlog
448144b 148p requeues 21258
lpaa23:~# echo 1 >/proc/sys/net/ipv4/tcp_tsq_enable_tcp_wfree_ksoftirqd_detect
Immediate jump to full bandwidth, and traffic is properly
shard on all tx queues.
lpaa23:~# sar -n DEV 1 10 | grep eth1
06:16:46 AM eth1
1397632.00
2795397.00 90081.87
4133031.26 0.00 0.00 1.00
06:16:47 AM eth1
1396874.00
2793614.00 90032.99
4130385.46 0.00 0.00 0.00
06:16:48 AM eth1
1395842.00
2791600.00 89966.46
4127409.67 0.00 0.00 1.00
06:16:49 AM eth1
1395528.00
2791017.00 89946.17
4126551.24 0.00 0.00 0.00
06:16:50 AM eth1
1397891.00
2795716.00 90098.74
4133497.39 0.00 0.00 1.00
06:16:51 AM eth1
1394951.00
2789984.00 89908.96
4125022.51 0.00 0.00 0.00
06:16:52 AM eth1
1394608.00
2789190.00 89886.90
4123851.36 0.00 0.00 1.00
06:16:53 AM eth1
1395314.00
2790653.00 89934.33
4125983.09 0.00 0.00 0.00
06:16:54 AM eth1
1396115.00
2792276.00 89984.25
4128411.21 0.00 0.00 1.00
06:16:55 AM eth1
1396829.00
2793523.00 90030.19
4130250.28 0.00 0.00 0.00
Average: eth1
1396158.40
2792297.00 89987.09
4128439.35 0.00 0.00 0.50
lpaa23:~# tc -s -d qd sh dev eth1 | grep backlog
backlog
7900052b 2609p requeues 173287
backlog
878120b 290p requeues 589
backlog
1068884b 354p requeues 5621
backlog
996212b 329p requeues 1088
backlog
984100b 325p requeues 115316
backlog
956848b 316p requeues 1781
backlog
1080996b 357p requeues 3047
backlog
975016b 322p requeues 24571
backlog
990156b 327p requeues 21274
(All 8 TX queues get a fair share of the traffic)
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 14 Oct 2014 21:05:23 +0000 (17:05 -0400)]
Merge branch 'qlcnic'
Rajesh Borundia says:
====================
qlcnic: Bug fixes
This series fixes following issues.
* We were programming maximum number of arguments supported by
adapter instead of required in a command.
* Destroy tx command requires three arguments instead of two.
Please apply these patches to net.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Rajesh Borundia [Tue, 14 Oct 2014 11:41:46 +0000 (07:41 -0400)]
qlcnic: Fix number of arguments in destroy tx context command
o Number of arguments taken by destroy tx command is three
instead of two.
Signed-off-by: Rajesh Borundia <rajesh.borundia@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rajesh Borundia [Tue, 14 Oct 2014 11:41:45 +0000 (07:41 -0400)]
qlcnic: Fix programming number of arguments in a command.
o Initially we were programming maximum number of arguments.
Instead we should program number of arguments required in
a command.
o Maximum number of arguments for 82xx adapter is four. Fix it
for GET_ESWITCH_STATS command.
Signed-off-by: Rajesh Borundia <rajesh.borundia@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mark Rustad [Tue, 14 Oct 2014 13:28:38 +0000 (06:28 -0700)]
genl_magic: Resolve logical-op warnings
Resolve "logical 'and' applied to non-boolean constant" warnings"
that appear in W=2 builds by adding !! to a bit test.
Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>