Vaishali Thakkar [Fri, 20 May 2016 00:11:20 +0000 (17:11 -0700)]
x86: mm: use hugetlb_bad_size()
Update setup_hugepagesz() to call hugetlb_bad_size() when unsupported
hugepage size is found.
Signed-off-by: Vaishali Thakkar <vaishali.thakkar@oracle.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Yaowei Bai <baiyaowei@cmss.chinamobile.com>
Cc: Dominik Dingel <dingel@linux.vnet.ibm.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vaishali Thakkar [Fri, 20 May 2016 00:11:17 +0000 (17:11 -0700)]
tile: mm: use hugetlb_bad_size()
Update setup_hugepagesz() to call hugetlb_bad_size() when unsupported
hugepage size is found.
Signed-off-by: Vaishali Thakkar <vaishali.thakkar@oracle.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Yaowei Bai <baiyaowei@cmss.chinamobile.com>
Cc: Dominik Dingel <dingel@linux.vnet.ibm.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vaishali Thakkar [Fri, 20 May 2016 00:11:14 +0000 (17:11 -0700)]
powerpc: mm: use hugetlb_bad_size()
Update setup_hugepagesz() to call hugetlb_bad_size() when unsupported
hugepage size is found.
Signed-off-by: Vaishali Thakkar <vaishali.thakkar@oracle.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Yaowei Bai <baiyaowei@cmss.chinamobile.com>
Cc: Dominik Dingel <dingel@linux.vnet.ibm.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vaishali Thakkar [Fri, 20 May 2016 00:11:11 +0000 (17:11 -0700)]
metag: mm: use hugetlb_bad_size()
Update setup_hugepagesz() to call hugetlb_bad_size() when unsupported
hugepage size is found.
Signed-off-by: Vaishali Thakkar <vaishali.thakkar@oracle.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Yaowei Bai <baiyaowei@cmss.chinamobile.com>
Cc: Dominik Dingel <dingel@linux.vnet.ibm.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vaishali Thakkar [Fri, 20 May 2016 00:11:08 +0000 (17:11 -0700)]
arm64: mm: use hugetlb_bad_size()
Update setup_hugepagesz() to call hugetlb_bad_size() when unsupported
hugepage size is found.
Signed-off-by: Vaishali Thakkar <vaishali.thakkar@oracle.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Yaowei Bai <baiyaowei@cmss.chinamobile.com>
Cc: Dominik Dingel <dingel@linux.vnet.ibm.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vaishali Thakkar [Fri, 20 May 2016 00:11:04 +0000 (17:11 -0700)]
mm/hugetlb: introduce hugetlb_bad_size()
When any unsupported hugepage size is specified, 'hugepagesz=' and
'hugepages=' should be ignored during command line parsing until any
supported hugepage size is found. But currently incorrect number of
hugepages are allocated when unsupported size is specified as it fails
to ignore the 'hugepages=' command.
Test case:
Note that this is specific to x86 architecture.
Boot the kernel with command line option 'hugepagesz=256M hugepages=X'.
After boot, dmesg output shows that X number of hugepages of the size 2M
is pre-allocated instead of 0.
So, to handle such command line options, introduce new routine
hugetlb_bad_size. The routine hugetlb_bad_size sets the global variable
parsed_valid_hugepagesz. We are using parsed_valid_hugepagesz to save
the state when unsupported hugepagesize is found so that we can ignore
the 'hugepages=' parameters after that and then reset the variable when
supported hugepage size is found.
The routine hugetlb_bad_size can be called while setting 'hugepagesz='
parameter in an architecture specific code.
Signed-off-by: Vaishali Thakkar <vaishali.thakkar@oracle.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Yaowei Bai <baiyaowei@cmss.chinamobile.com>
Cc: Dominik Dingel <dingel@linux.vnet.ibm.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mike Kravetz [Fri, 20 May 2016 00:11:01 +0000 (17:11 -0700)]
mm/hugetlb: optimize minimum size (min_size) accounting
It was observed that minimum size accounting associated with the
hugetlbfs min_size mount option may not perform optimally and as
expected. As huge pages/reservations are released from the filesystem
and given back to the global pools, they are reserved for subsequent
filesystem use as long as the subpool reserved count is less than
subpool minimum size. It does not take into account used pages within
the filesystem. The filesystem size limits are not exceeded and this is
technically not a bug. However, better behavior would be to wait for
the number of used pages/reservations associated with the filesystem to
drop below the minimum size before taking reservations to satisfy
minimum size.
An optimization is also made to the hugepage_subpool_get_pages() routine
which is called when pages/reservations are allocated. This does not
change behavior, but simply avoids the accounting if all reservations
have already been taken (subpool reserved count == 0).
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrew Morton [Fri, 20 May 2016 00:10:58 +0000 (17:10 -0700)]
include/linux/nodemask.h: create next_node_in() helper
Lots of code does
node = next_node(node, XXX);
if (node == MAX_NUMNODES)
node = first_node(XXX);
so create next_node_in() to do this and use it in various places.
[mhocko@suse.com: use next_node_in() helper]
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Michal Hocko <mhocko@suse.com>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Laura Abbott <lauraa@codeaurora.org>
Cc: Hui Zhu <zhuhui@xiaomi.com>
Cc: Wang Xiaoqiang <wangxq10@lzu.edu.cn>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rasmus Villemoes [Fri, 20 May 2016 00:10:55 +0000 (17:10 -0700)]
include/linux: apply __malloc attribute
Attach the malloc attribute to a few allocation functions. This helps
gcc generate better code by telling it that the return value doesn't
alias any existing pointers (which is even more valuable given the
pessimizations implied by -fno-strict-aliasing).
A simple example of what this allows gcc to do can be seen by looking at
the last part of drm_atomic_helper_plane_reset:
plane->state = kzalloc(sizeof(*plane->state), GFP_KERNEL);
if (plane->state) {
plane->state->plane = plane;
plane->state->rotation = BIT(DRM_ROTATE_0);
}
which compiles to
e8 99 bf d6 ff callq
ffffffff8116d540 <kmem_cache_alloc_trace>
48 85 c0 test %rax,%rax
48 89 83 40 02 00 00 mov %rax,0x240(%rbx)
74 11 je
ffffffff814015c4 <drm_atomic_helper_plane_reset+0x64>
48 89 18 mov %rbx,(%rax)
48 8b 83 40 02 00 00 mov 0x240(%rbx),%rax [*]
c7 40 40 01 00 00 00 movl $0x1,0x40(%rax)
With this patch applied, the instruction at [*] is elided, since the
store to plane->state->plane is known to not alter the value of
plane->state.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rasmus Villemoes [Fri, 20 May 2016 00:10:52 +0000 (17:10 -0700)]
compiler.h: add support for malloc attribute
gcc as far back as at least 3.04 documents the function attribute
__malloc__. Add a shorthand for attaching that to a function
declaration. This was also suggested by Andi Kleen way back in 2002
[1], but didn't get applied, perhaps because gcc at that time generated
the exact same code with and without this attribute.
This attribute tells the compiler that the return value (if non-NULL)
can be assumed not to alias any other valid pointers at the time of the
call.
Please note that the documentation for a range of gcc versions (starting
from around 4.7) contained a somewhat confusing and self-contradicting
text:
The malloc attribute is used to tell the compiler that a function may
be treated as if any non-NULL pointer it returns cannot alias any other
pointer valid when the function returns and *that the memory has
undefined content*. [...] Standard functions with this property include
malloc and *calloc*.
(emphasis mine). The intended meaning has later been clarified [2]:
This tells the compiler that a function is malloc-like, i.e., that the
pointer P returned by the function cannot alias any other pointer valid
when the function returns, and moreover no pointers to valid objects
occur in any storage addressed by P.
What this means is that we can apply the attribute to kmalloc and
friends, and it is ok for the returned memory to have well-defined
contents (__GFP_ZERO). But it is not ok to apply it to kmemdup(), nor
to other functions which both allocate and possibly initialize the
memory with existing pointers. So unless someone is doing something
pretty perverted kstrdup() should also be a fine candidate.
[1] http://thread.gmane.org/gmane.linux.kernel/57172
[2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56955
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joonsoo Kim [Fri, 20 May 2016 00:10:49 +0000 (17:10 -0700)]
mm: rename _count, field of the struct page, to _refcount
Many developers already know that field for reference count of the
struct page is _count and atomic type. They would try to handle it
directly and this could break the purpose of page reference count
tracepoint. To prevent direct _count modification, this patch rename it
to _refcount and add warning message on the code. After that, developer
who need to handle reference count will find that field should not be
accessed directly.
[akpm@linux-foundation.org: fix comments, per Vlastimil]
[akpm@linux-foundation.org: Documentation/vm/transhuge.txt too]
[sfr@canb.auug.org.au: sync ethernet driver changes]
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Sunil Goutham <sgoutham@cavium.com>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Manish Chopra <manish.chopra@qlogic.com>
Cc: Yuval Mintz <yuval.mintz@qlogic.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Cc: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joonsoo Kim [Fri, 20 May 2016 00:10:46 +0000 (17:10 -0700)]
mm/page_ref: use page_ref helper instead of direct modification of _count
page_reference manipulation functions are introduced to track down
reference count change of the page. Use it instead of direct
modification of _count.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Sunil Goutham <sgoutham@cavium.com>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Li Peng [Fri, 20 May 2016 00:10:43 +0000 (17:10 -0700)]
mm/slub.c: fix sysfs filename in comment
/sys/kernel/slab/xx/defrag_ratio should be remote_node_defrag_ratio.
Link: http://lkml.kernel.org/r/1463449242-5366-1-git-send-email-lip@dtdream.com
Signed-off-by: Li Peng <lip@dtdream.com>
Acked-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Yang Shi [Fri, 20 May 2016 00:10:41 +0000 (17:10 -0700)]
mm: slab: remove ZONE_DMA_FLAG
Now we have IS_ENABLED helper to check if a Kconfig option is enabled or
not, so ZONE_DMA_FLAG sounds no longer useful.
And, the use of ZONE_DMA_FLAG in slab looks pointless according to the
comment [1] from Johannes Weiner, so remove them and ORing passed in
flags with the cache gfp flags has been done in kmem_getpages().
[1] https://lkml.org/lkml/2014/9/25/553
Link: http://lkml.kernel.org/r/1462381297-11009-1-git-send-email-yang.shi@linaro.org
Signed-off-by: Yang Shi <yang.shi@linaro.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Thomas Garnier [Fri, 20 May 2016 00:10:37 +0000 (17:10 -0700)]
mm: SLAB freelist randomization
Provides an optional config (CONFIG_SLAB_FREELIST_RANDOM) to randomize
the SLAB freelist. The list is randomized during initialization of a
new set of pages. The order on different freelist sizes is pre-computed
at boot for performance. Each kmem_cache has its own randomized
freelist. Before pre-computed lists are available freelists are
generated dynamically. This security feature reduces the predictability
of the kernel SLAB allocator against heap overflows rendering attacks
much less stable.
For example this attack against SLUB (also applicable against SLAB)
would be affected:
https://jon.oberheide.org/blog/2010/09/10/linux-kernel-can-slub-overflow/
Also, since v4.6 the freelist was moved at the end of the SLAB. It
means a controllable heap is opened to new attacks not yet publicly
discussed. A kernel heap overflow can be transformed to multiple
use-after-free. This feature makes this type of attack harder too.
To generate entropy, we use get_random_bytes_arch because 0 bits of
entropy is available in the boot stage. In the worse case this function
will fallback to the get_random_bytes sub API. We also generate a shift
random number to shift pre-computed freelist for each new set of pages.
The config option name is not specific to the SLAB as this approach will
be extended to other allocators like SLUB.
Performance results highlighted no major changes:
Hackbench (running 90 10 times):
Before average: 0.0698
After average: 0.0663 (-5.01%)
slab_test 1 run on boot. Difference only seen on the 2048 size test
being the worse case scenario covered by freelist randomization. New
slab pages are constantly being created on the 10000 allocations.
Variance should be mainly due to getting new pages every few
allocations.
Before:
Single thread testing
=====================
1. Kmalloc: Repeatedly allocate then free test
10000 times kmalloc(8) -> 99 cycles kfree -> 112 cycles
10000 times kmalloc(16) -> 109 cycles kfree -> 140 cycles
10000 times kmalloc(32) -> 129 cycles kfree -> 137 cycles
10000 times kmalloc(64) -> 141 cycles kfree -> 141 cycles
10000 times kmalloc(128) -> 152 cycles kfree -> 148 cycles
10000 times kmalloc(256) -> 195 cycles kfree -> 167 cycles
10000 times kmalloc(512) -> 257 cycles kfree -> 199 cycles
10000 times kmalloc(1024) -> 393 cycles kfree -> 251 cycles
10000 times kmalloc(2048) -> 649 cycles kfree -> 228 cycles
10000 times kmalloc(4096) -> 806 cycles kfree -> 370 cycles
10000 times kmalloc(8192) -> 814 cycles kfree -> 411 cycles
10000 times kmalloc(16384) -> 892 cycles kfree -> 455 cycles
2. Kmalloc: alloc/free test
10000 times kmalloc(8)/kfree -> 121 cycles
10000 times kmalloc(16)/kfree -> 121 cycles
10000 times kmalloc(32)/kfree -> 121 cycles
10000 times kmalloc(64)/kfree -> 121 cycles
10000 times kmalloc(128)/kfree -> 121 cycles
10000 times kmalloc(256)/kfree -> 119 cycles
10000 times kmalloc(512)/kfree -> 119 cycles
10000 times kmalloc(1024)/kfree -> 119 cycles
10000 times kmalloc(2048)/kfree -> 119 cycles
10000 times kmalloc(4096)/kfree -> 121 cycles
10000 times kmalloc(8192)/kfree -> 119 cycles
10000 times kmalloc(16384)/kfree -> 119 cycles
After:
Single thread testing
=====================
1. Kmalloc: Repeatedly allocate then free test
10000 times kmalloc(8) -> 130 cycles kfree -> 86 cycles
10000 times kmalloc(16) -> 118 cycles kfree -> 86 cycles
10000 times kmalloc(32) -> 121 cycles kfree -> 85 cycles
10000 times kmalloc(64) -> 176 cycles kfree -> 102 cycles
10000 times kmalloc(128) -> 178 cycles kfree -> 100 cycles
10000 times kmalloc(256) -> 205 cycles kfree -> 109 cycles
10000 times kmalloc(512) -> 262 cycles kfree -> 136 cycles
10000 times kmalloc(1024) -> 342 cycles kfree -> 157 cycles
10000 times kmalloc(2048) -> 701 cycles kfree -> 238 cycles
10000 times kmalloc(4096) -> 803 cycles kfree -> 364 cycles
10000 times kmalloc(8192) -> 835 cycles kfree -> 404 cycles
10000 times kmalloc(16384) -> 896 cycles kfree -> 441 cycles
2. Kmalloc: alloc/free test
10000 times kmalloc(8)/kfree -> 121 cycles
10000 times kmalloc(16)/kfree -> 121 cycles
10000 times kmalloc(32)/kfree -> 123 cycles
10000 times kmalloc(64)/kfree -> 142 cycles
10000 times kmalloc(128)/kfree -> 121 cycles
10000 times kmalloc(256)/kfree -> 119 cycles
10000 times kmalloc(512)/kfree -> 119 cycles
10000 times kmalloc(1024)/kfree -> 119 cycles
10000 times kmalloc(2048)/kfree -> 119 cycles
10000 times kmalloc(4096)/kfree -> 119 cycles
10000 times kmalloc(8192)/kfree -> 119 cycles
10000 times kmalloc(16384)/kfree -> 119 cycles
[akpm@linux-foundation.org: propagate gfp_t into cache_random_seq_create()]
Signed-off-by: Thomas Garnier <thgarnie@google.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Greg Thelen <gthelen@google.com>
Cc: Laura Abbott <labbott@fedoraproject.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vladimir Davydov [Fri, 20 May 2016 00:10:34 +0000 (17:10 -0700)]
mm/slub.c: replace kick_all_cpus_sync() with synchronize_sched() in kmem_cache_shrink()
When we call __kmem_cache_shrink on memory cgroup removal, we need to
synchronize kmem_cache->cpu_partial update with put_cpu_partial that
might be running on other cpus. Currently, we achieve that by using
kick_all_cpus_sync, which works as a system wide memory barrier. Though
fast it is, this method has a flaw - it issues a lot of IPIs, which
might hurt high performance or real-time workloads.
To fix this, let's replace kick_all_cpus_sync with synchronize_sched.
Although the latter one may take much longer to finish, it shouldn't be
a problem in this particular case, because memory cgroups are destroyed
asynchronously from a workqueue so that no user visible effects should
be introduced. OTOH, it will save us from excessive IPIs when someone
removes a cgroup.
Anyway, even if using synchronize_sched turns out to take too long, we
can always introduce a kind of __kmem_cache_shrink batching so that this
method would only be called once per one cgroup destruction (not per
each per memcg kmem cache as it is now).
Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Reported-by: Peter Zijlstra <peterz@infradead.org>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joonsoo Kim [Fri, 20 May 2016 00:10:31 +0000 (17:10 -0700)]
mm/slab: lockless decision to grow cache
To check whether free objects exist or not precisely, we need to grab a
lock. But, accuracy isn't that important because race window would be
even small and if there is too much free object, cache reaper would reap
it. So, this patch makes the check for free object exisistence not to
hold a lock. This will reduce lock contention in heavily allocation
case.
Note that until now, n->shared can be freed during the processing by
writing slabinfo, but, with some trick in this patch, we can access it
freely within interrupt disabled period.
Below is the result of concurrent allocation/free in slab allocation
benchmark made by Christoph a long time ago. I make the output simpler.
The number shows cycle count during alloc/free respectively so less is
better.
* Before
Kmalloc N*alloc N*free(32): Average=248/966
Kmalloc N*alloc N*free(64): Average=261/949
Kmalloc N*alloc N*free(128): Average=314/1016
Kmalloc N*alloc N*free(256): Average=741/1061
Kmalloc N*alloc N*free(512): Average=1246/1152
Kmalloc N*alloc N*free(1024): Average=2437/1259
Kmalloc N*alloc N*free(2048): Average=4980/1800
Kmalloc N*alloc N*free(4096): Average=9000/2078
* After
Kmalloc N*alloc N*free(32): Average=344/792
Kmalloc N*alloc N*free(64): Average=347/882
Kmalloc N*alloc N*free(128): Average=390/959
Kmalloc N*alloc N*free(256): Average=393/1067
Kmalloc N*alloc N*free(512): Average=683/1229
Kmalloc N*alloc N*free(1024): Average=1295/1325
Kmalloc N*alloc N*free(2048): Average=2513/1664
Kmalloc N*alloc N*free(4096): Average=4742/2172
It shows that allocation performance decreases for the object size up to
128 and it may be due to extra checks in cache_alloc_refill(). But,
with considering improvement of free performance, net result looks the
same. Result for other size class looks very promising, roughly, 50%
performance improvement.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joonsoo Kim [Fri, 20 May 2016 00:10:29 +0000 (17:10 -0700)]
mm/slab: refill cpu cache through a new slab without holding a node lock
Until now, cache growing makes a free slab on node's slab list and then
we can allocate free objects from it. This necessarily requires to hold
a node lock which is very contended. If we refill cpu cache before
attaching it to node's slab list, we can avoid holding a node lock as
much as possible because this newly allocated slab is only visible to
the current task. This will reduce lock contention.
Below is the result of concurrent allocation/free in slab allocation
benchmark made by Christoph a long time ago. I make the output simpler.
The number shows cycle count during alloc/free respectively so less is
better.
* Before
Kmalloc N*alloc N*free(32): Average=355/750
Kmalloc N*alloc N*free(64): Average=452/812
Kmalloc N*alloc N*free(128): Average=559/1070
Kmalloc N*alloc N*free(256): Average=1176/980
Kmalloc N*alloc N*free(512): Average=1939/1189
Kmalloc N*alloc N*free(1024): Average=3521/1278
Kmalloc N*alloc N*free(2048): Average=7152/1838
Kmalloc N*alloc N*free(4096): Average=13438/2013
* After
Kmalloc N*alloc N*free(32): Average=248/966
Kmalloc N*alloc N*free(64): Average=261/949
Kmalloc N*alloc N*free(128): Average=314/1016
Kmalloc N*alloc N*free(256): Average=741/1061
Kmalloc N*alloc N*free(512): Average=1246/1152
Kmalloc N*alloc N*free(1024): Average=2437/1259
Kmalloc N*alloc N*free(2048): Average=4980/1800
Kmalloc N*alloc N*free(4096): Average=9000/2078
It shows that contention is reduced for all the object sizes and
performance increases by 30 ~ 40%.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joonsoo Kim [Fri, 20 May 2016 00:10:26 +0000 (17:10 -0700)]
mm/slab: separate cache_grow() to two parts
This is a preparation step to implement lockless allocation path when
there is no free objects in kmem_cache.
What we'd like to do here is to refill cpu cache without holding a node
lock. To accomplish this purpose, refill should be done after new slab
allocation but before attaching the slab to the management list. So,
this patch separates cache_grow() to two parts, allocation and attaching
to the list in order to add some code inbetween them in the following
patch.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joonsoo Kim [Fri, 20 May 2016 00:10:23 +0000 (17:10 -0700)]
mm/slab: make cache_grow() handle the page allocated on arbitrary node
Currently, cache_grow() assumes that allocated page's nodeid would be
same with parameter nodeid which is used for allocation request. If we
discard this assumption, we can handle fallback_alloc() case gracefully.
So, this patch makes cache_grow() handle the page allocated on arbitrary
node and clean-up relevant code.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joonsoo Kim [Fri, 20 May 2016 00:10:20 +0000 (17:10 -0700)]
mm/slab: racy access/modify the slab color
Slab color isn't needed to be changed strictly. Because locking for
changing slab color could cause more lock contention so this patch
implements racy access/modify the slab color. This is a preparation
step to implement lockless allocation path when there is no free objects
in the kmem_cache.
Below is the result of concurrent allocation/free in slab allocation
benchmark made by Christoph a long time ago. I make the output simpler.
The number shows cycle count during alloc/free respectively so less is
better.
* Before
Kmalloc N*alloc N*free(32): Average=365/806
Kmalloc N*alloc N*free(64): Average=452/690
Kmalloc N*alloc N*free(128): Average=736/886
Kmalloc N*alloc N*free(256): Average=1167/985
Kmalloc N*alloc N*free(512): Average=2088/1125
Kmalloc N*alloc N*free(1024): Average=4115/1184
Kmalloc N*alloc N*free(2048): Average=8451/1748
Kmalloc N*alloc N*free(4096): Average=16024/2048
* After
Kmalloc N*alloc N*free(32): Average=355/750
Kmalloc N*alloc N*free(64): Average=452/812
Kmalloc N*alloc N*free(128): Average=559/1070
Kmalloc N*alloc N*free(256): Average=1176/980
Kmalloc N*alloc N*free(512): Average=1939/1189
Kmalloc N*alloc N*free(1024): Average=3521/1278
Kmalloc N*alloc N*free(2048): Average=7152/1838
Kmalloc N*alloc N*free(4096): Average=13438/2013
It shows that contention is reduced for object size >= 1024 and
performance increases by roughly 15%.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joonsoo Kim [Fri, 20 May 2016 00:10:17 +0000 (17:10 -0700)]
mm/slab: don't keep free slabs if free_objects exceeds free_limit
Currently, determination to free a slab is done whenever each freed
object is put into the slab. This has a following problem.
Assume free_limit = 10 and nr_free = 9.
Free happens as following sequence and nr_free changes as following.
free(become a free slab) free(not become a free slab) nr_free: 9 -> 10
(at first free) -> 11 (at second free)
If we try to check if we can free current slab or not on each object
free, we can't free any slab in this situation because current slab
isn't a free slab when nr_free exceed free_limit (at second free) even
if there is a free slab.
However, if we check it lastly, we can free 1 free slab.
This problem would cause to keep too much memory in the slab subsystem.
This patch try to fix it by checking number of free object after all
free work is done. If there is free slab at that time, we can free slab
as much as possible so we keep free slab as minimal.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joonsoo Kim [Fri, 20 May 2016 00:10:14 +0000 (17:10 -0700)]
mm/slab: clean-up kmem_cache_node setup
There are mostly same code for setting up kmem_cache_node either in
cpuup_prepare() or alloc_kmem_cache_node(). Factor out and clean-up
them.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Tested-by: Nishanth Menon <nm@ti.com>
Tested-by: Jon Hunter <jonathanh@nvidia.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joonsoo Kim [Fri, 20 May 2016 00:10:11 +0000 (17:10 -0700)]
mm/slab: factor out kmem_cache_node initialization code
It can be reused on other place, so factor out it. Following patch will
use it.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joonsoo Kim [Fri, 20 May 2016 00:10:08 +0000 (17:10 -0700)]
mm/slab: drain the free slab as much as possible
slabs_tofree() implies freeing all free slab. We can do it with just
providing INT_MAX.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joonsoo Kim [Fri, 20 May 2016 00:10:05 +0000 (17:10 -0700)]
mm/slab: remove BAD_ALIEN_MAGIC again
Initial attemp to remove BAD_ALIEN_MAGIC is once reverted by 'commit
edcad2509550 ("Revert "slab: remove BAD_ALIEN_MAGIC"")' because it
causes a problem on m68k which has many node but !CONFIG_NUMA. In this
case, although alien cache isn't used at all but to cope with some
initialization path, garbage value is used and that is BAD_ALIEN_MAGIC.
Now, this patch set use_alien_caches to 0 when !CONFIG_NUMA, there is no
initialization path problem so we don't need BAD_ALIEN_MAGIC at all. So
remove it.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Tested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joonsoo Kim [Fri, 20 May 2016 00:10:02 +0000 (17:10 -0700)]
mm/slab: fix the theoretical race by holding proper lock
While processing concurrent allocation, SLAB could be contended a lot
because it did a lots of work with holding a lock. This patchset try to
reduce the number of critical section to reduce lock contention. Major
changes are lockless decision to allocate more slab and lockless cpu
cache refill from the newly allocated slab.
Below is the result of concurrent allocation/free in slab allocation
benchmark made by Christoph a long time ago. I make the output simpler.
The number shows cycle count during alloc/free respectively so less is
better.
* Before
Kmalloc N*alloc N*free(32): Average=365/806
Kmalloc N*alloc N*free(64): Average=452/690
Kmalloc N*alloc N*free(128): Average=736/886
Kmalloc N*alloc N*free(256): Average=1167/985
Kmalloc N*alloc N*free(512): Average=2088/1125
Kmalloc N*alloc N*free(1024): Average=4115/1184
Kmalloc N*alloc N*free(2048): Average=8451/1748
Kmalloc N*alloc N*free(4096): Average=16024/2048
* After
Kmalloc N*alloc N*free(32): Average=344/792
Kmalloc N*alloc N*free(64): Average=347/882
Kmalloc N*alloc N*free(128): Average=390/959
Kmalloc N*alloc N*free(256): Average=393/1067
Kmalloc N*alloc N*free(512): Average=683/1229
Kmalloc N*alloc N*free(1024): Average=1295/1325
Kmalloc N*alloc N*free(2048): Average=2513/1664
Kmalloc N*alloc N*free(4096): Average=4742/2172
It shows that performance improves greatly (roughly more than 50%) for
the object class whose size is more than 128 bytes.
This patch (of 11):
If we don't hold neither the slab_mutex nor the node lock, node's shared
array cache could be freed and re-populated. If __kmem_cache_shrink()
is called at the same time, it will call drain_array() with n->shared
without holding node lock so problem can happen. This patch fix the
situation by holding the node lock before trying to drain the shared
array.
In addition, add a debug check to confirm that n->shared access race
doesn't exist.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Arnd Bergmann [Fri, 20 May 2016 00:09:59 +0000 (17:09 -0700)]
kernel/padata.c: hide unused functions
A recent cleanup removed some exported functions that were not used
anywhere, which in turn exposed the fact that some other functions in
the same file are only used in some configurations.
We now get a warning about them when CONFIG_HOTPLUG_CPU is disabled:
kernel/padata.c:670:12: error: '__padata_remove_cpu' defined but not used [-Werror=unused-function]
static int __padata_remove_cpu(struct padata_instance *pinst, int cpu)
^~~~~~~~~~~~~~~~~~~
kernel/padata.c:650:12: error: '__padata_add_cpu' defined but not used [-Werror=unused-function]
static int __padata_add_cpu(struct padata_instance *pinst, int cpu)
This rearranges the code so the __padata_remove_cpu/__padata_add_cpu
functions are within the #ifdef that protects the code that calls them.
[akpm@linux-foundation.org: coding-style fixes]
Fixes: 4ba6d78c671e ("kernel/padata.c: removed unused code")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Richard Cochran <rcochran@linutronix.de>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Richard Cochran [Fri, 20 May 2016 00:09:56 +0000 (17:09 -0700)]
kernel/padata.c: removed unused code
By accident I stumbled across code that has never been used. This
driver has EXPORT_SYMBOL functions, and the only user of the code is
pcrypt.c, but this only uses a subset of the exported symbols.
According to 'git log -G', the functions, padata_set_cpumasks,
padata_add_cpu, and padata_remove_cpu have never been used since they
were first introduced. This patch removes the unused code.
On one 64 bit build, with CRYPTO_PCRYPT built in, the text is more than
4k smaller.
kbuild_hp> size $KBUILD_OUTPUT/vmlinux
text data bss dec hex filename
10566658 4678360 1122304 16367322 f9beda vmlinux
10561984 4678360 1122304 16362648 f9ac98 vmlinux
On another config, 32 bit, the saving is about 0.5k bytes.
kbuild_hp-x86> size $KBUILD_OUTPUT/vmlinux
6012005 2409513 2785280 11206798 ab008e vmlinux
6011491 2409513 2785280 11206284 aafe8c vmlinux
Signed-off-by: Richard Cochran <rcochran@linutronix.de>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Guozhonghua [Fri, 20 May 2016 00:09:53 +0000 (17:09 -0700)]
ocfs2: clean up an unneeded goto in ocfs2_put_slot()
The goto is not useful in ocfs2_put_slot(), so delete it.
Signed-off-by: Guozhonghua <guozhonghua@h3c.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <joseph.qi@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jun Piao [Fri, 20 May 2016 00:09:50 +0000 (17:09 -0700)]
ocfs2: clean up unused parameter 'count' in o2hb_read_block_input()
Clean up unused parameter 'count' in o2hb_read_block_input().
Signed-off-by: Jun Piao <piaojun@huawei.com>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
piaojun [Fri, 20 May 2016 00:09:47 +0000 (17:09 -0700)]
ocfs2: clean up an unused variable 'wants_rotate' in ocfs2_truncate_rec
Clean up an unused variable 'wants_rotate' in ocfs2_truncate_rec.
Signed-off-by: Jun Piao <piaojun@huawei.com>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Guozhonghua [Fri, 20 May 2016 00:09:44 +0000 (17:09 -0700)]
ocfs2: fix comment in struct ocfs2_extended_slot
The comment in ocfs2_extended_slot has the offset wrong.
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <joseph.qi@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Du, Changbin [Fri, 20 May 2016 00:09:41 +0000 (17:09 -0700)]
debugobjects: insulate non-fixup logic related to static obj from fixup callbacks
When activating a static object we need make sure that the object is
tracked in the object tracker. If it is a non-static object then the
activation is illegal.
In previous implementation, each subsystem need take care of this in
their fixup callbacks. Actually we can put it into debugobjects core.
Thus we can save duplicated code, and have *pure* fixup callbacks.
To achieve this, a new callback "is_static_object" is introduced to let
the type specific code decide whether a object is static or not. If
yes, we take it into object tracker, otherwise give warning and invoke
fixup callback.
This change has paassed debugobjects selftest, and I also do some test
with all debugobjects supports enabled.
At last, I have a concern about the fixups that can it change the object
which is in incorrect state on fixup? Because the 'addr' may not point
to any valid object if a non-static object is not tracked. Then Change
such object can overwrite someone's memory and cause unexpected
behaviour. For example, the timer_fixup_activate bind timer to function
stub_timer.
Link: http://lkml.kernel.org/r/1462576157-14539-1-git-send-email-changbin.du@intel.com
[changbin.du@intel.com: improve code comments where invoke the new is_static_object callback]
Link: http://lkml.kernel.org/r/1462777431-8171-1-git-send-email-changbin.du@intel.com
Signed-off-by: Du, Changbin <changbin.du@intel.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Josh Triplett <josh@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tejun Heo <tj@kernel.org>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Du, Changbin [Fri, 20 May 2016 00:09:38 +0000 (17:09 -0700)]
Documentation: update debugobjects doc
Update documentation creangponding to change(debugobjects: make fixup
functions return bool instead of int).
Signed-off-by: Du, Changbin <changbin.du@intel.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Josh Triplett <josh@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tejun Heo <tj@kernel.org>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Du, Changbin [Fri, 20 May 2016 00:09:35 +0000 (17:09 -0700)]
percpu_counter: update debugobjects fixup callbacks return type
Update the return type to use bool instead of int, corresponding to
cheange (debugobjects: make fixup functions return bool instead of int).
Signed-off-by: Du, Changbin <changbin.du@intel.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Josh Triplett <josh@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tejun Heo <tj@kernel.org>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Du, Changbin [Fri, 20 May 2016 00:09:32 +0000 (17:09 -0700)]
rcu: update debugobjects fixup callbacks return type
Update the return type to use bool instead of int, corresponding to
cheange (debugobjects: make fixup functions return bool instead of int).
Signed-off-by: Du, Changbin <changbin.du@intel.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Josh Triplett <josh@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tejun Heo <tj@kernel.org>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Du, Changbin [Fri, 20 May 2016 00:09:29 +0000 (17:09 -0700)]
timer: update debugobjects fixup callbacks return type
Update the return type to use bool instead of int, corresponding to
cheange (debugobjects: make fixup functions return bool instead of int).
Signed-off-by: Du, Changbin <changbin.du@intel.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Josh Triplett <josh@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tejun Heo <tj@kernel.org>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Du, Changbin [Fri, 20 May 2016 00:09:26 +0000 (17:09 -0700)]
workqueue: update debugobjects fixup callbacks return type
Update the return type to use bool instead of int, corresponding to
change (debugobjects: make fixup functions return bool instead of int)
Signed-off-by: Du, Changbin <changbin.du@intel.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Josh Triplett <josh@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tejun Heo <tj@kernel.org>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Du, Changbin [Fri, 20 May 2016 00:09:23 +0000 (17:09 -0700)]
debugobjects: correct the usage of fixup call results
If debug_object_fixup() return non-zero when problem has been fixed.
But the code got it backwards, it taks 0 as fixup successfully. So fix
it.
Signed-off-by: Du, Changbin <changbin.du@intel.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Josh Triplett <josh@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tejun Heo <tj@kernel.org>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Du, Changbin [Fri, 20 May 2016 00:09:20 +0000 (17:09 -0700)]
debugobjects: make fixup functions return bool instead of int
I am going to introduce debugobjects infrastructure to USB subsystem.
But before this, I found the code of debugobjects could be improved.
This patchset will make fixup functions return bool type instead of int.
Because fixup only need report success or no. boolean is the 'real'
type.
This patch (of 7):
The object debugging infrastructure core provides some fixup callbacks
for the subsystem who use it. These callbacks are called from the debug
code whenever a problem in debug_object_init is detected. And
debugobjects core suppose them returns 1 when the fixup was successful,
otherwise 0. So the return type is boolean.
A bad thing is that debug_object_fixup use the return value for
arithmetic operation. It confused me that what is the reall return
type.
Reading over the whole code, I found some place do use the return value
incorrectly(see next patch). So why use bool type instead?
Signed-off-by: Du, Changbin <changbin.du@intel.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Josh Triplett <josh@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tejun Heo <tj@kernel.org>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vineet Gupta [Fri, 20 May 2016 00:09:17 +0000 (17:09 -0700)]
scripts/bloat-o-meter: print percent change
This adds an additional line of output (to reduce the chances of
breaking any existing output parsers) which prints the total size before
and after and the relative difference.
add/remove: 39/0 grow/shrink: 12408/55 up/down: 362227/-1430 (360797)
function old new delta
ext4_fill_super 10556 12590 +2034
_fpadd_parts - 1186 +1186
ntfs_fill_super 5340 6164 +824
...
...
__divdf3 752 386 -366
unlzma 3682 3274 -408
Total: Before=
5023101, After=
5383898, chg 7.000000%
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Link: http://lkml.kernel.org/r/1463124110-30314-1-git-send-email-vgupta@synopsys.com
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Michal Marek <mmarek@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kees Cook [Fri, 20 May 2016 00:09:14 +0000 (17:09 -0700)]
scripts/spelling.txt: add "fimware" misspelling
A few instances of "fimware" instead of "firmware" were found. Fix
these and add it to the spelling.txt file.
Signed-off-by: Kees Cook <keescook@chromium.org>
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Konstantin Khlebnikov [Fri, 20 May 2016 00:09:11 +0000 (17:09 -0700)]
scripts/decode_stacktrace.sh: handle symbols in modules
scripts/decode_stacktrace.sh presently displays module symbols as
func+0x0ff/0x5153 [module]
Add a third argument: the pathname of a directory where the script
should look for the file module.ko so that the output appears as
func (foo/bar.c:123) module
Without the argument or if the module file isn't found the script prints
such symbols as is without decoding.
Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Deepa Dinamani [Fri, 20 May 2016 00:09:08 +0000 (17:09 -0700)]
time: remove timespec_add_safe()
All references to timespec_add_safe() now use timespec64_add_safe().
The plan is to replace struct timespec references with struct timespec64
throughout the kernel as timespec is not y2038 safe.
Drop timespec_add_safe() and use timespec64_add_safe() for all
architectures.
Link: http://lkml.kernel.org/r/1461947989-21926-4-git-send-email-deepa.kernel@gmail.com
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Acked-by: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Deepa Dinamani [Fri, 20 May 2016 00:09:05 +0000 (17:09 -0700)]
fs: poll/select/recvmmsg: use timespec64 for timeout events
struct timespec is not y2038 safe. Even though timespec might be
sufficient to represent timeouts, use struct timespec64 here as the plan
is to get rid of all timespec reference in the kernel.
The patch transitions the common functions: poll_select_set_timeout()
and select_estimate_accuracy() to use timespec64. And, all the syscalls
that use these functions are transitioned in the same patch.
The restart block parameters for poll uses monotonic time. Use
timespec64 here as well to assign timeout value. This parameter in the
restart block need not change because this only holds the monotonic
timestamp at which timeout should occur. And, unsigned long data type
should be big enough for this timestamp.
The system call interfaces will be handled in a separate series.
Compat interfaces need not change as timespec64 is an alias to struct
timespec on a 64 bit system.
Link: http://lkml.kernel.org/r/1461947989-21926-3-git-send-email-deepa.kernel@gmail.com
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Acked-by: John Stultz <john.stultz@linaro.org>
Acked-by: David S. Miller <davem@davemloft.net>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Deepa Dinamani [Fri, 20 May 2016 00:09:02 +0000 (17:09 -0700)]
time: add missing implementation for timespec64_add_safe()
timespec64_add_safe() has been defined in time64.h for 64 bit systems.
But, 32 bit systems only have an extern function prototype defined.
Provide a definition for the above function.
The function will be necessary as part of y2038 changes. struct
timespec is not y2038 safe. All references to timespec will be replaced
by struct timespec64. The function is meant to be a replacement for
timespec_add_safe().
The implementation is similar to timespec_add_safe().
Link: http://lkml.kernel.org/r/1461947989-21926-2-git-send-email-deepa.kernel@gmail.com
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Acked-by: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jan Kara [Fri, 20 May 2016 00:08:59 +0000 (17:08 -0700)]
fsnotify: avoid spurious EMFILE errors from inotify_init()
Inotify instance is destroyed when all references to it are dropped.
That not only means that the corresponding file descriptor needs to be
closed but also that all corresponding instance marks are freed (as each
mark holds a reference to the inotify instance). However marks are
freed only after SRCU period ends which can take some time and thus if
user rapidly creates and frees inotify instances, number of existing
inotify instances can exceed max_user_instances limit although from user
point of view there is always at most one existing instance. Thus
inotify_init() returns EMFILE error which is hard to justify from user
point of view. This problem is exposed by LTP inotify06 testcase on
some machines.
We fix the problem by making sure all group marks are properly freed
while destroying inotify instance. We wait for SRCU period to end in
that path anyway since we have to make sure there is no event being
added to the instance while we are tearing down the instance. So it
takes only some plumbing to allow for marks to be destroyed in that path
as well and not from a dedicated work item.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Jan Kara <jack@suse.cz>
Reported-by: Xiaoguang Wang <wangxg.fnst@cn.fujitsu.com>
Tested-by: Xiaoguang Wang <wangxg.fnst@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Thu, 19 May 2016 01:55:19 +0000 (18:55 -0700)]
Merge tag 'trace-v4.7' of git://git./linux/kernel/git/rostedt/linux-trace
Pull tracing updates from Steven Rostedt:
"This includes two new updates for the ftrace infrastructure.
- With the changing of the code for filtering events by pid, from a
list of pids to a bitmask, we can now easily implement following
forks. With a new tracing option "event-fork" which, when set,
will have tasks with pids in set_event_pid, when they fork, to have
their child pids added to set_event_pid and the child will be
traced as well.
Note, if "event-fork" is set and a task with its pid in
set_event_pid exits, its pid will be removed from set_event_pid
- The addition of Tom Zanussi's hist triggers. This includes a very
thorough documentatino on how to use the hist triggers with events.
This introduces a quick and easy way to get histogram data from
events and their fields.
Some other cleanups and updates were added as well. Like Masami
Hiramatsu added test cases for the event trigger and hist triggers.
Also I added a speed up of filtering by using a temp buffer when
filters are set"
* tag 'trace-v4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (45 commits)
tracing: Use temp buffer when filtering events
tracing: Remove TRACE_EVENT_FL_USE_CALL_FILTER logic
tracing: Remove unused function trace_current_buffer_lock_reserve()
tracing: Remove one use of trace_current_buffer_lock_reserve()
tracing: Have trace_buffer_unlock_commit() call the _regs version with NULL
tracing: Remove unused function trace_current_buffer_discard_commit()
tracing: Move trace_buffer_unlock_commit{_regs}() to local header
tracing: Fold filter_check_discard() into its only user
tracing: Make filter_check_discard() local
tracing: Move event_trigger_unlock_commit{_regs}() to local header
tracing: Don't use the address of the buffer array name in copy_from_user
tracing: Handle tracing_map_alloc_elts() error path correctly
tracing: Add check for NULL event field when creating hist field
tracing: checking for NULL instead of IS_ERR()
tracing: Do not inherit event-fork option for instances
tracing: Fix unsigned comparison to zero in hist trigger code
kselftests/ftrace: Add a test for log2 modifier of hist trigger
tracing: Add hist trigger 'log2' modifier
kselftests/ftrace: Add hist trigger testcases
kselftests/ftrace : Add event trigger testcases
...
Linus Torvalds [Thu, 19 May 2016 01:46:55 +0000 (18:46 -0700)]
Merge branch 'stable-4.7' of git://git.infradead.org/users/pcmoore/audit
Pull audit updates from Paul Moore:
"Four small audit patches for 4.7.
Two are simple cleanups around the audit thread management code, one
adds a tty field to AUDIT_LOGIN events, and the final patch makes
tty_name() usable regardless of CONFIG_TTY.
Nothing controversial, and it all passes our regression test"
* 'stable-4.7' of git://git.infradead.org/users/pcmoore/audit:
tty: provide tty_name() even without CONFIG_TTY
audit: add tty field to LOGIN event
audit: we don't need to __set_current_state(TASK_RUNNING)
audit: cleanup prune_tree_thread
Linus Torvalds [Thu, 19 May 2016 00:22:19 +0000 (17:22 -0700)]
Merge tag 'rproc-v4.7' of git://github.com/andersson/remoteproc
Pull remoteproc updates from Bjorn Andersson:
"Introduce a synchronization point between the async firmware loading
and clients requesting the remote processor to boot, as well as
support for remote processors that are not interested in the resource
table information"
* tag 'rproc-v4.7' of git://github.com/andersson/remoteproc:
remoteproc: Add additional crash reasons
remoteproc: core: Make the loaded resource table optional
remoteproc: core: Task sync during rproc_fw_boot()
Linus Torvalds [Thu, 19 May 2016 00:17:20 +0000 (17:17 -0700)]
Merge tag 'rpmsg-v4.7' of git://github.com/andersson/remoteproc
Pull rpmsg updates from Bjorn Andersson:
"Refactor rpmsg module registration to follow other subsystems; by
introduction of module_rpmsg_driver and hiding of THIS_MODULE from
clients"
* tag 'rpmsg-v4.7' of git://github.com/andersson/remoteproc:
rpmsg: use module_rpmsg_driver in existing drivers and examples
rpmsg: add helper macro module_rpmsg_driver
rpmsg: drop owner assignment from rpmsg_drivers
rpmsg: add THIS_MODULE to rpmsg_driver in rpmsg core
Linus Torvalds [Thu, 19 May 2016 00:12:23 +0000 (17:12 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/dtor/input
Pull input updates from Dmitry Torokhov:
"First round of updates for the input subsystem. No new drivers here,
just some driver fixes"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: rotary-encoder - fix bare use of 'unsigned'
Input: cm109 - spin_lock in complete() cleanup
Input: cm109 - fix handling of volume and mute buttons
Input: byd - don't wipe dynamically allocated memory twice
Input: twl4030 - fix unsafe macro definition
Input: twl6040-vibra - remove mutex
Input: bcm_iproc_tsc - DT spelling s/clock-name/clock-names/
Input: bcm_iproc_tsc - use syscon to access shared registers
Input: ti_am335x_tsc - use SIMPLE_DEV_PM_OPS
Input: omap-keypad - remove set_col_gpio_val() and get_row_gpio_val()
Input: omap-keypad - drop empty PM stubs
Input: omap-keypad - remove adjusting of scan delay
Input: gpio-keys - clean up device tree binding example
Input: kbtab - stop saving struct usb_device
Input: gtco - stop saving struct usb_device
Input: aiptek - stop saving struct usb_device
Input: acecad - stop saving struct usb_device
Linus Torvalds [Thu, 19 May 2016 00:03:51 +0000 (17:03 -0700)]
Merge tag 'media/v4.7-1' of git://git./linux/kernel/git/mchehab/linux-media
Pull media updates from Mauro Carvalho Chehab:
- added support for Intersil/Techwell TW686x-based video capture cards
- v4l PCI skeleton driver moved to samples directory
- Documentation cleanups and improvements
- RC: reduced the memory footprint for IR raw events
- tpg: Export the tpg code from vivid as a module
- adv7180: Add device tree binding documentation
- lots of driver improvements and fixes
* tag 'media/v4.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (173 commits)
[media] exynos-gsc: avoid build warning without CONFIG_OF
[media] samples: v4l: from Documentation to samples directory
[media] dib0700: add USB ID for another STK8096-PVR ref design based card
[media] tvp5150: propagate I2C write error in .s_register callback
[media] tvp5150: return I2C write operation failure to callers
[media] em28xx: add support for Hauppauge WinTV-dualHD DVB tuner
[media] em28xx: add missing USB IDs
[media] update cx23885 and em28xx cardlists
[media] media: au0828 fix au0828_v4l2_device_register() to not unlock and free
[media] c8sectpfe: Rework firmware loading mechanism
[media] c8sectpfe: Demote print to dev_dbg
[media] c8sectpfe: Fix broken circular buffer wp management
[media] media-device: Simplify compat32 logic
[media] media: i2c: ths7303: remove redundant assignment on bt
[media] dvb-usb: hide unused functions
[media] xilinx-vipp: remove unnecessary of_node_put
[media] drivers/media/media-devnode: clear private_data before put_device()
[media] drivers/media/media-device: move debug log before _devnode_unregister()
[media] drivers/media/rc: postpone kfree(rc_dev)
[media] media/dvb-core: forward media_create_pad_links() return value
...
Linus Torvalds [Wed, 18 May 2016 23:38:59 +0000 (16:38 -0700)]
Merge tag 'scsi-misc' of git://git./linux/kernel/git/jejb/scsi
Pull SCSI updates from James Bottomley:
"First round of SCSI updates for the 4.6+ merge window.
This batch includes the usual quota of driver updates (bnx2fc, mp3sas,
hpsa, ncr5380, lpfc, hisi_sas, snic, aacraid, megaraid_sas). There's
also a multiqueue update for scsi_debug, assorted bug fixes and a few
other minor updates (refactor of scsi_sg_pools into generic code, alua
and VPD updates, and struct timeval conversions)"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (138 commits)
mpt3sas: Used "synchronize_irq()"API to synchronize timed-out IO & TMs
mpt3sas: Set maximum transfer length per IO to 4MB for VDs
mpt3sas: Updating mpt3sas driver version to 13.100.00.00
mpt3sas: Fix initial Reference tag field for 4K PI drives.
mpt3sas: Handle active cable exception event
mpt3sas: Update MPI header to 2.00.42
Revert "lpfc: Delete unnecessary checks before the function call mempool_destroy"
eata_pio: missing break statement
hpsa: Fix type ZBC conditional checks
scsi_lib: Decode T10 vendor IDs
scsi_dh_alua: do not fail for unknown VPD identification
scsi_debug: use locally assigned naa
scsi_debug: uuid for lu name
scsi_debug: vpd and mode page work
scsi_debug: add multiple queue support
bfa: fix bfa_fcb_itnim_alloc() error handling
megaraid_sas: Downgrade two success messages to info
cxlflash: Fix to resolve dead-lock during EEH recovery
scsi_debug: rework resp_report_luns
scsi_debug: use pdt constants
...
Linus Torvalds [Wed, 18 May 2016 22:30:04 +0000 (15:30 -0700)]
Merge branch 'stable/for-linus-4.7' of git://git./linux/kernel/git/konrad/ibft
Pull iscsi_ibft updates from Konrad Rzeszutek Wilk:
"The pull has two features - both of them expand the SysFS entries:
- 'prefix-len' - which is subnet_mask_prefix of the iBFT header.
- 'acpi_header' dir with: 'iBFT', OEM-ID (whatever it extracts from
the iBFT header) and OEM_TABLE_ID (also whatever it extracts from
the iBFT header). This is to help NIC drivers to figure out during
bootup how to deal with BIOS created iBFT tables (like by TianoCore
UEFI implemenation)"
* 'stable/for-linus-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/ibft:
ibft: Expose iBFT acpi header via sysfs
iscsi_ibft: Add prefix-len attr and display netmask
Linus Torvalds [Wed, 18 May 2016 20:14:02 +0000 (13:14 -0700)]
Merge tag 'armsoc-drivers' of git://git./linux/kernel/git/arm/arm-soc
Pull ARM SoC driver updates from Arnd Bergmann:
"Driver updates for ARM SoCs, these contain various things that touch
the drivers/ directory but got merged through arm-soc for practical
reasons.
For the most part, this is now related to power management
controllers, which have not yet been abstracted into a separate
subsystem, and typically require some code in drivers/soc or arch/arm
to control the power domains.
Another large chunk here is a rework of the NVIDIA Tegra USB3.0
support, which was surprisingly tricky and took a long time to get
done.
Finally, reset controller handling as always gets merged through here
as well"
* tag 'armsoc-drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (97 commits)
arm-ccn: Enable building as module
soc/tegra: pmc: Add generic PM domain support
usb: xhci: tegra: Add Tegra210 support
usb: xhci: Add NVIDIA Tegra XUSB controller driver
dt-bindings: usb: xhci-tegra: Add Tegra210 XUSB controller support
dt-bindings: usb: Add NVIDIA Tegra XUSB controller binding
PCI: tegra: Support per-lane PHYs
dt-bindings: pci: tegra: Update for per-lane PHYs
phy: tegra: Add Tegra210 support
phy: Add Tegra XUSB pad controller support
dt-bindings: phy: tegra-xusb-padctl: Add Tegra210 support
dt-bindings: phy: Add NVIDIA Tegra XUSB pad controller binding
phy: core: Allow children node to be overridden
clk: tegra: Add interface to enable hardware control of SATA/XUSB PLLs
drivers: firmware: psci: make two helper functions inline
soc: renesas: rcar-sysc: Add support for R-Car H3 power areas
soc: renesas: rcar-sysc: Add support for R-Car E2 power areas
soc: renesas: rcar-sysc: Add support for R-Car M2-N power areas
soc: renesas: rcar-sysc: Add support for R-Car M2-W power areas
soc: renesas: rcar-sysc: Add support for R-Car H2 power areas
...
Linus Torvalds [Wed, 18 May 2016 20:07:57 +0000 (13:07 -0700)]
Merge tag 'armsoc-defconfig' of git://git./linux/kernel/git/arm/arm-soc
Pull ARM SoC defconfig updates from Arnd Bergmann:
"As usual, a bunch of commits, mostly adding drivers and other options
to defconfigs.
We are adding three new defconfig files for the newly added 32-bit
machines (aspeed and mps2), the rest is mainly housekeeping.
The changes outside of arch/arm/config/ are for a Kconfig symbol that
got renamed"
* tag 'armsoc-defconfig' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (63 commits)
ARM: aspeed: adapt defconfigs for new CONFIG_PRINTK_TIME
ARM: u8500_defconfig: update sensor config
ARM: u8500_defconfig: remove staging from defconfig
ARM: multi_v7_defconfig: Remove unused Kconfig option MACH_UX500_DT
ARM: at91/defconfig: sama5: add CONFIG_FHANDLE
arm/configs: Add Aspeed defconfig
arm/configs/multi_v5: Add Aspeed ast2400
ARM: at91: sama5: Update defconfig
ARM: imx_v6_v7_defconfig: add CONFIG_MICREL_PHY
ARM: imx_v6_v7_defconfig: add CONFIG_I2C_GPIO
ARM: multi_v7: Enable Tegra XUSB controller in defconfig
ARM: tegra: Enable XUSB controller in defconfig
ARM: omap2plus_defconfig: Enable PWM and ir-rx51 as loadable modules
ARM: multi_v7_defconfig: add the Atmel sama5d2-compatible ADC driver
ARM: multi_v7_defconfig: add the Atmel Audio microphone interface PDMIC
ARM: multi_v7_defconfig: add Atmel ISI (Image Sensor Interface) driver
ARM: multi_v7_defconfig: add Atmel watchdog timers
ARM: multi_v7_defconfig: add HLCDC drivers as modules
ARM: at91/defconfig: add PDMIC driver to sama5_defconfig
ARM: at91/defconfig: add HLCDC driver to sama5_defconfig
...
Linus Torvalds [Wed, 18 May 2016 19:58:39 +0000 (12:58 -0700)]
Merge tag 'armsoc-dt64' of git://git./linux/kernel/git/arm/arm-soc
Pull ARM 64-bit DT updates from Arnd Bergmann:
"We continue ramping up platform support for 64-bit ARM machines, with
111 individual non-merge changesets touching 21 platforms.
The LG1312 platform is completely new and is the first ARM platform by
LG that we support in the mainline kernel. Two other SoCs got added
that are updated versions of existing SoC families, so the port mainly
consists of new dts files:
- The Hisilicon Hip06/D03 is the latest server platform from
Huawei/Hisilicon, and follows the Hip05/D02 platform.
- Rockchip RK3399 follows the 32-bit RK3288 that is popular in
low-end Chromebooks and the 64-bit RK3368 that is mainly found in
chinese Android TV boxes.
The 96Boards HiKey based on the Hisilicon Hi6220 (Kirin 620) gets a
long-awaited overhaul with a lot of devices enabled in the DT, so it
should be much more usable with a mainline kernel now. See also
https://plus.google.com/
111524780435806926688/posts/PeGb2VsNhJd
A lot of work went into enabling new device drivers on existing
machines, but we also have a couple of new commercially available
machines:
- Google Pixel C laptop based on Tegra210
- Hardkernel Odroid C2 Based on Amlogic Meson GXBB (S905)
- Geekbuying GeekBox based on Rockchip RK3368
And finally, a couple of reference or development platforms that are
not end-user platforms but are used for trying out the respective SoC
platforms:
- Amlogic Meson GXBB P200 and P201 development systems
- NXP Layerscape 1043A QDS development board
- Hisilicon Hip06 D03 server board, as mentioned above
- LG1312 Reference Design
- RK3399 Evaluation Board"
* tag 'armsoc-dt64' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (104 commits)
arm64: dts: marvell: add XOR node for Armada 3700 SoC
dt-bindings: document rockchip rk3399-evb board
arm64: dts: rockchip: add dts file for RK3399 evaluation board
arm64: dts: rockchip: add core dtsi file for RK3399 SoCs
dt-bindings: rockchip-dw-mshc: add description for rk3399
arm64: dts: marvell: Use a SoC-specific compatible for xHCI on Armada37xx
arm64: dts: marvell: Rename armada-37xx USB node
arm64: dts: marvell: Clean up armada-3720-db
Documentation: arm64: Add Hisilicon Hip06 D03 dts binding
arm64: dts: Add initial dts for Hisilicon Hip06 D03 board
arm64: dts: hip05: Add nor flash support
arm64: dts: hip05: fix its node without msi-cells
arm64: dts: r8a7795: Don't disable referenced optional clocks
arm64: dts: salvator-x: populate EXTALR
arm64: dts: r8a7795: enable PCIe on Salvator-X
arm64: dts: r8a7795: Add PCIe nodes
arm64: tegra: Add IOMMU node to GM20B on Tegra210
arm64: tegra: Add reference clock to GM20B on Tegra210
dt-bindings: Add documentation for GM20B GPU
dt-bindings: gk20a: Document iommus property
...
Linus Torvalds [Wed, 18 May 2016 19:48:46 +0000 (12:48 -0700)]
Merge tag 'armsoc-dt' of git://git./linux/kernel/git/arm/arm-soc
Pull ARM DT updates from Arnd Bergmann:
"These are all the updates to device tree files for 32-bit platforms,
which as usual makes up the bulk of the ARM SoC changes: 462 non-merge
changesets, 450 files changed, 23340 insertions, 5216 deletions.
The three platforms that are added with the "soc" branch are here as
well, and we add some related machine files:
- For Aspeed AST2400/AST2500, we get the evaluation platform and the
Tyan Palmetto POWER8 mainboard that uses the AST2400 BMC
- For Oxnas 810SE, the Western Digital "My Book World Edition" is
added as the only platform at the moment.
- For ARM MPS2, the AN385 (Cortex-M3) and AN399 (Cortex-M7) are
supported
On the ARM Realview development platform, we now support all machines
with device tree, previously only the board files were supported,
which in turn will likely be removed soon.
Qualcomm IPQ4019 is the second generation ARM based "Internet
Processor", following the IPQ806x that is used in many high-end WiFi
routers. This one integrates two ath10k wifi radios that were
previously on separate chips.
Other boards that got added for existing chips are:
Ti OMAP family:
- Amazon Kindle Fire, first generation, tablet and ebook reader
- OnRISC Baltos iR 2110 and 3220 embedded industrial PCs
- TI AM5728 IDK, TI AM3359 ICE-V2, and TI DRA722 Rev C EVM
development systems
Samsung EXYNOS platform:
- Samsung ARTIK5 evaluation board, see
https://www.artik.io/modules/overview/artik-5/
NXP i.MX platforms:
- Ka-Ro electronics TX6S-8034, TX6S-8035, TX6U-8033, TX6U-81xx,
TX6Q-1036, TX6Q-1110/-1130, TXUL-0010 and TXUL-0011 industrial
SoM modules
- Embest MarS Board i.MX6Dual DIY platform
- Boundary Devices i.MX6 Quad Plus Nitrogen6_MAX and SoloX
Nitrogen6sx embedded boards
- Technexion Pico i.MX6UL compute module
- ZII VF610 Development Board
Marvell embedded (mvebu, orion, kirkwood) platforms:
- Linksys Viper (E4200v2 / EA4500) WiFi router
- Buffalo Kurobox Pro NAS
Qualcomm Snapdragon:
- Arrow DragonBoard 600c (96boards) with APQ8064 Snapdragon 600
Rockchips platform:
- mqmaker MiQi single-board computer
Altera SoCFPGA:
- samtec VIN|ING 1000 vehicle communication interface
Allwinner Sunxi platforms:
- Dserve DSRV9703C tablet
- Difrnce DIT4350 tablet
- Colorfly E708 Q1 tablet
- Polaroid MID2809PXE04 tablet
- Olimex A20 OLinuXino LIME2 single board computer
- Xunlong Orange Pi 2, Orange Pi One, and Orange Pi PC single board
computers
Across many platforms, bug fixes went in to address warnings that dtc
now emits with 'make dtbs W=1'. Further changes for device enablement
went into Ti OMAP, bcm283x (Raspberry Pi), bcm47xx (wifi router), Ti
Davinci, Samsung EXYNOS, Marvell mvebu/kirkwood/orion, NXP i.MX/Vybrid
NXP LPC18xx, NXP LPC32xx, Renesas shmobile/r-mobile/r-car, Rockchips
rk3xxx, ST Ux500, ST STi, Atmel AT91/SAMA5, Altera SoCFPGA, Allwinner
Sunxi, Sigma Designs Tango, NVIDIA Tegra, Socionext Uniphier and ARM
Versatile Express"
* tag 'armsoc-dt' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (458 commits)
ARM: dts: tango4: Import watchdog node
ARM: dts: tango4: Update cpus node for cpufreq
ARM: dts: tango4: Update DT to match clk driver
ARM: dts: tango4: Initial thermal support
arm/dst: Add Aspeed ast2500 device tree
arm/dts: Add Aspeed ast2400 device tree
ARM: sun7i: dt: Add pll3 and pll7 clocks
ARM: dts: sunxi: Add a olinuxino-lime2-emmc
ARM: dts: at91: sama5d4: add trng node
ARM: dts: at91: sama5d3: add trng node
ARM: dts: at91: sama5d2: add trng node
ARM: dts: at91: at91sam9g45 family: reduce the trng register map size
ARM: sun4i: dt: Add pll3 and pll7 clocks
ARM: sun5i: chip: Enable the TV Encoder
ARM: sun5i: r8: Add display blocks to the DTSI
ARM: sun5i: a13: Add display and TCON clocks
ARM: dts: ux500: configure the accelerometers open drain
ARM: mx5: dts: Enable USB OTG on M53EVK
ARM: dts: imx6ul-14x14-evk: Add audio support
ARM: dts: imx6qdl: Remove unneeded unit-addresses
...
Linus Torvalds [Wed, 18 May 2016 19:43:08 +0000 (12:43 -0700)]
Merge tag 'armsoc-arm64' of git://git./linux/kernel/git/arm/arm-soc
Pull ARM SoC 64-bit changes from Arnd Bergmann:
"One new platform gets added this time: The Cortex-A53 based LG
Electronics LG1K platform used in digital TVs.
The other changes are mostly smaller updates to the defconfig files,
to enable additional platform specific drivers, as they get merged
through the subsystem trees"
* tag 'armsoc-arm64' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
arm64: configs: add options useful for Armada 7K/8K support
arm64: defconfig: Add Juno SATA controller
arm64: defconfig: enable freescale/nxp config options
arm64: defconfig: enable 48-bit virtual addresses
arm64: defconfig: cleanup the defconfig
MAINTAINERS: update entry for Marvell ARM platform maintainers
arm64: marvell: enable AP806 and CP110 syscon driver
arm64: Kconfig: select sp804 timer for ARCH_HISI
arm64: defconfig: enable configs for WLAN and TI WL1835 as modules
arm64: defconfig: enable several common USB network adapters
arm64: defconfig: add CONFIG_SPI_SPIDEV as module
arm64: defconfig: Enable the PMIC and regulator for Hi6220 and 96boards HiKey
arm64: defconfig: Add Renesas R-Car USB 3.0 driver support
MAINTAINERS: add Chanho Min as ARM/LG1K maintainer
arm64: defconfig: enable ARCH_LG1K
arm64: add Kconfig entry for LG1K SoC family
arm64: defconfig: Enable PL330 DMA controller
arm64: defconfig: enable basic boot for Amlogic meson
Linus Torvalds [Wed, 18 May 2016 19:35:46 +0000 (12:35 -0700)]
Merge tag 'armsoc-soc' of git://git./linux/kernel/git/arm/arm-soc
Pull ARM SoC platform updates from Arnd Bergmann:
"We get support for three new 32-bit SoC platforms this time.
The amount of changes in arch/arm for any of them is miniscule, as all
the interesting code is in device driver subsystems (irqchip, clk,
pinctrl, ...) these days. I'm listing them here, as the addition of
the Kconfig statement is the main relevant milestone for a new
platform. In each case, some drivers are are shared with existing
platforms, while other drivers are added for v4.7 as well, or come in
a later release.
- The Aspeed platform is probably the most interesting one, this is
what most whitebox servers use as their baseboard management
controller. We get support for the very common ast2400 and ast2500
SoCs. The OpenBMC project focuses on this chip, and the LWN
article about their ELC 2016 presentation at
https://lwn.net/Articles/683320/
triggered the submission, but the code comes from IBM's OpenPOWER
team rather than the team at Facebook. There are still a lot more
drivers that need to get added over time, and I hope both teams can
work together on that.
- OXNAS is an old platform for Network Attached Storage devices from
Oxford Semiconductor. There are models with ARM10 (!) and
ARM11MPCore cores, but for now, we only support the original ARM9
based versions. The product lineup was subsequently part of PLX,
Avago and now the new Broadcom Ltd.
https://wiki.openwrt.org/doc/hardware/soc/soc.oxnas
has some more information.
- V2M-MPS2 is a prototyping platform from ARM for their Cortex-M
cores and is related to the existing Realview / Versatile Express
lineup, but without MMU.
We now support various NOMMU platforms, so adding a new one is
fairly straightforward.
http://infocenter.arm.com/help/topic/com.arm.doc.100112_0100_03_en/
has detailed information about the platform.
Other noteworthy updates:
- Work on LPC32xx has resumed, and Vladimir Zapolskiy and Sylvain
Lemieux are now maintaining the platform.
This is an older ARM9 based platform from NXP (not Freescale), but
it remains in use in embedded markets.
- Kevin Hilman is now co-maintaining the Amlogic Meson platform for
both 32-bit and 64-bit ARM, and started contributing some patches.
- As is often the case, work on the OMAP platforms makes up the bulk
of the actual SoC code changes in arch/arm, but there isn't a lot
of that either"
* tag 'armsoc-soc' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (42 commits)
MAINTAINERS: ARM/Amlogic: add co-maintainer, misc. updates
MAINTAINERS: add ARM/NXP LPC32XX SoC specific drivers to the section
MAINTAINERS: add new maintainers of NXP LPC32xx SoC
MAINTAINERS: move ARM/NXP LPC32xx record to ARM section
arm: Add Aspeed machine
ARM: lpc32xx: remove duplicate const on lpc32xx_auxdata_lookup
ARM: lpc32xx: remove leftovers of legacy clock source and provider drivers
ARM: lpc32xx: remove reboot header file
ARM: dove: Remove CLK_IS_ROOT
ARM: orion5x: Remove CLK_IS_ROOT
ARM: mv78xx0: Remove CLK_IS_ROOT
ARM: davinci: da850: use clk->set_parent for async3
ARM: davinci: Move clock init after ioremap.
MAINTAINERS: Update ARM Versatile Express platform entry
ARM: vexpress/mps2: introduce MPS2 platform
MAINTAINERS: add maintainer entry for ARM/OXNAS platform
ARM: Add new mach-oxnas
irqchip: versatile-fpga: add new compatible for OX810SE SoC
ARM: uniphier: correct the call order of of_node_put()
MAINTAINERS: fix stale TI DaVinci entries
...
Linus Torvalds [Wed, 18 May 2016 19:28:29 +0000 (12:28 -0700)]
Merge tag 'armsoc-cleanups-fixes' of git://git./linux/kernel/git/arm/arm-soc
Pull ARM SoC cleanups and fixes from Arnd Bergmann:
"Traditionally we've had two separate branches for cleanups and
non-critical bug fixes, but both of these got smaller with each
release and the differences are rather unclear now, so it seems more
appropriate to have a combined branch.
The most notable change is for OMAP, which gets a small rework to
simplify handling of the AUXDATA mechanism used on machines that are
not completely DT based yet, along with other work that is used as
preparation for dropping the legacy board files"
* tag 'armsoc-cleanups-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
ARM: dts: exynos: Add interrupt line to MAX8997 PMIC on exynos4210-trats
ARM: dts: exynos: Fix regulator name to avoid forbidden character on exynos4210-trats
ARM: dts: exynos: Add MFC memory banks for Peach boards
ARM: OMAP2+: n900 needs MMC slot names for legacy user space
ARM: OMAP2+: Add more functions to pwm pdata for ir-rx51
ARM: debug: remove extraneous DEBUG_HI3716_UART option
ARM: OMAP2+: Simplify auxdata by using the generic match
of/platform: Allow secondary compatible match in of_dev_lookup
ARM: davinci: use IRQCHIP_DECLARE for cp_intc
ARM: davinci: remove unused DA8XX_NUM_UARTS
ARM: davinci: simplify call to of populate
ARM: DaVinci USB: removed deprecated properties from MUSB config
ARM: rockchip: Fix use of plain integer as NULL pointer
ARM: realview: hide unused 'pmu_device' object
soc: versatile: dynamically detect RealView HBI numbers
Linus Torvalds [Wed, 18 May 2016 19:17:16 +0000 (12:17 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/s390/linux
Pull s390 updates from Martin Schwidefsky:
"The s390 patches for the 4.7 merge window have the usual bug fixes and
cleanups, and the following new features:
- An interface for dasd driver to query if a volume is online to
another operating system
- A new ioctl for the dasd driver to verify the format for a range of
tracks
- Following the example of x86 the struct fpu is now allocated with
the task_struct
- The 'report_error' interface for the PCI bus to send an
adapter-error notification from user space to the service element
of the machine"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (29 commits)
s390/vmem: remove unused function parameter
s390/vmem: fix identity mapping
s390: add missing include statements
s390: add missing declarations
s390: make couple of variables and functions static
s390/cache: remove superfluous locking
s390/cpuinfo: simplify locking and skip offline cpus early
s390/3270: hangup the 3270 tty after a disconnect
s390/3270: handle reconnect of a tty with a different size
s390/3270: avoid endless I/O loop with disconnected 3270 terminals
s390/3270: fix garbled output on 3270 tty view
s390/3270: fix view reference counting
s390/3270: add missing tty_kref_put
s390/dumpstack: implement and use return_address()
s390/cpum_sf: Remove superfluous SMP function call
s390/cpum_cf: Remove superfluous SMP function call
s390/Kconfig: make z196 the default processor type
s390/sclp: avoid compile warning in sclp_pci_report
s390/fpu: allocate 'struct fpu' with the task_struct
s390/crypto: cleanup and move the header with the cpacf definitions
...
Linus Torvalds [Wed, 18 May 2016 18:51:25 +0000 (11:51 -0700)]
iwlwifi: fix mis-merge that breaks the driver
My laptop that uses the intel 7680 iwlwifi module would no longer
connects to the network. It would fail with a "Microcode SW error
detected." and spew out register state over and over again without ever
connecting to the network.
The cause is mis-merge in commit
909b27f70643, where David seems to have
lost some of the changes to iwl_mvm_set_tx_cmd() from commit
5c08b0f5026f ("iwlwifi: mvm: don't override the rate with the AMSDU
len").
The reason seems to be a conflict with commit
d8fe484470dd ("iwlwifi:
mvm: add support for new TX CMD API"), which touched a line adjacent to
the changes in
909b27f70643.
David missed the fact that "info->driver_data[0]" had become
"skb_info->driver_data[0]". Then he removed the skb_info because it was
unused.
This just re-updates iwl_mvm_set_tx_cmd() with the lost two lines.
Reported-and-tested-by: Linus Torvalds <torvalds@linux-foundation.org>
Reported-by: Reinoud Koornstra <reinoudkoornstra@gmail.com>
Cc: Luciano Coelho <luciano.coelho@intel.com>
Cc: Kalle Valo <kvalo@codeaurora.org>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Wed, 18 May 2016 18:51:59 +0000 (11:51 -0700)]
Merge branch 'work.misc' of git://git./linux/kernel/git/viro/vfs
Pull misc vfs cleanups from Al Viro:
"Assorted cleanups and fixes all over the place"
* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
coredump: only charge written data against RLIMIT_CORE
coredump: get rid of coredump_params->written
ecryptfs_lookup(): try either only encrypted or plaintext name
ecryptfs: avoid multiple aliases for directories
bpf: reject invalid names right in ->lookup()
__d_alloc(): treat NULL name as QSTR("/", 1)
mtd: switch ubi_open_volume_path() to vfs_stat()
mtd: switch open_mtd_by_chdev() to use of vfs_stat()
Linus Torvalds [Wed, 18 May 2016 18:46:23 +0000 (11:46 -0700)]
Merge branch 'work.iov_iter' of git://git./linux/kernel/git/viro/vfs
Pull iov_iter cleanups from Al Viro.
* 'work.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fold checks into iterate_and_advance()
rw_verify_area(): saner calling conventions
aio: remove a pointless assignment
Linus Torvalds [Wed, 18 May 2016 17:28:45 +0000 (10:28 -0700)]
Merge branch 'work.lookups' of git://git./linux/kernel/git/viro/vfs
Pull parallel lookup fixups from Al Viro:
"Fix for xfs parallel readdir (turns out the cxfs exposure was not
enough to catch all problems), and a reversion of btrfs back to
->iterate() until the fs/btrfs/delayed-inode.c gets fixed"
* 'work.lookups' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
xfs: concurrent readdir hangs on data buffer locks
Revert "btrfs: switch to ->iterate_shared()"
Dave Chinner [Wed, 18 May 2016 14:17:26 +0000 (00:17 +1000)]
xfs: concurrent readdir hangs on data buffer locks
There's a three-process deadlock involving shared/exclusive barriers
and inverted lock orders in the directory readdir implementation.
It's a pre-existing problem with lock ordering, exposed by the
VFS parallelisation code.
process 1 process 2 process 3
--------- --------- ---------
readdir
iolock(shared)
get_leaf_dents
iterate entries
ilock(shared)
map, lock and read buffer
iunlock(shared)
process entries in buffer
.....
readdir
iolock(shared)
get_leaf_dents
iterate entries
ilock(shared)
map, lock buffer
<blocks>
finish ->iterate_shared
file_accessed()
->update_time
start transaction
ilock(excl)
<blocks>
.....
finishes processing buffer
get next buffer
ilock(shared)
<blocks>
And that's the deadlock.
Fix this by dropping the current buffer lock in process 1 before
trying to map the next buffer. This means we keep the lock order of
ilock -> buffer lock intact and hence will allow process 3 to make
progress and drop it's ilock(shared) once it is done.
Reported-by: Xiong Zhou <xzhou@redhat.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Wed, 18 May 2016 17:15:05 +0000 (13:15 -0400)]
Revert "btrfs: switch to ->iterate_shared()"
This reverts commit
972b241f8441dc37a3f89dcd7e71d7f013873d13.
Quoth Chris:
didn't take the delayed inode stuff into account
it got an rbtree of items and it pulls things out
so in shared mode, its hugely racey
sorry, lets revert and fix it for real inside of btrfs
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Linus Torvalds [Wed, 18 May 2016 17:17:56 +0000 (10:17 -0700)]
Merge branch 'sendmsg.cifs' of git://git./linux/kernel/git/viro/vfs
Pull cifs iovec cleanups from Al Viro.
* 'sendmsg.cifs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
cifs: don't bother with kmap on read_pages side
cifs_readv_receive: use cifs_read_from_socket()
cifs: no need to wank with copying and advancing iovec on recvmsg side either
cifs: quit playing games with draining iovecs
cifs: merge the hash calculation helpers
Linus Torvalds [Wed, 18 May 2016 17:08:45 +0000 (10:08 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/viro/vfs
Pull remaining vfs xattr work from Al Viro:
"The rest of work.xattr (non-cifs conversions)"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
btrfs: Switch to generic xattr handlers
ubifs: Switch to generic xattr handlers
jfs: Switch to generic xattr handlers
jfs: Clean up xattr name mapping
gfs2: Switch to generic xattr handlers
ceph: kill __ceph_removexattr()
ceph: Switch to generic xattr handlers
ceph: Get rid of d_find_alias in ceph_set_acl
Linus Torvalds [Wed, 18 May 2016 17:01:47 +0000 (10:01 -0700)]
Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs updates from Steve French:
"Various small CIFS and SMB3 fixes (including some for stable)"
* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
remove directory incorrectly tries to set delete on close on non-empty directories
Update cifs.ko version to 2.09
fs/cifs: correctly to anonymous authentication for the NTLM(v2) authentication
fs/cifs: correctly to anonymous authentication for the NTLM(v1) authentication
fs/cifs: correctly to anonymous authentication for the LANMAN authentication
fs/cifs: correctly to anonymous authentication via NTLMSSP
cifs: remove any preceding delimiter from prefix_path
cifs: Use file_dentry()
James Bottomley [Wed, 18 May 2016 01:12:50 +0000 (21:12 -0400)]
Merge branch 'fixes' into misc
Linus Torvalds [Wed, 18 May 2016 01:00:39 +0000 (18:00 -0700)]
Merge branch 'for-4.7' of git://git./linux/kernel/git/tj/libata
Pull libata updates from Tejun Heo:
"Trivial changes except for special case timeout bumping.
I have two more libata branches which depend on SCSI and dmaengine
tree respectively. I'll send pull requests for them once the
prerequisite trees are pulled in"
* 'for-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
libata-scsi: use %*ph to dump small buffers
treewide: Fix typos in libata.xml
libata-core: Allow longer timeout for drive spinup from PUIS
libata: Fixup awkward whitespace in warning by removing line continuation.
Linus Torvalds [Wed, 18 May 2016 00:47:31 +0000 (17:47 -0700)]
Merge tag 'regulator-fix-can-change-voltage' of git://git./linux/kernel/git/broonie/regulator
Pull regulator fix from Mark Brown:
"Fix build warnings from regulator_can_change_voltage()
Cut down on noise for mainstream users of the API and people
doing build testing by dropping the deprecated flag from
regulator_can_change_voltage() as it triggers even on the
EXPORT_SYMBOL_GPL() which affects all builds rather than just
the remaining drivers with calls to it (for which fixes are
currently pending).
The function remains deprecated and is expected to be removed
entirely in v4.8"
* tag 'regulator-fix-can-change-voltage' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: Silence build warnings from regulator_can_change_voltage()
Linus Torvalds [Wed, 18 May 2016 00:39:42 +0000 (17:39 -0700)]
Merge tag 'gpio-v4.7-1' of git://git./linux/kernel/git/linusw/linux-gpio
Pull GPIO updates from Linus Walleij:
"This is the bulk of GPIO changes for kernel cycle v4.7:
Core infrastructural changes:
- Support for natively single-ended GPIO driver stages.
This means that if the hardware has registers to configure open
drain or open source configuration, we use that rather than (as we
did before) try to emulate it by switching the line to an input to
get high impedance.
This is also documented throughly in Documentation/gpio/driver.txt
for those of you who did not understand one word of what I just
wrote.
- Start to do away with the unnecessarily complex and unitelligible
ARCH_REQUIRE_GPIOLIB and ARCH_WANT_OPTIONAL_GPIOLIB, another
evolutional artifact from the time when the GPIO subsystem was
unmaintained.
Archs can now just select GPIOLIB and be done with it, cleanups to
arches will trickle in for the next kernel. Some minor archs ACKed
the changes immediately so these are included in this pull request.
- Advancing the use of the data pointer inside the GPIO device for
storing driver data by switching the PowerPC, Super-H Unicore and
a few other subarches or subsystem drivers in ALSA SoC, Input,
serial, SSB, staging etc to use it.
- The initialization now reads the input/output state of the GPIO
lines, so that each GPIO descriptor knows - if this callback is
implemented - whether the line is input or output. This also
reflects nicely in userspace "lsgpio".
- It is now possible to name GPIO producer names, line names, from
the device tree. (Platform data has been supported for a while).
I bet we will get a similar mechanism for ACPI one of those days.
This makes is possible to get sensible producer names for e.g.
GPIO rails in "lsgpio" in userspace.
New drivers:
- New driver for the Loongson1.
- The XLP driver now supports Broadcom Vulcan ARM64.
- The IT87 driver now supports IT8620 and IT8628.
- The PCA953X driver now supports Galileo Gen2.
Driver improvements:
- MCP23S08 was switched to use the gpiolib irqchip helpers and now
also suppors level-triggered interrupts.
- 74x164 and RCAR now supports the .set_multiple() callback
- AMDPT was converted to use generic GPIO.
- TC3589x, TPS65218, SX150X, F7188X, MENZ127, VX855, WM831X, WM8994
support the new single ended callback for open drain and in some
cases open source.
- Implement the .get_direction() callback for a few more drivers like
PL061, Xgene.
Cleanups:
- Paul Gortmaker combed through the drivers and de-modularized those
who are not really modules.
- Move the GPIO poweroff DT bindings to the power subdir where they
belong.
- Rename gpio-generic.c to gpio-mmio.c, which is much more to the
point. That's what it is handling, nothing more, nothing less"
* tag 'gpio-v4.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: (126 commits)
MIPS: do away with ARCH_[WANT_OPTIONAL|REQUIRE]_GPIOLIB
gpio: zevio: make it explicitly non-modular
gpio: timberdale: make it explicitly non-modular
gpio: stmpe: make it explicitly non-modular
gpio: sodaville: make it explicitly non-modular
pinctrl: sh-pfc: Let gpio_chip.to_irq() return zero on error
gpio: dwapb: Add ACPI device ID for DWAPB GPIO controller on X-Gene platforms
gpio: dt-bindings: add wd,mbl-gpio bindings
gpio: of: make it possible to name GPIO lines
gpio: make gpiod_to_irq() return negative for NO_IRQ
gpio: xgene: implement .get_direction()
gpio: xgene: Enable ACPI support for X-Gene GFC GPIO driver
gpio: tegra: Implement gpio_get_direction callback
gpio: set up initial state from .get_direction()
gpio: rename gpio-generic.c into gpio-mmio.c
gpio: generic: fix GPIO_GENERIC_PLATFORM is set to module case
gpio: dwapb: add gpio-signaled acpi event support
gpio: dwapb: convert device node to fwnode
gpio: dwapb: remove name from dwapb_port_property
gpio/qoriq: select IRQ_DOMAIN
...
Linus Torvalds [Wed, 18 May 2016 00:34:33 +0000 (17:34 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/jikos/hid
Pull HID updates from Jiri Kosina:
"No biggies this time:
- micro-optimization of implement() in HID core parses, from Dmitry
Torokhov
- thingm driver cleanups from Heiner Kallweit
- fine-graining detection of distance and tilt axes in wacom driver
from Jason Gerecke
- New hid-asus driver, currently supporting X205TA and VivoBook
E200HA, from Yusuke Fujimaki"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
HID: wacom: Add fuzz factor to distance and tilt axes
HID: usbhid: quirks for Corsair RGB keyboard & mice (K70R, K95RGB, M65RGB, K70RGB, K65RGB)
HID: thingm: remove not needed error message
HID: thingm: set new flag LED_HW_PLUGGABLE
HID: thingm: factor out duplicated code to thingm_init_led
HID: simplify implement() a bit
HID: asus: add support for VivoBook E200HA
HID: hidraw: silence an uninitialized variable warning
HID: roccat: silence an uninitialized variable warning
HID: Asus X205TA keyboard driver
HID: hidraw: switch to using memdup_user
Linus Torvalds [Wed, 18 May 2016 00:11:27 +0000 (17:11 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/jikos/livepatching
Pull livepatching updates from Jiri Kosina:
- remove of our own implementation of architecture-specific relocation
code and leveraging existing code in the module loader to perform
arch-dependent work, from Jessica Yu.
The relevant patches have been acked by Rusty (for module.c) and
Heiko (for s390).
- live patching support for ppc64le, which is a joint work of Michael
Ellerman and Torsten Duwe. This is coming from topic branch that is
share between livepatching.git and ppc tree.
- addition of livepatching documentation from Petr Mladek
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching:
livepatch: make object/func-walking helpers more robust
livepatch: Add some basic livepatch documentation
powerpc/livepatch: Add live patching support on ppc64le
powerpc/livepatch: Add livepatch stack to struct thread_info
powerpc/livepatch: Add livepatch header
livepatch: Allow architectures to specify an alternate ftrace location
ftrace: Make ftrace_location_range() global
livepatch: robustify klp_register_patch() API error checking
Documentation: livepatch: outline Elf format and requirements for patch modules
livepatch: reuse module loader code to write relocations
module: s390: keep mod_arch_specific for livepatch modules
module: preserve Elf information for livepatch modules
Elf: add livepatch-specific Elf constants
Linus Torvalds [Wed, 18 May 2016 00:05:30 +0000 (17:05 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/jikos/trivial
Pull trivial tree updates from Jiri Kosina.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (21 commits)
gitignore: fix wording
mfd: ab8500-debugfs: fix "between" in printk
memstick: trivial fix of spelling mistake on management
cpupowerutils: bench: fix "average"
treewide: Fix typos in printk
IB/mlx4: printk fix
pinctrl: sirf/atlas7: fix printk spelling
serial: mctrl_gpio: Grammar s/lines GPIOs/line GPIOs/, /sets/set/
w1: comment spelling s/minmum/minimum/
Blackfin: comment spelling s/divsor/divisor/
metag: Fix misspellings in comments.
ia64: Fix misspellings in comments.
hexagon: Fix misspellings in comments.
tools/perf: Fix misspellings in comments.
cris: Fix misspellings in comments.
c6x: Fix misspellings in comments.
blackfin: Fix misspelling of 'register' in comment.
avr32: Fix misspelling of 'definitions' in comment.
treewide: Fix typos in printk
Doc: treewide : Fix typos in DocBook/filesystem.xml
...
Linus Torvalds [Tue, 17 May 2016 23:26:30 +0000 (16:26 -0700)]
Merge git://git./linux/kernel/git/davem/net-next
Pull networking updates from David Miller:
"Highlights:
1) Support SPI based w5100 devices, from Akinobu Mita.
2) Partial Segmentation Offload, from Alexander Duyck.
3) Add GMAC4 support to stmmac driver, from Alexandre TORGUE.
4) Allow cls_flower stats offload, from Amir Vadai.
5) Implement bpf blinding, from Daniel Borkmann.
6) Optimize _ASYNC_ bit twiddling on sockets, unless the socket is
actually using FASYNC these atomics are superfluous. From Eric
Dumazet.
7) Run TCP more preemptibly, also from Eric Dumazet.
8) Support LED blinking, EEPROM dumps, and rxvlan offloading in mlx5e
driver, from Gal Pressman.
9) Allow creating ppp devices via rtnetlink, from Guillaume Nault.
10) Improve BPF usage documentation, from Jesper Dangaard Brouer.
11) Support tunneling offloads in qed, from Manish Chopra.
12) aRFS offloading in mlx5e, from Maor Gottlieb.
13) Add RFS and RPS support to SCTP protocol, from Marcelo Ricardo
Leitner.
14) Add MSG_EOR support to TCP, this allows controlling packet
coalescing on application record boundaries for more accurate
socket timestamp sampling. From Martin KaFai Lau.
15) Fix alignment of 64-bit netlink attributes across the board, from
Nicolas Dichtel.
16) Per-vlan stats in bridging, from Nikolay Aleksandrov.
17) Several conversions of drivers to ethtool ksettings, from Philippe
Reynes.
18) Checksum neutral ILA in ipv6, from Tom Herbert.
19) Factorize all of the various marvell dsa drivers into one, from
Vivien Didelot
20) Add VF support to qed driver, from Yuval Mintz"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1649 commits)
Revert "phy dp83867: Fix compilation with CONFIG_OF_MDIO=m"
Revert "phy dp83867: Make rgmii parameters optional"
r8169: default to 64-bit DMA on recent PCIe chips
phy dp83867: Make rgmii parameters optional
phy dp83867: Fix compilation with CONFIG_OF_MDIO=m
bpf: arm64: remove callee-save registers use for tmp registers
asix: Fix offset calculation in asix_rx_fixup() causing slow transmissions
switchdev: pass pointer to fib_info instead of copy
net_sched: close another race condition in tcf_mirred_release()
tipc: fix nametable publication field in nl compat
drivers: net: Don't print unpopulated net_device name
qed: add support for dcbx.
ravb: Add missing free_irq() calls to ravb_close()
qed: Remove a stray tab
net: ethernet: fec-mpc52xx: use phy_ethtool_{get|set}_link_ksettings
net: ethernet: fec-mpc52xx: use phydev from struct net_device
bpf, doc: fix typo on bpf_asm descriptions
stmmac: hardware TX COE doesn't work when force_thresh_dma_mode is set
net: ethernet: fs-enet: use phy_ethtool_{get|set}_link_ksettings
net: ethernet: fs-enet: use phydev from struct net_device
...
Andreas Gruenbacher [Fri, 22 Apr 2016 20:36:44 +0000 (22:36 +0200)]
btrfs: Switch to generic xattr handlers
The btrfs_{set,remove}xattr inode operations check for a read-only root
(btrfs_root_readonly) before calling into generic_{set,remove}xattr. If
this check is moved into __btrfs_setxattr, we can get rid of
btrfs_{set,remove}xattr.
This patch applies to mainline, I would like to keep it together with
the other xattr cleanups if possible, though. Could you please review?
Thanks,
Andreas
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Andreas Gruenbacher [Fri, 22 Apr 2016 17:14:00 +0000 (19:14 +0200)]
ubifs: Switch to generic xattr handlers
Ubifs internally uses special inodes for storing xattrs. Those inodes
had NULL {get,set,remove}xattr inode operations before this change, so
xattr operations on them would fail. The super block's s_xattr field
would also apply to those special inodes. However, the inodes are not
visible outside of ubifs, and so no xattr operations will ever be
carried out on them anyway.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Linus Torvalds [Tue, 17 May 2016 23:13:00 +0000 (16:13 -0700)]
Merge tag 'dm-4.7-changes' of git://git./linux/kernel/git/device-mapper/linux-dm
Pull device mapper updates from Mike Snitzer:
- based on Jens' 'for-4.7/core' to have DM thinp's discard support use
bio_inc_remaining() and the block core's new async __blkdev_issue_discard()
interface
- make DM multipath's fast code-paths lockless, using lockless_deference,
to significantly improve large NUMA performance when using blk-mq.
The m->lock spinlock contention was a serious bottleneck.
- a few other small code cleanups and Documentation fixes
* tag 'dm-4.7-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm thin: unroll issue_discard() to create longer discard bio chains
dm thin: use __blkdev_issue_discard for async discard support
dm thin: remove __bio_inc_remaining() and switch to using bio_inc_remaining()
dm raid: make sure no feature flags are set in metadata
dm ioctl: drop use of __GFP_REPEAT in copy_params()'s __vmalloc() call
dm stats: fix spelling mistake in Documentation
dm cache: update cache-policies.txt now that mq is an alias for smq
dm mpath: eliminate use of spinlock in IO fast-paths
dm mpath: move trigger_event member to the end of 'struct multipath'
dm mpath: use atomic_t for counting members of 'struct multipath'
dm mpath: switch to using bitops for state flags
dm thin: Remove return statement from void function
dm: remove unused mapped_device argument from free_tio()
Linus Torvalds [Tue, 17 May 2016 23:03:32 +0000 (16:03 -0700)]
Merge branch 'for-4.7/drivers' of git://git.kernel.dk/linux-block
Pull block driver updates from Jens Axboe:
"On top of the core pull request, this is the drivers pull request for
this merge window. This contains:
- Switch drivers to the new write back cache API, and kill off the
flush flags. From me.
- Kill the discard support for the STEC pci-e flash driver. It's
trivially broken, and apparently unmaintained, so it's safer to
just remove it. From Jeff Moyer.
- A set of lightnvm updates from the usual suspects (Matias/Javier,
and Simon), and fixes from Arnd, Jeff Mahoney, Sagi, and Wenwei
Tao.
- A set of updates for NVMe:
- Turn the controller state management into a proper state
machine. From Christoph.
- Shuffling of code in preparation for NVMe-over-fabrics, also
from Christoph.
- Cleanup of the command prep part from Ming Lin.
- Rewrite of the discard support from Ming Lin.
- Deadlock fix for namespace removal from Ming Lin.
- Use the now exported blk-mq tag helper for IO termination.
From Sagi.
- Various little fixes from Christoph, Guilherme, Keith, Ming
Lin, Wang Sheng-Hui.
- Convert mtip32xx to use the now exported blk-mq tag iter function,
from Keith"
* 'for-4.7/drivers' of git://git.kernel.dk/linux-block: (74 commits)
lightnvm: reserved space calculation incorrect
lightnvm: rename nr_pages to nr_ppas on nvm_rq
lightnvm: add is_cached entry to struct ppa_addr
lightnvm: expose gennvm_mark_blk to targets
lightnvm: remove mgt targets on mgt removal
lightnvm: pass dma address to hardware rather than pointer
lightnvm: do not assume sequential lun alloc.
nvme/lightnvm: Log using the ctrl named device
lightnvm: rename dma helper functions
lightnvm: enable metadata to be sent to device
lightnvm: do not free unused metadata on rrpc
lightnvm: fix out of bound ppa lun id on bb tbl
lightnvm: refactor set_bb_tbl for accepting ppa list
lightnvm: move responsibility for bad blk mgmt to target
lightnvm: make nvm_set_rqd_ppalist() aware of vblks
lightnvm: remove struct factory_blks
lightnvm: refactor device ops->get_bb_tbl()
lightnvm: introduce nvm_for_each_lun_ppa() macro
lightnvm: refactor dev->online_target to global nvm_targets
lightnvm: rename nvm_targets to nvm_tgt_type
...
Linus Torvalds [Tue, 17 May 2016 22:29:49 +0000 (15:29 -0700)]
Merge branch 'for-4.7/core' of git://git.kernel.dk/linux-block
Pull core block layer updates from Jens Axboe:
"This is the core block IO changes for this merge window. Nothing
earth shattering in here, it's mostly just fixes. In detail:
- Fix for a long standing issue where wrong ordering in blk-mq caused
order_to_size() to spew a warning. From Bart.
- Async discard support from Christoph. Basically just splitting our
sync interface into a submit + wait part.
- Add a cleaner interface for flagging whether a device has a write
back cache or not. We've previously overloaded blk_queue_flush()
with this, but let's make it more explicit. Drivers cleaned up and
updated in the drivers pull request. From me.
- Fix for a double check for whether IO accounting is enabled or not.
From Michael Callahan.
- Fix for the async discard from Mike Snitzer, reinstating the early
EOPNOTSUPP return if the device doesn't support discards.
- Also from Mike, export bio_inc_remaining() so dm can drop it's
private copy of it.
- From Ming Lin, add support for passing in an offset for request
payloads.
- Tag function export from Sagi, which will be used in NVMe in the
drivers pull.
- Two blktrace related fixes from Shaohua.
- Propagate NOMERGE flag when making a request from a bio, also from
Shaohua.
- An optimization to not parse cgroup paths in blk-throttle, if we
don't need to. From Shaohua"
* 'for-4.7/core' of git://git.kernel.dk/linux-block:
blk-mq: fix undefined behaviour in order_to_size()
blk-throttle: don't parse cgroup path if trace isn't enabled
blktrace: add missed mask name
blktrace: delete garbage for message trace
block: make bio_inc_remaining() interface accessible again
block: reinstate early return of -EOPNOTSUPP from blkdev_issue_discard
block: Minor blk_account_io_start usage cleanup
block: add __blkdev_issue_discard
block: remove struct bio_batch
block: copy NOMERGE flag from bio to request
block: add ability to flag write back caching on a device
blk-mq: Export tagset iter function
block: add offset in blk_add_request_payload()
writeback: Fix performance regression in wb_over_bg_thresh()
Linus Torvalds [Tue, 17 May 2016 22:05:23 +0000 (15:05 -0700)]
Merge branch 'work.preadv2' of git://git./linux/kernel/git/viro/vfs
Pull vfs cleanups from Al Viro:
"More cleanups from Christoph"
* 'work.preadv2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
nfsd: use RWF_SYNC
fs: add RWF_DSYNC aand RWF_SYNC
ceph: use generic_write_sync
fs: simplify the generic_write_sync prototype
fs: add IOCB_SYNC and IOCB_DSYNC
direct-io: remove the offset argument to dio_complete
direct-io: eliminate the offset argument to ->direct_IO
xfs: eliminate the pos variable in xfs_file_dio_aio_write
filemap: remove the pos argument to generic_file_direct_write
filemap: remove pos variables in generic_file_read_iter
Linus Torvalds [Tue, 17 May 2016 21:41:03 +0000 (14:41 -0700)]
Merge branch 'work.const-path' of git://git./linux/kernel/git/viro/vfs
Pull 'struct path' constification update from Al Viro:
"'struct path' is passed by reference to a bunch of Linux security
methods; in theory, there's nothing to stop them from modifying the
damn thing and LSM community being what it is, sooner or later some
enterprising soul is going to decide that it's a good idea.
Let's remove the temptation and constify all of those..."
* 'work.const-path' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
constify ima_d_path()
constify security_sb_pivotroot()
constify security_path_chroot()
constify security_path_{link,rename}
apparmor: remove useless checks for NULL ->mnt
constify security_path_{mkdir,mknod,symlink}
constify security_path_{unlink,rmdir}
apparmor: constify common_perm_...()
apparmor: constify aa_path_link()
apparmor: new helper - common_path_perm()
constify chmod_common/security_path_chmod
constify security_sb_mount()
constify chown_common/security_path_chown
tomoyo: constify assorted struct path *
apparmor_path_truncate(): path->mnt is never NULL
constify vfs_truncate()
constify security_path_truncate()
[apparmor] constify struct path * in a bunch of helpers
Linus Torvalds [Tue, 17 May 2016 21:35:45 +0000 (14:35 -0700)]
Merge branch 'for-cifs' of git://git./linux/kernel/git/viro/vfs
Pull cifs xattr updates from Al Viro:
"This is the remaining parts of the xattr work - the cifs bits"
* 'for-cifs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
cifs: Switch to generic xattr handlers
cifs: Fix removexattr for os2.* xattrs
cifs: Check for equality with ACL_TYPE_ACCESS and ACL_TYPE_DEFAULT
cifs: Fix xattr name checks
Linus Torvalds [Tue, 17 May 2016 21:25:02 +0000 (14:25 -0700)]
Merge branch 'for_linus' of git://git./linux/kernel/git/jack/linux-fs
Pull UDF fixes from Jan Kara:
"A fix for UDF crash on corrupted media and one UDF header fixup"
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
udf: Export superblock magic to userspace
udf: Prevent stack overflow on corrupted filesystem mount
Linus Torvalds [Tue, 17 May 2016 21:15:18 +0000 (14:15 -0700)]
Merge tag 'jfs-4.7' of git://github.com/kleikamp/linux-shaggy
Pull jfs updates from Dave Kleikamp:
"Some jfs logging cleanups from Joe Perches"
* tag 'jfs-4.7' of git://github.com/kleikamp/linux-shaggy:
jfs: Coalesce some formats
jfs: Remove unnecessary line continuations and terminating newlines
jfs: Remove terminating newlines from jfs_info, jfs_warn, jfs_err uses
Kees Cook [Tue, 17 May 2016 19:14:39 +0000 (12:14 -0700)]
exec: clarify reasoning for euid/egid reset
This section of code initially looks redundant, but is required. This
improves the comment to explain more clearly why the reset is needed.
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Steve French [Fri, 13 May 2016 02:20:36 +0000 (21:20 -0500)]
remove directory incorrectly tries to set delete on close on non-empty directories
Wrong return code was being returned on SMB3 rmdir of
non-empty directory.
For SMB3 (unlike for cifs), we attempt to delete a directory by
set of delete on close flag on the open. Windows clients set
this flag via a set info (SET_FILE_DISPOSITION to set this flag)
which properly checks if the directory is empty.
With this patch on smb3 mounts we correctly return
"DIRECTORY NOT EMPTY"
on attempts to remove a non-empty directory.
Signed-off-by: Steve French <steve.french@primarydata.com>
CC: Stable <stable@vger.kernel.org>
Acked-by: Sachin Prabhu <sprabhu@redhat.com>
Steve French [Tue, 10 May 2016 13:24:05 +0000 (08:24 -0500)]
Update cifs.ko version to 2.09
Signed-off-by: Steven French <steve.french@primarydata.com>
Stefan Metzmacher [Tue, 3 May 2016 08:52:30 +0000 (10:52 +0200)]
fs/cifs: correctly to anonymous authentication for the NTLM(v2) authentication
Only server which map unknown users to guest will allow
access using a non-null NTLMv2_Response.
For Samba it's the "map to guest = bad user" option.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=11913
Signed-off-by: Stefan Metzmacher <metze@samba.org>
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Steve French <smfrench@gmail.com>
Stefan Metzmacher [Tue, 3 May 2016 08:52:30 +0000 (10:52 +0200)]
fs/cifs: correctly to anonymous authentication for the NTLM(v1) authentication
Only server which map unknown users to guest will allow
access using a non-null NTChallengeResponse.
For Samba it's the "map to guest = bad user" option.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=11913
Signed-off-by: Stefan Metzmacher <metze@samba.org>
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Steve French <smfrench@gmail.com>
Stefan Metzmacher [Tue, 3 May 2016 08:52:30 +0000 (10:52 +0200)]
fs/cifs: correctly to anonymous authentication for the LANMAN authentication
Only server which map unknown users to guest will allow
access using a non-null LMChallengeResponse.
For Samba it's the "map to guest = bad user" option.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=11913
Signed-off-by: Stefan Metzmacher <metze@samba.org>
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Steve French <smfrench@gmail.com>
Stefan Metzmacher [Tue, 3 May 2016 08:52:30 +0000 (10:52 +0200)]
fs/cifs: correctly to anonymous authentication via NTLMSSP
See [MS-NLMP] 3.2.5.1.2 Server Receives an AUTHENTICATE_MESSAGE from the Client:
...
Set NullSession to FALSE
If (AUTHENTICATE_MESSAGE.UserNameLen == 0 AND
AUTHENTICATE_MESSAGE.NtChallengeResponse.Length == 0 AND
(AUTHENTICATE_MESSAGE.LmChallengeResponse == Z(1)
OR
AUTHENTICATE_MESSAGE.LmChallengeResponse.Length == 0))
-- Special case: client requested anonymous authentication
Set NullSession to TRUE
...
Only server which map unknown users to guest will allow
access using a non-null NTChallengeResponse.
For Samba it's the "map to guest = bad user" option.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=11913
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Steve French <smfrench@gmail.com>
Sachin Prabhu [Mon, 8 Feb 2016 08:14:01 +0000 (13:44 +0530)]
cifs: remove any preceding delimiter from prefix_path
We currently do not check if any delimiter exists before the prefix
path in cifs_compose_mount_options(). Consequently when building the
devname using cifs_build_devname() we can end up with multiple
delimiters separating the UNC and the prefix path.
An issue was reported by the customer mounting a folder within a DFS
share from a Netapp server which uses McAfee antivirus. We have
narrowed down the cause to the use of double backslashes in the file
name used to open the file. This was determined to be caused because of
additional delimiters as a result of the bug.
In addition to changes in cifs_build_devname(), we also fix
cifs_parse_devname() to ignore any preceding delimiter for the prefix
path.
The problem was originally reported on RHEL 6 in RHEL bz
1252721. This
is the upstream version of the fix. The fix was confirmed by looking at
the packet capture of a DFS mount.
Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Goldwyn Rodrigues [Mon, 18 Apr 2016 11:41:52 +0000 (06:41 -0500)]
cifs: Use file_dentry()
CIFS may be used as lower layer of overlayfs and accessing f_path.dentry can
lead to a crash.
Fix by replacing direct access of file->f_path.dentry with the
file_dentry() accessor, which will always return a native object.
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Acked-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
Signed-off-by: Steve French <smfrench@gmail.com>