Jeff Mahoney [Tue, 7 Feb 2017 00:39:09 +0000 (19:39 -0500)]
btrfs: fix btrfs_compat_ioctl failures on non-compat ioctls
Commit
4c63c2454ef incorrectly assumed that returning -ENOIOCTLCMD would
cause the native ioctl to be called. The ->compat_ioctl callback is
expected to handle all ioctls, not just compat variants. As a result,
when using 32-bit userspace on 64-bit kernels, everything except those
three ioctls would return -ENOTTY.
Fixes: 4c63c2454ef ("btrfs: bugfix: handle FS_IOC32_{GETFLAGS,SETFLAGS,GETVERSION} in btrfs_ioctl")
Cc: stable@vger.kernel.org
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Omar Sandoval [Thu, 26 Jan 2017 01:06:40 +0000 (17:06 -0800)]
Btrfs: remove ->{get, set}_acl() from btrfs_dir_ro_inode_operations
Subvolume directory inodes can't have ACLs.
Cc: <stable@vger.kernel.org> # 4.9.x
Signed-off-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
Omar Sandoval [Thu, 26 Jan 2017 01:06:39 +0000 (17:06 -0800)]
Btrfs: disable xattr operations on subvolume directories
When you snapshot a subvolume containing a subvolume, you get a
placeholder directory where the subvolume would be. These directory
inodes have ->i_ops set to btrfs_dir_ro_inode_operations. Previously,
these i_ops didn't include the xattr operation callbacks. The conversion
to xattr_handlers missed this case, leading to bogus attempts to set
xattrs on these inodes. This manifested itself as failures when running
delayed inodes.
To fix this, clear IOP_XATTR in ->i_opflags on these inodes.
Fixes: 6c6ef9f26e59 ("xattr: Stop calling {get,set,remove}xattr inode operations")
Cc: Andreas Gruenbacher <agruenba@redhat.com>
Reported-by: Chris Murphy <lists@colorremedies.com>
Tested-by: Chris Murphy <lists@colorremedies.com>
Cc: <stable@vger.kernel.org> # 4.9.x
Signed-off-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
Omar Sandoval [Thu, 26 Jan 2017 01:06:38 +0000 (17:06 -0800)]
Btrfs: remove old tree_root case in btrfs_read_locked_inode()
As Jeff explained in
c2951f32d36c ("btrfs: remove old tree_root dirent
processing in btrfs_real_readdir()"), supporting this old format is no
longer necessary since the Btrfs magic number has been updated since we
changed to the current format. There are other places where we still
handle this old format, but since this is part of a fix that is going to
stable, I'm only removing this one for now.
Cc: <stable@vger.kernel.org> # 4.9.x
Signed-off-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
Liu Bo [Thu, 1 Dec 2016 21:43:31 +0000 (13:43 -0800)]
Btrfs: fix truncate down when no_holes feature is enabled
For such a file mapping,
[0-4k][hole][8k-12k]
In NO_HOLES mode, we don't have the [hole] extent any more.
Commit
c1aa45759e90 ("Btrfs: fix shrinking truncate when the no_holes feature is enabled")
fixed disk isize not being updated in NO_HOLES mode when data is not flushed.
However, even if data has been flushed, we can still have trouble
in updating disk isize since we updated disk isize to 'start' of
the last evicted extent.
Reviewed-by: Chris Mason <clm@fb.com>
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Chandan Rajendra [Fri, 23 Dec 2016 09:30:18 +0000 (15:00 +0530)]
Btrfs: Fix deadlock between direct IO and fast fsync
The following deadlock is seen when executing generic/113 test,
---------------------------------------------------------+----------------------------------------------------
Direct I/O task Fast fsync task
---------------------------------------------------------+----------------------------------------------------
btrfs_direct_IO
__blockdev_direct_IO
do_blockdev_direct_IO
do_direct_IO
btrfs_get_blocks_direct
while (blocks needs to written)
get_more_blocks (first iteration)
btrfs_get_blocks_direct
btrfs_create_dio_extent
down_read(&BTRFS_I(inode) >dio_sem)
Create and add extent map and ordered extent
up_read(&BTRFS_I(inode) >dio_sem)
btrfs_sync_file
btrfs_log_dentry_safe
btrfs_log_inode_parent
btrfs_log_inode
btrfs_log_changed_extents
down_write(&BTRFS_I(inode) >dio_sem)
Collect new extent maps and ordered extents
wait for ordered extent completion
get_more_blocks (second iteration)
btrfs_get_blocks_direct
btrfs_create_dio_extent
down_read(&BTRFS_I(inode) >dio_sem)
--------------------------------------------------------------------------------------------------------------
In the above description, Btrfs direct I/O code path has not yet started
submitting bios for file range covered by the initial ordered
extent. Meanwhile, The fast fsync task obtains the write semaphore and
waits for I/O on the ordered extent to get completed. However, the
Direct I/O task is now blocked on obtaining the read semaphore.
To resolve the deadlock, this commit modifies the Direct I/O code path
to obtain the read semaphore before invoking
__blockdev_direct_IO(). The semaphore is then given up after
__blockdev_direct_IO() returns. This allows the Direct I/O code to
complete I/O on all the ordered extents it creates.
Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Wang Xiaoguang [Wed, 7 Sep 2016 12:17:38 +0000 (20:17 +0800)]
btrfs: fix false enospc error when truncating heavily reflinked file
Below test script can reveal this bug:
dd if=/dev/zero of=fs.img bs=$((1024*1024)) count=100
dev=$(losetup --show -f fs.img)
mkdir -p /mnt/mntpoint
mkfs.btrfs -f $dev
mount $dev /mnt/mntpoint
cd /mnt/mntpoint
echo "workdir is: /mnt/mntpoint"
blocksize=$((128 * 1024))
dd if=/dev/zero of=testfile bs=$blocksize count=1
sync
count=$((17*1024*1024*1024/blocksize))
echo "file size is:" $((count*blocksize))
for ((i = 1; i <= $count; i++)); do
dst_offset=$((blocksize * i))
xfs_io -f -c "reflink testfile 0 $dst_offset $blocksize"\
testfile > /dev/null
done
sync
truncate --size 0 testfile
The last truncate operation will fail for ENOSPC reason, but indeed
it should not fail.
In btrfs_truncate(), we use a temporary block_rsv to do truncate
operation. With every btrfs_truncate_inode_items() call, we migrate space
to this block_rsv, but forget to cleanup previous reservation, which
will make this block_rsv's reserved bytes keep growing, and this reserved
space will only be released in the end of btrfs_truncate(), this metadata
leak will impact other's metadata reservation. In this case, it's
"btrfs_start_transaction(root, 2);" fails for enospc error, which make
this truncate operation fail.
Call btrfs_block_rsv_release() to fix this bug.
Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Chris Mason [Wed, 11 Jan 2017 14:26:12 +0000 (06:26 -0800)]
Merge branch 'tracepoint-updates-4.10' of git://git./linux/kernel/git/kdave/linux into for-linus-4.10
David Sterba [Fri, 6 Jan 2017 14:51:36 +0000 (15:51 +0100)]
btrfs: make tracepoint format strings more compact
We've recently added the fsid to trace events, this makes the line quite
long. To reduce the it again, remove extra spaces around = and remove
",".
Signed-off-by: David Sterba <dsterba@suse.com>
Liu Bo [Thu, 1 Dec 2016 00:10:10 +0000 (16:10 -0800)]
Btrfs: add truncated_len for ordered extent tracepoints
This can help us monitor truncated ordered extents.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Liu Bo [Thu, 17 Nov 2016 23:00:50 +0000 (15:00 -0800)]
Btrfs: add 'inode' for extent map tracepoint
'inode' is an important field for btrfs_get_extent, lets trace it.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Fri, 6 Jan 2017 13:12:51 +0000 (14:12 +0100)]
btrfs: fix crash when tracepoint arguments are freed by wq callbacks
Enabling btrfs tracepoints leads to instant crash, as reported. The wq
callbacks could free the memory and the tracepoints started to
dereference the members to get to fs_info.
The proposed fix https://marc.info/?l=linux-btrfs&m=
148172436722606&w=2
removed the tracepoints but we could preserve them by passing only the
required data in a safe way.
Fixes: bc074524e123 ("btrfs: prefix fsid to all trace events")
CC: stable@vger.kernel.org # 4.8+
Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Wed, 4 Jan 2017 10:42:16 +0000 (11:42 +0100)]
Merge branch 'misc-4.10' into for-chris-4.10-
20170104
Liu Bo [Fri, 23 Dec 2016 01:13:54 +0000 (17:13 -0800)]
Btrfs: adjust outstanding_extents counter properly when dio write is split
Currently how btrfs dio deals with split dio write is not good
enough if dio write is split into several segments due to the
lack of contiguous space, a large dio write like 'dd bs=1G count=1'
can end up with incorrect outstanding_extents counter and endio
would complain loudly with an assertion.
This fixes the problem by compensating the outstanding_extents
counter in inode if a large dio write gets split.
Reported-by: Anand Jain <anand.jain@oracle.com>
Tested-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Liu Bo [Thu, 1 Dec 2016 00:20:25 +0000 (16:20 -0800)]
Btrfs: fix lockdep warning about log_mutex
While checking INODE_REF/INODE_EXTREF for a corner case, we may acquire a
different inode's log_mutex with holding the current inode's log_mutex, and
lockdep has complained this with a possilble deadlock warning.
Fix this by using mutex_lock_nested() when processing the other inode's
log_mutex.
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Liu Bo [Thu, 1 Dec 2016 00:11:04 +0000 (16:11 -0800)]
Btrfs: use down_read_nested to make lockdep silent
If @block_group is not @used_bg, it'll try to get @used_bg's lock without
droping @block_group 's lock and lockdep has throwed a scary deadlock warning
about it.
Fix it by using down_read_nested.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Jeff Mahoney [Tue, 20 Dec 2016 18:28:28 +0000 (13:28 -0500)]
btrfs: fix locking when we put back a delayed ref that's too new
In __btrfs_run_delayed_refs, when we put back a delayed ref that's too
new, we have already dropped the lock on locked_ref when we set
->processing = 0.
This patch keeps the lock to cover that assignment.
Fixes: d7df2c796d7 (Btrfs: attach delayed ref updates to delayed ref heads)
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Jeff Mahoney [Tue, 20 Dec 2016 18:28:27 +0000 (13:28 -0500)]
btrfs: fix error handling when run_delayed_extent_op fails
In __btrfs_run_delayed_refs, the error path when run_delayed_extent_op
fails sets locked_ref->processing = 0 but doesn't re-increment
delayed_refs->num_heads_ready. As a result, we end up triggering
the WARN_ON in btrfs_select_ref_head.
Fixes: d7df2c796d7 (Btrfs: attach delayed ref updates to delayed ref heads)
Reported-by: Jon Nelson <jnelson-suse@jamponi.net>
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Pan Bian [Sun, 4 Dec 2016 04:51:53 +0000 (12:51 +0800)]
btrfs: return the actual error value from from btrfs_uuid_tree_iterate
In function btrfs_uuid_tree_iterate(), errno is assigned to variable ret
on errors. However, it directly returns 0. It may be better to return
ret. This patch also removes the warning, because the caller already
prints a warning.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=188731
Signed-off-by: Pan Bian <bianpan2016@163.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
[ edited subject ]
Signed-off-by: David Sterba <dsterba@suse.com>
Maxim Patlasov [Mon, 12 Dec 2016 22:32:44 +0000 (14:32 -0800)]
btrfs: limit async_work allocation and worker func duration
Problem statement: unprivileged user who has read-write access to more than
one btrfs subvolume may easily consume all kernel memory (eventually
triggering oom-killer).
Reproducer (./mkrmdir below essentially loops over mkdir/rmdir):
[root@kteam1 ~]# cat prep.sh
DEV=/dev/sdb
mkfs.btrfs -f $DEV
mount $DEV /mnt
for i in `seq 1 16`
do
mkdir /mnt/$i
btrfs subvolume create /mnt/SV_$i
ID=`btrfs subvolume list /mnt |grep "SV_$i$" |cut -d ' ' -f 2`
mount -t btrfs -o subvolid=$ID $DEV /mnt/$i
chmod a+rwx /mnt/$i
done
[root@kteam1 ~]# sh prep.sh
[maxim@kteam1 ~]$ for i in `seq 1 16`; do ./mkrmdir /mnt/$i 2000 2000 & done
[root@kteam1 ~]# for i in `seq 1 4`; do grep "kmalloc-128" /proc/slabinfo | grep -v dma; sleep 60; done
kmalloc-128 10144 10144 128 32 1 : tunables 0 0 0 : slabdata 317 317 0
kmalloc-128
9992352 9992352 128 32 1 : tunables 0 0 0 : slabdata 312261 312261 0
kmalloc-128
24226752 24226752 128 32 1 : tunables 0 0 0 : slabdata 757086 757086 0
kmalloc-128
42754240 42754240 128 32 1 : tunables 0 0 0 : slabdata
1336070 1336070 0
The huge numbers above come from insane number of async_work-s allocated
and queued by btrfs_wq_run_delayed_node.
The problem is caused by btrfs_wq_run_delayed_node() queuing more and more
works if the number of delayed items is above BTRFS_DELAYED_BACKGROUND. The
worker func (btrfs_async_run_delayed_root) processes at least
BTRFS_DELAYED_BATCH items (if they are present in the list). So, the machinery
works as expected while the list is almost empty. As soon as it is getting
bigger, worker func starts to process more than one item at a time, it takes
longer, and the chances to have async_works queued more than needed is getting
higher.
The problem above is worsened by another flaw of delayed-inode implementation:
if async_work was queued in a throttling branch (number of items >=
BTRFS_DELAYED_WRITEBACK), corresponding worker func won't quit until
the number of items < BTRFS_DELAYED_BACKGROUND / 2. So, it is possible that
the func occupies CPU infinitely (up to 30sec in my experiments): while the
func is trying to drain the list, the user activity may add more and more
items to the list.
The patch fixes both problems in straightforward way: refuse queuing too
many works in btrfs_wq_run_delayed_node and bail out of worker func if
at least BTRFS_DELAYED_WRITEBACK items are processed.
Changed in v2: remove support of thresh == NO_THRESHOLD.
Signed-off-by: Maxim Patlasov <mpatlasov@virtuozzo.com>
Signed-off-by: Chris Mason <clm@fb.com>
Cc: stable@vger.kernel.org # v3.15+
Chris Mason [Tue, 13 Dec 2016 17:14:42 +0000 (09:14 -0800)]
Merge branch 'for-chris-4.10' of git://git./linux/kernel/git/fdmanana/linux into for-linus-4.10
Patches queued up by Filipe:
The most important change is still the fix for the extent tree
corruption that happens due to balance when qgroups are enabled (a
regression introduced in 4.7 by a fix for a regression from the last
qgroups rework). This has been hitting SLE and openSUSE users and QA
very badly, where transactions keep getting aborted when running
delayed references leaving the root filesystem in RO mode and nearly
unusable. There are fixes here that allow us to run xfstests again
with the integrity checker enabled, which has been impossible since 4.8
(apparently I'm the only one running xfstests with the integrity
checker enabled, which is useful to validate dirtied leafs, like
checking if there are keys out of order, etc). The rest are just some
trivial fixes, most of them tagged for stable, and two cleanups.
Signed-off-by: Chris Mason <clm@fb.com>
Chris Mason [Fri, 9 Dec 2016 01:55:03 +0000 (17:55 -0800)]
Revert "Btrfs: adjust len of writes if following a preallocated extent"
This is exposing an existing deadlock between fsync and AIO. Until we
have the deadlock fixed, I'm pulling this one out.
This reverts commit
a23eaa875f0f1d89eb866b8c9860e78273ff5daf.
Signed-off-by: Chris Mason <clm@fb.com>
Chris Mason [Fri, 9 Dec 2016 13:56:33 +0000 (05:56 -0800)]
Btrfs: don't WARN() in btrfs_transaction_abort() for IO errors
btrfs_transaction_abort() has a WARN() to help us nail down whatever
problem lead to the abort. But most of the time, we're aborting for EIO,
and the warning just adds noise.
Signed-off-by: Chris Mason <clm@fb.com>
David Sterba [Tue, 4 Oct 2016 17:34:27 +0000 (19:34 +0200)]
btrfs: opencode chunk locking, remove helpers
The helpers are trivial and we don't use them consistently.
Signed-off-by: David Sterba <dsterba@suse.com>
Jeff Mahoney [Sat, 10 Sep 2016 01:39:03 +0000 (21:39 -0400)]
btrfs: remove root parameter from transaction commit/end routines
Now we only use the root parameter to print the root objectid in
a tracepoint. We can use the root parameter from the transaction
handle for that. It's also used to join the transaction with
async commits, so we remove the comment that it's just for checking.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Jeff Mahoney [Sat, 10 Sep 2016 00:42:44 +0000 (20:42 -0400)]
btrfs: split btrfs_wait_marked_extents into normal and tree log functions
btrfs_write_and_wait_marked_extents and btrfs_sync_log both call
btrfs_wait_marked_extents, which provides a core loop and then handles
errors differently based on whether it's it's a log root or not.
This means that btrfs_write_and_wait_marked_extents needs to take a root
because btrfs_wait_marked_extents requires one, even though it's only
used to determine whether the root is a log root. The log root code
won't ever call into the transaction commit code using a log root, so we
can factor out the core loop and provide the error handling appropriate
to each waiter in new routines. This allows us to eventually remove
the root argument from btrfs_commit_transaction, and as a result,
btrfs_end_transaction.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Jeff Mahoney [Wed, 22 Jun 2016 22:54:24 +0000 (18:54 -0400)]
btrfs: take an fs_info directly when the root is not used otherwise
There are loads of functions in btrfs that accept a root parameter
but only use it to obtain an fs_info pointer. Let's convert those to
just accept an fs_info pointer directly.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Jeff Mahoney [Fri, 9 Sep 2016 16:09:35 +0000 (12:09 -0400)]
btrfs: simplify btrfs_wait_cache_io prototype
With the exception of the one case where btrfs_wait_cache_io is called
without a block group, it's called with the same arguments. The root
argument is only used in the special case, so let's factor out the core
and simplify the call in the normal case to require a trans, block group,
and path.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Jeff Mahoney [Tue, 6 Sep 2016 20:00:42 +0000 (16:00 -0400)]
btrfs: convert extent-tree tracepoints to use fs_info
The extent-tree tracepoints all operate on the extent root, regardless of
which root is passed in. Let's just use the extent root objectid instead.
If it turns out that nobody is depending on the format of this tracepoint,
we can drop the root printing entirely.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Jeff Mahoney [Wed, 22 Jun 2016 22:54:23 +0000 (18:54 -0400)]
btrfs: root->fs_info cleanup, access fs_info->delayed_root directly
This results in btrfs_assert_delayed_root_empty and
btrfs_destroy_delayed_inode taking an fs_info instead of a root.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Jeff Mahoney [Wed, 22 Jun 2016 22:54:23 +0000 (18:54 -0400)]
btrfs: root->fs_info cleanup, add fs_info convenience variables
In routines where someptr->fs_info is referenced multiple times, we
introduce a convenience variable. This makes the code considerably
more readable.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Jeff Mahoney [Wed, 22 Jun 2016 22:54:22 +0000 (18:54 -0400)]
btrfs: root->fs_info cleanup, update_block_group{,flags}
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Jeff Mahoney [Thu, 16 Jun 2016 15:30:29 +0000 (11:30 -0400)]
btrfs: root->fs_info cleanup, lock/unlock_chunks
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Jeff Mahoney [Thu, 16 Jun 2016 15:07:27 +0000 (11:07 -0400)]
btrfs: root->fs_info cleanup, btrfs_calc_{trans,trunc}_metadata_size
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Jeff Mahoney [Wed, 15 Jun 2016 13:22:56 +0000 (09:22 -0400)]
btrfs: pull node/sector/stripe sizes out of root and into fs_info
We track the node sizes per-root, but they never vary from the values
in the superblock. This patch messes with the 80-column style a bit,
but subsequent patches to factor out root->fs_info into a convenience
variable fix it up again.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Jeff Mahoney [Wed, 22 Jun 2016 22:56:18 +0000 (18:56 -0400)]
btrfs: root->fs_info cleanup, io_ctl_init
The io_ctl->root member was only being used to access root->fs_info.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Jeff Mahoney [Wed, 22 Jun 2016 22:54:56 +0000 (18:54 -0400)]
btrfs: root->fs_info cleanup, use fs_info->dev_root everywhere
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Jeff Mahoney [Wed, 22 Jun 2016 22:56:44 +0000 (18:56 -0400)]
btrfs: struct reada_control.root -> reada_control.fs_info
The root is never used. We substitute extent_root in for the
reada_find_extent call, since it's only ever used to obtain the node
size. This call site will be changed to use fs_info in a later patch.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Jeff Mahoney [Wed, 22 Jun 2016 22:54:36 +0000 (18:54 -0400)]
btrfs: struct btrfsic_state->root should be an fs_info
The root member is never used except for obtaining an fs_info pointer.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Jeff Mahoney [Wed, 22 Jun 2016 22:54:27 +0000 (18:54 -0400)]
btrfs: alloc_reserved_file_extent trace point should use extent_root
Even though a separate root is passed in, we're still operating on the
extent root. Let's use that for the trace point.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Jeff Mahoney [Wed, 22 Jun 2016 00:16:08 +0000 (20:16 -0400)]
btrfs: btrfs_init_new_device should use fs_info->dev_root
btrfs_init_new_device only uses the root passed in via the ioctl to
start the transaction. Nothing else that happens is related to whatever
root the user used to initiate the ioctl. We can drop the root requirement
and just use fs_info->dev_root instead.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Jeff Mahoney [Wed, 22 Jun 2016 01:16:51 +0000 (21:16 -0400)]
btrfs: call functions that always use the same root with fs_info instead
There are many functions that are always called with the same root
argument. Rather than passing the same root every time, we can
pass an fs_info pointer instead and have the function get the root
pointer itself.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Jeff Mahoney [Tue, 21 Jun 2016 14:40:19 +0000 (10:40 -0400)]
btrfs: call functions that overwrite their root parameter with fs_info
There are 11 functions that accept a root parameter and immediately
overwrite it. We can pass those an fs_info pointer instead.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Robbie Ko [Fri, 7 Oct 2016 09:30:47 +0000 (17:30 +0800)]
Btrfs: fix tree search logic when replaying directory entry deletes
If a log tree has a layout like the following:
leaf N:
...
item 240 key (282 DIR_LOG_ITEM 0) itemoff 8189 itemsize 8
dir log end
1275809046
leaf N + 1:
item 0 key (282 DIR_LOG_ITEM
3936149215) itemoff 16275 itemsize 8
dir log end
18446744073709551615
...
When we pass the value
1275809046 + 1 as the parameter start_ret to the
function tree-log.c:find_dir_range() (done by replay_dir_deletes()), we
end up with path->slots[0] having the value 239 (points to the last item
of leaf N, item 240). Because the dir log item in that position has an
offset value smaller than *start_ret (
1275809046 + 1) we need to move on
to the next leaf, however the logic for that is wrong since it compares
the current slot to the number of items in the leaf, which is smaller
and therefore we don't lookup for the next leaf but instead we set the
slot to point to an item that does not exist, at slot 240, and we later
operate on that slot which has unexpected content or in the worst case
can result in an invalid memory access (accessing beyond the last page
of leaf N's extent buffer).
So fix the logic that checks when we need to lookup at the next leaf
by first incrementing the slot and only after to check if that slot
is beyond the last item of the current leaf.
Signed-off-by: Robbie Ko <robbieko@synology.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Fixes: e02119d5a7b4 (Btrfs: Add a write ahead tree log to optimize synchronous operations)
Cc: stable@vger.kernel.org # 2.6.29+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
[Modified changelog for clarity and correctness]
Robbie Ko [Fri, 28 Oct 2016 02:48:26 +0000 (10:48 +0800)]
Btrfs: fix deadlock caused by fsync when logging directory entries
While logging new directory entries, at tree-log.c:log_new_dir_dentries(),
after we call btrfs_search_forward() we get a leaf with a read lock on it,
and without unlocking that leaf we can end up calling btrfs_iget() to get
an inode pointer. The later (btrfs_iget()) can end up doing a read-only
search on the same tree again, if the inode is not in memory already, which
ends up causing a deadlock if some other task in the meanwhile started a
write search on the tree and is attempting to write lock the same leaf
that btrfs_search_forward() locked while holding write locks on upper
levels of the tree blocking the read search from btrfs_iget(). In this
scenario we get a deadlock.
So fix this by releasing the search path before calling btrfs_iget() at
tree-log.c:log_new_dir_dentries().
Example trace of such deadlock:
[ 4077.478852] kworker/u24:10 D
ffff88107fc90640 0 14431 2 0x00000000
[ 4077.486752] Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs]
[ 4077.494346]
ffff880ffa56bad0 0000000000000046 0000000000009000 ffff880ffa56bfd8
[ 4077.502629]
ffff880ffa56bfd8 ffff881016ce21c0 ffffffffa06ecb26 ffff88101a5d6138
[ 4077.510915]
ffff880ebb5173b0 ffff880ffa56baf8 ffff880ebb517410 ffff881016ce21c0
[ 4077.519202] Call Trace:
[ 4077.528752] [<
ffffffffa06ed5ed>] ? btrfs_tree_lock+0xdd/0x2f0 [btrfs]
[ 4077.536049] [<
ffffffff81053680>] ? wake_up_atomic_t+0x30/0x30
[ 4077.542574] [<
ffffffffa068cc1f>] ? btrfs_search_slot+0x79f/0xb10 [btrfs]
[ 4077.550171] [<
ffffffffa06a5073>] ? btrfs_lookup_file_extent+0x33/0x40 [btrfs]
[ 4077.558252] [<
ffffffffa06c600b>] ? __btrfs_drop_extents+0x13b/0xdf0 [btrfs]
[ 4077.566140] [<
ffffffffa06fc9e2>] ? add_delayed_data_ref+0xe2/0x150 [btrfs]
[ 4077.573928] [<
ffffffffa06fd629>] ? btrfs_add_delayed_data_ref+0x149/0x1d0 [btrfs]
[ 4077.582399] [<
ffffffffa06cf3c0>] ? __set_extent_bit+0x4c0/0x5c0 [btrfs]
[ 4077.589896] [<
ffffffffa06b4a64>] ? insert_reserved_file_extent.constprop.75+0xa4/0x320 [btrfs]
[ 4077.599632] [<
ffffffffa06b206d>] ? start_transaction+0x8d/0x470 [btrfs]
[ 4077.607134] [<
ffffffffa06bab57>] ? btrfs_finish_ordered_io+0x2e7/0x600 [btrfs]
[ 4077.615329] [<
ffffffff8104cbc2>] ? process_one_work+0x142/0x3d0
[ 4077.622043] [<
ffffffff8104d729>] ? worker_thread+0x109/0x3b0
[ 4077.628459] [<
ffffffff8104d620>] ? manage_workers.isra.26+0x270/0x270
[ 4077.635759] [<
ffffffff81052b0f>] ? kthread+0xaf/0xc0
[ 4077.641404] [<
ffffffff81052a60>] ? kthread_create_on_node+0x110/0x110
[ 4077.648696] [<
ffffffff814a9ac8>] ? ret_from_fork+0x58/0x90
[ 4077.654926] [<
ffffffff81052a60>] ? kthread_create_on_node+0x110/0x110
[ 4078.358087] kworker/u24:15 D
ffff88107fcd0640 0 14436 2 0x00000000
[ 4078.365981] Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs]
[ 4078.373574]
ffff880ffa57fad0 0000000000000046 0000000000009000 ffff880ffa57ffd8
[ 4078.381864]
ffff880ffa57ffd8 ffff88103004d0a0 ffffffffa06ecb26 ffff88101a5d6138
[ 4078.390163]
ffff880fbeffc298 ffff880ffa57faf8 ffff880fbeffc2f8 ffff88103004d0a0
[ 4078.398466] Call Trace:
[ 4078.408019] [<
ffffffffa06ed5ed>] ? btrfs_tree_lock+0xdd/0x2f0 [btrfs]
[ 4078.415322] [<
ffffffff81053680>] ? wake_up_atomic_t+0x30/0x30
[ 4078.421844] [<
ffffffffa068cc1f>] ? btrfs_search_slot+0x79f/0xb10 [btrfs]
[ 4078.429438] [<
ffffffffa06a5073>] ? btrfs_lookup_file_extent+0x33/0x40 [btrfs]
[ 4078.437518] [<
ffffffffa06c600b>] ? __btrfs_drop_extents+0x13b/0xdf0 [btrfs]
[ 4078.445404] [<
ffffffffa06fc9e2>] ? add_delayed_data_ref+0xe2/0x150 [btrfs]
[ 4078.453194] [<
ffffffffa06fd629>] ? btrfs_add_delayed_data_ref+0x149/0x1d0 [btrfs]
[ 4078.461663] [<
ffffffffa06cf3c0>] ? __set_extent_bit+0x4c0/0x5c0 [btrfs]
[ 4078.469161] [<
ffffffffa06b4a64>] ? insert_reserved_file_extent.constprop.75+0xa4/0x320 [btrfs]
[ 4078.478893] [<
ffffffffa06b206d>] ? start_transaction+0x8d/0x470 [btrfs]
[ 4078.486388] [<
ffffffffa06bab57>] ? btrfs_finish_ordered_io+0x2e7/0x600 [btrfs]
[ 4078.494561] [<
ffffffff8104cbc2>] ? process_one_work+0x142/0x3d0
[ 4078.501278] [<
ffffffff8104a507>] ? pwq_activate_delayed_work+0x27/0x40
[ 4078.508673] [<
ffffffff8104d729>] ? worker_thread+0x109/0x3b0
[ 4078.515098] [<
ffffffff8104d620>] ? manage_workers.isra.26+0x270/0x270
[ 4078.522396] [<
ffffffff81052b0f>] ? kthread+0xaf/0xc0
[ 4078.528032] [<
ffffffff81052a60>] ? kthread_create_on_node+0x110/0x110
[ 4078.535325] [<
ffffffff814a9ac8>] ? ret_from_fork+0x58/0x90
[ 4078.541552] [<
ffffffff81052a60>] ? kthread_create_on_node+0x110/0x110
[ 4079.355824] user-space-program D
ffff88107fd30640 0 32020 1 0x00000000
[ 4079.363716]
ffff880eae8eba10 0000000000000086 0000000000009000 ffff880eae8ebfd8
[ 4079.372003]
ffff880eae8ebfd8 ffff881016c162c0 ffffffffa06ecb26 ffff88101a5d6138
[ 4079.380294]
ffff880fbed4b4c8 ffff880eae8eba38 ffff880fbed4b528 ffff881016c162c0
[ 4079.388586] Call Trace:
[ 4079.398134] [<
ffffffffa06ed595>] ? btrfs_tree_lock+0x85/0x2f0 [btrfs]
[ 4079.405431] [<
ffffffff81053680>] ? wake_up_atomic_t+0x30/0x30
[ 4079.411955] [<
ffffffffa06876fb>] ? btrfs_lock_root_node+0x2b/0x40 [btrfs]
[ 4079.419644] [<
ffffffffa068ce83>] ? btrfs_search_slot+0xa03/0xb10 [btrfs]
[ 4079.427237] [<
ffffffffa06aba52>] ? btrfs_buffer_uptodate+0x52/0x70 [btrfs]
[ 4079.435041] [<
ffffffffa0689b60>] ? generic_bin_search.constprop.38+0x80/0x190 [btrfs]
[ 4079.443897] [<
ffffffffa068ea44>] ? btrfs_insert_empty_items+0x74/0xd0 [btrfs]
[ 4079.451975] [<
ffffffffa072c443>] ? copy_items+0x128/0x850 [btrfs]
[ 4079.458890] [<
ffffffffa072da10>] ? btrfs_log_inode+0x629/0xbf3 [btrfs]
[ 4079.466292] [<
ffffffffa06f34a1>] ? btrfs_log_inode_parent+0xc61/0xf30 [btrfs]
[ 4079.474373] [<
ffffffffa06f45a9>] ? btrfs_log_dentry_safe+0x59/0x80 [btrfs]
[ 4079.482161] [<
ffffffffa06c298d>] ? btrfs_sync_file+0x20d/0x330 [btrfs]
[ 4079.489558] [<
ffffffff8112777c>] ? do_fsync+0x4c/0x80
[ 4079.495300] [<
ffffffff81127a0a>] ? SyS_fdatasync+0xa/0x10
[ 4079.501422] [<
ffffffff814a9b72>] ? system_call_fastpath+0x16/0x1b
[ 4079.508334] user-space-program D
ffff88107fc30640 0 32021 1 0x00000004
[ 4079.516226]
ffff880eae8efbf8 0000000000000086 0000000000009000 ffff880eae8effd8
[ 4079.524513]
ffff880eae8effd8 ffff881030279610 ffffffffa06ecb26 ffff88101a5d6138
[ 4079.532802]
ffff880ebb671d88 ffff880eae8efc20 ffff880ebb671de8 ffff881030279610
[ 4079.541092] Call Trace:
[ 4079.550642] [<
ffffffffa06ed595>] ? btrfs_tree_lock+0x85/0x2f0 [btrfs]
[ 4079.557941] [<
ffffffff81053680>] ? wake_up_atomic_t+0x30/0x30
[ 4079.564463] [<
ffffffffa068cc1f>] ? btrfs_search_slot+0x79f/0xb10 [btrfs]
[ 4079.572058] [<
ffffffffa06bb7d8>] ? btrfs_truncate_inode_items+0x168/0xb90 [btrfs]
[ 4079.580526] [<
ffffffffa06b04be>] ? join_transaction.isra.15+0x1e/0x3a0 [btrfs]
[ 4079.588701] [<
ffffffffa06b206d>] ? start_transaction+0x8d/0x470 [btrfs]
[ 4079.596196] [<
ffffffffa0690ac6>] ? block_rsv_add_bytes+0x16/0x50 [btrfs]
[ 4079.603789] [<
ffffffffa06bc2e9>] ? btrfs_truncate+0xe9/0x2e0 [btrfs]
[ 4079.610994] [<
ffffffffa06bd00b>] ? btrfs_setattr+0x30b/0x410 [btrfs]
[ 4079.618197] [<
ffffffff81117c1c>] ? notify_change+0x1dc/0x680
[ 4079.624625] [<
ffffffff8123c8a4>] ? aa_path_perm+0xd4/0x160
[ 4079.630854] [<
ffffffff810f4fcb>] ? do_truncate+0x5b/0x90
[ 4079.636889] [<
ffffffff810f59fa>] ? do_sys_ftruncate.constprop.15+0x10a/0x160
[ 4079.644869] [<
ffffffff8110d87b>] ? SyS_fcntl+0x5b/0x570
[ 4079.650805] [<
ffffffff814a9b72>] ? system_call_fastpath+0x16/0x1b
[ 4080.410607] user-space-program D
ffff88107fc70640 0 32028 12639 0x00000004
[ 4080.418489]
ffff880eaeccbbe0 0000000000000086 0000000000009000 ffff880eaeccbfd8
[ 4080.426778]
ffff880eaeccbfd8 ffff880f317ef1e0 ffffffffa06ecb26 ffff88101a5d6138
[ 4080.435067]
ffff880ef7e93928 ffff880f317ef1e0 ffff880eaeccbc08 ffff880f317ef1e0
[ 4080.443353] Call Trace:
[ 4080.452920] [<
ffffffffa06ed15d>] ? btrfs_tree_read_lock+0xdd/0x190 [btrfs]
[ 4080.460703] [<
ffffffff81053680>] ? wake_up_atomic_t+0x30/0x30
[ 4080.467225] [<
ffffffffa06876bb>] ? btrfs_read_lock_root_node+0x2b/0x40 [btrfs]
[ 4080.475400] [<
ffffffffa068cc81>] ? btrfs_search_slot+0x801/0xb10 [btrfs]
[ 4080.482994] [<
ffffffffa06b2df0>] ? btrfs_clean_one_deleted_snapshot+0xe0/0xe0 [btrfs]
[ 4080.491857] [<
ffffffffa06a70a6>] ? btrfs_lookup_inode+0x26/0x90 [btrfs]
[ 4080.499353] [<
ffffffff810ec42f>] ? kmem_cache_alloc+0xaf/0xc0
[ 4080.505879] [<
ffffffffa06bd905>] ? btrfs_iget+0xd5/0x5d0 [btrfs]
[ 4080.512696] [<
ffffffffa06caf04>] ? btrfs_get_token_64+0x104/0x120 [btrfs]
[ 4080.520387] [<
ffffffffa06f341f>] ? btrfs_log_inode_parent+0xbdf/0xf30 [btrfs]
[ 4080.528469] [<
ffffffffa06f45a9>] ? btrfs_log_dentry_safe+0x59/0x80 [btrfs]
[ 4080.536258] [<
ffffffffa06c298d>] ? btrfs_sync_file+0x20d/0x330 [btrfs]
[ 4080.543657] [<
ffffffff8112777c>] ? do_fsync+0x4c/0x80
[ 4080.549399] [<
ffffffff81127a0a>] ? SyS_fdatasync+0xa/0x10
[ 4080.555534] [<
ffffffff814a9b72>] ? system_call_fastpath+0x16/0x1b
Signed-off-by: Robbie Ko <robbieko@synology.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Fixes: 2f2ff0ee5e43 (Btrfs: fix metadata inconsistencies after directory fsync)
Cc: stable@vger.kernel.org # 4.1+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
[Modified changelog for clarity and correctness]
Robbie Ko [Fri, 28 Oct 2016 02:32:54 +0000 (10:32 +0800)]
Btrfs: fix enospc in hole punching
The hole punching can result in adding new leafs (and as a consequence
new nodes) to the tree because when we find file extent items that span
beyond the hole range we may end up not deleting them (just adjusting
them, reducing their range by reducing their length or increasing their
offset field) and add new file extent items representing holes.
So after splitting a leaf (therefore creating a new one) to insert a new
file extent item representing a hole, a new node might be added to each
level of the tree in the worst case scenario (since there's a new key
and every parent node was full).
For example if a file has an extent item representing the range 0 to 64Mb
and we punch a hole in the range 1Mb to 20Mb, the existing extent item is
duplicated and one of the copies is adjusted to represent the range 0 to
1Mb, the other copy adjusted to represent the range 20Mb to 64Mb, and a
new file extent item representing a hole in the range 1Mb to 20Mb is
inserted.
Fix this by using btrfs_calc_trans_metadata_size() instead of
btrfs_calc_trunc_metadata_size(), so that enough metadata space is
reserved for the worst possible case.
Signed-off-by: Robbie Ko <robbieko@synology.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
[Modified changelog for clarity and correctness]
David Sterba [Wed, 30 Nov 2016 13:02:20 +0000 (14:02 +0100)]
Merge branch 'misc-4.10' into for-chris-4.10-
20161130
Wang Xiaoguang [Wed, 26 Oct 2016 10:07:33 +0000 (18:07 +0800)]
btrfs: improve delayed refs iterations
This issue was found when I tried to delete a heavily reflinked file,
when deleting such files, other transaction operation will not have a
chance to make progress, for example, start_transaction() will blocked
in wait_current_trans(root) for long time, sometimes it even triggers
soft lockups, and the time taken to delete such heavily reflinked file
is also very large, often hundreds of seconds. Using perf top, it reports
that:
PerfTop: 7416 irqs/sec kernel:99.8% exact: 0.0% [4000Hz cpu-clock], (all, 4 CPUs)
---------------------------------------------------------------------------------------
84.37% [btrfs] [k] __btrfs_run_delayed_refs.constprop.80
11.02% [kernel] [k] delay_tsc
0.79% [kernel] [k] _raw_spin_unlock_irq
0.78% [kernel] [k] _raw_spin_unlock_irqrestore
0.45% [kernel] [k] do_raw_spin_lock
0.18% [kernel] [k] __slab_alloc
It seems __btrfs_run_delayed_refs() took most cpu time, after some debug
work, I found it's select_delayed_ref() causing this issue, for a delayed
head, in our case, it'll be full of BTRFS_DROP_DELAYED_REF nodes, but
select_delayed_ref() will firstly try to iterate node list to find
BTRFS_ADD_DELAYED_REF nodes, obviously it's a disaster in this case, and
waste much time.
To fix this issue, we introduce a new ref_add_list in struct btrfs_delayed_ref_head,
then in select_delayed_ref(), if this list is not empty, we can directly use
nodes in this list. With this patch, it just took about 10~15 seconds to
delte the same file. Now using perf top, it reports that:
PerfTop: 2734 irqs/sec kernel:99.5% exact: 0.0% [4000Hz cpu-clock], (all, 4 CPUs)
----------------------------------------------------------------------------------------
20.74% [kernel] [k] _raw_spin_unlock_irqrestore
16.33% [kernel] [k] __slab_alloc
5.41% [kernel] [k] lock_acquired
4.42% [kernel] [k] lock_acquire
4.05% [kernel] [k] lock_release
3.37% [kernel] [k] _raw_spin_unlock_irq
For normal files, this patch also gives help, at least we do not need to
iterate whole list to found BTRFS_ADD_DELAYED_REF nodes.
Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Qu Wenruo [Tue, 18 Oct 2016 01:31:29 +0000 (09:31 +0800)]
btrfs: qgroup: Fix qgroup data leaking by using subtree tracing
Commit
62b99540a1d91e464 (btrfs: relocation: Fix leaking qgroups numbers
on data extents) only fixes the problem partly.
The previous fix is to trace all new data extents at transaction commit
time when balance finishes.
However balance is not done in a large transaction, every path
replacement can happen in its own transaction.
This makes the fix useless if transaction commits during relocation.
For example:
relocate_block_group()
|-merge_reloc_roots()
| |- merge_reloc_root()
| |- btrfs_start_transaction() <- Trans X
| |- replace_path() <- Cause leak
| |- btrfs_end_transaction_throttle() <- Trans X commits here
| | Leak not fixed
| |
| |- btrfs_start_transaction() <- Trans Y
| |- replace_path() <- Cause leak
| |- btrfs_end_transaction_throttle() <- Trans Y ends
| but not committed
|-btrfs_join_transaction() <- Still trans Y
|-qgroup_fix() <- Only fixes data leak
| in trans Y
|-btrfs_commit_transaction() <- Trans Y commits
In that case, qgroup fixup can only fix data leak in trans Y, data leak
in trans X is out of fix.
So the correct fix should happen in the same transaction of
replace_path().
This patch fixes it by tracing both subtrees of tree block swap, so it
can fix the problem and ensure all leaking and fix are in the same
transaction, so no leak again.
Reported-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-and-Tested-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Qu Wenruo [Tue, 18 Oct 2016 01:31:28 +0000 (09:31 +0800)]
btrfs: Export and move leaf/subtree qgroup helpers to qgroup.c
Move account_shared_subtree() to qgroup.c and rename it to
btrfs_qgroup_trace_subtree().
Do the same thing for account_leaf_items() and rename it to
btrfs_qgroup_trace_leaf_items().
Since all these functions are only for qgroup, move them to qgroup.c and
export them is more appropriate.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-and-Tested-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Qu Wenruo [Tue, 18 Oct 2016 01:31:27 +0000 (09:31 +0800)]
btrfs: qgroup: Rename functions to make it follow reserve,trace,account steps
Rename btrfs_qgroup_insert_dirty_extent(_nolock) to
btrfs_qgroup_trace_extent(_nolock), according to the new
reserve/trace/account naming schema.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-and-Tested-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Qu Wenruo [Tue, 18 Oct 2016 01:31:26 +0000 (09:31 +0800)]
btrfs: qgroup: Add comments explaining how btrfs qgroup works
Add explaination how btrfs qgroups work.
Qgroup is split into 3 main phrases:
1) Reserve
To ensure qgroup doesn't exceed its limit
2) Trace
To info qgroup to trace which extent
3) Account
Calculate qgroup number change for each traced extent.
This should save quite some time for new developers.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Christoph Hellwig [Fri, 25 Nov 2016 08:07:53 +0000 (09:07 +0100)]
btrfs: use bio_for_each_segment_all in __btrfsic_submit_bio
And remove the bogus check for a NULL return value from kmap, which
can't happen. While we're at it: I don't think that kmapping up to 256
will work without deadlocks on highmem machines, a better idea would
be to use vm_map_ram to map all of them into a single virtual address
range. Incidentally that would also simplify the code a lot.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Christoph Hellwig [Fri, 25 Nov 2016 08:07:52 +0000 (09:07 +0100)]
btrfs: refactor __btrfs_lookup_bio_sums to use bio_for_each_segment_all
Rework the loop a little bit to use the generic bio_for_each_segment_all
helper for iterating over the bio.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Christoph Hellwig [Fri, 25 Nov 2016 08:07:51 +0000 (09:07 +0100)]
btrfs: calculate end of bio offset properly
Use the bvec offset and len members to prepare for multipage bvecs.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David Sterba <dsterba@suse.com>
Christoph Hellwig [Fri, 25 Nov 2016 08:07:50 +0000 (09:07 +0100)]
btrfs: use bi_size
Instead of using bi_vcnt to calculate it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Christoph Hellwig [Fri, 25 Nov 2016 08:07:49 +0000 (09:07 +0100)]
btrfs: don't access the bio directly in btrfs_csum_one_bio
Use bio_for_each_segment_all to iterate over the segments instead.
This requires a bit of reshuffling so that we only lookup up the ordered
item once inside the bio_for_each_segment_all loop.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Christoph Hellwig [Fri, 25 Nov 2016 08:07:48 +0000 (09:07 +0100)]
btrfs: don't access the bio directly in the direct I/O code
Just use bio_for_each_segment_all to iterate over all segments.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Christoph Hellwig [Fri, 25 Nov 2016 08:07:47 +0000 (09:07 +0100)]
btrfs: don't access the bio directly in the raid5/6 code
Just use bio_for_each_segment_all to iterate over all segments.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Christoph Hellwig [Fri, 25 Nov 2016 08:07:46 +0000 (09:07 +0100)]
btrfs: use bio iterators for the decompression handlers
Pass the full bio to the decompression routines and use bio iterators
to iterate over the data in the bio.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David Sterba <dsterba@suse.com>
Jeff Mahoney [Sat, 19 Nov 2016 02:52:40 +0000 (21:52 -0500)]
btrfs: Ensure proper sector alignment for btrfs_free_reserved_data_space
This fixes the WARN_ON on BTRFS_I(inode)->reserved_extents in
btrfs_destroy_inode and the WARN_ON on nonzero delalloc bytes on umount
with qgroups enabled.
I was able to reproduce this by setting up a small (~500kb) quota limit
and writing a file one byte at a time until I hit the limit. The warnings
would all hit on umount.
The root cause is that we would reserve a block-sized range in both
the reservation and the quota in btrfs_check_data_free_space, but if we
encountered a problem (like e.g. EDQUOT), we would only release the single
byte in the qgroup reservation. That caused an iotree state split, which
increased the number of outstanding extents, in turn disallowing releasing
the metadata reservation.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Reviewed-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Josef Bacik [Mon, 14 Nov 2016 19:06:22 +0000 (14:06 -0500)]
Btrfs: abort transaction if fill_holes() fails
At this point we will have dropped extent entries from the file, so if we fail
to insert the new hole entries then we are leaving the fs in a corrupt state
(albeit an easily fixed one). Abort the transaciton if this happens so we can
avoid corrupting the fs. Thanks,
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Josef Bacik [Wed, 16 Nov 2016 14:13:39 +0000 (09:13 -0500)]
Btrfs: fix file extent corruption
In order to do hole punching we have a block reserve to hold the reservation we
need to drop the extents in our range. Since we could end up dropping a lot of
extents we set rsv->failfast so we can just loop around again and drop the
remaining of the range. Unfortunately we unconditionally fill the hole extents
in and start from the last extent we encountered, which we may or may not have
dropped. So this can result in overlapping file extent entries, which can be
tripped over in a variety of ways, either by hitting BUG_ON(!ret) in
fill_holes() after the search, or in btrfs_set_item_key_safe() in
btrfs_drop_extent() at a later time by an unrelated task. Fix this by only
setting drop_end to the last extent we did actually drop. This way our holes
are filled in properly for the range that we did drop, and the rest of the range
that remains to be dropped is actually dropped. Thanks,
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Jeff Mahoney [Sat, 5 Nov 2016 17:26:35 +0000 (13:26 -0400)]
btrfs: increment ctx->pos for every emitted or skipped dirent in readdir
If we process the last item in the leaf and hit an I/O error while
reading the next leaf, we return -EIO without having adjusted the
position. Since we have emitted dirents, getdents() will return
the byte count to the user instead of the error. Subsequent callers
will emit the last successful dirent again, and return -EIO again,
with the same result. Callers loop forever.
Instead, if we always increment ctx->pos after emitting or skipping
the dirent, we'll be sure that we won't hit the same one again. When
we go to process the next leaf, we won't have emitted any dirents
and the -EIO will be returned to the user properly. We also don't
need to track if we've emitted a dirent already or if we've changed
the position yet.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Jeff Mahoney [Mon, 21 Nov 2016 14:59:04 +0000 (15:59 +0100)]
btrfs: remove old tree_root dirent processing in btrfs_real_readdir()
Commit
3de4586c527 (Btrfs: Allow subvolumes and snapshots anywhere
in the directory tree) introduced the current system of placing
snapshots in the directory tree. It also introduced the behavior of
creating the snapshot and then creating the directory entries for it.
We've kept this code around for compatibility reasons, but it turns
out that no file systems with the old tree_root based snapshots can
be mounted on newer (>= 2009) kernels anyway. About a month after the
above commit, commit
2a7108ad89e (Btrfs: rev the disk format for the
inode compat and csum selection changes) landed, changing the superblock
magic number.
As a result, we know that we'll never encounter tree_root-based dirents
or have to deal with skipping our own snapshot dirents. Since that
also means that we're now only iterating over DIR_INDEX items, which only
contain one directory entry per leaf item, we don't need to loop over
the leaf item contents anymore either.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Nick Terrell [Wed, 2 Nov 2016 03:25:27 +0000 (20:25 -0700)]
btrfs: Call kunmap if zlib_inflateInit2 fails
If zlib_inflateInit2 fails, the input page is never unmapped.
Add a call to kunmap when it fails.
Signed-off-by: Nick Terrell <nickrterrell@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Tue, 1 Nov 2016 13:21:23 +0000 (14:21 +0100)]
btrfs: store and load values of stripes_min/stripes_max in balance status item
The balance status item contains currently known filter values, but the
stripes filter was unintentionally not among them. This would mean, that
interrupted and automatically restarted balance does not apply the
stripe filters.
Fixes: dee32d0ac3719ef8d640efaf0884111df444730f
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: David Sterba <dsterba@suse.com>
Christophe JAILLET [Tue, 1 Nov 2016 10:26:06 +0000 (11:26 +0100)]
btrfs: remove redundant check of btrfs_iget return value
'btrfs_iget()' can not return NULL, so this test can be removed.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Domagoj Tršan [Thu, 27 Oct 2016 07:52:33 +0000 (08:52 +0100)]
btrfs: change btrfs_csum_final result param type to u8
csum member of struct btrfs_super_block has array type of u8. It makes
sense that function btrfs_csum_final should be also declared to accept
u8 *. I changed the declaration of method void btrfs_csum_final(u32 crc,
char *result); to void btrfs_csum_final(u32 crc, u8 *result);
Signed-off-by: Domagoj Tršan <domagoj.trsan@gmail.com>
[ changed cast to u8 at several call sites ]
Signed-off-by: David Sterba <dsterba@suse.com>
Liu Bo [Fri, 4 Nov 2016 19:20:54 +0000 (12:20 -0700)]
Btrfs: adjust len of writes if following a preallocated extent
If we have
|0--hole--4095||4096--preallocate--12287|
instead of using preallocated space, a 8K direct write will just
create a new 8K extent and it'll end up with
|0--new extent--8191||8192--preallocate--12287|
It's because we find a hole em and then go to create a new 8K
extent directly without adjusting @len.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: Chris Mason <clm@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Shailendra Verma [Thu, 10 Nov 2016 09:47:41 +0000 (15:17 +0530)]
btrfs: return early from failed memory allocations in ioctl handlers
There is no need to call kfree() if memdup_user() fails, as no memory
was allocated and the error in the error-valued pointer should be returned.
Signed-off-by: Shailendra Verma <shailendra.v@samsung.com>
[ edit subject ]
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Tue, 8 Nov 2016 17:30:31 +0000 (18:30 +0100)]
btrfs: add optimized version of eb to eb copy
Using copy_extent_buffer is suitable for copying betwenn buffers from an
arbitrary offset and deals with page boundaries. This is not necessary
when doing a full extent_buffer-to-extent_buffer copy. We can utilize
the copy_page helper as well.
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Tue, 8 Nov 2016 17:09:03 +0000 (18:09 +0100)]
btrfs: remove constant parameter to memset_extent_buffer and rename it
The only memset we do is to 0, so sink the parameter to the function and
simplify all calls. Rename the function to reflect the behaviour.
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Tue, 8 Nov 2016 16:56:24 +0000 (17:56 +0100)]
btrfs: use specialized page copying helpers in btrfs_clone_extent_buffer
The copy_page is usually optimized and can be faster than memcpy.
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Wed, 9 Nov 2016 16:44:25 +0000 (17:44 +0100)]
btrfs: use new helpers to set uuids in eb
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Wed, 9 Nov 2016 16:43:38 +0000 (17:43 +0100)]
btrfs: introduce helpers for updating eb uuids
The fsid and chunk tree uuid are always located in the first page,
we don't need the to use write_extent_buffer.
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Tue, 8 Nov 2016 23:03:12 +0000 (00:03 +0100)]
btrfs: delete unused member from superblock
__bdev' has never been used since
0b86a832a1f38abec695864ec2eaedc9d2383f1b (2008).
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Tue, 8 Nov 2016 22:21:05 +0000 (23:21 +0100)]
btrfs: remove trivial helper btrfs_find_tree_block
During the time, the function has been shrunk to the point that it just
calls find_extent_buffer, just passing the parameters.
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Tue, 8 Nov 2016 16:18:35 +0000 (17:18 +0100)]
btrfs: reada, remove pointless BUG_ON check for fs_info
We dereference fs_info several times, besides that post-mount functions
should never see a NULL fs_info.
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Tue, 8 Nov 2016 16:11:27 +0000 (17:11 +0100)]
btrfs: reada, remove pointless BUG_ON in reada_find_extent
The lock is held, we make the same lookup that previously failed with
EEXIST and we don't insert NULL pointers.
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Tue, 8 Nov 2016 12:50:03 +0000 (13:50 +0100)]
btrfs: reada, sink start parameter to btree_readahead_hook
Originally, the eb and start were passed separately in case eb is NULL.
Since the readahead has been refactored in 4.6, this is not true anymore
and we can get rid of the parameter.
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Tue, 8 Nov 2016 12:39:05 +0000 (13:39 +0100)]
btrfs: reada, remove unused parameter from __readahead_hook
'start' is not used since "btrfs: reada: Pass reada_extent into
__readahead_hook directly" (
6e39dbe8b9e55280c).
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Tue, 8 Nov 2016 12:32:43 +0000 (13:32 +0100)]
btrfs: reada, cleanup remove unneeded variable in __readahead_hook
We can't touch the eb directly in case the function is called with a
non-zero error, so we can read the eb level when needed.
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Wed, 26 Oct 2016 14:23:50 +0000 (16:23 +0200)]
btrfs: rename helper macros for qgroup and aux data casts
The helpers are not meant to be generic, the name is misleading. Convert
them to static inlines for type checking.
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Wed, 5 Oct 2016 12:23:06 +0000 (14:23 +0200)]
btrfs: remove stale comment from btrfs_statfs
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Wed, 5 Oct 2016 12:23:05 +0000 (14:23 +0200)]
btrfs: remove unused headers, statfs.h
Signed-off-by: David Sterba <dsterba@suse.com>
Xiaoguang Wang [Fri, 23 Sep 2016 04:38:50 +0000 (12:38 +0800)]
btrfs: remove useless comments
Fixes: ("btrfs: update btrfs_space_info's bytes_may_use timely")
Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Adam Borowski [Mon, 14 Nov 2016 17:44:34 +0000 (18:44 +0100)]
btrfs: make block group flags in balance printks human-readable
They're not even documented anywhere, letting users with no recourse but
to RTFS. It's no big burden to output the bitfield as words.
Also, display unknown flags as hex.
Signed-off-by: Adam Borowski <kilobyte@angband.pl>
Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Omar Sandoval [Wed, 9 Nov 2016 23:26:50 +0000 (15:26 -0800)]
Btrfs: deal with existing encompassing extent map in btrfs_get_extent()
My QEMU VM was seeing inexplicable I/O errors that I tracked down to
errors coming from the qcow2 virtual drive in the host system. The qcow2
file is a nocow file on my Btrfs drive, which QEMU opens with O_DIRECT.
Every once in awhile, pread() or pwrite() would return EEXIST, which
makes no sense. This turned out to be a bug in btrfs_get_extent().
Commit
8dff9c853410 ("Btrfs: deal with duplciates during extent_map
insertion in btrfs_get_extent") fixed a case in btrfs_get_extent() where
two threads race on adding the same extent map to an inode's extent map
tree. However, if the added em is merged with an adjacent em in the
extent tree, then we'll end up with an existing extent that is not
identical to but instead encompasses the extent we tried to add. When we
call merge_extent_mapping() to find the nonoverlapping part of the new
em, the arithmetic overflows because there is no such thing. We then end
up trying to add a bogus em to the em_tree, which results in a EEXIST
that can bubble all the way up to userspace.
Fix it by extending the identical extent map special case.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Wang Xiaoguang [Mon, 7 Nov 2016 07:59:16 +0000 (15:59 +0800)]
btrfs: add necessary comments about tickets_id
Tickets_id's name may result in some misunderstandings, it just indicates
the next ticket will be handled and is not stored per ticket.
Fixes: ce12965 ("btrfs: introduce tickets_id to determine whether
asynchronous metadata reclaim work makes progress")
Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Wang Xiaoguang [Wed, 26 Oct 2016 07:23:01 +0000 (15:23 +0800)]
btrfs: cleanup: use already calculated value in btrfs_should_throttle_delayed_refs()
Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Christoph Hellwig [Thu, 27 Oct 2016 07:27:36 +0000 (09:27 +0200)]
btrfs: don't abuse REQ_OP_* flags for btrfs_map_block
btrfs_map_block supports different types of mappings, which to a large
extent resemble block layer operations. But they don't always do, and
currently btrfs dangerously overlays it's own flag over the block layer
flags. This is just asking for a conflict, so introduce a different
map flags enum inside of btrfs instead.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Linus Torvalds [Sun, 27 Nov 2016 21:08:04 +0000 (13:08 -0800)]
Linux 4.9-rc7
Linus Torvalds [Sun, 27 Nov 2016 16:24:46 +0000 (08:24 -0800)]
Merge git://git.infradead.org/intel-iommu
Pull IOMMU fixes from David Woodhouse:
"Two minor fixes.
The first fixes the assignment of SR-IOV virtual functions to the
correct IOMMU unit, and the second fixes the excessively large (and
physically contiguous) PASID tables used with SVM"
* git://git.infradead.org/intel-iommu:
iommu/vt-d: Fix PASID table allocation
iommu/vt-d: Fix IOMMU lookup for SR-IOV Virtual Functions
Linus Torvalds [Sun, 27 Nov 2016 16:22:59 +0000 (08:22 -0800)]
Merge branch 'upstream' of git://git.linux-mips.org/ralf/upstream-linus
Pull MIPS fixes from Ralf Baechle:
"Another round of MIPS fixes for 4.9:
- Fix unreadable output in __do_page_fault due to the KERN_CONT
patchset
- Correctly handle MIPS R6 fixes to the c0_wired register"
* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
MIPS: mm: Fix output of __do_page_fault
MIPS: Mask out limit field when calculating wired entry count
Linus Torvalds [Sun, 27 Nov 2016 01:21:13 +0000 (17:21 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/viro/vfs
Pull vfs splice fix from Al Viro.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fix default_file_splice_read()
Al Viro [Sun, 27 Nov 2016 01:05:42 +0000 (20:05 -0500)]
fix default_file_splice_read()
Botched calculation of number of pages. As the result,
we were dropping pieces when doing splice to pipe from
e.g. 9p.
Reported-by: Alexei Starovoitov <ast@kernel.org>
Tested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Linus Torvalds [Sat, 26 Nov 2016 23:28:34 +0000 (15:28 -0800)]
Merge branch 'i2c/for-current' of git://git./linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"Here is a revert and two bugfixes for the I2C designware driver.
Please note that we are still hunting down a regression for the
i2c-octeon driver. While there is a fix pending, we have unclear
feedback from the testers currently. An rc8 would be quite helpful
for this case"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
Revert "i2c: designware: do not disable adapter after transfer"
i2c: designware: fix rx fifo depth tracking
i2c: designware: report short transfers
Linus Torvalds [Sat, 26 Nov 2016 23:26:20 +0000 (15:26 -0800)]
Merge branch 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm
Pull ARM fix from Russell King:
"This resolves the ksyms issues by reverting the commit which
introduced the breakage"
There was what I consider to be a better fix, but it's late in the rc
game, so I'll take the revert.
* 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm:
Revert "arm: move exports to definitions"
Linus Torvalds [Sat, 26 Nov 2016 21:05:05 +0000 (13:05 -0800)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
1) Fix leak in fsl/fman driver, from Dan Carpenter.
2) Call flow dissector initcall earlier than any networking driver can
register and start to use it, from Eric Dumazet.
3) Some dup header fixes from Geliang Tang.
4) TIPC link monitoring compat fix from Jon Paul Maloy.
5) Link changes require EEE re-negotiation in bcm_sf2 driver, from
Florian Fainelli.
6) Fix bogus handle ID passed into tfilter_notify_chain(), from Roman
Mashak.
7) Fix dump size calculation in rtnl_calcit(), from Zhang Shengju.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (26 commits)
tipc: resolve connection flow control compatibility problem
mvpp2: use correct size for memset
net/mlx5: drop duplicate header delay.h
net: ieee802154: drop duplicate header delay.h
ibmvnic: drop duplicate header seq_file.h
fsl/fman: fix a leak in tgec_free()
net: ethtool: don't require CAP_NET_ADMIN for ETHTOOL_GLINKSETTINGS
tipc: improve sanity check for received domain records
tipc: fix compatibility bug in link monitoring
net: ethernet: mvneta: Remove IFF_UNICAST_FLT which is not implemented
dwc_eth_qos: drop duplicate headers
net sched filters: fix filter handle ID in tfilter_notify_chain()
net: dsa: bcm_sf2: Ensure we re-negotiate EEE during after link change
bnxt: do not busy-poll when link is down
udplite: call proper backlog handlers
ipv6: bump genid when the IFA_F_TENTATIVE flag is clear
net/mlx4_en: Free netdev resources under state lock
net: revert "net: l2tp: Treat NET_XMIT_CN as success in l2tp_eth_dev_xmit"
rtnetlink: fix the wrong minimal dump size getting from rtnl_calcit()
bnxt_en: Fix a VXLAN vs GENEVE issue
...