git.openwrt.org Git - openwrt/staging/blogic.git/log

projects / openwrt / staging / blogic.git / log

Ben Hutchings [Fri, 24 Aug 2012 17:04:38 +0000 (18:04 +0100)]

sfc: Fix the initial device operstate

Following commit 8f4cccb ('net: Set device operstate at registration
time') it is now correct and preferable to set the carrier off before
registering a device.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>

commit | commitdiff | tree

Ben Hutchings [Thu, 2 Aug 2012 00:39:38 +0000 (01:39 +0100)]

sfc: Assign efx and efx->type as early as possible in efx_pci_probe()

We also stop clearing *efx in efx_init_struct(). This is safe because
alloc_etherdev_mq() already clears it for us.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>

commit | commitdiff | tree

Ben Hutchings [Fri, 27 Jul 2012 19:50:57 +0000 (20:50 +0100)]

sfc: Remove bogus comment about MTU change and RX buffer overrun

RX DMA is limited by the length specified in each descriptor and not
by the MAC. Over-length frames may get into the RX FIFO regardless of
the MAC settings, due to a hardware bug, but they will be truncated by
the packet DMA engine and reported as such in the completion event.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>

commit | commitdiff | tree

Ben Hutchings [Fri, 27 Jul 2012 19:50:54 +0000 (20:50 +0100)]

sfc: Remove overly paranoid locking assertions from netdev operations

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>

commit | commitdiff | tree

Ben Hutchings [Fri, 27 Jul 2012 19:50:52 +0000 (20:50 +0100)]

sfc: Fix reset vs probe/remove/PM races involving efx_nic::state

We try to defer resets while the device is not READY, but we're not
doing this quite correctly.  In particular, changes to efx_nic::state
are documented as serialised by the RTNL lock, but they aren't.

1. We check whether a reset was requested during probe (suggesting
broken hardware) before we allow requested resets to be scheduled.
This leaves a window where a requested reset would be deferred
indefinitely.

2. Although we cancel the reset work item during device removal,
there are still later operations that can cause it to be scheduled
again.  We need to check the state before scheduling it.

3. Since the state can change between scheduling and running of
the work item, we still need to check it there, and we need to
do so *after* acquiring the RTNL lock which serialises state
changes.

4. We must cancel the reset work item during device removal, if the
state could ever have been READY.  This wasn't done in some of the
failure paths from efx_pci_probe().  Move the cancellation to
efx_pci_remove_main().

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>

commit | commitdiff | tree

Ben Hutchings [Fri, 27 Jul 2012 19:48:36 +0000 (20:48 +0100)]

sfc: Improve log messages in case we abort probe due to a pending reset

The current informational message doesn't properly explain what
happens, and could also appear if we defer a reset during
suspend/resume.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>

commit | commitdiff | tree

Ben Hutchings [Fri, 27 Jul 2012 19:46:41 +0000 (20:46 +0100)]

sfc: Never try to stop and start a NIC that is disabled

efx_change_mtu() and efx_realloc_channels() each stop and start much
of the NIC, even if it has been disabled. Since efx_start_all() is a
no-op when the NIC is disabled, this is probably harmless in the case
of efx_change_mtu(), but efx_realloc_channels() also reenables
interrupts which could be a bad thing to do.

Change efx_start_all() and efx_start_interrupts() to assert that the
NIC is not disabled, but make efx_stop_interrupts() do nothing if the
NIC is disabled (since it is already stopped), consistent with
efx_stop_all().

Update comments for efx_start_all() and efx_stop_all() to describe
their purpose and preconditions more accurately.

Add a common function to check and log if the NIC is disabled, and use
it in efx_net_open(), efx_change_mtu() and efx_realloc_channels().

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>

commit | commitdiff | tree

Ben Hutchings [Fri, 27 Jul 2012 18:35:52 +0000 (19:35 +0100)]

sfc: Hold RTNL lock (only) when calling efx_stop_interrupts()

Interrupt state should be consistently guarded by the RTNL lock once
the net device is registered.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>

commit | commitdiff | tree

Ben Hutchings [Fri, 27 Jul 2012 18:35:47 +0000 (19:35 +0100)]

sfc: Keep disabled NICs quiescent during suspend/resume

Currently we ignore and clear the disabled state.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>

commit | commitdiff | tree

Ben Hutchings [Fri, 27 Jul 2012 18:35:39 +0000 (19:35 +0100)]

sfc: Hold the RTNL lock for more of the suspend/resume cycle

I don't think these PM functions can race with userland net device
operations, but it's much easier to reason about locking if state is
consistently guarded by the same lock.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>

commit | commitdiff | tree

Ben Hutchings [Fri, 27 Jul 2012 18:31:16 +0000 (19:31 +0100)]

sfc: Change state names to be clearer, and comment them

STATE_INIT and STATE_FINI are equivalent and represent incompletely
initialised states; combine them as STATE_UNINIT.

Rename STATE_RUNNING to STATE_READY, to avoid confusion with
netif_running() and IFF_RUNNING.

The comments do not quite match current usage, but this will be
corrected in subsequent fixes.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>

commit | commitdiff | tree

Ben Hutchings [Fri, 22 Jun 2012 01:44:01 +0000 (02:44 +0100)]

sfc: Stash header offsets for TSO in struct tso_state

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>

commit | commitdiff | tree

Ben Hutchings [Tue, 19 Jun 2012 19:03:41 +0000 (20:03 +0100)]

sfc: Replace tso_state::full_packet_space with ip_base_len

We only use tso_state::full_packet_space to calculate the IPv4 tot_len
or IPv6 payload_len, not to set tso_state::packet_space. Replace it
with an ip_base_len field holding the value of tot_len or payload_len
before including the TCP payload, which is much more useful when
constructing the new headers.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>

commit | commitdiff | tree

Ben Hutchings [Thu, 17 May 2012 17:40:54 +0000 (18:40 +0100)]

sfc: Simplify TSO header buffer allocation

TSO header buffers contain a control structure immediately followed by
the packet headers, and are kept on a free list when not in use. This
complicates buffer management and tends to result in cache read misses
when we recycle such buffers (particularly if DMA-coherent memory
requires caches to be disabled).

Replace the free list with a simple mapping by descriptor index. We
know that there is always a payload descriptor between any two
descriptors with TSO header buffers, so we can allocate only one
such buffer for each two descriptors.

While we're at it, use a standard error code for allocation failure,
not -1.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>

commit | commitdiff | tree

Ben Hutchings [Tue, 22 May 2012 00:27:58 +0000 (01:27 +0100)]

sfc: Stop TX queues before they fill up

We now have a definite upper bound on the number of descriptors per
skb; use that to stop the queue when the next packet might not fit.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>

commit | commitdiff | tree

Ben Hutchings [Thu, 17 May 2012 19:52:20 +0000 (20:52 +0100)]

sfc: Refactor struct efx_tx_buffer to use a flags field

Add a flags field to struct efx_tx_buffer, replacing the
continuation and map_single booleans.

Since a single descriptor cannot be both a TSO header and the last
descriptor for an skb, unionise efx_tx_buffer::{skb,tsoh} and add
flags for validity of these fields.

Clear all flags in free buffers (whereas previously the continuation
flag would be set).

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>

commit | commitdiff | tree

Ben Hutchings [Mon, 20 Aug 2012 21:16:51 +0000 (22:16 +0100)]

net: Set device operstate at registration time

The operstate of a device is initially IF_OPER_UNKNOWN and is updated
asynchronously by linkwatch after each change of carrier state
reported by the driver.  The default carrier state of a net device is
on, and this will never be changed on drivers that do not support
carrier detection, thus the operstate remains IF_OPER_UNKNOWN.

For devices that do support carrier detection, the driver must set the
carrier state to off initially, then poll the hardware state when the
device is opened.  However, we must not activate linkwatch for a
unregistered device, and commit b473001 ('net: Do not fire linkwatch
events until the device is registered.') ensured that we don't.  But
this means that the operstate for many devices that support carrier
detection remains IF_OPER_UNKNOWN when it should be IF_OPER_DOWN.

The same issue exists with the dormant state.

The proper initialisation sequence, avoiding a race with opening of
the device, is:

        rtnl_lock();
        rc = register_netdevice(dev);
        if (rc)
                goto out_unlock;
        netif_carrier_off(dev); /* or netif_dormant_on(dev) */
        rtnl_unlock();

but it seems silly that this should have to be repeated in so many
drivers.  Further, the operstate seen immediately after opening the
device may still be IF_OPER_UNKNOWN due to the asynchronous nature of
linkwatch.

Commit 22604c8 ('net: Fix for initial link state in 2.6.28') attempted
to fix this by setting the operstate synchronously, but it was
reverted as it could lead to deadlock.

This initialises the operstate synchronously at registration time
only.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Timur Tabi [Mon, 20 Aug 2012 09:26:39 +0000 (09:26 +0000)]

net/fsl: introduce Freescale 10G MDIO driver

Similar to fsl_pq_mdio.c, this driver is for the 10G MDIO controller on
Freescale Frame Manager Ethernet controllers.

Signed-off-by: Timur Tabi <timur@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Neil Horman [Mon, 20 Aug 2012 07:59:10 +0000 (07:59 +0000)]

cls_cgroup: Allow classifier cgroups to have their classid reset to 0

The network classifier cgroup initalizes each cgroups instance classid value to
0.  However, the sock_update_classid function only updates classid's in sockets
if the tasks cgroup classid is not zero, and if it differs from the current
classid.  The later check is to prevent cache line dirtying, but the former is
detrimental, as it prevents resetting a classid for a cgroup to 0.  While this
is not a common action, it has administrative usefulness (if the admin wants to
disable classification of a certain group temporarily for instance).

Easy fix, just remove the zero check.  Tested successfully by myself

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: "David S. Miller" <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

David S. Miller [Fri, 24 Aug 2012 15:30:38 +0000 (11:30 -0400)]

Merge tag 'batman-adv-for-davem' of git://git.open-mesh.org/linux-merge

Antonio Quartulli says:

====================
Included changes:
- a set of codestyle rearrangements/fixes
- new feature to early detect new joining (mesh-unaware) clients
- a minor fix for the gw-feature
- substitution of shift operations with the BIT() macro
- reorganization of the main batman-adv structure (struct batadv_priv)
- some more (very) minor cleanups and fixes
===================

Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Rami Rosen [Thu, 23 Aug 2012 02:55:41 +0000 (02:55 +0000)]

packet: fix broken build.

This patch fixes a broken build due to a missing header:
...
CC net/ipv4/proc.o
In file included from include/net/net_namespace.h:15,
from net/ipv4/proc.c:35:
include/net/netns/packet.h:11: error: field 'sklist_lock' has incomplete type
...

The lock of netns_packet has been replaced by a recent patch to be a mutex instead of a spinlock,
but we need to replace the header file to be linux/mutex.h instead of linux/spinlock.h as well.

See commit 0fa7fa98dbcc2789409ed24e885485e645803d7f:
packet: Protect packet sk list with mutex (v2) patch,

Signed-off-by: Rami Rosen <rosenr@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Eric Dumazet [Wed, 22 Aug 2012 21:50:59 +0000 (21:50 +0000)]

net: reinstate rtnl in call_netdevice_notifiers()

Eric Biederman pointed out that not holding RTNL while calling
call_netdevice_notifiers() was racy.

This patch is a direct transcription his feedback
against commit 0115e8e30d6fc (net: remove delay at device dismantle)

Thanks Eric !

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Mahesh Bandewar <maheshb@google.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Gao feng <gaofeng@cn.fujitsu.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Sven Eckelmann [Sun, 19 Aug 2012 19:48:25 +0000 (21:48 +0200)]

batman-adv: Start new development cycle

Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>

commit | commitdiff | tree

Antonio Quartulli [Thu, 5 Jul 2012 21:38:30 +0000 (23:38 +0200)]

batman-adv: change interface_rx to get orig node

In order to understand where a broadcast packet is coming from and use
this information to detect not yet announced clients, this patch modifies the
interface_rx() function by passing a new argument: the orig node
corresponding to the node that originated the received packet (if known).
This new argument if not NULL for broadcast packets only (other packets does not
have source field).

Signed-off-by: Antonio Quartulli <ordex@autistici.org>

commit | commitdiff | tree

Antonio Quartulli [Thu, 5 Jul 2012 21:38:29 +0000 (23:38 +0200)]

batman-adv: detect not yet announced clients

With the current TT mechanism a new client joining the network is not
immediately able to communicate with other hosts because its MAC address has not
been announced yet. This situation holds until the first OGM containing its
joining event will be spread over the mesh network.

This behaviour can be acceptable in networks where the originator interval is a
small value (e.g. 1sec) but if that value is set to an higher time (e.g. 5secs)
the client could suffer from several malfunctions like DHCP client timeouts,
etc.

This patch adds an early detection mechanism that makes nodes in the network
able to recognise "not yet announced clients" by means of the broadcast packets
they emitted on connection (e.g. ARP or DHCP request). The added client will
then be confirmed upon receiving the OGM claiming it or purged if such OGM
is not received within a fixed amount of time.

Signed-off-by: Antonio Quartulli <ordex@autistici.org>

commit | commitdiff | tree

Sven Eckelmann [Sun, 8 Jul 2012 16:33:51 +0000 (18:33 +0200)]

batman-adv: Reduce accumulated length of simple statements

Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>

commit | commitdiff | tree

Sven Eckelmann [Sun, 8 Jul 2012 15:13:15 +0000 (17:13 +0200)]

batman-adv: Don't break statements after assignment operator

Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>

commit | commitdiff | tree

Sven Eckelmann [Sun, 8 Jul 2012 14:32:09 +0000 (16:32 +0200)]

batman-adv: Use BIT(x) macro to calculate bit positions

Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>

commit | commitdiff | tree

Martin Hundebøll [Thu, 5 Jul 2012 09:34:28 +0000 (11:34 +0200)]

batman-adv: Drop tt queries with foreign dest

When enabling promiscuous mode, tt queries for other hosts might be
received. Before this patch, "foreign" tt queries were processed like
any other query and thus forwarded to its destination again and thereby
causing a loop.

This patch adds a check to drop foreign tt queries.

Signed-off-by: Martin Hundebøll <martin@hundeboll.net>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>

commit | commitdiff | tree

Martin Hundebøll [Thu, 5 Jul 2012 09:34:27 +0000 (11:34 +0200)]

batman-adv: Move batadv_check_unicast_packet()

batadv_check_unicast_packet() is needed in batadv_recv_tt_query(), so
move the former to before the latter.

Signed-off-by: Martin Hundebøll <martin@hundeboll.net>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>

commit | commitdiff | tree

Sven Eckelmann [Sun, 15 Jul 2012 20:26:51 +0000 (22:26 +0200)]

batman-adv: Split batadv_priv in sub-structures for features

The structure batadv_priv grows everytime a new feature is introduced. It gets
hard to find the parts of the struct that belongs to a specific feature. This
becomes even harder by the fact that not every feature uses a prefix in the
member name.

The variables for bridge loop avoidence, gateway handling, translation table
and visualization server are moved into separate structs that are included in
the bat_priv main struct.

Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>

commit | commitdiff | tree

Simon Wunderlich [Sun, 1 Jul 2012 20:51:55 +0000 (22:51 +0200)]

batman-adv: check batadv_orig_hash_add_if() return code

If this call fails, some of the orig_nodes spaces may have been
resized for the increased number of interface, and some may not.
If we would just continue with the larger number of interfaces,
this would lead to access to not allocated memory later.

We better check the return code, and don't add the interface if
no memory is available. OTOH, keeping some of the orig_nodes
with too much memory allocated should hurt no one (except for
a few too many bytes allocated).

Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>

commit | commitdiff | tree

Antonio Quartulli [Sun, 1 Jul 2012 17:07:31 +0000 (19:07 +0200)]

batman-adv: fix typos in comments

the word millisecond is misspelled in several comments. This patch fixes it.

Signed-off-by: Antonio Quartulli <ordex@autistici.org>

commit | commitdiff | tree

Antonio Quartulli [Sun, 1 Jul 2012 12:09:12 +0000 (14:09 +0200)]

batman-adv: add reference counting for type batadv_tt_orig_list_entry

The batadv_tt_orig_list_entry structure didn't have any refcounting mechanism so
far. This patch introduces it and makes the structure being usable in much more
complex context.

Signed-off-by: Antonio Quartulli <ordex@autistici.org>

commit | commitdiff | tree

Jonathan Corbet [Sat, 30 Jun 2012 16:49:13 +0000 (10:49 -0600)]

batman-adv: remove a misleading comment

As much as I'm happy to see LWN links sprinkled through the kernel by the
dozen, this one in particular reflects a very old state of reality; the
associated comment is now incorrect. So just delete it.

Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>

commit | commitdiff | tree

Marek Lindner [Sat, 23 Jun 2012 09:47:53 +0000 (11:47 +0200)]

batman-adv: convert remaining packet counters to per_cpu_ptr() infrastructure

Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Acked-by: Martin Hundebøll <martin@hundeboll.net>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>

commit | commitdiff | tree

Simon Wunderlich [Sat, 23 Jun 2012 10:34:18 +0000 (12:34 +0200)]

batman-adv: rename bridge loop avoidance claim types

for consistency reasons within the code and with the documentation,
we should always call it "claim" and "unclaim".

Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>

commit | commitdiff | tree

Simon Wunderlich [Sat, 23 Jun 2012 10:34:17 +0000 (12:34 +0200)]

batman-adv: correct comments in bridge loop avoidance

Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>

commit | commitdiff | tree

Simon Wunderlich [Mon, 18 Jun 2012 16:39:26 +0000 (18:39 +0200)]

batman-adv: Add the backbone gateway list to debugfs

This is especially useful if there are no claims yet, but we still want
to know which gateways are using bridge loop avoidance in the network.

Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>

commit | commitdiff | tree

Antonio Quartulli [Tue, 21 Aug 2012 22:42:40 +0000 (00:42 +0200)]

batman-adv: move function arguments on one line

Signed-off-by: Antonio Quartulli <ordex@autistici.org>

commit | commitdiff | tree

Pavel Emelyanov [Tue, 21 Aug 2012 01:06:47 +0000 (01:06 +0000)]

packet: Protect packet sk list with mutex (v2)

Change since v1:

* Fixed inuse counters access spotted by Eric

In patch eea68e2f (packet: Report socket mclist info via diag module) I've
introduced a "scheduling in atomic" problem in packet diag module -- the
socket list is traversed under rcu_read_lock() while performed under it sk
mclist access requires rtnl lock (i.e. -- mutex) to be taken.

[152363.820563] BUG: scheduling while atomic: crtools/12517/0x10000002
[152363.820573] 4 locks held by crtools/12517:
[152363.820581]  #0:  (sock_diag_mutex){+.+.+.}, at: [<ffffffff81a2dcb5>] sock_diag_rcv+0x1f/0x3e
[152363.820613]  #1:  (sock_diag_table_mutex){+.+.+.}, at: [<ffffffff81a2de70>] sock_diag_rcv_msg+0xdb/0x11a
[152363.820644]  #2:  (nlk->cb_mutex){+.+.+.}, at: [<ffffffff81a67d01>] netlink_dump+0x23/0x1ab
[152363.820693]  #3:  (rcu_read_lock){.+.+..}, at: [<ffffffff81b6a049>] packet_diag_dump+0x0/0x1af

Similar thing was then re-introduced by further packet diag patches (fanount
mutex and pgvec mutex for rings) :(

Apart from being terribly sorry for the above, I propose to change the packet
sk list protection from spinlock to mutex. This lock currently protects two
modifications:

* sklist
* prot inuse counters

The sklist modifications can be just reprotected with mutex since they already
occur in a sleeping context. The inuse counters modifications are trickier -- the
__this_cpu_-s are used inside, thus requiring the caller to handle the potential
issues with contexts himself. Since packet sockets' counters are modified in two
places only (packet_create and packet_release) we only need to protect the context
from being preempted. BH disabling is not required in this case.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Allan, Bruce W [Mon, 20 Aug 2012 04:55:29 +0000 (04:55 +0000)]

mdio: translation of MMD EEE registers to/from ethtool settings

The helper functions which translate IEEE MDIO Manageable Device (MMD)
Energy-Efficient Ethernet (EEE) registers 3.20, 7.60 and 7.61 to and from
the comparable ethtool supported/advertised settings will be needed by
drivers other than those in PHYLIB (e.g. e1000e in a follow-on patch).

In the same fashion as similar translation functions in linux/mii.h, move
these functions from the PHYLIB core to the linux/mdio.h header file so the
code will not have to be duplicated in each driver needing MMD-to-ethtool
(and vice-versa) translations. The function and some variable names have
been renamed to be more descriptive.

Not tested on the only hardware that currently calls the related functions,
stmmac, because I don't have access to any. Has been compile tested and
the translations have been tested on a locally modified version of e1000e.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

danborkmann@iogearbox.net [Mon, 20 Aug 2012 03:34:03 +0000 (03:34 +0000)]

af_packet: use define instead of constant

Instead of using a hard-coded value for the status variable, it would make
the code more readable to use its destined define from linux/if_packet.h.

Signed-off-by: daniel.borkmann@tik.ee.ethz.ch
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Ying Xue [Sun, 19 Aug 2012 21:44:08 +0000 (21:44 +0000)]

rds: Don't disable BH on BH context

Since we have already in BH context when *_write_space(),
*_data_ready() as well as *_state_change() are called, it's
unnecessary to disable BH.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

John Eaglesham [Tue, 21 Aug 2012 20:43:35 +0000 (20:43 +0000)]

bonding: support for IPv6 transmit hashing

Currently the "bonding" driver does not support load balancing outgoing
traffic in LACP mode for IPv6 traffic. IPv4 (and TCP or UDP over IPv4)
are currently supported; this patch adds transmit hashing for IPv6 (and
TCP or UDP over IPv6), bringing IPv6 up to par with IPv4 support in the
bonding driver. In addition, bounds checking has been added to all
transmit hashing functions.

The algorithm chosen (xor'ing the bottom three quads of the source and
destination addresses together, then xor'ing each byte of that result into
the bottom byte, finally xor'ing with the last bytes of the MAC addresses)
was selected after testing almost 400,000 unique IPv6 addresses harvested
from server logs. This algorithm had the most even distribution for both
big- and little-endian architectures while still using few instructions. Its
behavior also attempts to closely match that of the IPv4 algorithm.

The IPv6 flow label was intentionally not included in the hash as it appears
to be unset in the vast majority of IPv6 traffic sampled, and the current
algorithm not using the flow label already offers a very even distribution.

Fragmented IPv6 packets are handled the same way as fragmented IPv4 packets,
ie, they are not balanced based on layer 4 information. Additionally,
IPv6 packets with intermediate headers are not balanced based on layer
4 information. In practice these intermediate headers are not common and
this should not cause any problems, and the alternative (a packet-parsing
loop and look-up table) seemed slow and complicated for little gain.

Tested-by: John Eaglesham <linux@8192.net>
Signed-off-by: John Eaglesham <linux@8192.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Eric Dumazet [Sun, 19 Aug 2012 03:47:30 +0000 (03:47 +0000)]

ipv6: gre: fix ip6gre_err()

ip6gre_err() miscomputes grehlen (sizeof(ipv6h) is 4 or 8,
not 40 as expected), and should take into account 'offset' parameter.

Also uses pskb_may_pull() to cope with some fragged skbs

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Dmitry Kozlov <xeb@mail.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Eric Dumazet [Sun, 19 Aug 2012 10:31:48 +0000 (12:31 +0200)]

xfrm: fix RCU bugs

This patch reverts commit 56892261ed1a (xfrm: Use rcu_dereference_bh to
deference pointer protected by rcu_read_lock_bh), and fixes bugs
introduced in commit 418a99ac6ad ( Replace rwlock on xfrm_policy_afinfo
with rcu )

1) We properly use RCU variant in this file, not a mix of RCU/RCU_BH

2) We must defer some writes after the synchronize_rcu() call or a reader
can crash dereferencing NULL pointer.

3) Now we use the xfrm_policy_afinfo_lock spinlock only from process
context, we no longer need to block BH in xfrm_policy_register_afinfo()
and xfrm_policy_unregister_afinfo()

4) Can use RCU_INIT_POINTER() instead of rcu_assign_pointer() in
xfrm_policy_unregister_afinfo()

5) Remove a forward inline declaration (xfrm_policy_put_afinfo()),
and also move xfrm_policy_get_afinfo() declaration.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Fan Du <fan.du@windriver.com>
Cc: Priyanka Jain <Priyanka.Jain@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Eric Dumazet [Wed, 22 Aug 2012 17:19:46 +0000 (17:19 +0000)]

net: remove delay at device dismantle

I noticed extra one second delay in device dismantle, tracked down to
a call to dst_dev_event() while some call_rcu() are still in RCU queues.

These call_rcu() were posted by rt_free(struct rtable *rt) calls.

We then wait a little (but one second) in netdev_wait_allrefs() before
kicking again NETDEV_UNREGISTER.

As the call_rcu() are now completed, dst_dev_event() can do the needed
device swap on busy dst.

To solve this problem, add a new NETDEV_UNREGISTER_FINAL, called
after a rcu_barrier(), but outside of RTNL lock.

Use NETDEV_UNREGISTER_FINAL with care !

Change dst_dev_event() handler to react to NETDEV_UNREGISTER_FINAL

Also remove NETDEV_UNREGISTER_BATCH, as its not used anymore after
IP cache removal.

With help from Gao feng

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Mahesh Bandewar <maheshb@google.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

David S. Miller [Thu, 23 Aug 2012 01:48:21 +0000 (18:48 -0700)]

Merge git://1984.lsi.us.es/nf-next

Pablo Neira Ayuso says:

====================
This is the first batch of Netfilter and IPVS updates for your
net-next tree. Mostly cleanups for the Netfilter side. They are:

* Remove unnecessary RTNL locking now that we have support
  for namespace in nf_conntrack, from Patrick McHardy.

* Cleanup to eliminate unnecessary goto in the initialization
  path of several Netfilter tables, from Jean Sacren.

* Another cleanup from Wu Fengguang, this time to PTR_RET instead
  of if IS_ERR then return PTR_ERR.

* Use list_for_each_entry_continue_rcu in nf_iterate, from
  Michael Wang.

* Add pmtu_disc sysctl option to disable PMTU in their tunneling
  transmitter, from Julian Anastasov.

* Generalize application protocol registration in IPVS and modify
  IPVS FTP helper to use it, from Julian Anastasov.

* update Kconfig. The IPVS FTP helper depends on the Netfilter FTP
  helper for NAT support, from Julian Anastasov.

* Add logic to update PMTU for IPIP packets in IPVS, again
  from Julian Anastasov.

* A couple of sparse warning fixes for IPVS and Netfilter from
  Claudiu Ghioc and Patrick McHardy respectively.

Patrick's IPv6 NAT changes will follow after this batch, I need
to flush this batch first before refreshing my tree.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

David S. Miller [Wed, 22 Aug 2012 21:23:43 +0000 (14:23 -0700)]

Merge branch 'master' of git://git./linux/kernel/git/jkirsher/net-next

Jeff Kirsher says:

====================
This series contains updates to ethtool.h, e1000, e1000e, and igb to
implement MDI/MDIx control.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

David S. Miller [Wed, 22 Aug 2012 21:21:38 +0000 (14:21 -0700)]

Merge git://git./linux/kernel/git/davem/net

commit | commitdiff | tree

Jean Sacren [Sun, 19 Aug 2012 15:11:32 +0000 (15:11 +0000)]

netfilter: remove unnecessary goto statement for error recovery

Usually it's a good practice to use goto statement for error recovery
when initializing the module. This approach could be an overkill if:

1) there is only one fail case;
2) success and failure use the same return statement.

For a cleaner approach, remove the unnecessary goto statement and
directly implement error recovery.

Signed-off-by: Jean Sacren <sakiwit@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

commit | commitdiff | tree

Michael Wang [Thu, 16 Aug 2012 18:33:39 +0000 (18:33 +0000)]

netfilter: replace list_for_each_continue_rcu with new interface

This patch replaces list_for_each_continue_rcu() with
list_for_each_entry_continue_rcu() to allow removing
list_for_each_continue_rcu().

Signed-off-by: Michael Wang <wangyun@linux.vnet.ibm.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

commit | commitdiff | tree

Linus Torvalds [Wed, 22 Aug 2012 00:22:22 +0000 (17:22 -0700)]

Merge branch 'akpm' (Andrew's patch-bomb)

Merge fixes from Andrew Morton.

Random drivers and some VM fixes.

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (17 commits)
  mm: compaction: Abort async compaction if locks are contended or taking too long
  mm: have order > 0 compaction start near a pageblock with free pages
  rapidio/tsi721: fix unused variable compiler warning
  rapidio/tsi721: fix inbound doorbell interrupt handling
  drivers/rtc/rtc-rs5c348.c: fix hour decoding in 12-hour mode
  mm: correct page->pfmemalloc to fix deactivate_slab regression
  drivers/rtc/rtc-pcf2123.c: initialize dynamic sysfs attributes
  mm/compaction.c: fix deferring compaction mistake
  drivers/misc/sgi-xp/xpc_uv.c: SGI XPC fails to load when cpu 0 is out of IRQ resources
  string: do not export memweight() to userspace
  hugetlb: update hugetlbpage.txt
  checkpatch: add control statement test to SINGLE_STATEMENT_DO_WHILE_MACRO
  mm: hugetlbfs: correctly populate shared pmd
  cciss: fix incorrect scsi status reporting
  Documentation: update mount option in filesystem/vfat.txt
  mm: change nr_ptes BUG_ON to WARN_ON
  cs5535-clockevt: typo, it's MFGPT, not MFPGT

commit | commitdiff | tree

Linus Torvalds [Tue, 21 Aug 2012 23:54:38 +0000 (16:54 -0700)]

Merge branch 'v4l_for_linus' of git://git./linux/kernel/git/mchehab/linux-media

Pull media fixes from Mauro Carvalho Chehab:
"For bug fixes, at soc_camera, si470x, uvcvideo, iguanaworks IR driver,
  radio_shark Kbuild fixes, and at the V4L2 core (radio fixes)."

* 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
  [media] media: soc_camera: don't clear pix->sizeimage in JPEG mode
  [media] media: mx2_camera: Fix clock handling for i.MX27
  [media] video: mx2_camera: Use clk_prepare_enable/clk_disable_unprepare
  [media] video: mx1_camera: Use clk_prepare_enable/clk_disable_unprepare
  [media] media: mx3_camera: buf_init() add buffer state check
  [media] radio-shark2: Only compile led support when CONFIG_LED_CLASS is set
  [media] radio-shark: Only compile led support when CONFIG_LED_CLASS is set
  [media] radio-shark*: Call cancel_work_sync from disconnect rather then release
  [media] radio-shark*: Remove work-around for dangling pointer in usb intfdata
  [media] Add USB dependency for IguanaWorks USB IR Transceiver
  [media] Add missing logging for rangelow/high of hwseek
  [media] VIDIOC_ENUM_FREQ_BANDS fix
  [media] mem2mem_testdev: fix querycap regression
  [media] si470x: v4l2-compliance fixes
  [media] DocBook: Remove a spurious character
  [media] uvcvideo: Reset the bytesused field when recycling an erroneous buffer

commit | commitdiff | tree

Linus Torvalds [Tue, 21 Aug 2012 23:46:08 +0000 (16:46 -0700)]

Merge git://git./linux/kernel/git/davem/net

Pull networking update from David Miller:
"A couple weeks of bug fixing in there.  The largest chunk is all the
  broken crap Amerigo Wang found in the netpoll layer."

1) netpoll and it's users has several serious bugs:
    a) uses GFP_KERNEL with locks held
    b) interfaces requiring interrupts disabled are called with them
       enabled
    c) and vice versa
    d) VLAN tag demuxing, as per all other RX packet input paths, is not
       applied

    All from Amerigo Wang.

2) Hopefully cure the ipv4 mapped ipv6 address TCP early demux bugs for
    good, from Neal Cardwell.

3) Unlike AF_UNIX, AF_PACKET sockets don't set a default credentials
    when the user doesn't specify one explicitly during sendmsg().
    Instead we attach an empty (zero) SCM credential block which is
    definitely not what we want.  Fix from Eric Dumazet.

4) IPv6 illegally invokes netdevice notifiers with RCU lock held, fix
    from Ben Hutchings.

5) inet_csk_route_child_sock() checks wrong inet options pointer, fix
    from Christoph Paasch.

6) When AF_PACKET is used for transmit, packet loopback doesn't behave
    properly when a socket fanout is enabled, from Eric Leblond.

7) On bluetooth l2cap channel create failure, we leak the socket, from
    Jaganath Kanakkassery.

8) Fix all the netprio file handling bugs found by Al Viro, from John
    Fastabend.

9) Several error return and NULL deref bug fixes in networking drivers
    from Julia Lawall.

10) A large smattering of struct padding et al.  kernel memory leaks to
    userspace found of Mathias Krause.

11) Conntrack expections in netfilter can access an uninitialized timer,
    fix from Pablo Neira Ayuso.

12) Several netfilter SIP tracker bug fixes from Patrick McHardy.

13) IPSEC ipv6 routes are not initialized correctly all the time,
    resulting in an OOPS in inet_putpeer().  Also from Patrick McHardy.

14) Bridging does rcu_dereference() outside of RCU protected area, from
    Stephen Hemminger.

15) Fix routing cache removal performance regression when looking up
    output routes that have a local destination.  From Zheng Yan.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (87 commits)
  af_netlink: force credentials passing [CVE-2012-3520]
  ipv4: fix ip header ident selection in __ip_make_skb()
  ipv4: Use newinet->inet_opt in inet_csk_route_child_sock()
  tcp: fix possible socket refcount problem
  net: tcp: move sk_rx_dst_set call after tcp_create_openreq_child()
  net/core/dev.c: fix kernel-doc warning
  netconsole: remove a redundant netconsole_target_put()
  net: ipv6: fix oops in inet_putpeer()
  net/stmmac: fix issue of clk_get for Loongson1B.
  caif: Do not dereference NULL in chnl_recv_cb()
  af_packet: don't emit packet on orig fanout group
  drivers/net/irda: fix error return code
  drivers/net/wan/dscc4.c: fix error return code
  drivers/net/wimax/i2400m/fw.c: fix error return code
  smsc75xx: add missing entry to MAINTAINERS
  net: qmi_wwan: new devices: UML290 and K5006-Z
  net: sh_eth: Add eth support for R8A7779 device
  netdev/phy: skip disabled mdio-mux nodes
  dt: introduce for_each_available_child_of_node, of_get_next_available_child
  net: netprio: fix cgrp create and write priomap race
  ...

commit | commitdiff | tree

Mel Gorman [Tue, 21 Aug 2012 23:16:17 +0000 (16:16 -0700)]

mm: compaction: Abort async compaction if locks are contended or taking too long

Jim Schutt reported a problem that pointed at compaction contending
heavily on locks.  The workload is straight-forward and in his own words;

The systems in question have 24 SAS drives spread across 3 HBAs,
running 24 Ceph OSD instances, one per drive.  FWIW these servers
are dual-socket Intel 5675 Xeons w/48 GB memory.  I've got ~160
Ceph Linux clients doing dd simultaneously to a Ceph file system
backed by 12 of these servers.

Early in the test everything looks fine

  procs -------------------memory------------------ ---swap-- -----io---- --system-- -----cpu-------
   r  b       swpd       free       buff      cache   si   so    bi    bo   in   cs  us sy  id wa st
  31 15          0     287216        576   38606628    0    0     2  1158    2   14   1  3  95  0  0
  27 15          0     225288        576   38583384    0    0    18 2222016 203357 134876  11 56  17 15  0
  28 17          0     219256        576   38544736    0    0    11 2305932 203141 146296  11 49  23 17  0
   6 18          0     215596        576   38552872    0    0     7 2363207 215264 166502  12 45  22 20  0
  22 18          0     226984        576   38596404    0    0     3 2445741 223114 179527  12 43  23 22  0

and then it goes to pot

  procs -------------------memory------------------ ---swap-- -----io---- --system-- -----cpu-------
   r  b       swpd       free       buff      cache   si   so    bi    bo   in   cs  us sy  id wa st
  163  8          0     464308        576   36791368    0    0    11 22210  866  536   3 13  79  4  0
  207 14          0     917752        576   36181928    0    0   712 1345376 134598 47367   7 90   1  2  0
  123 12          0     685516        576   36296148    0    0   429 1386615 158494 60077   8 84   5  3  0
  123 12          0     598572        576   36333728    0    0  1107 1233281 147542 62351   7 84   5  4  0
  622  7          0     660768        576   36118264    0    0   557 1345548 151394 59353   7 85   4  3  0
  223 11          0     283960        576   36463868    0    0    46 1107160 121846 33006   6 93   1  1  0

Note that system CPU usage is very high blocks being written out has
dropped by 42%. He analysed this with perf and found

  perf record -g -a sleep 10
  perf report --sort symbol --call-graph fractal,5
    34.63%  [k] _raw_spin_lock_irqsave
            |
            |--97.30%-- isolate_freepages
            |          compaction_alloc
            |          unmap_and_move
            |          migrate_pages
            |          compact_zone
            |          compact_zone_order
            |          try_to_compact_pages
            |          __alloc_pages_direct_compact
            |          __alloc_pages_slowpath
            |          __alloc_pages_nodemask
            |          alloc_pages_vma
            |          do_huge_pmd_anonymous_page
            |          handle_mm_fault
            |          do_page_fault
            |          page_fault
            |          |
            |          |--87.39%-- skb_copy_datagram_iovec
            |          |          tcp_recvmsg
            |          |          inet_recvmsg
            |          |          sock_recvmsg
            |          |          sys_recvfrom
            |          |          system_call
            |          |          __recv
            |          |          |
            |          |           --100.00%-- (nil)
            |          |
            |           --12.61%-- memcpy
             --2.70%-- [...]

There was other data but primarily it is all showing that compaction is
contended heavily on the zone->lock and zone->lru_lock.

commit [b2eef8c0: mm: compaction: minimise the time IRQs are disabled
while isolating pages for migration] noted that it was possible for
migration to hold the lru_lock for an excessive amount of time. Very
broadly speaking this patch expands the concept.

This patch introduces compact_checklock_irqsave() to check if a lock
is contended or the process needs to be scheduled. If either condition
is true then async compaction is aborted and the caller is informed.
The page allocator will fail a THP allocation if compaction failed due
to contention. This patch also introduces compact_trylock_irqsave()
which will acquire the lock only if it is not contended and the process
does not need to schedule.

Reported-by: Jim Schutt <jaschut@sandia.gov>
Tested-by: Jim Schutt <jaschut@sandia.gov>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit | commitdiff | tree

Mel Gorman [Tue, 21 Aug 2012 23:16:15 +0000 (16:16 -0700)]

mm: have order > 0 compaction start near a pageblock with free pages

Commit 7db8889ab05b ("mm: have order > 0 compaction start off where it
left") introduced a caching mechanism to reduce the amount work the free
page scanner does in compaction.  However, it has a problem.  Consider
two process simultaneously scanning free pages

     C
Process A M     S      F
|---------------------------------------|
Process B M FS

C is zone->compact_cached_free_pfn
S is cc->start_pfree_pfn
M is cc->migrate_pfn
F is cc->free_pfn

In this diagram, Process A has just reached its migrate scanner, wrapped
around and updated compact_cached_free_pfn accordingly.

Simultaneously, Process B finishes isolating in a block and updates
compact_cached_free_pfn again to the location of its free scanner.

Process A moves to "end_of_zone - one_pageblock" and runs this check

                if (cc->order > 0 && (!cc->wrapped ||
                                      zone->compact_cached_free_pfn >
                                      cc->start_free_pfn))
                        pfn = min(pfn, zone->compact_cached_free_pfn);

compact_cached_free_pfn is above where it started so the free scanner
skips almost the entire space it should have scanned.  When there are
multiple processes compacting it can end in a situation where the entire
zone is not being scanned at all.  Further, it is possible for two
processes to ping-pong update to compact_cached_free_pfn which is just
random.

Overall, the end result wrecks allocation success rates.

There is not an obvious way around this problem without introducing new
locking and state so this patch takes a different approach.

First, it gets rid of the skip logic because it's not clear that it
matters if two free scanners happen to be in the same block but with
racing updates it's too easy for it to skip over blocks it should not.

Second, it updates compact_cached_free_pfn in a more limited set of
circumstances.

If a scanner has wrapped, it updates compact_cached_free_pfn to the end
of the zone. When a wrapped scanner isolates a page, it updates
compact_cached_free_pfn to point to the highest pageblock it
can isolate pages from.

If a scanner has not wrapped when it has finished isolated pages it
checks if compact_cached_free_pfn is pointing to the end of the
zone. If so, the value is updated to point to the highest
pageblock that pages were isolated from. This value will not
be updated again until a free page scanner wraps and resets
compact_cached_free_pfn.

This is not optimal and it can still race but the compact_cached_free_pfn
will be pointing to or very near a pageblock with free pages.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit | commitdiff | tree

Alexandre Bounine [Tue, 21 Aug 2012 23:16:12 +0000 (16:16 -0700)]

rapidio/tsi721: fix unused variable compiler warning

Fix unused variable compiler warning when built with CONFIG_RAPIDIO_DEBUG
option off.

This patch is applicable to kernel versions starting from v3.2

Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit | commitdiff | tree

Alexandre Bounine [Tue, 21 Aug 2012 23:16:11 +0000 (16:16 -0700)]

rapidio/tsi721: fix inbound doorbell interrupt handling

Make sure that there is no doorbell messages left behind due to disabled
interrupts during inbound doorbell processing.

The most common case for this bug is loss of rionet JOIN messages in
systems with three or more rionet participants and MSI or MSI-X enabled.
As result, requests for packet transfers may finish with "destination
unreachable" error message.

This patch is applicable to kernel versions starting from v3.2.

Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit | commitdiff | tree

Atsushi Nemoto [Tue, 21 Aug 2012 23:16:10 +0000 (16:16 -0700)]

drivers/rtc/rtc-rs5c348.c: fix hour decoding in 12-hour mode

Correct the offset by subtracting 20 from tm_hour before taking the
modulo 12.

[ "Why 20?" I hear you ask. Or at least I did.

  Here's the reason why: RS5C348_BIT_PM is 32, and is - stupidly -
  included in the RS5C348_HOURS_MASK define.  So it's really subtracting
  out that bit to get "hour+12".  But then because it does things modulo
  12, it needs to add the 12 in again afterwards anyway.

  This code is confused.  It would be much clearer if RS5C348_HOURS_MASK
  just didn't include the RS5C348_BIT_PM bit at all, then it wouldn't
  need to do the silly subtract either.

  Whatever. It's all just math, the end result is the same.   - Linus ]

Reported-by: James Nute <newten82@gmail.com>
Tested-by: James Nute <newten82@gmail.com>
Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit | commitdiff | tree

Alex Shi [Tue, 21 Aug 2012 23:16:08 +0000 (16:16 -0700)]

mm: correct page->pfmemalloc to fix deactivate_slab regression

Commit cfd19c5a9ecf ("mm: only set page->pfmemalloc when
ALLOC_NO_WATERMARKS was used") tried to narrow down page->pfmemalloc
setting, but it missed some places the pfmemalloc should be set.

So, in __slab_alloc, the unalignment pfmemalloc and ALLOC_NO_WATERMARKS
cause incorrect deactivate_slab() on our core2 server:

    64.73%           fio  [kernel.kallsyms]     [k] _raw_spin_lock
                     |
                     --- _raw_spin_lock
                        |
                        |---0.34%-- deactivate_slab
                        |          __slab_alloc
                        |          kmem_cache_alloc
                        |          |

That causes our fio sync write performance to have a 40% regression.

Move the checking in get_page_from_freelist() which resolves this issue.

Signed-off-by: Alex Shi <alex.shi@intel.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: David Miller <davem@davemloft.net
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Tested-by: Eric Dumazet <eric.dumazet@gmail.com>
Tested-by: Sage Weil <sage@inktank.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit | commitdiff | tree

Ilya Shchepetkov [Tue, 21 Aug 2012 23:16:06 +0000 (16:16 -0700)]

drivers/rtc/rtc-pcf2123.c: initialize dynamic sysfs attributes

Dynamically allocated sysfs attributes must be initialized using
sysfs_attr_init(), otherwise lockdep complains: BUG: key <address> not in
.data!

Found by Linux Driver Verification project (linuxtesting.org).

Signed-off-by: Ilya Shchepetkov <shchepetkov@ispras.ru>
Cc: Chris Verges <chrisv@cyberswitching.com>
Cc: Christian Pellegrin <chripell@fsfe.org>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit | commitdiff | tree

Minchan Kim [Tue, 21 Aug 2012 23:16:03 +0000 (16:16 -0700)]

mm/compaction.c: fix deferring compaction mistake

Commit aff622495c9a ("vmscan: only defer compaction for failed order and
higher") fixed bad deferring policy but made mistake about checking
compact_order_failed in __compact_pgdat().  So it can't update
compact_order_failed with the new order.  This ends up preventing
correct operation of policy deferral.  This patch fixes it.

Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Rik van Riel <riel@redhat.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit | commitdiff | tree

Robin Holt [Tue, 21 Aug 2012 23:16:02 +0000 (16:16 -0700)]

drivers/misc/sgi-xp/xpc_uv.c: SGI XPC fails to load when cpu 0 is out of IRQ resources

On many of our larger systems, CPU 0 has had all of its IRQ resources
consumed before XPC loads. Worst cases on machines with multiple 10
GigE cards and multiple IB cards have depleted the entire first socket
of IRQs.

This patch makes selecting the node upon which IRQs are allocated (as
well as all the other GRU Message Queue structures) specifiable as a
module load param and has a default behavior of searching all nodes/cpus
for an available resources.

[akpm@linux-foundation.org: fix build: include cpu.h and module.h]
Signed-off-by: Robin Holt <holt@sgi.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit | commitdiff | tree

WANG Cong [Tue, 21 Aug 2012 23:16:00 +0000 (16:16 -0700)]

string: do not export memweight() to userspace

Fix the following warning:

usr/include/linux/string.h:8: userspace cannot reference function or variable defined in the kernel

Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com>
Acked-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit | commitdiff | tree

Zhouping Liu [Tue, 21 Aug 2012 23:15:57 +0000 (16:15 -0700)]

hugetlb: update hugetlbpage.txt

Commit f0f57b2b1488 ("mm: move hugepage test examples to
tools/testing/selftests/vm") moved map_hugetlb.c, hugepage-shm.c and
hugepage-mmap.c tests into tools/testing/selftests/vm/ directory, but it
didn't update hugetlbpage.txt

Signed-off-by: Zhouping Liu <sanweidaying@gmail.com>
Acked-by: Dave Young <dyoung@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit | commitdiff | tree

Joe Perches [Tue, 21 Aug 2012 23:15:53 +0000 (16:15 -0700)]

checkpatch: add control statement test to SINGLE_STATEMENT_DO_WHILE_MACRO

Commit b13edf7ff2dd ("checkpatch: add checks for do {} while (0) macro
misuses") added a test that is overly simplistic for single statement
macros.

Macros that start with control tests should be enclosed in a do {} while
(0) loop.

Add the necessary control tests to the check.

Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Andy Whitcroft <apw@canonical.com>
Tested-by: Franz Schrober <franzschrober@yahoo.de>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit | commitdiff | tree

Michal Hocko [Tue, 21 Aug 2012 23:15:52 +0000 (16:15 -0700)]

mm: hugetlbfs: correctly populate shared pmd

Each page mapped in a process's address space must be correctly
accounted for in _mapcount.  Normally the rules for this are
straightforward but hugetlbfs page table sharing is different.  The page
table pages at the PMD level are reference counted while the mapcount
remains the same.

If this accounting is wrong, it causes bugs like this one reported by
Larry Woodman:

  kernel BUG at mm/filemap.c:135!
  invalid opcode: 0000 [#1] SMP
  CPU 22
  Modules linked in: bridge stp llc sunrpc binfmt_misc dcdbas microcode pcspkr acpi_pad acpi]
  Pid: 18001, comm: mpitest Tainted: G        W    3.3.0+ #4 Dell Inc. PowerEdge R620/07NDJ2
  RIP: 0010:[<ffffffff8112cfed>]  [<ffffffff8112cfed>] __delete_from_page_cache+0x15d/0x170
  Process mpitest (pid: 18001, threadinfo ffff880428972000, task ffff880428b5cc20)
  Call Trace:
    delete_from_page_cache+0x40/0x80
    truncate_hugepages+0x115/0x1f0
    hugetlbfs_evict_inode+0x18/0x30
    evict+0x9f/0x1b0
    iput_final+0xe3/0x1e0
    iput+0x3e/0x50
    d_kill+0xf8/0x110
    dput+0xe2/0x1b0
    __fput+0x162/0x240

During fork(), copy_hugetlb_page_range() detects if huge_pte_alloc()
shared page tables with the check dst_pte == src_pte.  The logic is if
the PMD page is the same, they must be shared.  This assumes that the
sharing is between the parent and child.  However, if the sharing is
with a different process entirely then this check fails as in this
diagram:

  parent
    |
    ------------>pmd
                 src_pte----------> data page
                                        ^
  other--------->pmd--------------------|
                  ^
  child-----------|
                 dst_pte

For this situation to occur, it must be possible for Parent and Other to
have faulted and failed to share page tables with each other.  This is
possible due to the following style of race.

  PROC A                                          PROC B
  copy_hugetlb_page_range                         copy_hugetlb_page_range
    src_pte == huge_pte_offset                      src_pte == huge_pte_offset
    !src_pte so no sharing                          !src_pte so no sharing

  (time passes)

  hugetlb_fault                                   hugetlb_fault
    huge_pte_alloc                                  huge_pte_alloc
      huge_pmd_share                                 huge_pmd_share
        LOCK(i_mmap_mutex)
        find nothing, no sharing
        UNLOCK(i_mmap_mutex)
                                                      LOCK(i_mmap_mutex)
                                                      find nothing, no sharing
                                                      UNLOCK(i_mmap_mutex)
      pmd_alloc                                       pmd_alloc
      LOCK(instantiation_mutex)
      fault
      UNLOCK(instantiation_mutex)
                                                  LOCK(instantiation_mutex)
                                                  fault
                                                  UNLOCK(instantiation_mutex)

These two processes are not poing to the same data page but are not
sharing page tables because the opportunity was missed.  When either
process later forks, the src_pte == dst pte is potentially insufficient.
As the check falls through, the wrong PTE information is copied in
(harmless but wrong) and the mapcount is bumped for a page mapped by a
shared page table leading to the BUG_ON.

This patch addresses the issue by moving pmd_alloc into huge_pmd_share
which guarantees that the shared pud is populated in the same critical
section as pmd.  This also means that huge_pte_offset test in
huge_pmd_share is serialized correctly now which in turn means that the
success of the sharing will be higher as the racing tasks see the pud
and pmd populated together.

Race identified and changelog written mostly by Mel Gorman.

{akpm@linux-foundation.org: attempt to make the huge_pmd_share() comment comprehensible, clean up coding style]
Reported-by: Larry Woodman <lwoodman@redhat.com>
Tested-by: Larry Woodman <lwoodman@redhat.com>
Reviewed-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Ken Chen <kenchen@google.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit | commitdiff | tree

Stephen M. Cameron [Tue, 21 Aug 2012 23:15:49 +0000 (16:15 -0700)]

cciss: fix incorrect scsi status reporting

Delete code which sets SCSI status incorrectly as it's already been set
correctly above this incorrect code. The bug was introduced in 2009 by
commit b0e15f6db111 ("cciss: fix typo that causes scsi status to be
lost.")

Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Reported-by: Roel van Meer <roel.vanmeer@bokxing.nl>
Tested-by: Roel van Meer <roel.vanmeer@bokxing.nl>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit | commitdiff | tree

Namjae Jeon [Tue, 21 Aug 2012 23:15:46 +0000 (16:15 -0700)]

Documentation: update mount option in filesystem/vfat.txt

Update two mount options(discard, nfs) in vfat.txt.

Signed-off-by: Namjae Jeon <linkinjeon@gmail.com>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit | commitdiff | tree

Hugh Dickins [Tue, 21 Aug 2012 23:15:45 +0000 (16:15 -0700)]

mm: change nr_ptes BUG_ON to WARN_ON

Occasionally an isolated BUG_ON(mm->nr_ptes) gets reported, indicating
that not all the page tables allocated could be found and freed when
exit_mmap() tore down the user address space.

There's usually nothing we can say about it, beyond that it's probably a
sign of some bad memory or memory corruption; though it might still
indicate a bug in vma or page table management (and did recently reveal a
race in THP, fixed a few months ago).

But one overdue change we can make is from BUG_ON to WARN_ON.

It's fairly likely that the system will crash shortly afterwards in some
other way (for example, the BUG_ON(page_mapped(page)) in
__delete_from_page_cache(), once an inode mapped into the lost page tables
gets evicted); but might tell us more before that.

Change the BUG_ON(page_mapped) to WARN_ON too? Later perhaps: I'm less
eager, since that one has several times led to fixes.

Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit | commitdiff | tree

Jens Rottmann [Tue, 21 Aug 2012 23:15:43 +0000 (16:15 -0700)]

cs5535-clockevt: typo, it's MFGPT, not MFPGT

Signed-off-by: Jens Rottmann <JRottmann@LiPPERTEmbedded.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit | commitdiff | tree

Eric Dumazet [Tue, 21 Aug 2012 06:21:17 +0000 (06:21 +0000)]

af_netlink: force credentials passing [CVE-2012-3520]

Pablo Neira Ayuso discovered that avahi and
potentially NetworkManager accept spoofed Netlink messages because of a
kernel bug. The kernel passes all-zero SCM_CREDENTIALS ancillary data
to the receiver if the sender did not provide such data, instead of not
including any such data at all or including the correct data from the
peer (as it is the case with AF_UNIX).

This bug was introduced in commit 16e572626961
(af_unix: dont send SCM_CREDENTIALS by default)

This patch forces passing credentials for netlink, as
before the regression.

Another fix would be to not add SCM_CREDENTIALS in
netlink messages if not provided by the sender, but it
might break some programs.

With help from Florian Weimer & Petr Matousek

This issue is designated as CVE-2012-3520

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Petr Matousek <pmatouse@redhat.com>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Eric Dumazet [Mon, 20 Aug 2012 07:26:45 +0000 (07:26 +0000)]

ipv4: fix ip header ident selection in __ip_make_skb()

Christian Casteyde reported a kmemcheck 32-bit read from uninitialized
memory in __ip_select_ident().

It turns out that __ip_make_skb() called ip_select_ident() before
properly initializing iph->daddr.

This is a bug uncovered by commit 1d861aa4b3fb (inet: Minimize use of
cached route inetpeer.)

Addresses https://bugzilla.kernel.org/show_bug.cgi?id=46131

Reported-by: Christian Casteyde <casteyde.christian@free.fr>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Christoph Paasch [Mon, 20 Aug 2012 02:52:09 +0000 (02:52 +0000)]

ipv4: Use newinet->inet_opt in inet_csk_route_child_sock()

Since 0e734419923bd ("ipv4: Use inet_csk_route_child_sock() in DCCP and
TCP."), inet_csk_route_child_sock() is called instead of
inet_csk_route_req().

However, after creating the child-sock in tcp/dccp_v4_syn_recv_sock(),
ireq->opt is set to NULL, before calling inet_csk_route_child_sock().
Thus, inside inet_csk_route_child_sock() opt is always NULL and the
SRR-options are not respected anymore.
Packets sent by the server won't have the correct destination-IP.

This patch fixes it by accessing newinet->inet_opt instead of ireq->opt
inside inet_csk_route_child_sock().

Reported-by: Luca Boccassi <luca.boccassi@gmail.com>
Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Eric Dumazet [Mon, 20 Aug 2012 00:22:46 +0000 (00:22 +0000)]

tcp: fix possible socket refcount problem

Commit 6f458dfb40 (tcp: improve latencies of timer triggered events)
added bug leading to following trace :

[ 2866.131281] IPv4: Attempt to release TCP socket in state 1 ffff880019ec0000
[ 2866.131726]
[ 2866.132188] =========================
[ 2866.132281] [ BUG: held lock freed! ]
[ 2866.132281] 3.6.0-rc1+ #622 Not tainted
[ 2866.132281] -------------------------
[ 2866.132281] kworker/0:1/652 is freeing memory ffff880019ec0000-ffff880019ec0a1f, with a lock still held there!
[ 2866.132281]  (sk_lock-AF_INET-RPC){+.+...}, at: [<ffffffff81903619>] tcp_sendmsg+0x29/0xcc6
[ 2866.132281] 4 locks held by kworker/0:1/652:
[ 2866.132281]  #0:  (rpciod){.+.+.+}, at: [<ffffffff81083567>] process_one_work+0x1de/0x47f
[ 2866.132281]  #1:  ((&task->u.tk_work)){+.+.+.}, at: [<ffffffff81083567>] process_one_work+0x1de/0x47f
[ 2866.132281]  #2:  (sk_lock-AF_INET-RPC){+.+...}, at: [<ffffffff81903619>] tcp_sendmsg+0x29/0xcc6
[ 2866.132281]  #3:  (&icsk->icsk_retransmit_timer){+.-...}, at: [<ffffffff81078017>] run_timer_softirq+0x1ad/0x35f
[ 2866.132281]
[ 2866.132281] stack backtrace:
[ 2866.132281] Pid: 652, comm: kworker/0:1 Not tainted 3.6.0-rc1+ #622
[ 2866.132281] Call Trace:
[ 2866.132281]  <IRQ>  [<ffffffff810bc527>] debug_check_no_locks_freed+0x112/0x159
[ 2866.132281]  [<ffffffff818a0839>] ? __sk_free+0xfd/0x114
[ 2866.132281]  [<ffffffff811549fa>] kmem_cache_free+0x6b/0x13a
[ 2866.132281]  [<ffffffff818a0839>] __sk_free+0xfd/0x114
[ 2866.132281]  [<ffffffff818a08c0>] sk_free+0x1c/0x1e
[ 2866.132281]  [<ffffffff81911e1c>] tcp_write_timer+0x51/0x56
[ 2866.132281]  [<ffffffff81078082>] run_timer_softirq+0x218/0x35f
[ 2866.132281]  [<ffffffff81078017>] ? run_timer_softirq+0x1ad/0x35f
[ 2866.132281]  [<ffffffff810f5831>] ? rb_commit+0x58/0x85
[ 2866.132281]  [<ffffffff81911dcb>] ? tcp_write_timer_handler+0x148/0x148
[ 2866.132281]  [<ffffffff81070bd6>] __do_softirq+0xcb/0x1f9
[ 2866.132281]  [<ffffffff81a0a00c>] ? _raw_spin_unlock+0x29/0x2e
[ 2866.132281]  [<ffffffff81a1227c>] call_softirq+0x1c/0x30
[ 2866.132281]  [<ffffffff81039f38>] do_softirq+0x4a/0xa6
[ 2866.132281]  [<ffffffff81070f2b>] irq_exit+0x51/0xad
[ 2866.132281]  [<ffffffff81a129cd>] do_IRQ+0x9d/0xb4
[ 2866.132281]  [<ffffffff81a0a3ef>] common_interrupt+0x6f/0x6f
[ 2866.132281]  <EOI>  [<ffffffff8109d006>] ? sched_clock_cpu+0x58/0xd1
[ 2866.132281]  [<ffffffff81a0a172>] ? _raw_spin_unlock_irqrestore+0x4c/0x56
[ 2866.132281]  [<ffffffff81078692>] mod_timer+0x178/0x1a9
[ 2866.132281]  [<ffffffff818a00aa>] sk_reset_timer+0x19/0x26
[ 2866.132281]  [<ffffffff8190b2cc>] tcp_rearm_rto+0x99/0xa4
[ 2866.132281]  [<ffffffff8190dfba>] tcp_event_new_data_sent+0x6e/0x70
[ 2866.132281]  [<ffffffff8190f7ea>] tcp_write_xmit+0x7de/0x8e4
[ 2866.132281]  [<ffffffff818a565d>] ? __alloc_skb+0xa0/0x1a1
[ 2866.132281]  [<ffffffff8190f952>] __tcp_push_pending_frames+0x2e/0x8a
[ 2866.132281]  [<ffffffff81904122>] tcp_sendmsg+0xb32/0xcc6
[ 2866.132281]  [<ffffffff819229c2>] inet_sendmsg+0xaa/0xd5
[ 2866.132281]  [<ffffffff81922918>] ? inet_autobind+0x5f/0x5f
[ 2866.132281]  [<ffffffff810ee7f1>] ? trace_clock_local+0x9/0xb
[ 2866.132281]  [<ffffffff8189adab>] sock_sendmsg+0xa3/0xc4
[ 2866.132281]  [<ffffffff810f5de6>] ? rb_reserve_next_event+0x26f/0x2d5
[ 2866.132281]  [<ffffffff8103e6a9>] ? native_sched_clock+0x29/0x6f
[ 2866.132281]  [<ffffffff8103e6f8>] ? sched_clock+0x9/0xd
[ 2866.132281]  [<ffffffff810ee7f1>] ? trace_clock_local+0x9/0xb
[ 2866.132281]  [<ffffffff8189ae03>] kernel_sendmsg+0x37/0x43
[ 2866.132281]  [<ffffffff8199ce49>] xs_send_kvec+0x77/0x80
[ 2866.132281]  [<ffffffff8199cec1>] xs_sendpages+0x6f/0x1a0
[ 2866.132281]  [<ffffffff8107826d>] ? try_to_del_timer_sync+0x55/0x61
[ 2866.132281]  [<ffffffff8199d0d2>] xs_tcp_send_request+0x55/0xf1
[ 2866.132281]  [<ffffffff8199bb90>] xprt_transmit+0x89/0x1db
[ 2866.132281]  [<ffffffff81999bcd>] ? call_connect+0x3c/0x3c
[ 2866.132281]  [<ffffffff81999d92>] call_transmit+0x1c5/0x20e
[ 2866.132281]  [<ffffffff819a0d55>] __rpc_execute+0x6f/0x225
[ 2866.132281]  [<ffffffff81999bcd>] ? call_connect+0x3c/0x3c
[ 2866.132281]  [<ffffffff819a0f33>] rpc_async_schedule+0x28/0x34
[ 2866.132281]  [<ffffffff810835d6>] process_one_work+0x24d/0x47f
[ 2866.132281]  [<ffffffff81083567>] ? process_one_work+0x1de/0x47f
[ 2866.132281]  [<ffffffff819a0f0b>] ? __rpc_execute+0x225/0x225
[ 2866.132281]  [<ffffffff81083a6d>] worker_thread+0x236/0x317
[ 2866.132281]  [<ffffffff81083837>] ? process_scheduled_works+0x2f/0x2f
[ 2866.132281]  [<ffffffff8108b7b8>] kthread+0x9a/0xa2
[ 2866.132281]  [<ffffffff81a12184>] kernel_thread_helper+0x4/0x10
[ 2866.132281]  [<ffffffff81a0a4b0>] ? retint_restore_args+0x13/0x13
[ 2866.132281]  [<ffffffff8108b71e>] ? __init_kthread_worker+0x5a/0x5a
[ 2866.132281]  [<ffffffff81a12180>] ? gs_change+0x13/0x13
[ 2866.308506] IPv4: Attempt to release TCP socket in state 1 ffff880019ec0000
[ 2866.309689] =============================================================================
[ 2866.310254] BUG TCP (Not tainted): Object already free
[ 2866.310254] -----------------------------------------------------------------------------
[ 2866.310254]

The bug comes from the fact that timer set in sk_reset_timer() can run
before we actually do the sock_hold(). socket refcount reaches zero and
we free the socket too soon.

timer handler is not allowed to reduce socket refcnt if socket is owned
by the user, or we need to change sk_reset_timer() implementation.

We should take a reference on the socket in case TCP_DELACK_TIMER_DEFERRED
or TCP_DELACK_TIMER_DEFERRED bit are set in tsq_flags

Also fix a typo in tcp_delack_timer(), where TCP_WRITE_TIMER_DEFERRED
was used instead of TCP_DELACK_TIMER_DEFERRED.

For consistency, use same socket refcount change for TCP_MTU_REDUCED_DEFERRED,
even if not fired from a timer.

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Tested-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Linus Torvalds [Tue, 21 Aug 2012 19:25:24 +0000 (12:25 -0700)]

Merge branch 'audit-fixes' of git://git./linux/kernel/git/mszeredi/vfs

Pull audit-tree fixes from Miklos Szeredi:
"The audit subsystem maintainers (Al and Eric) are not responding to
  repeated resends.  Eric did ack them a while ago, but no response
  since then.  So I'm sending these directly to you."

* 'audit-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
  audit: clean up refcounting in audit-tree
  audit: fix refcounting in audit-tree
  audit: don't free_chunk() after fsnotify_add_mark()

commit | commitdiff | tree

Linus Torvalds [Tue, 21 Aug 2012 17:08:39 +0000 (10:08 -0700)]

Merge branch 'for-linus' of git://git./linux/kernel/git/gerg/m68knommu

Pull m68knommu arch fixes from Greg Ungerer:
"This contains 2 fixes.  One fixes compilation of ColdFire clk code,
  the other makes sure we use the generic atomic64 support on all m68k
  targets."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu:
  m68k: select CONFIG_GENERIC_ATOMIC64 for all m68k CPU types
  m68knommu: select CONFIG_HAVE_CLK for ColdFire CPU types

commit | commitdiff | tree

Linus Torvalds [Tue, 21 Aug 2012 17:07:41 +0000 (10:07 -0700)]

Merge tag 'pinctrl-fixes-v3.6-rc3' of git://git./linux/kernel/git/linusw/linux-pinctrl

Pull pin control fixes from Linus Walleij:
- Fixed Nomadik errorpath
- Fixed documentation spelling errors
- Forward-declare struct device in a header file
- Remove some extraneous code lines when getting pinctrl states
- Correct the i.MX51 configure register number
- Fix the Nomadik keypad function group list

* tag 'pinctrl-fixes-v3.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
  pinctrl/nomadik: add kp_b_2 keyboard function group list
  pinctrl: imx51: fix .conf_reg of MX51_PAD_SD2_CMD__CSPI_MOSI
  trivial: pinctrl core: remove extraneous code lines
  pinctrl: header: trivial: declare struct device
  Documentation/pinctrl.txt: Fix some misspelled macros
  pinctrl/nomadik: fix null in irqdomain errorpath

commit | commitdiff | tree

Linus Torvalds [Tue, 21 Aug 2012 16:17:05 +0000 (09:17 -0700)]

Merge tag 'sound-3.6' of git://git./linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
"This update became slightly bigger than usual for rc3, but most of the
  commits are small and trivial.  A large chunk is found for HD-audio
  ca0132 codec, which is mostly a clean up of the specific code, to make
  SPDIF working properly, and also in the new ASoC Arizona driver.

  One important fix is for usb-audio Oops fix since 3.5.  We still see
  some EHCI related bandwidth problem, but usb-audio should be more
  stabilized now.

  Other than that, a Kconfig fix is spread over files, and various
  HD-audio and ASoC fixes as usual, in addition to Julia's error path
  fixes."

* tag 'sound-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (42 commits)
  ALSA: snd-als100: fix suspend/resume
  ALSA: hda - Fix leftover codec->power_transition
  ALSA: hda - don't create dysfunctional mixer controls for ca0132
  ALSA: sound/ppc/snd_ps3.c: fix error return code
  ALSA: sound/pci/rme9652/hdspm.c: fix error return code
  ALSA: sound/pci/sis7019.c: fix error return code
  ALSA: sound/pci/ctxfi/ctatc.c: fix error return code
  ALSA: sound/atmel/ac97c.c: fix error return code
  ALSA: sound/atmel/abdac.c: fix error return code
  ALSA: fix pcm.h kernel-doc warning and notation
  sound: oss/sb_audio: prevent divide by zero bug
  ASoC: wm9712: Fix inverted capture volume
  ASoC: wm9712: Fix microphone source selection
  ASoC: wm5102: Remove DRC2
  ALSA: hda - Don't send invalid volume knob command on IDT 92hd75bxx
  ALSA: usb-audio: Fix scheduling-while-atomic bug in PCM capture stream
  ALSA: lx6464es: Add a missing error check
  ALSA: hda - Fix 'Beep Playback Switch' with no underlying mute switch
  ASoC: jack: Always notify full jack status
  ASoC: wm5110: Add missing input PGA routes
  ...

commit | commitdiff | tree

Eric Dumazet [Tue, 21 Aug 2012 13:05:14 +0000 (15:05 +0200)]

task_work: add a scheduling point in task_work_run()

It seems commit 4a9d4b024a31 ("switch fput to task_work_add") re-
introduced the problem addressed in 944be0b22472 ("close_files(): add
scheduling point")

If a server process with a lot of files (say 2 million tcp sockets) is
killed, we can spend a lot of time in task_work_run() and trigger a soft
lockup.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit | commitdiff | tree

Dave Airlie [Tue, 21 Aug 2012 06:40:07 +0000 (16:40 +1000)]

fbcon: fix race condition between console lock and cursor timer

So we've had a fair few reports of fbcon handover breakage between
efi/vesafb and i915 surface recently, so I dedicated a couple of
days to finding the problem.

Essentially the last thing we saw was the conflicting framebuffer
message and that was all.

So after much tracing with direct netconsole writes (printks
under console_lock not so useful), I think I found the race.

  Thread A (driver load)    Thread B (timer thread)
    unbind_con_driver ->              |
    bind_con_driver ->                |
    vc->vc_sw->con_deinit ->          |
    fbcon_deinit ->                   |
    console_lock()                    |
        |                             |
        |                       fbcon_flashcursor timer fires
        |                       console_lock() <- blocked for A
        |
        |
  fbcon_del_cursor_timer ->
    del_timer_sync
    (BOOM)

Of course because all of this is under the console lock,
we never see anything, also since we also just unbound the active
console guess what we never see anything.

Hopefully this fixes the problem for anyone seeing vesafb->kms
driver handoff.

Signed-off-by: David Airlie <airlied@redhat.com>
Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: stable@vger.kernel.org
Tested-by: Josh Boyer <jwboyer@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit | commitdiff | tree

Jesse Brandeburg [Thu, 26 Jul 2012 02:31:19 +0000 (02:31 +0000)]

igb: update to allow reading/setting MDI state

This is the implementation for igb to allow forcing MDI state
via ethtool, allowing users to work around some improperly
behaving switches.

Forcing in this driver is for now only allowed when auto-neg is
enabled.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
CC: Carolyn Wyborny <carolyn.wyborny@intel.com>
Tested-by: Aaron Brown aaron.f.brown@intel.com
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

commit | commitdiff | tree

Jesse Brandeburg [Thu, 26 Jul 2012 02:31:14 +0000 (02:31 +0000)]

e1000e: implement MDI/MDI-X control

Some users report issues with link failing when connected to certain
switches. This gives the user the ability to control the MDI state
from the driver, allowing users to work around some improperly
behaving switches.

Forcing in this driver is for now only allowed when auto-neg is
enabled.

This is in regards to the related ethtool app patch and
bugzilla.kernel.org bug 11998

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
CC: bruce.w.allan@intel.com
CC: n.poppelier@xs4all.nl
CC: bastien@durel.org
CC: jsveiga@it.eng.br
Tested-by: Aaron Brown aaron.f.brown@intel.com
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

commit | commitdiff | tree

Jesse Brandeburg [Thu, 26 Jul 2012 02:31:09 +0000 (02:31 +0000)]

e1000: configure and read MDI settings

This is the implementation in e1000 to allow ethtool to force
MDI state, allowing users to work around some improperly
behaving switches.

Forcing in this driver is for now only allowed when auto-neg is enabled.

To use must have the matching version of ethtool app that supports
this functionality.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
CC: Tushar Dave <tushar.n.dave@intel.com>
Tested-by: Aaron Brown aaron.f.brown@intel.com
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

commit | commitdiff | tree

Jesse Brandeburg [Thu, 26 Jul 2012 02:31:04 +0000 (02:31 +0000)]

igb: implement 580 MDI setting support

In order for igb to support MDI setting support via
ethtool this code is needed to allow setting the MDI state
via software.

This is in regards to the related ethtool patch

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Aaron Brown aaron.f.brown@intel.com
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

commit | commitdiff | tree

Bruce W Allan [Thu, 26 Jul 2012 02:30:59 +0000 (02:30 +0000)]

e1000e: implement 82577/579 MDI setting support

In order for e1000e to support MDI setting support via
ethtool this code is needed to allow setting the MDI state
via software.

This is in regards to the related ethtool patch and
fixes bugzilla.kernel.org bug 11998

Signed-off-by: Bruce W Allan <bruce.w.allan@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Aaron Brown aaron.f.brown@intel.com
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

commit | commitdiff | tree

Jesse Brandeburg [Thu, 26 Jul 2012 02:30:53 +0000 (02:30 +0000)]

ethtool.h: MDI setting support

This change modifies the core ethtool struct to allow a driver to
support setting of MDI/MDI-X state for twisted pair wiring. This
change uses a previously reserved u8 and should not change any
binary compatibility of ethtool.

Also as per Ben Hutchings' suggestion, the capabilities are
stored in a separate byte so the driver can report if it supports
changing settings.

see thread: http://kerneltrap.org/mailarchive/linux-netdev/2010/11/17/6289820/thread

see ethtool patches titled:
ethtool: allow setting MDI-X state

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
CC: Ben Hutchings <bhutchings@solarflare.com>
Tested-by: Aaron Brown aaron.f.brown@intel.com
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

commit | commitdiff | tree

Ondrej Zary [Mon, 20 Aug 2012 19:50:13 +0000 (21:50 +0200)]

ALSA: snd-als100: fix suspend/resume

snd_card_als100_probe() does not set pcm field in struct snd_sb.
As a result, PCM is not suspended and applications don't know that they need
to resume the playback.

Tested with Labway A381-F20 card (ALS120).

Signed-off-by: Ondrej Zary <linux@rainbow-software.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>

commit | commitdiff | tree

Linus Torvalds [Mon, 20 Aug 2012 23:42:41 +0000 (16:42 -0700)]

Merge branch 'for-linus' of git://git./linux/kernel/git/helgaas/pci

Pull PCI fixes from Bjorn Helgaas:
"Here are two patches from Rafael Wysocki.

  One fixes an EHCI-related hibernation crash on ASUS boxes.  We fixed a
  similar suspend issue in v3.6-rc1, and this applies the same fix to
  the hibernate path.

  The other fixes D3/D3cold/D4 messages related to the D3cold support we
  merged in v3.6-rc1."

(Removed redundant top non-fast-forward merge commit from pulled branch)

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  PCI: EHCI: Fix crash during hibernation on ASUS computers
  PCI / PM: Fix D3/D3cold/D4 messages printed by acpi_pci_set_power_state()

commit | commitdiff | tree

Linus Torvalds [Mon, 20 Aug 2012 22:26:28 +0000 (15:26 -0700)]

Merge tag 'please-pull-ia64-fixes' of git://git./linux/kernel/git/aegl/linux

Pull config cleanup for ia64 from Tony Luck:
"Clean out references to dead CONFIG_MISC_DEVICES option"

* tag 'please-pull-ia64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux:
[IA64] defconfig: Remove CONFIG_MISC_DEVICES

commit | commitdiff | tree

Linus Torvalds [Mon, 20 Aug 2012 20:14:22 +0000 (13:14 -0700)]

Merge tag 'usb-3.6-rc3' of git://git./linux/kernel/git/gregkh/usb

Pull more USB patches from Greg Kroah-Hartman:
"Here are 10 more USB patches for 3.6-rc3.  They all fix reported
  problems (build problems for one of them, and easily repeatable oopses
  for the others.)

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>"
* tag 'usb-3.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  gpu/mfd/usb: Fix USB randconfig problems
  USB: CDC ACM: Fix NULL pointer dereference
  USB: emi62: remove __devinit* from the struct usb_device_id table
  USB: winbond: remove __devinit* from the struct usb_device_id table
  USB: vt6656: remove __devinit* from the struct usb_device_id table
  USB: rtl8187: remove __devinit* from the struct usb_device_id table
  USB: p54usb: remove __devinit* from the struct usb_device_id table
  USB: spca506: remove __devinit* from the struct usb_device_id table
  USB: jl2005bcd: remove __devinit* from the struct usb_device_id table
  USB: smsusb: remove __devinit* from the struct usb_device_id table

commit | commitdiff | tree

Linus Torvalds [Mon, 20 Aug 2012 20:13:47 +0000 (13:13 -0700)]

Merge tag 'driver-core-3.6-rc3' of git://git./linux/kernel/git/gregkh/driver-core

Pull one more driver core fix from Greg Kroah-Hartman:
"Here is one fix for the dmesg line corruption problem that the
previous set of patches caused.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>"
* tag 'driver-core-3.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
dyndbg: fix for SOH in logging messages

commit | commitdiff | tree

Linus Torvalds [Mon, 20 Aug 2012 20:12:41 +0000 (13:12 -0700)]

Merge branch 'for_linus' of git://cavan.codon.org.uk/platform-drivers-x86

Pull x86 platform driver update from Matthew Garrett:
"Some small updates for a few drivers, and some hardware enablement for
  new Ideapads and the gmux hardware in the latest Macs.

  This code won't run on older devices and has been well tested on new
  ones, so low risk of regressions."

* 'for_linus' of git://cavan.codon.org.uk/platform-drivers-x86:
  ideapad: add Lenovo IdeaPad Z570 support (part 3)
  ideapad: add Lenovo IdeaPad Z570 support (part 2)
  ideapad: add Lenovo IdeaPad Z570 support (part 1)
  classmate-laptop: always call input_sync() after input_report_switch()
  thinkpad-acpi: recognize latest V-Series using DMI_BIOS_VENDOR
  dell-laptop: Fixed typo in touchpad LED quirk
  vga_switcheroo: Don't require handler init callback
  vga_switcheroo: Remove assumptions about registration/unregistration ordering
  apple-gmux: Add display mux support
  apple-gmux: Fix kconfig dependencies
  asus-wmi: record wlan status while controlled by userapp
  apple_gmux: Fix ACPI video unregister
  apple_gmux: Add support for newer hardware
  gmux: Add generic write32 function

commit | commitdiff | tree

Linus Torvalds [Mon, 20 Aug 2012 20:11:00 +0000 (13:11 -0700)]

Merge tag 'hwmon-for-linus' of git://git./linux/kernel/git/groeck/linux-staging

Pull a hwmon fix from Guenter Roeck:
"One patch with section conflict fixes."

* tag 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
sections: Fix section conflicts in drivers/hwmon

commit | commitdiff | tree

Linus Torvalds [Mon, 20 Aug 2012 20:05:27 +0000 (13:05 -0700)]

Merge tag 'spi-3.6' of git://git./linux/kernel/git/broonie/misc

Pull spi fixes from Mark Brown:
"Grant is still away so another pull request with some fairly minor
  fixes, the most notable of which are several fixes for some common
  error patterns with the reference counting spi_master_get/put do."

* tag 'spi-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/misc:
  spi/coldfire-qspi: Drop extra calls to spi_master_get in suspend/resume functions
  spi: spi-coldfire-qspi: Drop extra spi_master_put in device remove function
  spi/pl022: fix spi-pl022 pm enable at probe
  spi/bcm63xx: Ensure that memory is freed only after it is no longer used
  spi: omap2-mcspi: Fix the error handling in probe
  spi/s3c64xx: Add missing static storage class specifiers

commit | commitdiff | tree

Fabio Estevam [Sat, 18 Aug 2012 16:23:49 +0000 (13:23 -0300)]

[IA64] defconfig: Remove CONFIG_MISC_DEVICES

commit 7c5763b845 (drivers:misc: Remove MISC_DEVICES config option) removed
CONFIG_MISC_DEVICES option, so remove the occurrences from the config files
as well.

Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>

commit | commitdiff | tree

Linus Torvalds [Mon, 20 Aug 2012 19:59:51 +0000 (12:59 -0700)]

Merge tag 'regulator-3.6' of git://git./linux/kernel/git/broonie/regulator

Pull regulator fixes from Mark Brown:
"A bunch of fixes which are a combination of minor fixes that have been
  shaken down due to greater testing exposure, the biggest block of
  which are for the Palmas driver which hadn't had all the changes
  required for mainline properly tested when it was merged."

* tag 'regulator-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
  regulator: twl-regulator: fix up VINTANA1/VINTANA2
  regulator: core: request only valid gpio pins for regulator enable
  regulator: twl: Remove references to the twl4030 regulator
  regulator: gpio-regulator: Split setting of voltages and currents
  regulator: ab3100: add missing voltage table
  regulator: anatop: Fix wrong mask used in anatop_get_voltage_sel
  regulator: tps6586x: correct vin pin for sm0/sm1/sm2
  regulator: palmas: Fix palmas_probe error handling
  regulator: palmas: Call palmas_ldo_[read|write] in palmas_ldo_init
  regulator: palmas: Fix regmap offsets for PALMAS_REG_SMPS10 vsel_reg
  regulator: palmas: Fix calculating selector in palmas_map_voltage_ldo

commit | commitdiff | tree

Linus Torvalds [Mon, 20 Aug 2012 19:59:08 +0000 (12:59 -0700)]

Merge tag 'iommu-fixes-v3.6-rc2' of git://git./linux/kernel/git/joro/iommu

Pull IOMMU fixes from Joerg Roedel:
"Two fixes are necessary.  One patch fixes a boot crash on MacBook Air
  with interrupt remapping enabled and the other patch fixes a
  regression (which causes a boot crash on AMD IOMMUv2 systems too) in
  the init code of the AMD IOMMU driver."

* tag 'iommu-fixes-v3.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
  iommu/amd: Fix wrong check for ARRAY_SIZE()
  irq_remap: disable IRQ remapping if any IOAPIC lacks an IOMMU

John Crispins staging tree

RSS Atom