Linus Luessing [Sat, 10 Mar 2012 22:17:52 +0000 (06:17 +0800)]
batman-adv: Adding hard_iface specific sysfs wrapper macros for UINT
This allows us to easily add a sysfs parameter for an unsigned int
later, which is not for a batman mesh interface (e.g. bat0), but for a
common interface instead. It allows reading and writing an atomic_t in
hard_iface (instead of bat_priv compared to the mesh variant).
Developed by Linus during a 6 months trainee study period in Ascom
(Switzerland) AG.
Signed-off-by: Linus Luessing <linus.luessing@web.de>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Marek Lindner [Sat, 10 Mar 2012 22:17:51 +0000 (06:17 +0800)]
batman-adv: rename sysfs macros to reflect the soft-interface dependency
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Marek Lindner [Sat, 10 Mar 2012 22:17:50 +0000 (06:17 +0800)]
batman-adv: refactoring API: find generalized name for bat_ogm_update_mac callback
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Marek Lindner [Sat, 10 Mar 2012 22:17:49 +0000 (06:17 +0800)]
batman-adv: ignore protocol packets if the interface did not enable this protocol
Reported-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Marek Lindner [Thu, 1 Mar 2012 07:35:21 +0000 (15:35 +0800)]
batman-adv: split neigh_new function into generic and batman iv specific parts
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Acked-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Marek Lindner [Thu, 1 Mar 2012 07:35:20 +0000 (15:35 +0800)]
batman-adv: replace HZ calculations with jiffies_to_msecs()
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Acked-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Marek Lindner [Thu, 1 Mar 2012 07:35:19 +0000 (15:35 +0800)]
batman-adv: rename last_valid to last_seen
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Acked-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Marek Lindner [Sun, 4 Mar 2012 08:56:25 +0000 (16:56 +0800)]
batman-adv: register batman ogm receive function during protocol init
The B.A.T.M.A.N. IV OGM receive function still was hard-coded although
it is a routing protocol specific function. This patch takes advantage
of the dynamic packet handler registration to remove the hard-coded
function calls.
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Acked-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Marek Lindner [Thu, 1 Mar 2012 07:35:17 +0000 (15:35 +0800)]
batman-adv: introduce packet type handler array for incoming packets
The packet handler array replaces the growing switch statement, thus
dealing with incoming packets in a more efficient way. It also adds
to possibility to register packet handlers on the fly.
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Acked-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Marek Lindner [Thu, 1 Mar 2012 07:35:16 +0000 (15:35 +0800)]
batman-adv: introduce is_single_hop_neigh variable to increase readability
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Acked-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Antonio Quartulli [Mon, 27 Feb 2012 10:29:53 +0000 (11:29 +0100)]
batman-adv: fix wrong dhcp option list browsing
In is_type_dhcprequest(), while parsing a DHCP message, if the entry we found in
the option list is neither a padding nor the dhcp-type, we have to ignore it and
jump as many bytes as its length + 1. The "+ 1" byte is given by the subtype
field itself that has to be jumped too.
Reported-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
alex.bluesman.smirnov@gmail.com [Thu, 10 May 2012 03:25:52 +0000 (03:25 +0000)]
6lowpan: IPv6 link local address
According to the RFC4944 (Transmission of IPv6 Packets over
IEEE 802.15.4 Networks), chapter 7:
The IPv6 link-local address [RFC4291] for an IEEE 802.15.4 interface
is formed by appending the Interface Identifier, as defined above, to
the prefix FE80::/64.
10 bits 54 bits 64 bits
+----------+-----------------------+----------------------------+
|
1111111010| (zeros) | Interface Identifier |
+----------+-----------------------+----------------------------+
This patch adds IPv6 address generation support for the 6lowpan
interfaces.
Signed-off-by: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 10 May 2012 07:51:25 +0000 (07:51 +0000)]
codel: Controlled Delay AQM
An implementation of CoDel AQM, from Kathleen Nichols and Van Jacobson.
http://queue.acm.org/detail.cfm?id=
2209336
This AQM main input is no longer queue size in bytes or packets, but the
delay packets stay in (FIFO) queue.
As we don't have infinite memory, we still can drop packets in enqueue()
in case of massive load, but mean of CoDel is to drop packets in
dequeue(), using a control law based on two simple parameters :
target : target sojourn time (default 5ms)
interval : width of moving time window (default 100ms)
Based on initial work from Dave Taht.
Refactored to help future codel inclusion as a plugin for other linux
qdisc (FQ_CODEL, ...), like RED.
include/net/codel.h contains codel algorithm as close as possible than
Kathleen reference.
net/sched/sch_codel.c contains the linux qdisc specific glue.
Separate structures permit a memory efficient implementation of fq_codel
(to be sent as a separate work) : Each flow has its own struct
codel_vars.
timestamps are taken at enqueue() time with 1024 ns precision, allowing
a range of 2199 seconds in queue, and 100Gb links support. iproute2 uses
usec as base unit.
Selected packets are dropped, unless ECN is enabled and packets can get
ECN mark instead.
Tested from 2Mb to 10Gb speeds with no particular problems, on ixgbe and
tg3 drivers (BQL enabled).
Usage: tc qdisc ... codel [ limit PACKETS ] [ target TIME ]
[ interval TIME ] [ ecn ]
qdisc codel 10: parent 1:1 limit 2000p target 3.0ms interval 60.0ms ecn
Sent
13347099587 bytes
8815805 pkt (dropped 0, overlimits 0 requeues 0)
rate 202365Kbit 16708pps backlog
113550b 75p requeues 0
count 116 lastcount 98 ldelay 4.3ms dropping drop_next 816us
maxpacket 1514 ecn_mark 84399 drop_overlimit 0
CoDel must be seen as a base module, and should be used keeping in mind
there is still a FIFO queue. So a typical setup will probably need a
hierarchy of several qdiscs and packet classifiers to be able to meet
whatever constraints a user might have.
One possible example would be to use fq_codel, which combines Fair
Queueing and CoDel, in replacement of sfq / sfq_red.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Dave Taht <dave.taht@bufferbloat.net>
Cc: Kathleen Nichols <nichols@pollere.com>
Cc: Van Jacobson <van@pollere.net>
Cc: Tom Herbert <therbert@google.com>
Cc: Matt Mathis <mattmathis@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 10 May 2012 05:36:34 +0000 (05:36 +0000)]
net_sched: update bstats in dequeue()
Class bytes/packets stats can be misleading because they are updated in
enqueue() while packet might be dropped later.
We already fixed all qdiscs but sch_atm.
This patch makes the final cleanup.
class rate estimators can now match qdisc ones.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Joe Perches [Wed, 9 May 2012 17:04:04 +0000 (17:04 +0000)]
net, drivers/net: Convert compare_ether_addr_64bits to ether_addr_equal_64bits
Use the new bool function ether_addr_equal_64bits to add
some clarity and reduce the likelihood for misuse of
compare_ether_addr_64bits for sorting.
Done via cocci script:
$ cat compare_ether_addr_64bits.cocci
@@
expression a,b;
@@
- !compare_ether_addr_64bits(a, b)
+ ether_addr_equal_64bits(a, b)
@@
expression a,b;
@@
- compare_ether_addr_64bits(a, b)
+ !ether_addr_equal_64bits(a, b)
@@
expression a,b;
@@
- !ether_addr_equal_64bits(a, b) == 0
+ ether_addr_equal_64bits(a, b)
@@
expression a,b;
@@
- !ether_addr_equal_64bits(a, b) != 0
+ !ether_addr_equal_64bits(a, b)
@@
expression a,b;
@@
- ether_addr_equal_64bits(a, b) == 0
+ !ether_addr_equal_64bits(a, b)
@@
expression a,b;
@@
- ether_addr_equal_64bits(a, b) != 0
+ ether_addr_equal_64bits(a, b)
@@
expression a,b;
@@
- !!ether_addr_equal_64bits(a, b)
+ ether_addr_equal_64bits(a, b)
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Joe Perches [Wed, 9 May 2012 17:04:03 +0000 (17:04 +0000)]
etherdevice.h: Add ether_addr_equal_64bits
Add an optimized boolean function to check if
2 ethernet addresses are the same.
This is to avoid any confusion about compare_ether_addr_64bits
returning an unsigned, and not being able to use the
compare_ether_addr_64bits function for sorting ala memcmp.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Joe Perches [Wed, 9 May 2012 17:17:46 +0000 (17:17 +0000)]
drivers/net: Convert compare_ether_addr to ether_addr_equal
Use the new bool function ether_addr_equal to add
some clarity and reduce the likelihood for misuse
of compare_ether_addr for sorting.
Done via cocci script:
$ cat compare_ether_addr.cocci
@@
expression a,b;
@@
- !compare_ether_addr(a, b)
+ ether_addr_equal(a, b)
@@
expression a,b;
@@
- compare_ether_addr(a, b)
+ !ether_addr_equal(a, b)
@@
expression a,b;
@@
- !ether_addr_equal(a, b) == 0
+ ether_addr_equal(a, b)
@@
expression a,b;
@@
- !ether_addr_equal(a, b) != 0
+ !ether_addr_equal(a, b)
@@
expression a,b;
@@
- ether_addr_equal(a, b) == 0
+ !ether_addr_equal(a, b)
@@
expression a,b;
@@
- ether_addr_equal(a, b) != 0
+ ether_addr_equal(a, b)
@@
expression a,b;
@@
- !!ether_addr_equal(a, b)
+ ether_addr_equal(a, b)
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sathya Perla [Tue, 8 May 2012 19:41:24 +0000 (19:41 +0000)]
be2net: avoid disabling sriov while VFs are assigned
Calling pci_disable_sriov() while VFs are assigned to VMs causes
kernel panic. This patch uses PCI_DEV_FLAGS_ASSIGNED bit state of the
VF's pci_dev to avoid this. Also, the unconditional function reset cmd
issued on a PF probe can delete the VF configuration for the
previously enabled VFs. A scratchpad register is now used to issue a
function reset only when needed (i.e., in a crash dump scenario.)
Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
James Chapman [Wed, 9 May 2012 23:43:09 +0000 (23:43 +0000)]
l2tp: fix data packet sequence number handling
If enabled, L2TP data packets have sequence numbers which a receiver
can use to drop out of sequence frames or try to reorder them. The
first frame has sequence number 0, but the L2TP code currently expects
it to be 1. This results in the first data frame being handled as out
of sequence.
This one-line patch fixes the problem.
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
James Chapman [Wed, 9 May 2012 23:43:08 +0000 (23:43 +0000)]
l2tp: fix reorder timeout recovery
When L2TP data packet reordering is enabled, packets are held in a
queue while waiting for out-of-sequence packets. If a packet gets
lost, packets will be held until the reorder timeout expires, when we
are supposed to then advance to the sequence number of the next packet
but we don't currently do so. As a result, the data channel is stuck
because we are waiting for a packet that will never arrive - all
packets age out and none are passed.
The fix is to add a flag to the session context, which is set when the
reorder timeout expires and tells the receive code to reset the next
expected sequence number to that of the next packet in the queue.
Tested in a production L2TP network with Starent and Nortel L2TP gear.
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Emelyanov [Thu, 10 May 2012 01:50:20 +0000 (01:50 +0000)]
tcp: Out-line tcp_try_rmem_schedule
As proposed by Eric, make the tcp_input.o thinner.
add/remove: 1/1 grow/shrink: 1/4 up/down: 868/-1329 (-461)
function old new delta
tcp_try_rmem_schedule - 864 +864
tcp_ack 4811 4815 +4
tcp_validate_incoming 817 815 -2
tcp_collapse 860 858 -2
tcp_send_rcvq 555 353 -202
tcp_data_queue 3435 3033 -402
tcp_prune_queue 721 - -721
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Emelyanov [Thu, 10 May 2012 01:50:01 +0000 (01:50 +0000)]
tcp: Schedule rmem for rcvq repair send
As noted by Eric, no checks are performed on the data size we're
putting in the read queue during repair. Thus, validate the given
data size with the common rmem management routine.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Emelyanov [Thu, 10 May 2012 01:49:41 +0000 (01:49 +0000)]
tcp: Move rcvq sending to tcp_input.c
It actually works on the input queue and will use its read mem
routines, thus it's better to have in in the tcp_input.c file.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 11 May 2012 03:16:35 +0000 (23:16 -0400)]
Merge branch 'master' of git://git./linux/kernel/git/jkirsher/net-next
Don Skidmore [Sat, 28 Apr 2012 03:29:22 +0000 (03:29 +0000)]
ixgbe: update version number
Update version number to better match the version of the out of tree
driver with similar functionality.
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Don Skidmore [Fri, 4 May 2012 06:07:08 +0000 (06:07 +0000)]
ixgbe: cleanup the hwmon function calls
When the hwmon code was initially added it was with the assumption that a
sysfs patch would be also coming soon. Since that isn't the case some
clean up needs to be done. This patch does that.
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Tested-by: Stephen Ko <stephen.s.ko@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Fri, 4 May 2012 01:55:23 +0000 (01:55 +0000)]
ixgbe: support software timestamping
Kernel software timestamping requires that the driver calls skb_tx_timestamp
just before passing the skb to the MAC, in order to provide the best software
timestamps. This patch adds this call for that support.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Fri, 4 May 2012 02:56:12 +0000 (02:56 +0000)]
ixgbe: add support for get_ts_info
This patch adds support for the ethtool get_ts_info operation, which enables
access of available timestamp/timesync support for that device. It can query
which ptp clock device is associated with the particular port.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Stephen Ko <stephen.s.ko@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Thu, 3 May 2012 01:44:12 +0000 (01:44 +0000)]
ixgbe: correct disable_rx_buff timeout
The current value of the udelay timeout for ixgbe_disable_rx_buff is too
short. This causes the security path to not not be properly disabled during
the section that is meant to have it turned off. The end result causes a race
condition that results in RX issues.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob E Keller [Tue, 1 May 2012 05:24:41 +0000 (05:24 +0000)]
ixgbe: Enable timesync clock-out feature for PPS support on X540
This patch enables the PPS system in the PHC framework, by enabling
the clock-out feature on the X540 device. Causes the SDP0 to be set as
a 1Hz clock. Also configures the timesync interrupt cause in order to
report each pulse to the PPS via the PHC framework, which can be used
for general system clock synchronization. (This allows a stable method
for tuning the general system time via the on-board SYSTIM register
based clock.)
Signed-off-by: Jacob E Keller <jacob.e.keller@intel.com>
Tested-by: Stephen Ko <stephen.s.ko@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Tue, 1 May 2012 05:24:58 +0000 (05:24 +0000)]
ixgbe: Hardware Timestamping + PTP Hardware Clock (PHC)
This patch enables hardware timestamping for use with PTP software by
extracting a ns counter from an arbitrary fixed point cycles counter.
The hardware generates SYSTIME registers using the DMA tick which
changes based on the current link speed. These SYSTIME registers are
converted to ns using the cyclecounter and timecounter structures
provided by the kernel. Using the SO_TIMESTAMPING api, software can
enable and access timestamps for PTP packets.
The SO_TIMESTAMPING API has space for 3 different kinds of timestamps,
SYS, RAW, and SOF. SYS hardware timestamps are hardware ns values that
are then scaled to the software clock. RAW hardware timestamps are the
direct raw value of the ns counter. SOF software timestamps are the
software timestamp calculated as close as possible to the software
transmit, but are not offloaded to the hardware. This patch only
supports the RAW hardware timestamps due to inefficiency of the SYS
design.
This patch also enables the PHC subsystem features for atomically
adjusting the cycle register, and adjusting the clock frequency in
parts per billion. This frequency adjustment works by slightly
adjusting the value added to the cycle registers each DMA tick. This
causes the hardware registers to overflow rapidly (approximately once
every 34 seconds, when at 10gig link). To solve this, the timecounter
structure is used, along with a timer set for every 25 seconds. This
allows for detecting register overflow and converting the cycle
counter registers into ns values needed for providing useful
timestamps to the network stack.
Only the basic required clock functions are supported at this time,
although the hardware supports some ancillary features and these could
easily be enabled in the future.
Note that use of this hardware timestamping requires modifying daemon
software to use the SO_TIMESTAMPING API for timestamps, and the
ptp_clock PHC framework for accessing the clock. The timestamps have
no relation to the system time at all, so software must use the posix
clock generated by the PHC framework instead.
Signed-off-by: Jacob E Keller <jacob.e.keller@intel.com>
Tested-by: Stephen Ko <stephen.s.ko@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Greg Rose [Sat, 21 Apr 2012 00:54:28 +0000 (00:54 +0000)]
ixgbe: Fix bogus error message
If the VF sends a MACVLAN request with index of zero then it is not
actually trying to add a filter. Check the index value and only
indicate that operation is not allowed when the VF is actually trying
to add a filter.
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Alexander Duyck [Wed, 25 Apr 2012 04:36:38 +0000 (04:36 +0000)]
ixgbe: Set Drop_EN bit when multiple Rx queues are present w/o flow control
The drop enable bit can be used to improve the performance of the adapter
in the case of multiple queues being present. This performance gain is due
to the fact that some slower CPUs can cause the FIFO to backfill preventing
faster CPUs from receiving additional work. By setting the drop enable bit
we prevent this and instead just drop the packets that would have been
bound for the slower CPU.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Alexander Duyck [Thu, 10 May 2012 05:14:44 +0000 (22:14 -0700)]
ixgbe: Clean up priority based flow control
This change cleans up the logic in the priority based flow control
configuration routines. Both the 82599 and 82598 based routines perform
similar functions however they are both arranged completely differently.
This patch goes over both of them to clean up the code.
In addition I am dropping the ixgbe_fc_pfc flow control mode and instead
just replacing it with checks for if priority flow control is enabled.
This allows us to maintain some of the link flow control information which
allows for an easier transition between link and priority flow control.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Alexander Duyck [Wed, 28 Mar 2012 08:03:38 +0000 (08:03 +0000)]
ixgbe: Exit on error case in VF message processing
Previously we would get a mailbox error and still process the message.
Instead we should exit on error.
In addition we should also be flushing the ACK of the message so that we
can guarantee that the other end is aware we have received the message
while we are processing it.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Koki Sanagi [Wed, 15 Feb 2012 14:45:39 +0000 (14:45 +0000)]
igb: output register's information related to RX/TX queue[4-15]
Current igb outputs registers related to TX/RX queues(ex. RDT, RDH, TDT, TDH).
But it thinks the number of RX/TX queues is 4. But 82576 has 16 RX/TX queues.
This patch modifies igb to output the rest of the registers if the device is
82576.
Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com>
Acked-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jeff Kirsher [Thu, 10 May 2012 04:12:37 +0000 (21:12 -0700)]
Merge branch 'master' of git://git./linux/kernel/git/davem/net-next
Rajesh Borundia [Wed, 9 May 2012 05:55:30 +0000 (05:55 +0000)]
netxen_nic: Fix estimation of recv MSS in case of LRO
o Linux stack estimates MSS from skb->len or skb_shinfo(skb)->gso_size.
In case of LRO skb->len is aggregate of len of number of packets hence MSS
obtained using skb->len would be incorrect. Incorrect estimation of recv MSS
would lead to delayed acks in some traffic patterns (which sends two or three
packets and wait for ack and only then send remaining packets). This leads to
drop in performance. Hence we need to set gso_size to MSS obtained from firmware.
o This is fixed recently in firmware hence the MSS is obtained based on
capability. If fw is capable of sending the MSS then only driver sets the gso_size.
Signed-off-by: Rajesh Borundia <rajesh.borundia@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sucheta Chakraborty [Wed, 9 May 2012 05:55:29 +0000 (05:55 +0000)]
netxen: added miniDIMM support in driver.
Driver queries DIMM information from firmware and accordingly
sets "presence" field of the structure.
"presence" field when set to 0xff denotes invalid flag. And when
set to 0x0 denotes DIMM memory is not present.
Signed-off-by: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com>
Signed-off-by: Rajesh Borundia <rajesh.borundia@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Manish chopra [Wed, 9 May 2012 05:55:28 +0000 (05:55 +0000)]
netxen_nic: Allow only useful and recommended firmware dump capture mask values
o 0x3, 0x7, 0xF, 0x1F, 0x3F, 0x7F and 0xFF are the allowed capture masks.
Signed-off-by: Manish chopra <manish.chopra@qlogic.com>
Signed-off-by: Rajesh Borundia <rajesh.borundia@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sritej Velaga [Wed, 9 May 2012 05:55:27 +0000 (05:55 +0000)]
netxen_nic: disable minidump by default
disable fw dump by default at start up.
Signed-off-by: Sritej Velaga <sritej.velaga@qlogic.com>
Signed-off-by: Rajesh Borundia <rajesh.borundia@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 10 May 2012 02:51:17 +0000 (22:51 -0400)]
Merge branch 'for-davem' of git://git./linux/kernel/git/bwh/sfc-next
Ben Hutchings [Sat, 5 May 2012 00:36:50 +0000 (01:36 +0100)]
sfc: Implement module EEPROM access for SFE4002 and SFN4112F
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Stuart Hodgson [Tue, 1 May 2012 17:50:43 +0000 (18:50 +0100)]
sfc: Added support for new ethtool APIs for obtaining module eeprom
Currently allows for SFP+ eeprom to be returned using the ethtool API.
This can be extended in future to handle different eeprom formats
and sizes
Signed-off-by: Stuart Hodgson <smhodgson@solarflare.com>
[bwh: Drop redundant validation, comment, whitespace]
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Stuart Hodgson [Thu, 19 Apr 2012 08:44:42 +0000 (09:44 +0100)]
ethtool: Extend the ethtool API to obtain plugin module eeprom data
ETHTOOL_GMODULEINFO returns a new struct ethtool_modinfo that will return the
type and size of plug-in module eeprom (such as SFP+) for parsing
by userland program.
ETHTOOL_GMODULEEEPROM returns the raw eeprom information
using the existing ethtool_eeprom structture to return the data
Signed-off-by: Stuart Hodgson <smhodgson@solarflare.com>
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Ben Hutchings [Thu, 12 Apr 2012 00:42:12 +0000 (00:42 +0000)]
ethtool: Split ethtool_get_eeprom() to allow for additional EEPROM accessors
We want to support reading module (SFP+, XFP, ...) EEPROMs as well as
NIC EEPROMs. They will need a different command number and driver
operation, but the structure and arguments will be the same and so we
can share most of the code here.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
David Riddoch [Wed, 11 Apr 2012 12:12:41 +0000 (13:12 +0100)]
sfc: By default refill RX rings as soon as space for a batch
Previously we refilled with much larger batches, which caused large latency
spikes. We now have many more much much smaller spikes!
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
David Riddoch [Wed, 11 Apr 2012 12:09:24 +0000 (13:09 +0100)]
sfc: Fill RX rings completely full, rather than to 95% full
There was no runtime control of the fast_fill_limit in any case, so purged
that field.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Ben Hutchings [Wed, 4 Apr 2012 23:22:19 +0000 (00:22 +0100)]
sfc: Fix missing cleanup in failure path of efx_pci_probe()
We need to clear the private data pointer in the PCI device.
Also reorder cleanup in efx_pci_remove() for symmetry.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Stuart Hodgson [Fri, 30 Mar 2012 12:04:51 +0000 (13:04 +0100)]
sfc: Do not attempt to flush queues if DMA is disabled
efx_nic_fatal_interrupt() disables DMA before scheduling a reset.
After this, we need not and *cannot* flush queues.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Joe Perches [Tue, 8 May 2012 18:56:57 +0000 (18:56 +0000)]
dsa: Convert compare_ether_addr to ether_addr_equal
Use the new bool function ether_addr_equal to add
some clarity and reduce the likelihood for misuse
of compare_ether_addr for sorting.
Done via cocci script:
$ cat compare_ether_addr.cocci
@@
expression a,b;
@@
- !compare_ether_addr(a, b)
+ ether_addr_equal(a, b)
@@
expression a,b;
@@
- compare_ether_addr(a, b)
+ !ether_addr_equal(a, b)
@@
expression a,b;
@@
- !ether_addr_equal(a, b) == 0
+ ether_addr_equal(a, b)
@@
expression a,b;
@@
- !ether_addr_equal(a, b) != 0
+ !ether_addr_equal(a, b)
@@
expression a,b;
@@
- ether_addr_equal(a, b) == 0
+ !ether_addr_equal(a, b)
@@
expression a,b;
@@
- ether_addr_equal(a, b) != 0
+ ether_addr_equal(a, b)
@@
expression a,b;
@@
- !!ether_addr_equal(a, b)
+ ether_addr_equal(a, b)
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Joe Perches [Tue, 8 May 2012 18:56:56 +0000 (18:56 +0000)]
wireless: Convert compare_ether_addr to ether_addr_equal by hand
spatch/coccinelle isn't perfect. It doesn't understand
__aligned(x) and doesn't convert functions it can't parse.
Convert the remaining compare_ether_addr uses.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Joe Perches [Tue, 8 May 2012 18:56:55 +0000 (18:56 +0000)]
wireless: Convert compare_ether_addr to ether_addr_equal
Use the new bool function ether_addr_equal to add
some clarity and reduce the likelihood for misuse
of compare_ether_addr for sorting.
I removed a conversion from scan.c/cmp_bss_core
that appears to be a sorting function.
Done via cocci script:
$ cat compare_ether_addr.cocci
@@
expression a,b;
@@
- !compare_ether_addr(a, b)
+ ether_addr_equal(a, b)
@@
expression a,b;
@@
- compare_ether_addr(a, b)
+ !ether_addr_equal(a, b)
@@
expression a,b;
@@
- !ether_addr_equal(a, b) == 0
+ ether_addr_equal(a, b)
@@
expression a,b;
@@
- !ether_addr_equal(a, b) != 0
+ !ether_addr_equal(a, b)
@@
expression a,b;
@@
- ether_addr_equal(a, b) == 0
+ !ether_addr_equal(a, b)
@@
expression a,b;
@@
- ether_addr_equal(a, b) != 0
+ ether_addr_equal(a, b)
@@
expression a,b;
@@
- !!ether_addr_equal(a, b)
+ ether_addr_equal(a, b)
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Joe Perches [Tue, 8 May 2012 18:56:54 +0000 (18:56 +0000)]
netfilter: Convert compare_ether_addr to ether_addr_equal
Use the new bool function ether_addr_equal to add
some clarity and reduce the likelihood for misuse
of compare_ether_addr for sorting.
Done via cocci script:
$ cat compare_ether_addr.cocci
@@
expression a,b;
@@
- !compare_ether_addr(a, b)
+ ether_addr_equal(a, b)
@@
expression a,b;
@@
- compare_ether_addr(a, b)
+ !ether_addr_equal(a, b)
@@
expression a,b;
@@
- !ether_addr_equal(a, b) == 0
+ ether_addr_equal(a, b)
@@
expression a,b;
@@
- !ether_addr_equal(a, b) != 0
+ !ether_addr_equal(a, b)
@@
expression a,b;
@@
- ether_addr_equal(a, b) == 0
+ !ether_addr_equal(a, b)
@@
expression a,b;
@@
- ether_addr_equal(a, b) != 0
+ ether_addr_equal(a, b)
@@
expression a,b;
@@
- !!ether_addr_equal(a, b)
+ ether_addr_equal(a, b)
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Joe Perches [Tue, 8 May 2012 18:56:53 +0000 (18:56 +0000)]
mac80211: Convert compare_ether_addr to ether_addr_equal by hand
spatch/coccinelle isn't perfect. It doesn't understand
__aligned(x) and doesn't convert functions it can't parse.
Convert the remaining compare_ether_addr uses.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Joe Perches [Tue, 8 May 2012 18:56:52 +0000 (18:56 +0000)]
mac80211: Convert compare_ether_addr to ether_addr_equal
Use the new bool function ether_addr_equal to add
some clarity and reduce the likelihood for misuse
of compare_ether_addr for sorting.
Done via cocci script:
$ cat compare_ether_addr.cocci
@@
expression a,b;
@@
- !compare_ether_addr(a, b)
+ ether_addr_equal(a, b)
@@
expression a,b;
@@
- compare_ether_addr(a, b)
+ !ether_addr_equal(a, b)
@@
expression a,b;
@@
- !ether_addr_equal(a, b) == 0
+ ether_addr_equal(a, b)
@@
expression a,b;
@@
- !ether_addr_equal(a, b) != 0
+ !ether_addr_equal(a, b)
@@
expression a,b;
@@
- ether_addr_equal(a, b) == 0
+ !ether_addr_equal(a, b)
@@
expression a,b;
@@
- ether_addr_equal(a, b) != 0
+ ether_addr_equal(a, b)
@@
expression a,b;
@@
- !!ether_addr_equal(a, b)
+ ether_addr_equal(a, b)
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Joe Perches [Tue, 8 May 2012 18:56:51 +0000 (18:56 +0000)]
bluetooth: Convert compare_ether_addr to ether_addr_equal
Use the new bool function ether_addr_equal to add
some clarity and reduce the likelihood for misuse
of compare_ether_addr for sorting.
Done via cocci script:
$ cat compare_ether_addr.cocci
@@
expression a,b;
@@
- !compare_ether_addr(a, b)
+ ether_addr_equal(a, b)
@@
expression a,b;
@@
- compare_ether_addr(a, b)
+ !ether_addr_equal(a, b)
@@
expression a,b;
@@
- !ether_addr_equal(a, b) == 0
+ ether_addr_equal(a, b)
@@
expression a,b;
@@
- !ether_addr_equal(a, b) != 0
+ !ether_addr_equal(a, b)
@@
expression a,b;
@@
- ether_addr_equal(a, b) == 0
+ !ether_addr_equal(a, b)
@@
expression a,b;
@@
- ether_addr_equal(a, b) != 0
+ ether_addr_equal(a, b)
@@
expression a,b;
@@
- !!ether_addr_equal(a, b)
+ ether_addr_equal(a, b)
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Joe Perches [Tue, 8 May 2012 18:56:50 +0000 (18:56 +0000)]
atm: Convert compare_ether_addr to ether_addr_equal
Use the new bool function ether_addr_equal to add
some clarity and reduce the likelihood for misuse
of compare_ether_addr for sorting.
Done via cocci script:
$ cat compare_ether_addr.cocci
@@
expression a,b;
@@
- !compare_ether_addr(a, b)
+ ether_addr_equal(a, b)
@@
expression a,b;
@@
- compare_ether_addr(a, b)
+ !ether_addr_equal(a, b)
@@
expression a,b;
@@
- !ether_addr_equal(a, b) == 0
+ ether_addr_equal(a, b)
@@
expression a,b;
@@
- !ether_addr_equal(a, b) != 0
+ !ether_addr_equal(a, b)
@@
expression a,b;
@@
- ether_addr_equal(a, b) == 0
+ !ether_addr_equal(a, b)
@@
expression a,b;
@@
- ether_addr_equal(a, b) != 0
+ ether_addr_equal(a, b)
@@
expression a,b;
@@
- !!ether_addr_equal(a, b)
+ ether_addr_equal(a, b)
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Joe Perches [Tue, 8 May 2012 18:56:49 +0000 (18:56 +0000)]
bridge: Convert compare_ether_addr to ether_addr_equal
Use the new bool function ether_addr_equal to add
some clarity and reduce the likelihood for misuse
of compare_ether_addr for sorting.
Done via cocci script:
$ cat compare_ether_addr.cocci
@@
expression a,b;
@@
- !compare_ether_addr(a, b)
+ ether_addr_equal(a, b)
@@
expression a,b;
@@
- compare_ether_addr(a, b)
+ !ether_addr_equal(a, b)
@@
expression a,b;
@@
- !ether_addr_equal(a, b) == 0
+ ether_addr_equal(a, b)
@@
expression a,b;
@@
- !ether_addr_equal(a, b) != 0
+ !ether_addr_equal(a, b)
@@
expression a,b;
@@
- ether_addr_equal(a, b) == 0
+ !ether_addr_equal(a, b)
@@
expression a,b;
@@
- ether_addr_equal(a, b) != 0
+ ether_addr_equal(a, b)
@@
expression a,b;
@@
- !!ether_addr_equal(a, b)
+ ether_addr_equal(a, b)
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Joe Perches [Tue, 8 May 2012 18:56:48 +0000 (18:56 +0000)]
bridge: netfilter: Convert compare_ether_addr to ether_addr_equal
Use the new bool function ether_addr_equal to add
some clarity and reduce the likelihood for misuse
of compare_ether_addr for sorting.
Done via cocci script:
$ cat compare_ether_addr.cocci
@@
expression a,b;
@@
- !compare_ether_addr(a, b)
+ ether_addr_equal(a, b)
@@
expression a,b;
@@
- compare_ether_addr(a, b)
+ !ether_addr_equal(a, b)
@@
expression a,b;
@@
- !ether_addr_equal(a, b) == 0
+ ether_addr_equal(a, b)
@@
expression a,b;
@@
- !ether_addr_equal(a, b) != 0
+ !ether_addr_equal(a, b)
@@
expression a,b;
@@
- ether_addr_equal(a, b) == 0
+ !ether_addr_equal(a, b)
@@
expression a,b;
@@
- ether_addr_equal(a, b) != 0
+ ether_addr_equal(a, b)
@@
expression a,b;
@@
- !!ether_addr_equal(a, b)
+ ether_addr_equal(a, b)
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Joe Perches [Tue, 8 May 2012 18:56:47 +0000 (18:56 +0000)]
8021q: Convert compare_ether_addr to ether_addr_equal
Use the new bool function ether_addr_equal to add
some clarity and reduce the likelihood for misuse
of compare_ether_addr for sorting.
Done via cocci script:
$ cat compare_ether_addr.cocci
@@
expression a,b;
@@
- !compare_ether_addr(a, b)
+ ether_addr_equal(a, b)
@@
expression a,b;
@@
- compare_ether_addr(a, b)
+ !ether_addr_equal(a, b)
@@
expression a,b;
@@
- !ether_addr_equal(a, b) == 0
+ ether_addr_equal(a, b)
@@
expression a,b;
@@
- !ether_addr_equal(a, b) != 0
+ !ether_addr_equal(a, b)
@@
expression a,b;
@@
- ether_addr_equal(a, b) == 0
+ !ether_addr_equal(a, b)
@@
expression a,b;
@@
- ether_addr_equal(a, b) != 0
+ ether_addr_equal(a, b)
@@
expression a,b;
@@
- !!ether_addr_equal(a, b)
+ ether_addr_equal(a, b)
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Joe Perches [Tue, 8 May 2012 18:56:46 +0000 (18:56 +0000)]
802: Convert compare_ether_addr to ether_addr_equal
Use the new bool function ether_addr_equal to add
some clarity and reduce the likelihood for misuse
of compare_ether_addr for sorting.
Done via cocci script:
$ cat compare_ether_addr.cocci
@@
expression a,b;
@@
- !compare_ether_addr(a, b)
+ ether_addr_equal(a, b)
@@
expression a,b;
@@
- compare_ether_addr(a, b)
+ !ether_addr_equal(a, b)
@@
expression a,b;
@@
- !ether_addr_equal(a, b) == 0
+ ether_addr_equal(a, b)
@@
expression a,b;
@@
- !ether_addr_equal(a, b) != 0
+ !ether_addr_equal(a, b)
@@
expression a,b;
@@
- ether_addr_equal(a, b) == 0
+ !ether_addr_equal(a, b)
@@
expression a,b;
@@
- ether_addr_equal(a, b) != 0
+ ether_addr_equal(a, b)
@@
expression a,b;
@@
- !!ether_addr_equal(a, b)
+ ether_addr_equal(a, b)
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Joe Perches [Tue, 8 May 2012 18:56:45 +0000 (18:56 +0000)]
etherdevice.h: Add ether_addr_equal
Add a boolean function to check if 2 ethernet addresses
are the same.
This is to avoid any confusion about compare_ether_addr
returning an unsigned, and not being able to use the
compare_ether_addr function for sorting ala memcmp.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 9 May 2012 22:07:44 +0000 (18:07 -0400)]
Merge git://1984.lsi.us.es/net-next
Jeff Kirsher [Wed, 9 May 2012 09:23:46 +0000 (02:23 -0700)]
e1000e: Fix merge conflict (net->net-next)
During merge of net to net-next the changes in patch:
e1000e: Fix default interrupt throttle rate not set in NIC HW
got munged in param.c of the e1000e driver. This rectifies the
merge issues.
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Westphal [Mon, 7 May 2012 10:51:45 +0000 (10:51 +0000)]
netfilter: hashlimit: byte-based limit mode
can be used e.g. for ingress traffic policing or
to detect when a host/port consumes more bandwidth than expected.
This is done by optionally making cost to mean
"cost per 16-byte-chunk-of-data" instead of "cost per packet".
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Florian Westphal [Mon, 7 May 2012 10:51:44 +0000 (10:51 +0000)]
netfilter: hashlimit: move rateinfo initialization to helper
followup patch would bloat main match function too much.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Florian Westphal [Mon, 7 May 2012 10:51:43 +0000 (10:51 +0000)]
netfilter: limit, hashlimit: avoid duplicated inline
credit_cap can be set to credit, which avoids inlining user2credits
twice. Also, remove inline keyword and let compiler decide.
old:
684 192 0 876 36c net/netfilter/xt_limit.o
4927 344 32 5303 14b7 net/netfilter/xt_hashlimit.o
now:
668 192 0 860 35c net/netfilter/xt_limit.o
4793 344 32 5169 1431 net/netfilter/xt_hashlimit.o
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Hans Schillstrom [Wed, 2 May 2012 07:49:47 +0000 (07:49 +0000)]
netfilter: add xt_hmark target for hash-based skb marking
The target allows you to create rules in the "raw" and "mangle" tables
which set the skbuff mark by means of hash calculation within a given
range. The nfmark can influence the routing method (see "Use netfilter
MARK value as routing key") and can also be used by other subsystems to
change their behaviour.
[ Part of this patch has been refactorized and modified by Pablo Neira Ayuso ]
Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Hans Schillstrom [Mon, 23 Apr 2012 03:35:26 +0000 (03:35 +0000)]
netfilter: ip6_tables: add flags parameter to ipv6_find_hdr()
This patch adds the flags parameter to ipv6_find_hdr. This flags
allows us to:
* know if this is a fragment.
* stop at the AH header, so the information contained in that header
can be used for some specific packet handling.
This patch also adds the offset parameter for inspection of one
inner IPv6 header that is contained in error messages.
Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Jeff Kirsher [Wed, 9 May 2012 09:15:14 +0000 (02:15 -0700)]
e1000e: Fix merge conflict (net->net-next)
During merge of net to net-next the changes in patch:
e1000e: Fix default interrupt throttle rate not set in NIC HW
got munged in param.c of the e1000e driver. This rectifies the
merge issues.
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
David S. Miller [Tue, 8 May 2012 18:40:21 +0000 (14:40 -0400)]
Merge branch 'master' of git://1984.lsi.us.es/net-next
Pablo Neira Ayuso [Tue, 8 May 2012 17:45:28 +0000 (19:45 +0200)]
netfilter: remove ip_queue support
This patch removes ip_queue support which was marked as obsolete
years ago. The nfnetlink_queue modules provides more advanced
user-space packet queueing mechanism.
This patch also removes capability code included in SELinux that
refers to ip_queue. Otherwise, we break compilation.
Several warning has been sent regarding this to the mailing list
in the past month without anyone rising the hand to stop this
with some strong argument.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Pablo Neira Ayuso [Thu, 3 May 2012 02:17:45 +0000 (02:17 +0000)]
netfilter: nf_conntrack: fix explicit helper attachment and NAT
Explicit helper attachment via the CT target is broken with NAT
if non-standard ports are used. This problem was hidden behind
the automatic helper assignment routine. Thus, it becomes more
noticeable now that we can disable the automatic helper assignment
with Eric Leblond's:
9e8ac5a netfilter: nf_ct_helper: allow to disable automatic helper assignment
Basically, nf_conntrack_alter_reply asks for looking up the helper
up if NAT is enabled. Unfortunately, we don't have the conntrack
template at that point anymore.
Since we don't want to rely on the automatic helper assignment,
we can skip the second look-up and stick to the helper that was
attached by iptables. With the CT target, the user is in full
control of helper attachment, thus, the policy is to trust what
the user explicitly configures via iptables (no automatic magic
anymore).
Interestingly, this bug was hidden by the automatic helper look-up
code. But it can be easily trigger if you attach the helper in
a non-standard port, eg.
iptables -I PREROUTING -t raw -p tcp --dport 8888 \
-j CT --helper ftp
And you disabled the automatic helper assignment.
I added the IPS_HELPER_BIT that allows us to differenciate between
a helper that has been explicitly attached and those that have been
automatically assigned. I didn't come up with a better solution
(having backward compatibility in mind).
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Kelvie Wong [Wed, 2 May 2012 14:39:24 +0000 (14:39 +0000)]
netfilter: nf_ct_expect: partially implement ctnetlink_change_expect
This refreshes the "timeout" attribute in existing expectations if one is
given.
The use case for this would be for userspace helpers to extend the lifetime
of the expectation when requested, as this is not possible right now
without deleting/recreating the expectation.
I use this specifically for forwarding DCERPC traffic through:
DCERPC has a port mapper daemon that chooses a (seemingly) random port for
future traffic to go to. We expect this traffic (with a reasonable
timeout), but sometimes the port mapper will tell the client to continue
using the same port. This allows us to extend the expectation accordingly.
Signed-off-by: Kelvie Wong <kelvie@ieee.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Hans Schillstrom [Mon, 30 Apr 2012 06:13:50 +0000 (08:13 +0200)]
net: export sysctl_[r|w]mem_max symbols needed by ip_vs_sync
To build ip_vs as a module sysctl_rmem_max and sysctl_wmem_max
needs to be exported.
The dependency was added by "ipvs: wakeup master thread" patch.
Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
H Hartley Sweeten [Thu, 26 Apr 2012 18:26:15 +0000 (11:26 -0700)]
ipvs: ip_vs_proto: local functions should not be exposed globally
Functions not referenced outside of a source file should be marked
static to prevent it from being exposed globally.
This quiets the sparse warnings:
warning: symbol '__ipvs_proto_data_get' was not declared. Should it be static?
Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
H Hartley Sweeten [Thu, 26 Apr 2012 18:17:28 +0000 (11:17 -0700)]
ipvs: ip_vs_ftp: local functions should not be exposed globally
Functions not referenced outside of a source file should be marked
static to prevent it from being exposed globally.
This quiets the sparse warnings:
warning: symbol 'ip_vs_ftp_init' was not declared. Should it be static?
Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Pablo Neira Ayuso [Tue, 8 May 2012 07:28:19 +0000 (09:28 +0200)]
ipvs: optimize the use of flags in ip_vs_bind_dest
cp->flags is marked volatile but ip_vs_bind_dest
can safely modify the flags, so save some CPU cycles by
using temp variable.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Pablo Neira Ayuso [Tue, 8 May 2012 17:40:30 +0000 (19:40 +0200)]
ipvs: add support for sync threads
Allow master and backup servers to use many threads
for sync traffic. Add sysctl var "sync_ports" to define the
number of threads. Every thread will use single UDP port,
thread 0 will use the default port 8848 while last thread
will use port 8848+sync_ports-1.
The sync traffic for connections is scheduled to many
master threads based on the cp address but one connection is
always assigned to same thread to avoid reordering of the
sync messages.
Remove ip_vs_sync_switch_mode because this check
for sync mode change is still risky. Instead, check for mode
change under sync_buff_lock.
Make sure the backup socks do not block on reading.
Special thanks to Aleksey Chudov for helping in all tests.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Tested-by: Aleksey Chudov <aleksey.chudov@gmail.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Julian Anastasov [Tue, 24 Apr 2012 20:46:40 +0000 (23:46 +0300)]
ipvs: reduce sync rate with time thresholds
Add two new sysctl vars to control the sync rate with the
main idea to reduce the rate for connection templates because
currently it depends on the packet rate for controlled connections.
This mechanism should be useful also for normal connections
with high traffic.
sync_refresh_period: in seconds, difference in reported connection
timer that triggers new sync message. It can be used to
avoid sync messages for the specified period (or half of
the connection timeout if it is lower) if connection state
is not changed from last sync.
sync_retries: integer, 0..3, defines sync retries with period of
sync_refresh_period/8. Useful to protect against loss of
sync messages.
Allow sysctl_sync_threshold to be used with
sysctl_sync_period=0, so that only single sync message is sent
if sync_refresh_period is also 0.
Add new field "sync_endtime" in connection structure to
hold the reported time when connection expires. The 2 lowest
bits will represent the retry count.
As the sysctl_sync_period now can be 0 use ACCESS_ONCE to
avoid division by zero.
Special thanks to Aleksey Chudov for being patient with me,
for his extensive reports and helping in all tests.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Tested-by: Aleksey Chudov <aleksey.chudov@gmail.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Pablo Neira Ayuso [Tue, 8 May 2012 17:39:49 +0000 (19:39 +0200)]
ipvs: wakeup master thread
High rate of sync messages in master can lead to
overflowing the socket buffer and dropping the messages.
Fixed sleep of 1 second without wakeup events is not suitable
for loaded masters,
Use delayed_work to schedule sending for queued messages
and limit the delay to IPVS_SYNC_SEND_DELAY (20ms). This will
reduce the rate of wakeups but to avoid sending long bursts we
wakeup the master thread after IPVS_SYNC_WAKEUP_RATE (8) messages.
Add hard limit for the queued messages before sending
by using "sync_qlen_max" sysctl var. It defaults to 1/32 of
the memory pages but actually represents number of messages.
It will protect us from allocating large parts of memory
when the sending rate is lower than the queuing rate.
As suggested by Pablo, add new sysctl var
"sync_sock_size" to configure the SNDBUF (master) or
RCVBUF (slave) socket limit. Default value is 0 (preserve
system defaults).
Change the master thread to detect and block on
SNDBUF overflow, so that we do not drop messages when
the socket limit is low but the sync_qlen_max limit is
not reached. On ENOBUFS or other errors just drop the
messages.
Change master thread to enter TASK_INTERRUPTIBLE
state early, so that we do not miss wakeups due to messages or
kthread_should_stop event.
Thanks to Pablo Neira Ayuso for his valuable feedback!
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Julian Anastasov [Tue, 24 Apr 2012 20:46:38 +0000 (23:46 +0300)]
ipvs: always update some of the flags bits in backup
As the goal is to mirror the inactconns/activeconns
counters in the backup server, make sure the cp->flags are
updated even if cp is still not bound to dest. If cp->flags
are not updated ip_vs_bind_dest will rely only on the initial
flags when updating the counters. To avoid mistakes and
complicated checks for protocol state rely only on the
IP_VS_CONN_F_INACTIVE bit when updating the counters.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Tested-by: Aleksey Chudov <aleksey.chudov@gmail.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Julian Anastasov [Tue, 24 Apr 2012 20:46:37 +0000 (23:46 +0300)]
ipvs: fix ip_vs_try_bind_dest to rebind app and transmitter
Initially, when the synced connection is created we
use the forwarding method provided by master but once we
bind to destination it can be changed. As result, we must
update the application and the transmitter.
As ip_vs_try_bind_dest is called always for connections
that require dest binding, there is no need to validate the
cp and dest pointers.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Julian Anastasov [Tue, 24 Apr 2012 20:46:36 +0000 (23:46 +0300)]
ipvs: remove check for IP_VS_CONN_F_SYNC from ip_vs_bind_dest
As the IP_VS_CONN_F_INACTIVE bit is properly set
in cp->flags for all kind of connections we do not need to
add special checks for synced connections when updating
the activeconns/inactconns counters for first time. Now
logic will look just like in ip_vs_unbind_dest.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Julian Anastasov [Tue, 24 Apr 2012 20:46:35 +0000 (23:46 +0300)]
ipvs: ignore IP_VS_CONN_F_NOOUTPUT in backup server
As IP_VS_CONN_F_NOOUTPUT is derived from the
forwarding method we should get it from conn_flags just
like we do it for IP_VS_CONN_F_FWD_MASK bits when binding
to real server.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Sasha Levin [Sat, 14 Apr 2012 16:37:47 +0000 (12:37 -0400)]
ipvs: use GFP_KERNEL allocation where possible
Use GFP_KERNEL instead of GFP_ATOMIC when registering an ipvs protocol.
This is safe since it will always run from a process context.
Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Julian Anastasov [Fri, 13 Apr 2012 13:49:38 +0000 (16:49 +0300)]
ipvs: SH scheduler does not need GFP_ATOMIC allocation
Schedulers are initialized and bound to services only
on commands.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Julian Anastasov [Fri, 13 Apr 2012 13:49:41 +0000 (16:49 +0300)]
ipvs: LBLCR scheduler does not need GFP_ATOMIC allocation on init
Schedulers are initialized and bound to services only
on commands.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Julian Anastasov [Fri, 13 Apr 2012 13:49:42 +0000 (16:49 +0300)]
ipvs: WRR scheduler does not need GFP_ATOMIC allocation
Schedulers are initialized and bound to services only
on commands.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Julian Anastasov [Fri, 13 Apr 2012 13:49:39 +0000 (16:49 +0300)]
ipvs: DH scheduler does not need GFP_ATOMIC allocation
Schedulers are initialized and bound to services only
on commands.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Julian Anastasov [Fri, 13 Apr 2012 13:49:40 +0000 (16:49 +0300)]
ipvs: LBLC scheduler does not need GFP_ATOMIC allocation on init
Schedulers are initialized and bound to services only
on commands.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Julian Anastasov [Fri, 13 Apr 2012 13:49:37 +0000 (16:49 +0300)]
ipvs: timeout tables do not need GFP_ATOMIC allocation
They are called only on initialization.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Pablo Neira Ayuso [Tue, 8 May 2012 17:36:44 +0000 (19:36 +0200)]
netfilter: bridge: optionally set indev to vlan
if net.bridge.bridge-nf-filter-vlan-tagged sysctl is enabled, bridge
netfilter removes the vlan header temporarily and then feeds the packet
to ip(6)tables.
When the new "bridge-nf-pass-vlan-input-device" sysctl is on
(default off), then bridge netfilter will also set the
in-interface to the vlan interface; if such an interface exists.
This is needed to make iptables REDIRECT target work with
"vlan-on-top-of-bridge" setups and to allow use of "iptables -i" to
match the vlan device name.
Also update Documentation with current brnf default settings.
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Bart De Schuymer <bdschuym@pandora.be>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Eric Dumazet [Wed, 18 Apr 2012 04:36:40 +0000 (06:36 +0200)]
netfilter: nf_conntrack: use this_cpu_inc()
this_cpu_inc() is IRQ safe and faster than
local_bh_disable()/__this_cpu_inc()/local_bh_enable(), at least on x86.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Christoph Lameter <cl@linux.com>
Cc: Tejun Heo <tj@kernel.org>
Reviewed-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Eric Leblond [Wed, 18 Apr 2012 09:20:41 +0000 (11:20 +0200)]
netfilter: nf_ct_helper: allow to disable automatic helper assignment
This patch allows you to disable automatic conntrack helper
lookup based on TCP/UDP ports, eg.
echo 0 > /proc/sys/net/netfilter/nf_conntrack_helper
[ Note: flows that already got a helper will keep using it even
if automatic helper assignment has been disabled ]
Once this behaviour has been disabled, you have to explicitly
use the iptables CT target to attach helper to flows.
There are good reasons to stop supporting automatic helper
assignment, for further information, please read:
http://www.netfilter.org/news.html#2012-04-03
This patch also adds one message to inform that automatic helper
assignment is deprecated and it will be removed soon (this is
spotted only once, with the first flow that gets a helper attached
to make it as less annoying as possible).
Signed-off-by: Eric Leblond <eric@regit.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Tony Zelenoff [Thu, 8 Mar 2012 23:35:39 +0000 (23:35 +0000)]
netfilter: nf_ct_ecache: refactor notifier registration
* ret variable initialization removed as useless
* similar code strings concatenated and functions code
flow became more plain
Signed-off-by: Tony Zelenoff <antonz@parallels.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Joe Perches [Tue, 8 May 2012 06:44:40 +0000 (06:44 +0000)]
etherdev.h: Convert int is_<foo>_ether_addr to bool
Make the return value explicitly true or false.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Steve Glendinning [Fri, 4 May 2012 00:57:13 +0000 (00:57 +0000)]
smsc75xx: let EEPROM determine GPIO/LED settings
This patch allows the GPIO/LED settings to be configured by the
EEPROM if present, and only sets the default values (LED outputs
for link/activity) when an EEPROM is not detected.
Signed-off-by: Steve Glendinning <steve.glendinning@shawell.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Steve Glendinning [Fri, 4 May 2012 00:57:12 +0000 (00:57 +0000)]
smsc75xx: eliminate unnecessary phy register read
Only a write is necessary to clear the interrupt status, and we
don't use the value from the preceding read operation. This
patch eliminates the unnecessary read.
Signed-off-by: Steve Glendinning <steve.glendinning@shawell.net>
Signed-off-by: David S. Miller <davem@davemloft.net>