git.openwrt.org Git - openwrt/staging/blogic.git/log

iscsi-target: Fix reservation conflict -EBUSY response handling bug

This patch addresses a iscsi-target specific bug related to reservation conflict
handling in iscsit_handle_scsi_cmd() that has been causing reservation conflicts
to complete and not fail as expected due to incorrect errno checking.  The problem
occured with the change to return -EBUSY from transport_generic_cmd_sequencer() ->
transport_generic_allocate_tasks() failures, that broke iscsit_handle_scsi_cmd()
checking for -EINVAL in order to invoke a non GOOD status response.

This was manifesting itself as data corruption with legacy SPC-2 reservations,
but also effects iscsi-target LUNs with SPC-3 persistent reservations.

This bug was originally introduced in lio-core commit:

commit 03e98c9eb916f3f0868c1dc344dde2a60287ff72
Author: Nicholas Bellinger <nab@linux-iscsi.org>
Date:   Fri Nov 4 02:36:16 2011 -0700

    target: Address legacy PYX_TRANSPORT_* return code breakage

Reported-by: Martin Svec <martin.svec@zoner.cz>
Cc: Martin Svec <martin.svec@zoner.cz>
Cc: Christoph Hellwig <hch@lst.de>
Cc: stable@vger.kernel.org
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

target: Fix compatible reservation handling (CRH=1) with legacy RESERVE/RELEASE

This patch addresses a bug with target_check_scsi2_reservation_conflict()
return checking in target_scsi2_reservation_[reserve,release]() that was
preventing CRH=1 operation from silently succeeding in the two special
cases defined by SPC-3, and not failing with reservation conflict status
when dealing with legacy RESERVE/RELEASE + active SPC-3 PR logic.

Also explictly set cmd->scsi_status = SAM_STAT_RESERVATION_CONFLICT during
the early non reservation holder failure from pr_ops->t10_seq_non_holder()
check in transport_generic_cmd_sequencer() for fabrics that already expect
it to be set.

This bug was originally introduced in mainline commit:

commit eacac00ce5bfde8086cd0615fb53c986f7f970fe
Author: Christoph Hellwig <hch@infradead.org>
Date: Thu Nov 3 17:50:40 2011 -0400

target: split core_scsi2_emulate_crh

Reported-by: Martin Svec <martin.svec@zoner.cz>
Cc: Martin Svec <martin.svec@zoner.cz>
Cc: Christoph Hellwig <hch@lst.de>
Cc: stable@vger.kernel.org
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

target: Fix unsupported WRITE_SAME sense payload

This patch fixes a bug in target-core where unsupported WRITE_SAME ops
from a target_check_write_same_discard() failure was incorrectly
returning CHECK_CONDITION w/ TCM_INVALID_CDB_FIELD sense data.
This was causing some clients to not properly fall back, so go ahead
and use the correct TCM_UNSUPPORTED_SCSI_OPCODE sense for this case.

Reported-by: Martin Svec <martin.svec@zoner.cz>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

iscsi: use IP_FREEBIND socket option

Use IP_FREEBIND socket option so that iscsi portal configuration with
explicit IP addresses can happen during boot, before network interfaces
have been assigned IPs.

This is especially important on systemd based Linux boxes where system
boot happens asynchronously and non-trivial configuration must be done
to get targetcli.service to start synchronously after the network is
configured.

Reference:
http://lists.fedoraproject.org/pipermail/devel/2011-October/158025.html

Signed-off-by: Dax Kelson <dkelson@gurulabs.com>
Cc: "Nicholas A. Bellinger" <nab@linux-iscsi.org>
Cc: "Andy Grover" <agrover@redhat.com>
Cc: "Lennart Poettering" <lennart@poettering.net>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

iblock: fix handling of large requests

Requesting to many bvecs upsets bio_alloc_bioset, so limit the number we ask
for to the amount it can handle.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

target: handle empty string writes in sysfs

These are root only and we're not likely to hit the problem in practise,
but it makes the static checkers happy.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

iscsi_target: in_aton needs linux/inet.h

Fixes this error after a recent nfs cleanup:

drivers/target/iscsi/iscsi_target_configfs.c: In function 'lio_target_call_addnptotpg':
drivers/target/iscsi/iscsi_target_configfs.c:214:3: error: implicit declaration of function 'in6_pton' [-Werror=implicit-function-declaration]
drivers/target/iscsi/iscsi_target_configfs.c:239:3: error: implicit declaration of function 'in_aton' [-Werror=implicit-function-declaration]

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

target: Fix iblock se_dev_attrib.unmap_granularity

The block layer keeps q->limits.discard_granularity in bytes, but iblock
(and the SCSI Block Limits VPD page) keep unmap_granularity in blocks.
Report the correct value when exporting block devices by dividing to
convert bytes to blocks.

Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

target: Fix target_submit_cmd() exception handling

This patch fixes a bug in target_submit_cmd() where the failure path
for transport_generic_allocate_tasks() made a direct call to
transport_send_check_condition_and_sense() and not calling the
final target_put_sess_cmd() release callback.

For transport_generic_allocate_tasks() failures, use the proper call to
transport_generic_request_failure() to handle kref_put() along
with potential internal queue full response processing.

It also makes transport_lookup_cmd_lun() failures in
target_submit_cmd() use transport_send_check_condition_and_sense() and
target_put_sess_cmd() directly to avoid se_cmd->se_dev reference in
transport_generic_request_failure() handling.

Finally it drops the out_check_cond: label and use direct reference for
allocate task failures, and per-se_device queue_full handling is
currently not supported for transport_lookup_cmd_lun() failure
descriptors due to se_device dependency.

Reported-by: Roland Dreier <roland@purestorage.com>
Cc: Roland Dreier <roland@purestorage.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

target: Change target_submit_cmd() to return void

Retval not very useful, and may even be harmful. Once submitted, fabrics
should expect a sense error if anything goes wrong. All fabrics checking
of this retval are useless or broken:

fc checks it just to emit more debug output.
ib_srpt trickles retval up, then it is ignored.
qla2xxx trickles it up, which then causes a bug because the abort goto
in qla_target.c thinks cmd hasn't been sent to target.

Just returning nothing is best.

Signed-off-by: Andy Grover <agrover@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

target: accept REQUEST_SENSE with 18bytes

WindowsXP+BOT issues a MODE_SENSE request with page 0x1c which is not
suppoerted by target. Target rejects that command with
TCM_INVALID_CDB_FIELD, so far so good. On BOT I can't send the SENSE
response back, instead I can only reply that an error occured. The next
thing happens is a REQUEST_SENSE request with 18 bytes length. Since the
check here is more than 18 bytes I have to NACK that request as well.
This is not really required: We check for some additional room, but we
never use it. The additional length is set to 0xa so the total length is
0xa + 8 = 18 which is fine with my 18 bytes.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

target: Fail INQUIRY commands with EVPD==0 but PAGE CODE!=0

My draft of SPC-4 says:

    If the PAGE CODE field is not set to zero when the EVPD bit is set
    to zero, the command shall be terminated with CHECK CONDITION
    status, with the sense key set to ILLEGAL REQUEST, and the
    additional sense code set to INVALID FIELD IN CDB.

Signed-off-by: Roland Dreier <roland@purestorage.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

target: Return correct ASC for unimplemented VPD pages

My draft of SPC-4 says:

    If the device server does not implement the requested vital product
    data page, then the command shall be terminated with CHECK CONDITION
    status, with the sense key set to ILLEGAL REQUEST, and the
    additional sense code set to INVALID FIELD IN CDB.

Signed-off-by: Roland Dreier <roland@purestorage.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

iscsi-target: Fix discovery with INADDR_ANY and IN6ADDR_ANY_INIT

This patch addresses a bug with sendtargets discovery where INADDR_ANY (0.0.0.0)
+ IN6ADDR_ANY_INIT ([0:0:0:0:0:0:0:0]) network portals where incorrectly being
reported back to initiators instead of the address of the connecting interface.
To address this, save local socket ->getname() output during iscsi login setup,
and makes iscsit_build_sendtargets_response() return these TargetAddress keys
when INADDR_ANY or IN6ADDR_ANY_INIT portals are in use.

Reported-by: Dax Kelson <dkelson@gurulabs.com>
Reported-by: Andy Grover <agrover@redhat.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: <stable@vger.kernel.org>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

target: Allow control CDBs with data > 1 page

We need to handle >1 page control cdbs, so extend the code to do a vmap
if bigger than 1 page. It seems like kmap() is still preferable if just
a page, fewer TLB shootdowns(?), so keep using that when possible.

Rename function pair for their new scope.

Signed-off-by: Andy Grover <agrover@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

iscsi-target: Fix up a few assignments

A statement such as
struct iscsi_node_attrib *na = na = iscsit_tpg_get_node_attrib(sess);
has undefined behaviour since there are two assignments to 'na', strictly
speaking (the order in which side-effects from the assignments take place
is undefined since there's no intervening sequence point), and it looks
unintentional in any case.

Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

iscsi-target: make one-bit bitfields unsigned

Signed bitfields are a problem because instead of being 1 or 0 like
you'd expect they are 0 and -1. It doesn't cause a problem in this case
but sparse complains:

drivers/target/iscsi/iscsi_target_core.h:564:56: error: dubious one-bit
signed bitfield

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

iscsi-target: Fix double list_add with iscsit_alloc_buffs reject

This patch fixes a bug where the iscsit_add_reject_from_cmd() call
from a failure to iscsit_alloc_buffs() was incorrectly passing
add_to_conn=1 and causing a double list_add after iscsi_cmd->i_list
had already been added in iscsit_handle_scsi_cmd().

Cc: <stable@vger.kernel.org>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

iscsi-target: Fix reject release handling in iscsit_free_cmd()

This patch addresses a bug where iscsit_free_cmd() was incorrectly calling
iscsit_release_cmd() for ISCSI_OP_REJECT because iscsi_add_reject*() will
overwrite the original iscsi_cmd->iscsi_opcode assignment.  This bug was
introduced with the following commit:

commit 0be67f2ed8f577d2c72d917928394c5885fa9134
Author: Nicholas Bellinger <nab@linux-iscsi.org>
Date:   Sun Oct 9 01:48:14 2011 -0700

    iscsi-target: Remove SCF_SE_LUN_CMD flag abuses

and was manifesting itself as list corruption with the following:

[  131.191092] ------------[ cut here ]------------
[  131.191092] WARNING: at lib/list_debug.c:53 __list_del_entry+0x8d/0x98()
[  131.191092] Hardware name: VMware Virtual Platform
[  131.191092] list_del corruption. prev->next should be ffff880022d3c100, but was 6b6b6b6b6b6b6b6b
[  131.191092] Modules linked in: tcm_vhost ib_srpt ib_cm ib_sa ib_mad ib_core tcm_qla2xxx qla2xxx tcm_loop tcm_fc libfc scsi_transport_fc crc32c iscsi_target_mod target_core_stgt scsi_tgt target_core_pscsi target_core_file target_core_iblock target_core_mod configfs ipv6 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi sr_mod cdrom sd_mod e1000 ata_piix libata mptspi mptscsih mptbase [last unloaded: scsi_wait_scan]
[  131.191092] Pid: 2250, comm: iscsi_ttx Tainted: G        W    3.2.0-rc4+ #42
[  131.191092] Call Trace:
[  131.191092]  [<ffffffff8103b553>] warn_slowpath_common+0x80/0x98
[  131.191092]  [<ffffffff8103b5ff>] warn_slowpath_fmt+0x41/0x43
[  131.191092]  [<ffffffff811d0279>] __list_del_entry+0x8d/0x98
[  131.191092]  [<ffffffffa01395c9>] transport_lun_remove_cmd+0x9b/0xb7 [target_core_mod]
[  131.191092]  [<ffffffffa013a55c>] transport_generic_free_cmd+0x5d/0x71 [target_core_mod]
[  131.191092]  [<ffffffffa01a012b>] iscsit_free_cmd+0x1e/0x27 [iscsi_target_mod]
[  131.191092]  [<ffffffffa01a13be>] iscsit_close_connection+0x14d/0x5b2 [iscsi_target_mod]
[  131.191092]  [<ffffffffa0196a0c>] iscsit_take_action_for_connection_exit+0xdb/0xe0 [iscsi_target_mod]
[  131.191092]  [<ffffffffa01a55d4>] iscsi_target_tx_thread+0x15cb/0x1608 [iscsi_target_mod]
[  131.191092]  [<ffffffff8103609a>] ? check_preempt_wakeup+0x121/0x185
[  131.191092]  [<ffffffff81030801>] ? __dequeue_entity+0x2e/0x33
[  131.191092]  [<ffffffffa01a4009>] ? iscsit_send_text_rsp+0x25f/0x25f [iscsi_target_mod]
[  131.191092]  [<ffffffffa01a4009>] ? iscsit_send_text_rsp+0x25f/0x25f [iscsi_target_mod]
[  131.191092]  [<ffffffff8138f706>] ? schedule+0x55/0x57
[  131.191092]  [<ffffffff81056c7d>] kthread+0x7d/0x85
[  131.191092]  [<ffffffff81399534>] kernel_thread_helper+0x4/0x10
[  131.191092]  [<ffffffff81056c00>] ? kthread_worker_fn+0x16d/0x16d
[  131.191092]  [<ffffffff81399530>] ? gs_change+0x13/0x13

Reported-by: <jrepac@yahoo.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

target: fix return code of core_tpg_.*_lun

- core_tpg_pre_addlun()
  returns always ERR_PTR() or the pointer, never NULL. The additional
  check for NULL in core_dev_add_lun() is not required.

- core_tpg_pre_dellun()
  returns always ERR_PTR() or the pointer, never NULL. The check for NULL
  in core_dev_del_lun() is wrong. The third argument (int *) is never
  used, remove it.

- core_dev_add_lun()
  returns always NULL or the pointer, never ERR_PTR. The check for
  IS_ERR() is not required.

(nab: Convert core_dev_add_lun() use err.h macros for failure
handling to be consistent with the rest of target_core_fabric_configfs.c
callers)

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

target: use save/restore lock primitive in core_dec_lacl_count()

It may happen that uasp will free the request in irq conntext, the
callchain:

uasp_cmd_release() -> transport_generic_free_cmd() -> core_dec_lacl_count()

where the last function enables the IRQ. Those irqs are re-disabled
later (due to the spin.*irq_restore) but in between we could get hurt.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

target: avoid multiple outputs in scsi_dump_inquiry()

The multiple calls to pr_debug() each with one letter results in a new
line. This patch merges the multiple requests into one call per line
so we don't have the multiple line cuts.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

target: Add workaround for zero-length control CDB handling

This patch adds a work-around for handling zero allocation length
control CDBs (type SCF_SCSI_CONTROL_SG_IO_CDB) that was causing an
OOPs with the following raw calls:

   # sg_raw -v /dev/sdd 3 0 0 0 0 0
   # sg_raw -v /dev/sdd 0x1a 0 1 0 0 0

This patch will follow existing zero-length handling for data I/O
and silently return with GOOD status.  This addresses the zero length
issue, but the proper long-term resolution for handling arbitary
allocation lengths will be to refactor out data-phase handling in
individual CDB emulation logic within target_core_cdb.c

Reported-by: Roland Dreier <roland@purestorage.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

target: Correct sense key for INVALID FIELD IN {PARAMETER LIST,CDB}

According to SPC-4, the sense key for commands that are failed with
INVALID FIELD IN PARAMETER LIST and INVALID FIELD IN CDB should be
ILLEGAL REQUEST (5h) rather than ABORTED COMMAND (Bh).  Without this
patch, a tcm_loop LUN incorrectly gives:

    # sg_raw -r 1 -v /dev/sda 3 1 0 0 ff 0
    Sense Information:
     Fixed format, current;  Sense key: Aborted Command
     Additional sense: Invalid field in cdb
     Raw sense data (in hex):
            70 00 0b 00 00 00 00 0a  00 00 00 00 24 00 00 00
            00 00

While a real SCSI disk gives:

    Sense Information:
     Fixed format, current;  Sense key: Illegal Request
     Additional sense: Invalid field in cdb
     Raw sense data (in hex):
            70 00 05 00 00 00 00 18  00 00 00 00 24 00 00 00
            00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00

with the main point being that the real disk gives a sense key of
ILLEGAL REQUEST (5h).

Signed-off-by: Roland Dreier <roland@purestorage.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

target: Don't zero pages used for data buffers

Doing alloc_page(GFP_KERNEL | __GFP_ZERO) to get pages used for data
buffers wastes a lot of CPU clearing pages that will be quickly be
overwritten by the actual data. However, for emulated control
commands such as INQUIRY and so on, the code does assume that the
buffer is zeroed.

To avoid this CPU overhead, skip the __GFP_ZERO for commands that are
actually moving data, ie cmds that have SCF_SCSI_DATA_SG_IO_CDB set.

Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

target: Allow PERSISTENT RESERVE IN for non-reservation holder

Initiators that aren't the active reservation holder should be able to
do a PERSISTENT RESERVE IN command in all cases, so add it to the list
of allowed CDBs in core_scsi3_pr_seq_non_holder().

Signed-off-by: Marco Sanvido <marco@purestorage.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

target: Use correct preempted registration sense code

The comments quote the right parts of the spec:

   * d) Establish a unit attention condition for the
   *    initiator port associated with every I_T nexus
   *    that lost its registration other than the I_T
   *    nexus on which the PERSISTENT RESERVE OUT command
   *    was received, with the additional sense code set
   *    to REGISTRATIONS PREEMPTED.

and

   * e) Establish a unit attention condition for the initiator
   *    port associated with every I_T nexus that lost its
   *    persistent reservation and/or registration, with the
   *    additional sense code set to REGISTRATIONS PREEMPTED;

but the actual code accidentally uses ASCQ_2AH_RESERVATIONS_PREEMPTED
instead of ASCQ_2AH_REGISTRATIONS_PREEMPTED.  Fix this.

Signed-off-by: Marco Sanvido <marco@purestorage.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

target: don't allocate bio headroom in iblock

We never embedd the bio into a structure, so there is no need to allocate
64 bytes of headroom per bio.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

target: Set additional sense length field in sense data

The target code was not setting the additional sense length field in the
sense data it returned, which meant that at least the Linux stack
ignored the ASC/ASCQ fields.  For example, without this patch, on a
tcm_loop device:

    # sg_raw -v /dev/sda 2 0 0 0 0 0

gives

        cdb to send: 02 00 00 00 00 00
    SCSI Status: Check Condition

    Sense Information:
     Fixed format, current;  Sense key: Illegal Request
      Raw sense data (in hex):
            70 00 05 00 00 00 00 00

while after the patch we correctly get the following (which matches what
a regular disk returns):

        cdb to send: 02 00 00 00 00 00
    SCSI Status: Check Condition

    Sense Information:
     Fixed format, current;  Sense key: Illegal Request
     Additional sense: Invalid command operation code
     Raw sense data (in hex):
            70 00 05 00 00 00 00 0a  00 00 00 00 20 00 00 00
            00 00

Signed-off-by: Roland Dreier <roland@purestorage.com>
Cc: stable@kernel.org
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

target: Remove legacy device status check from transport_execute_tasks

This patch removes a legacy se_dev_check_online() check from within
transport_execute_tasks() that should no longer be necessary as
transport_lookup_cmd_lun() is already making this call.

Using transport_cmd_check_stop() from transport_execute_tasks() should
already be checking per se_cmd context for each descriptor upon active
I/O shutdown, so no need to acquire dev->dev_status_lock again while
executing se_task submission.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Roland Dreier <roland@purestorage.com>
Cc: Joern Engel <joern@logfs.org>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

target: Remove __transport_execute_tasks() for each processing context

This patch removes the original usage of __transport_execute_tasks() ahead
of every transport_get_cmd_from_queue() call in transport_processing_thread().
This helps reduce se_device->execute_task_lock contention between qla2xxx wq
with target_submit_cmd() for READs and transport_processing_thread()
context servicing WRITEs with full payloads for I/O submission.

It also adds a __transport_execute_tasks() to kick the task queue again
without a *se_cmd descriptor with existing queue full logic, but this may
end up not being necessary.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Roland Dreier <roland@purestorage.com>
Cc: Joern Engel <joern@logfs.org>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

target: Remove extra se_device->execute_task_lock access in fast path

This patch makes __transport_execute_tasks() perform the addition of
tasks to dev->execute_task_list via __transport_add_tasks_from_cmd()
while holding dev->execute_task_lock during normal I/O fast path
submission.

It effectively removes the unnecessary re-acquire of dev->execute_task_lock
during transport_execute_tasks() -> transport_add_tasks_from_cmd() ahead
of calling __transport_execute_tasks() to queue tasks for the passed
*se_cmd descriptor.

(v2: Re-add goto check_depth usage for multi-task submission for now..)

Cc: Christoph Hellwig <hch@lst.de>
Cc: Roland Dreier <roland@purestorage.com>
Cc: Joern Engel <joern@logfs.org>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

target: Drop se_device TCQ queue_depth usage from I/O path

Historically, pSCSI devices have been the ones that required target-core
to enforce a per se_device->depth_left. This patch changes target-core
to no longer (by default) enforce a per se_device->depth_left or sleep in
transport_tcq_window_closed() when we out of queue slots for all backend
export cases.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Roland Dreier <roland@purestorage.com>
Cc: Joern Engel <joern@logfs.org>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

target: Fix possible NULL pointer with __transport_execute_tasks

This patch makes __transport_execute_tasks() use a local *se_dev
reference to prevent direct se_cmd->se_dev access after
transport_cmd_check_stop() -> transport_add_tasks_from_cmd()
has been called, as in the current implementation we can expect
__transport_execute_tasks() may be called from another context
that may have already completed the I/O.

Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

target: Remove TFO->check_release_cmd() fabric API caller

Remove the now unused target_core_fabric_ops->check_release_cmd() as
target_core handles this directly for se_cmd->cmd_kref objects now.

Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

tcm_fc: Convert ft_send_work to use target_submit_cmd

This patch converts the main ft_send_work() I/O path to use
target_submit_cmd() with a single se_cmd->cmd_kref reference
that is released via the existing ft_check_stop_free() response
path callback.

It also makes ft_send_tm() use transport_init_se_cmd() and
target_get_sess_cmd() to also use single se_cmd->cmd_kref
reference.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Kiran Patil <kiran.patil@intel.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

target: Add target_submit_cmd() for process context fabric submission

This patch adds a target_submit_cmd() caller that can be used by fabrics
to submit an uninitialized se_cmd descriptor to an struct se_session +
unpacked_lun from workqueue process context.  This call will invoke the
following steps:

- transport_init_se_cmd() to setup se_cmd specific pointers
- Obtain se_cmd->cmd_kref references with target_get_sess_cmd()
- set se_cmd->t_tasks_bidi
- transport_lookup_cmd_lun() to setup struct se_cmd->se_lun from
  the passed unpacked_lun
- transport_generic_allocate_tasks() to setup the passed *cdb, and
- transport_handle_cdb_direct() handle READ dispatch or WRITE
  ready-to-transfer callback to fabric

v2 changes from hch feedback:

*) Add target_sc_flags_table for target_submit_cmd flags
*) Rename bidi parameter to flags, add TARGET_SCF_BIDI_OP
*) Convert checks to BUG_ON
*) Add out_check_cond for transport_send_check_condition_and_sense
   usage

v3 changes:

*) Add TARGET_SCF_ACK_KREF for target_submit_cmd into
   target_get_sess_cmd to determine when the fabric caller is expecting
   a second kref_put() from fabric packet acknowledgement.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Roland Dreier <roland@purestorage.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

target: Make target_put_sess_cmd use target_release_cmd_kref

This patch moves target_put_sess_cmd() to use a se_cmd->cmd_kref
callback target_release_cmd_kref when performing driver release of
fabric->se_cmd descriptor memory. It sets the default cmd_kref
count value to '2' within target_get_sess_cmd() setup, and
currently assumes TFO->check_stop_free() usage.

It drops se_tfo->check_release_cmd() usage in the main
transport_release_cmd codepath.

Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

target: Set response format in INQUIRY response

Current SCSI specs say that the "response format" field in the standard
INQUIRY response should be set to 2, and all the real SCSI devices I
have do put 2 here. So let's do that too.

Signed-off-by: Roland Dreier <roland@purestorage.com>
Cc: stable@kernel.org
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

target: tcm_mod_builder: small fixups

This includes:
- remove on _ in "__NAMELEN" in $fabric _make_tport
- target_fabric_configfs_init() returns an error pointer and not NULL
anymore. Consider that.
- replace (!(var_name)) with (!var_name). The extra () are not required
- remove #ifdef MODULE. If the code is builtin it needs an init function
or the code is useless
- put exit/clean functions into __exit

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

Documentation/target: Fix tcm_mod_builder.py build breakage

This patch fixes TFO->release_cmd() and removes legacy pack_lun() usage
and new_cmd_failure when generating new TCM fabric skeleton from the
tcm_mod_builder.py script.

Reported-by: Stefan Bergstrand <stefan.bergstrand@sdsab.se>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

target: remove overagressive ____cacheline_aligned annoations

If we want dynamically allocated objects to be cacheline aligned we need
to tell that to the slab allocator by using the proper flags and not
by liberally sprinkling annotations onto all structures.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

tcm_loop: bump max_sectors

There is not reason to artifically limit max_sectors in tcm_loop, set
it to UINT_MAX to allow stressing the large I/O handling in the target
core using the loopback driver. Also remove various superflous defines
hiding the values set in the host template.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

target/configs: remove trailing newline from udev_path and alias

This patch strips the trailing newline from backend device udev_path and
alias attributes.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

iscsi-target: fix chap identifier simple_strtoul usage

This patch makes chap_server_compute_md5() use proper unsigned long
usage for the CHAP_I (identifier) and check for values beyond 255 as
per RFC-1994.

Reported-by: Joern Engel <joern@logfs.org>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

target: remove useless casts

A reader should spend an extra moment whenever noticing a cast,
because either something special is going on that deserves extra
attention or, as is all too often the case, the code is wrong.

These casts, afaics, have all been useless. They cast a foo* to a
foo*, cast a void* to the assigned type, cast a foo* to void*, before
assigning it to a void* variable, etc.

In a few cases I also removed an additional &...[0], which is equally
useless.

Lastly I added three FIXMEs where, to the best of my judgement, the
code appears to have a bug. It would be good if someone could check
these.

Signed-off-by: Joern Engel <joern@logfs.org>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

target: simplify target_check_cdb_and_preempt

- rename to target_check_cdb_and_preempt
- use non-safe list_for_each_entry
- move common check into callee (simplifying callers)

Signed-off-by: Joern Engel <joern@logfs.org>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

target: Move core_scsi3_check_cdb_abort_and_preempt

And make it static afterwards.

Signed-off-by: Joern Engel <joern@logfs.org>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

target: use \n as a separator for configuration

The command
| echo rd_pages=32768 > ramdisk/control

Does not work because it writes "rd_pages=32768\n" and the parser which
matches for "rd_pages=%d" does not recognize it due to the \n. One way
of fixing this would be using "echo -n" instead.
This patch adds \n to the list of separators so we don't have to use the
-n argument which I find is more convinient.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

target: make the se_task task_state_active a normal bool

There is no need to make task_state_active an atomic_t given that it is
always set under the execute_task_lock so we can make it a simple bool.
Also rename it to t_state_active to be closer to the list it guards,
and make sure all checks before the list addion/removal actually happen
under execute_task_lock.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

target: remove the se_task task_error_status field

We only reach transport_complete_task once per task, so the test and set on
task_error_status is never going to have an effect.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

target: fold se_task.task_sense into task_flags

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

target: header reshuffle, part2

This reorganized the headers under include/target into:

- target_core_base.h stays as is with all target-wide data stuctures and defines
- target_core_backend.h contains the whole interface to I/O backends
- target_core_fabric.h contains the whole interface to fabric modules

Except for those only the various configfs macro headers stay around.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

target: reshuffle headers

Create a new headers, drivers/target/target_core_internal.h that is supposed
to hold all target_core-internal prototypes. Move all non-exported includes
from include/target to it, and merge the smaller prototype-only includes
inside drivers/target into it as well. Mark functions that were found to
not be called outside their implementation file static.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

Linux 3.2-rc5

Merge git://git.samba.org/sfrench/cifs-2.6

* git://git.samba.org/sfrench/cifs-2.6:
  cifs: check for NULL last_entry before calling cifs_save_resume_key
  cifs: attempt to freeze while looping on a receive attempt
  cifs: Fix sparse warning when calling cifs_strtoUCS
  CIFS: Add descriptions to the brlock cache functions

Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, efi: Calling __pa() with an ioremap()ed address is invalid
  x86, hpet: Immediately disable HPET timer 1 if rtc irq is masked
  x86/intel_mid: Kconfig select fix
  x86/intel_mid: Fix the Kconfig for MID selection

Merge branch 'spi/for-3.2' of git://git.pengutronix.de/git/wsa/linux-2.6

* 'spi/for-3.2' of git://git.pengutronix.de/git/wsa/linux-2.6:
  spi/gpio: fix section mismatch warning
  spi/fsl-espi: disable CONFIG_SPI_FSL_ESPI=m build
  spi/nuc900: Include linux/module.h
  spi/ath79: fix compile error due to missing include

Merge branch 'for-linus' of git://neil.brown.name/md

* 'for-linus' of git://neil.brown.name/md:
  md: raid5 crash during degradation
  md/raid5: never wait for bad-block acks on failed device.
  md: ensure new badblocks are handled promptly.
  md: bad blocks shouldn't cause a Blocked status on a Faulty device.
  md: take a reference to mddev during sysfs access.
  md: refine interpretation of "hold_active == UNTIL_IOCTL".
  md/lock: ensure updates to page_attrs are properly locked.

Merge git://git./linux/kernel/git/cmetcalf/linux-tile

* git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile:
  arch/tile: use new generic {enable,disable}_percpu_irq() routines
  drivers/net/ethernet/tile: use skb_frag_page() API
  asm-generic/unistd.h: support new process_vm_{readv,write} syscalls
  arch/tile: fix double-free bug in homecache_free_pages()
  arch/tile: add a few #includes and an EXPORT to catch up with kernel changes.

Merge branch 'iommu/fixes' of git://git./linux/kernel/git/joro/iommu

* 'iommu/fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
  MAINTAINERS: Update amd-iommu F: patterns
  iommu/amd: Fix typo in kernel-parameters.txt
  iommu/msm: Fix compile error in mach-msm/devices-iommu.c
  Fix comparison using wrong pointer variable in dma debug code

Merge branch 'for-linus' of git://git./linux/kernel/git/tiwai/sound

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: hda/realtek - Fix lost speaker volume controls
  ALSA: hda/realtek - Create "Bass Speaker" for two speaker pins
  ALSA: hda/realtek - Don't create extra controls with channel suffix
  ALSA: hda - Fix remaining VREF mute-LED NID check in post-3.1 changes
  ALSA: hda - Fix GPIO LED setup for IDT 92HD75 codecs
  ASoC: Provide a more complete DMA driver stub
  ASoC: Remove references to corgi and spitz from machine driver document
  ASoC: Make SND_SOC_MX27VIS_AIC32X4 depend on I2C
  ASoC: Fix dependency for SND_SOC_RAUMFELD and SND_PXA2XX_SOC_HX4700
  ASoC: uda1380: Return proper error in uda1380_modinit failure path
  ASoC: kirkwood: Make SND_KIRKWOOD_SOC_OPENRD and SND_KIRKWOOD_SOC_T5325 depend on I2C
  ASoC: Mark WM8994 ADC muxes as virtual
  ALSA: hda/realtek - Fix Oops in alc_mux_select()
  ALSA: sis7019 - give slow codecs more time to reset

Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf: Do no try to schedule task events if there are none
  lockdep, kmemcheck: Annotate ->lock in lockdep_init_map()
  perf header: Use event_name() to get an event name
  perf stat: Failure with "Operation not supported"

sys_getppid: add missing rcu_dereference

In order to safely dereference current->real_parent inside an
rcu_read_lock, we need an rcu_dereference.

Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

rapidio/tsi721: modify PCIe capability settings

Modify initialization of PCIe capability registers in Tsi721 mport driver:
- change Completion Timeout value to avoid unexpected data transfer
aborts during intensive traffic.
- replace hardcoded offset of PCIe capability block by making it use the
common function.

This patch is applicable to kernel versions starting from 3.2-rc1.

Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

rapidio/tsi721: fix mailbox resource reporting

Bug fix for Tsi721 RapidIO mport driver: Tsi721 supports four RapidIO
mailboxes (MBOX0 - MBOX3) as defined by RapidIO specification. Mailbox
resources has to be properly reported to allow use of all available
mailboxes (initial version reports only MBOX0).

This patch is applicable to kernel versions staring from 3.2-rc1.

Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

rapidio/tsi721: switch to dma_zalloc_coherent

Replace the pair dma_alloc_coherent()+memset() with the new
dma_zalloc_coherent() added by Andrew Morton for kernel version 3.2

Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

procfs: do not overflow get_{idle,iowait}_time for nohz

Since commit a25cac5198d4 ("proc: Consider NO_HZ when printing idle and
iowait times") we are reporting idle/io_wait time also while a CPU is
tickless.  We rely on get_{idle,iowait}_time functions to retrieve
proper data.

These functions, however, use usecs_to_cputime to translate micro
seconds time to cputime64_t.  This is just an alias to usecs_to_jiffies
which reduces the data type from u64 to unsigned int and also checks
whether the given parameter overflows jiffies_to_usecs(MAX_JIFFY_OFFSET)
and returns MAX_JIFFY_OFFSET in that case.

When we overflow depends on CONFIG_HZ but especially for CONFIG_HZ_300
it is quite low (1431649781) so we are getting MAX_JIFFY_OFFSET for
>3000s! until we overflow unsigned int.  Just for reference
CONFIG_HZ_100 has an overflow window around 20s, CONFIG_HZ_250 ~8s and
CONFIG_HZ_1000 ~2s.

This results in a bug when people saw [h]top going mad reporting 100%
CPU usage even though there was basically no CPU load.  The reason was
simply that /proc/stat stopped reporting idle/io_wait changes (and
reported MAX_JIFFY_OFFSET) and so the only change happening was for user
system time.

Let's use nsecs_to_jiffies64 instead which doesn't reduce the precision
to 32b type and it is much more appropriate for cumulative time values
(unlike usecs_to_jiffies which intended for timeout calculations).

Signed-off-by: Michal Hocko <mhocko@suse.cz>
Tested-by: Artem S. Tashkinov <t.artem@mailcity.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm: vmalloc: check for page allocation failure before vmlist insertion

Commit f5252e00 ("mm: avoid null pointer access in vm_struct via
/proc/vmallocinfo") adds newly allocated vm_structs to the vmlist after
it is fully initialised.  Unfortunately, it did not check that
__vmalloc_area_node() successfully populated the area.  In the event of
allocation failure, the vmalloc area is freed but the pointer to freed
memory is inserted into the vmlist leading to a a crash later in
get_vmalloc_info().

This patch adds a check for ____vmalloc_area_node() failure within
__vmalloc_node_range.  It does not use "goto fail" as in the previous
error path as a warning was already displayed by __vmalloc_area_node()
before it called vfree in its failure path.

Credit goes to Luciano Chavez for doing all the real work of identifying
exactly where the problem was.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Reported-by: Luciano Chavez <lnx1138@linux.vnet.ibm.com>
Tested-by: Luciano Chavez <lnx1138@linux.vnet.ibm.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: <stable@vger.kernel.org> [3.1.x+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm: Ensure that pfn_valid() is called once per pageblock when reserving pageblocks

setup_zone_migrate_reserve() expects that zone->start_pfn starts at
pageblock_nr_pages aligned pfn otherwise we could access beyond an
existing memblock resulting in the following panic if
CONFIG_HOLES_IN_ZONE is not configured and we do not check pfn_valid:

  IP: [<c02d331d>] setup_zone_migrate_reserve+0xcd/0x180
  *pdpt = 0000000000000000 *pde = f000ff53f000ff53
  Oops: 0000 [#1] SMP
  Pid: 1, comm: swapper Not tainted 3.0.7-0.7-pae #1 VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform
  EIP: 0060:[<c02d331d>] EFLAGS: 00010006 CPU: 0
  EIP is at setup_zone_migrate_reserve+0xcd/0x180
  EAX: 000c0000 EBX: f5801fc0 ECX: 000c0000 EDX: 00000000
  ESI: 000c01fe EDI: 000c01fe EBP: 00140000 ESP: f2475f58
  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
  Process swapper (pid: 1, ti=f2474000 task=f2472cd0 task.ti=f2474000)
  Call Trace:
  [<c02d389c>] __setup_per_zone_wmarks+0xec/0x160
  [<c02d3a1f>] setup_per_zone_wmarks+0xf/0x20
  [<c08a771c>] init_per_zone_wmark_min+0x27/0x86
  [<c020111b>] do_one_initcall+0x2b/0x160
  [<c086639d>] kernel_init+0xbe/0x157
  [<c05cae26>] kernel_thread_helper+0x6/0xd
  Code: a5 39 f5 89 f7 0f 46 fd 39 cf 76 40 8b 03 f6 c4 08 74 32 eb 91 90 89 c8 c1 e8 0e 0f be 80 80 2f 86 c0 8b 14 85 60 2f 86 c0 89 c8 <2b> 82 b4 12 00 00 c1 e0 05 03 82 ac 12 00 00 8b 00 f6 c4 08 0f
  EIP: [<c02d331d>] setup_zone_migrate_reserve+0xcd/0x180 SS:ESP 0068:f2475f58
  CR2: 00000000000012b4

We crashed in pageblock_is_reserved() when accessing pfn 0xc0000 because
highstart_pfn = 0x36ffe.

The issue was introduced in 3.0-rc1 by 6d3163ce ("mm: check if any page
in a pageblock is reserved before marking it MIGRATE_RESERVE").

Make sure that start_pfn is always aligned to pageblock_nr_pages to
ensure that pfn_valid s always called at the start of each pageblock.
Architectures with holes in pageblocks will be correctly handled by
pfn_valid_within in pageblock_is_reserved.

Signed-off-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Tested-by: Dang Bo <bdang@vmware.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Arve Hjnnevg <arve@android.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: <stable@vger.kernel.org> [3.0+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm/migrate.c: pair unlock_page() and lock_page() when migrating huge pages

Avoid unlocking and unlocked page if we failed to lock it.

Signed-off-by: Hillf Danton <dhillf@gmail.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

thp: set compound tail page _count to zero

Commit 70b50f94f1644 ("mm: thp: tail page refcounting fix") keeps all
page_tail->_count zero at all times.  But the current kernel does not
set page_tail->_count to zero if a 1GB page is utilized.  So when an
IOMMU 1GB page is used by KVM, it wil result in a kernel oops because a
tail page's _count does not equal zero.

  kernel BUG at include/linux/mm.h:386!
  invalid opcode: 0000 [#1] SMP
  Call Trace:
    gup_pud_range+0xb8/0x19d
    get_user_pages_fast+0xcb/0x192
    ? trace_hardirqs_off+0xd/0xf
    hva_to_pfn+0x119/0x2f2
    gfn_to_pfn_memslot+0x2c/0x2e
    kvm_iommu_map_pages+0xfd/0x1c1
    kvm_iommu_map_memslots+0x7c/0xbd
    kvm_iommu_map_guest+0xaa/0xbf
    kvm_vm_ioctl_assigned_device+0x2ef/0xa47
    kvm_vm_ioctl+0x36c/0x3a2
    do_vfs_ioctl+0x49e/0x4e4
    sys_ioctl+0x5a/0x7c
    system_call_fastpath+0x16/0x1b
  RIP  gup_huge_pud+0xf2/0x159

Signed-off-by: Youquan Song <youquan.song@intel.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

thp: add compound tail page _mapcount when mapped

With the 3.2-rc kernel, IOMMU 2M pages in KVM works.  But when I tried
to use IOMMU 1GB pages in KVM, I encountered an oops and the 1GB page
failed to be used.

The root cause is that 1GB page allocation calls gup_huge_pud() while 2M
page calls gup_huge_pmd.  If compound pages are used and the page is a
tail page, gup_huge_pmd() increases _mapcount to record tail page are
mapped while gup_huge_pud does not do that.

So when the mapped page is relesed, it will result in kernel oops
because the page is not marked mapped.

This patch add tail process for compound page in 1GB huge page which
keeps the same process as 2M page.

Reproduce like:
1. Add grub boot option: hugepagesz=1G hugepages=8
2. mount -t hugetlbfs -o pagesize=1G hugetlbfs /dev/hugepages
3. qemu-kvm -m 2048 -hda os-kvm.img -cpu kvm64 -smp 4 -mem-path /dev/hugepages
-net none -device pci-assign,host=07:00.1

  kernel BUG at mm/swap.c:114!
  invalid opcode: 0000 [#1] SMP
  Call Trace:
    put_page+0x15/0x37
    kvm_release_pfn_clean+0x31/0x36
    kvm_iommu_put_pages+0x94/0xb1
    kvm_iommu_unmap_memslots+0x80/0xb6
    kvm_assign_device+0xba/0x117
    kvm_vm_ioctl_assigned_device+0x301/0xa47
    kvm_vm_ioctl+0x36c/0x3a2
    do_vfs_ioctl+0x49e/0x4e4
    sys_ioctl+0x5a/0x7c
    system_call_fastpath+0x16/0x1b
  RIP  put_compound_page+0xd4/0x168

Signed-off-by: Youquan Song <youquan.song@intel.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

printk: avoid double lock acquire

Commit 4f2a8d3cf5e ("printk: Fix console_sem vs logbuf_lock unlock race")
introduced another silly bug where we would want to acquire an already
held lock. Avoid this.

Reported-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

memcg: update maintainers

More players joined to memory cgroup developments and Johannes' great work
changed internal design of memory cgroup dramatically.  And he will do
more works.  Michal Hokko did many bug fixes and know memory cgroup very
well.  Daisuke Nishimura helped us very much but he seems busy now.
Thanks to his works.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

drivers/rtc/rtc-s3c.c: fix driver clock enable/disable balance issues

If an error occurs after the clock is enabled, the enable/disable state
can become unbalanced.

Signed-off-by: Jonghwan Choi <jhbird.choi@samsung.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Acked-by: Kukjin Kim <kgene.kim@samsung.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

CREDITS: update Kees's expired fingerprint and fix details

Small clean-up for my CREDITS entry; the GPG fingerprint was not up to
date, so I fixed other details at the same time too.

Signed-off-by: Kees Cook <kees@outflux.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

thp: reduce khugepaged freezing latency

khugepaged can sometimes cause suspend to fail, requiring that the user
retry the suspend operation.

Use wait_event_freezable_timeout() instead of
schedule_timeout_interruptible() to avoid missing freezer wakeups. A
try_to_freeze() would have been needed in the khugepaged_alloc_hugepage
tight loop too in case of the allocation failing repeatedly, and
wait_event_freezable_timeout will provide it too.

khugepaged would still freeze just fine by trying again the next minute
but it's better if it freezes immediately.

Reported-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Tested-by: Jiri Slaby <jslaby@suse.cz>
Cc: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
Cc: "Rafael J. Wysocki" <rjw@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

fs/proc/meminfo.c: fix compilation error

Fix the error message "directives may not be used inside a macro argument"
which appears when the kernel is compiled for the cris architecture.

Signed-off-by: Claudio Scordino <claudio@evidence.eu.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

vmscan: use atomic-long for shrinker batching

Use atomic-long operations instead of looping around cmpxchg().

[akpm@linux-foundation.org: massage atomic.h inclusions]
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Dave Chinner <david@fromorbit.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

vmscan: fix initial shrinker size handling

A shrinker function can return -1, means that it cannot do anything
without a risk of deadlock.  For example prune_super() does this if it
cannot grab a superblock refrence, even if nr_to_scan=0.  Currently we
interpret this -1 as a ULONG_MAX size shrinker and evaluate `total_scan'
according to this.  So the next time around this shrinker can cause
really big pressure.  Let's skip such shrinkers instead.

Also make total_scan signed, otherwise the check (total_scan < 0) below
never works.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Dave Chinner <david@fromorbit.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

MAINTAINERS: Update amd-iommu F: patterns

Commit 29b68415e335 ("x86: amd_iommu: move to drivers/iommu/")
moved the files, update the patterns.

CC: Ohad Ben-Cohen <ohad@wizery.com>
CC: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>

x86, efi: Calling __pa() with an ioremap()ed address is invalid

If we encounter an efi_memory_desc_t without EFI_MEMORY_WB set
in ->attribute we currently call set_memory_uc(), which in turn
calls __pa() on a potentially ioremap'd address.

On CONFIG_X86_32 this is invalid, resulting in the following
oops on some machines:

  BUG: unable to handle kernel paging request at f7f22280
  IP: [<c10257b9>] reserve_ram_pages_type+0x89/0x210
  [...]

  Call Trace:
   [<c104f8ca>] ? page_is_ram+0x1a/0x40
   [<c1025aff>] reserve_memtype+0xdf/0x2f0
   [<c1024dc9>] set_memory_uc+0x49/0xa0
   [<c19334d0>] efi_enter_virtual_mode+0x1c2/0x3aa
   [<c19216d4>] start_kernel+0x291/0x2f2
   [<c19211c7>] ? loglevel+0x1b/0x1b
   [<c19210bf>] i386_start_kernel+0xbf/0xc8

A better approach to this problem is to map the memory region
with the correct attributes from the start, instead of modifying
it after the fact. The uncached case can be handled by
ioremap_nocache() and the cached by ioremap_cache().

Despite first impressions, it's not possible to use
ioremap_cache() to map all cached memory regions on
CONFIG_X86_64 because EFI_RUNTIME_SERVICES_DATA regions really
don't like being mapped into the vmalloc space, as detailed in
the following bug report,

https://bugzilla.redhat.com/show_bug.cgi?id=748516

Therefore, we need to ensure that any EFI_RUNTIME_SERVICES_DATA
regions are covered by the direct kernel mapping table on
CONFIG_X86_64. To accomplish this we now map E820_RESERVED_EFI
regions via the direct kernel mapping with the initial call to
init_memory_mapping() in setup_arch(), whereas previously these
regions wouldn't be mapped if they were after the last E820_RAM
region until efi_ioremap() was called. Doing it this way allows
us to delete efi_ioremap() completely.

Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Matthew Garrett <mjg@redhat.com>
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Huang Ying <huang.ying.caritas@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/1321621751-3650-1-git-send-email-matt@console-pimps.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>

cifs: check for NULL last_entry before calling cifs_save_resume_key

Prior to commit eaf35b1, cifs_save_resume_key had some NULL pointer
checks at the top. It turns out that at least one of those NULL
pointer checks is needed after all.

When the LastNameOffset in a FIND reply appears to be beyond the end of
the buffer, CIFSFindFirst and CIFSFindNext will set srch_inf.last_entry
to NULL. Since eaf35b1, the code will now oops in this situation.

Fix this by having the callers check for a NULL last entry pointer
before calling cifs_save_resume_key. No change is needed for the
call site in cifs_readdir as it's not reachable with a NULL
current_entry pointer.

This should fix:

https://bugzilla.redhat.com/show_bug.cgi?id=750247

Cc: stable@vger.kernel.org
Cc: Christoph Hellwig <hch@infradead.org>
Reported-by: Adam G. Metzler <adamgmetzler@gmail.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>

cifs: attempt to freeze while looping on a receive attempt

In the recent overhaul of the demultiplex thread receive path, I
neglected to ensure that we attempt to freeze on each pass through the
receive loop.

Reported-and-Tested-by: Woody Suwalski <terraluna977@gmail.com>
Reported-and-Tested-by: Adam Williamson <awilliam@redhat.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>

cifs: Fix sparse warning when calling cifs_strtoUCS

Fix sparse endian check warning while calling cifs_strtoUCS

CHECK   fs/cifs/smbencrypt.c
fs/cifs/smbencrypt.c:216:37: warning: incorrect type in argument 1
(different base types)
fs/cifs/smbencrypt.c:216:37:    expected restricted __le16 [usertype] *<noident>
fs/cifs/smbencrypt.c:216:37:    got unsigned short *<noident>

Signed-off-by: Steve French <smfrench@gmail.com>
Acked-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com

CIFS: Add descriptions to the brlock cache functions

Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Steve French <sfrench@us.ibm.com>

md: raid5 crash during degradation

NULL pointer access causes crash in raid5 module.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

Merge branch 'timers-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
alarmtimers: Fix time comparison
ptp: Fix clock_getres() implementation

Merge branch 'for-linus' of git://git./linux/kernel/git/mason/linux-btrfs

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: drop spin lock when memory alloc fails
  Btrfs: check if the to-be-added device is writable
  Btrfs: try cluster but don't advance in search list
  Btrfs: try to allocate from cluster even at LOOP_NO_EMPTY_SIZE

Merge branch 'fixes' of git://git./linux/kernel/git/arm/arm-soc

* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (28 commits)
  ARM: sa1100: fix build error
  ARM: OMAP1: recalculate loops per jiffy after dpll1 reprogram
  ARM: davinci: dm365 evm: align nand partition table to u-boot
  ARM: davinci: da850 evm: change audio edma event queue to EVENTQ_0
  ARM: davinci: dm646x evm: wrong register used in setup_vpif_input_channel_mode
  ARM: davinci: dm646x does not have a DSP domain
  ARM: davinci: psc: fix incorrect offsets
  ARM: davinci: psc: fix incorrect mask
  ARM: mx28: LRADC macro rename
  arm: mx23: recognise stmp378x as mx23
  ARM: mxs: fix machines' initializers order
  ARM: mxs/tx28: add __initconst for fec pdata
  ARM: S3C64XX: Staticise s3c6400_sysclass
  ARM: S3C64XX: Add linux/export.h to dev-spi.c
  ARM: S3C64XX: Remove extern from definition of framebuffer setup call
  MAINTAINERS: Extend Samsung patterns to cover SPI and ASoC drivers
  MAINTAINERS: Add linux-samsung-soc mailing list for Samsung
  MAINTAINERS: Consolidate Samsung MAINTAINERS
  ARM: CSR: PM: fix build error due to undeclared 'THIS_MODULE'
  ARM: CSR: fix build error due to new mdesc->dma_zone_size
  ...

TOMOYO: Fix pathname handling of disconnected paths.

Current tomoyo_realpath_from_path() implementation returns strange pathname
when calculating pathname of a file which belongs to lazy unmounted tree.
Use local pathname rather than strange absolute pathname in that case.

Also, this patch fixes a regression by commit 02125a82 "fix apparmor
dereferencing potentially freed dentry, sanitize __d_path() API".

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

x86, hpet: Immediately disable HPET timer 1 if rtc irq is masked

When HPET is operating in RTC mode, the TN_ENABLE bit on timer1
controls whether the HPET or the RTC delivers interrupts to irq8. When
the system goes into suspend, the RTC driver sends a signal to the
HPET driver so that the HPET releases control of irq8, allowing the
RTC to wake the system from suspend. The switchover is accomplished by
a write to the HPET configuration registers which currently only
occurs while servicing the HPET interrupt.

On some systems, I have seen the system suspend before an HPET
interrupt occurs, preventing the write to the HPET configuration
register and leaving the HPET in control of the irq8. As the HPET is
not active during suspend, it does not generate a wake signal and RTC
alarms do not work.

This patch forces the HPET driver to immediately transfer control of
the irq8 channel to the RTC instead of waiting until the next
interrupt event.

Signed-off-by: Mark Langsdorf <mark.langsdorf@amd.com>
Link: http://lkml.kernel.org/r/20111118153306.GB16319@alberich.amd.com
Tested-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org

Merge branch 'fixes' of git://github.com/hzhuang1/linux into fixes

Btrfs: drop spin lock when memory alloc fails

Drop spin lock in convert_extent_bit() when memory alloc fails,
otherwise, it will be a deadlock.

Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

Btrfs: check if the to-be-added device is writable

If we call ioctl(BTRFS_IOC_ADD_DEV) directly, we'll succeed in adding
a readonly device to a btrfs filesystem, and btrfs will write to
that device, emitting kernel errors:

[ 3109.833692] lost page write due to I/O error on loop2
[ 3109.833720] lost page write due to I/O error on loop2
...

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

Btrfs: try cluster but don't advance in search list

When we find an existing cluster, we switch to its block group as the
current block group, possibly skipping multiple blocks in the process.
Furthermore, under heavy contention, multiple threads may fail to
allocate from a cluster and then release just-created clusters just to
proceed to create new ones in a different block group.

This patch tries to allocate from an existing cluster regardless of its
block group, and doesn't switch to that group, instead proceeding to
try to allocate a cluster from the group it was iterating before the
attempt.

Signed-off-by: Alexandre Oliva <oliva@lsd.ic.unicamp.br>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

ARM: sa1100: fix build error

arm-eabi-4.4.3-ld:--defsym zreladdr=: syntax error
make[2]: *** [arch/arm/boot/compressed/vmlinux] Error 1
make[1]: *** [arch/arm/boot/compressed/vmlinux] Error 2
make: *** [uImage] Error 2

Signed-off-by: Haojian Zhuang <haojian.zhuang@gmail.com>
Signed-off-by: Jett.Zhou <jtzhou@marvell.com>

md/raid5: never wait for bad-block acks on failed device.

Once a device is failed we really want to completely ignore it.
It should go away soon anyway.

In particular the presence of bad blocks on it should not cause us to
block as we won't be trying to write there anyway.

So as soon as we can check if a device is Faulty, do so and pretend
that it is already gone if it is Faulty.

Signed-off-by: NeilBrown <neilb@suse.de>

md: ensure new badblocks are handled promptly.

When we mark blocks as bad we need them to be acknowledged by the
metadata handler promptly.

For an in-kernel metadata handler that was already being done. But
for an external metadata handler we need to alert it of the change by
sending a notification through the sysfs file. This adds that
notification.

Signed-off-by: NeilBrown <neilb@suse.de>