If the second allocation failed, the first structure allocated in this
routine was not freed.
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The sender requests an ACK every 1/2 MB to avoid retransmit timeouts that
were causing MVAPICH mod_bw to fail after a predictable number of sends.
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Fix the description of iSER in Kconfig. It is not accurate.
Signed-off-by: Erez Zilber <erezz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
iSER uses the DMA mapping api to map the page holding the
SCSI command data to the HCA DMA address space. When the
command data is not aligned for RDMA, the data is copied
to/from an allocated buffer which in turn is used for
executing this command. The pages associated with the
command must be unmapped before being touched.
Signed-off-by: Erez Zilber <erezz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
iSER uses a data transaction object (struct iser_dto) as part
of its IB data descriptors (struct iser_desc) management.
It also uses a hierarchy of connection structures pointing to
each other. A DTO may exist even after the iscsi_iser connection
pointed by it is destroyed (eg one that is bound to a post
receive buffer which was flushed by the IB HW). Hence DTOs need
point to the lowest connection, which is struct iser_conn.
Signed-off-by: Erez Zilber <erezz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
If the allocation of mr fails, then c2_reg_phys_mr() leaks the
page_list array it allocated earlier.
This was Coverity CID #1413.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Another NULL dereference spotted by the Coverity checker (cid #1395):
In case we can't alloc the vq_req, we goto bail1, where we call
vq_req_free(c2dev, vq_req); which then dereferences vq_req.
Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Acked-by: Tom Tucker <tom@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Make sure all 64-bit quantities are cast to unsigned long long
when printed with "%ll" printk formats.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This eliminates the i_blksize field from struct inode. Filesystems that want
to provide a per-inode st_blksize can do so by providing their own getattr
routine instead of using the generic_fillattr() function.
Note that some filesystems were providing pretty much random (and incorrect)
values for i_blksize.
[bunk@stusta.de: cleanup]
[akpm@osdl.org: generic_fillattr() fix]
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The following patches reduce the size of the VFS inode structure by 28 bytes
on a UP x86. (It would be more on an x86_64 system). This is a 10% reduction
in the inode size on a UP kernel that is configured in a production mode
(i.e., with no spinlock or other debugging functions enabled; if you want to
save memory taken up by in-core inodes, the first thing you should do is
disable the debugging options; they are responsible for a huge amount of bloat
in the VFS inode structure).
This patch:
The filesystem or device-specific pointer in the inode is inside a union,
which is pretty pointless given that all 30+ users of this field have been
using the void pointer. Get rid of the union and rename it to i_private, with
a comment to explain who is allowed to use the void pointer. This is just a
cleanup, but it allows us to reuse the union 'u' for something something where
the union will actually be used.
[judith@osdl.org: powerpc build fix]
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Judith Lebzelter <judith@osdl.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* Rougly half of callers already do it by not checking return value
* Code in drivers/acpi/osl.c does the following to be sure:
(void)kmem_cache_destroy(cache);
* Those who check it printk something, however, slab_error already printed
the name of failed cache.
* XFS BUGs on failed kmem_cache_destroy which is not the decision
low-level filesystem driver should make. Converted to ignore.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
0x08 is the HT capability, while PCI_CAP_ID_HT_IRQCONF would be
the subtype 0x80 that mpic_scan_ht_pic() uses.
Rename PCI_CAP_ID_HT_IRQCONF into PCI_CAP_ID_HT.
And by the way, use it in the ipath driver instead of defining its
own HT_CAPABILITY_ID.
Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
indirect chains of includes are arch-specific and can't
be relied upon... (hell, even attempt to build it for
itanic would trigger vmalloc.h ones; err.h triggers
on e.g. alpha).
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
IPoIB doesn't use anything from <linux/vmalloc.h>, so don't include it.
Signed-off-by: Dotan Barak <dotanb@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
When ipoib_ib_dev_flush() is called because of a port event, the
driver needs to rejoin all multicast groups, since the flush will call
ipoib_mcast_dev_flush() (via ipoib_ib_dev_down()). Otherwise no
(non-broadcast) multicast groups will be rejoined until the networking
core calls ->set_multicast_list again, and so multicast reception will
be broken for potentially a long time.
Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
RFC 4391 ("Transmission of IP over InfiniBand (IPoIB)") says:
If the IB multicast group does not already exist, one must be
created first with the IPoIB link MTU. The MGID MUST use the same
P_Key, Q_Key, SL, MTU, and HopLimit as those used in the
broadcast-GID. The rest of attributes SHOULD follow the values used
in the broadcast-GID as well.
However, the current IPoIB driver is only setting the attributes
required by the InfiniBand spec to create a multicast group, so in
particular the MTU and HopLimit are not being set. Add these
attributes when creating MCGs, and also set the Rate attribute, since
IPoIB pays attention to that attribute as well.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
If a QP has separate send and receive CQs, then the send CQ will never
have receive completions from that QP in it. So when cleaning the
send CQ, there's no need to pass in an SRQ pointer, even if the QP is
attached to an SRQ.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Trigger device remove and then add when a catastrophic error is
detected in hardware. This, in turn, will cause a device reset, which
we hope will recover from the catastrophic condition.
Since this might interefere with debugging the root cause, add a
module option to suppress this behaviour.
Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Do not track remote QPN in TimeWait state, since QP is not connected.
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Require users to register with SA module, to prevent the sa_query
module text from going away while an SA query callback is still
running. Update all in-tree users for the new interface.
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Split up ipoib_ib_handle_wc() into ipoib_ib_handle_rx_wc() and
ipoib_ib_handle_tx_wc() to make the code easier to read. This will
also help implement NAPI in the future.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Fast Memory Registration (fmr) is used to register for rdma an sg whose
elements are not linearly sequential after dma mapping.
The IB verbs layer provides an "all dma memory MR (memory region)" which
can be used for RDMA-ing a dma linearly sequential buffer.
Change the code to use the dma mr instead of doing fmr when dma mapping
produces a single dma entry sg.
Signed-off-by: Erez Zilber <erezz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
fix and add some debug prints related to iser
handling of memory for rdma.
Signed-off-by: Erez Zilber <erezz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
As iser is able to use at most one rdma operation for the
execution of a scsi command, and registration of the sg
associated with scsi command has its restrictions, the code
checks if an sg is "aligned for rdma".
Alignment for rdma is measured in "fmr page" units whose
possible resolutions are different between HCAs and can be
smaller, equal or bigger to the system page size.
When the system page size is bigger than 4KB (eg the default
with ia64 kernels) there a bigger chance that an sg would be
aligned for rdma if the fmr page size is 4KB.
Change the code to create FMR whose pages are of size 4KB
and to take that into account when processing the sg.
Signed-off-by: Erez Zilber <erezz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Currently, the data length of a command coming down from scsi-ml
is limited only by the size of its sg list (sg_tablesize). The
max data length may be different for different page size values.
By setting max_sectors, we limit the data length to
max_sectors*512 bytes.
Signed-off-by: Erez Zilber <erezz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
dma mapping may include a "compaction" of the sg associated with scsi command.
Hence, the size of the maximal prefix of the SG which is aligned for rdma must be
compared against the length of the dma mapped sg (mem->dma_nents) and not against
the size of it before it was mapped (mem->size).
Signed-off-by: Erez Zilber <erezz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Closes a window where address resolution can attach an rdma_cm_id to a
device during destruction of the rdma_cm_id. This can result in the
rdma_cm_id remaining in the device list after its memory has been
freed.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Add a driver for the Ammasso 1100 gigabit ethernet RNIC.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Modifications to the existing rdma header files, core files, drivers,
and ulp files to support iWARP, including:
- Hook iWARP CM into the build system and use it in rdma_cm.
- Convert enum ib_node_type to enum rdma_node_type, which includes
the possibility of RDMA_NODE_RNIC, and update everything for this.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Add an iWARP Connection Manager (CM), which abstracts connection
management for iWARP devices (RNICs). It is a logical instance of the
xx_cm where xx is the transport type (ib or iw). The symbols exported
are used by the transport independent rdma_cm module, and are
available also for transport dependent ULPs.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Remove some trailing whitespace that has snuck in despite the best
efforts of whitespace=error-all. Also fix a few other whitespace
bogosities.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Randomize the starting local comm ID to avoid getting a rejected
connection due to a stale connection after a system reboot or
reloading of the ib_cm.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The ib_mad module does not use a kthread function, but mad_priv.h
includes <linux/kthread.h>. mad_rmpp.c does not do any DMA-related
stuff, but includes <linux/dma-mapping.h>. Remove the unused includes.
Signed-off-by: James Lentini <jlentini@netapp.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The implementation assumes that any RMPP request that requires a
response uses DS RMPP. Based on the RMPP start-up scenarios defined
by the spec, this should be a valid assumption. That is, there is no
start-up scenario defined where an RMPP request is followed by a
non-RMPP response. By having this assumption we avoid any API
changes.
In order for a node that supports DS RMPP to communicate with one that
does not, RMPP responses assume a new window size of 1 if a DS ACK has
not been received. (By DS ACK, I'm referring to the turn-around ACK
after the final ACK of the request.) This is a slight spec deviation,
but is necessary to allow communication with nodes that do not
generate the DS ACK. It also handles the case when a response is sent
after the request state has been discarded.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Set the reject code properly when rejecting a request that contains an
invalid GID. A suitable GID is returned by the IB CM in the
additional reject information (ARI). This is a spec compliancy issue.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Enable atomic operations along with RDMA reads if a local RDMA
read/atomic depth is provided by the user.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Incorrect number of bits was taken for static_rate field.
Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
port_num was not being returned for unconnected QPs.
Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
When default static rate is returned for Tavor, need to translate it
to an ib rate value.
Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This stops the generic poll code from waiting for a timeout.
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>