Add per-device dma_mapping_ops support for CONFIG_X86_64 as POWER
architecture does:
This enables us to cleanly fix the Calgary IOMMU issue that some devices
are not behind the IOMMU (http://lkml.org/lkml/2008/5/8/423).
I think that per-device dma_mapping_ops support would be also helpful for
KVM people to support PCI passthrough but Andi thinks that this makes it
difficult to support the PCI passthrough (see the above thread). So I
CC'ed this to KVM camp. Comments are appreciated.
A pointer to dma_mapping_ops to struct dev_archdata is added. If the
pointer is non NULL, DMA operations in asm/dma-mapping.h use it. If it's
NULL, the system-wide dma_ops pointer is used as before.
If it's useful for KVM people, I plan to implement a mechanism to register
a hook called when a new pci (or dma capable) device is created (it works
with hot plugging). It enables IOMMUs to set up an appropriate
dma_mapping_ops per device.
The major obstacle is that dma_mapping_error doesn't take a pointer to the
device unlike other DMA operations. So x86 can't have dma_mapping_ops per
device. Note all the POWER IOMMUs use the same dma_mapping_error function
so this is not a problem for POWER but x86 IOMMUs use different
dma_mapping_error functions.
The first patch adds the device argument to dma_mapping_error. The patch
is trivial but large since it touches lots of drivers and dma-mapping.h in
all the architecture.
This patch:
dma_mapping_error() doesn't take a pointer to the device unlike other DMA
operations. So we can't have dma_mapping_ops per device.
Note that POWER already has dma_mapping_ops per device but all the POWER
IOMMUs use the same dma_mapping_error function. x86 IOMMUs use device
argument.
[akpm@linux-foundation.org: fix sge]
[akpm@linux-foundation.org: fix svc_rdma]
[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: fix bnx2x]
[akpm@linux-foundation.org: fix s2io]
[akpm@linux-foundation.org: fix pasemi_mac]
[akpm@linux-foundation.org: fix sdhci]
[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: fix sparc]
[akpm@linux-foundation.org: fix ibmvscsi]
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Muli Ben-Yehuda <muli@il.ibm.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Avi Kivity <avi@qumranet.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use only statically allocated data for PHY config packet transmission.
With the previous incarnation, some data wouldn't be freed if the packet
transmit callback was never called.
A theoretical drawback now is that, in PCs with more than one card,
card A may complete() for a waiter on card B. But this is highly
unlikely and its impact not serious. Bus manager B may reset bus B
before the PHY config went out, but the next phy config on B should be
fine. However, with a timeout of 100ms, this situation is close to
impossible.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Isochronous reception in dualbuffer mode is reportedly broken with
TI TSB43AB22A on x86-64. Descriptor addresses above 2G have been
determined as the trigger:
https://bugzilla.redhat.com/show_bug.cgi?id=435550
Two fixes are possible:
- pci_set_consistent_dma_mask(pdev, DMA_31BIT_MASK);
at least when IR descriptors are allocated, or
- simply don't use dualbuffer.
This fix implements the latter workaround.
But we keep using dualbuffer on x86-32 which won't give us highmen (and
thus physical addresses outside the 31bit range) in coherent DMA memory
allocations. Right now we could for example also whitelist PPC32, but
DMA mapping implementation details are expected to change there.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: Jarod Wilson <jwilson@redhat.com>
There will be 4 padding bytes in struct fw_cdev_event_response on some platforms
The member:__u32 data will point to these padding bytes. While queue the
response and data in complete_transaction in fw-cdev.c, it will queue like this:
|response(excluding padding bytes)|4 padding bytes|4 padding bytes|data.
It queue 4 extra bytes. That is to say it use "&response + sizeof(response)"
while other place of kernel and userspace library use "&response + offsetof
(typeof(response), data)". So it will lost the last 4 bytes of data. This patch
can fix it while not changing the struct definition.
Signed-off-by: JiSheng Zhang <jszhang3@mail.ustc.edu.cn>
This fixes responses to outbound block read requests on 64bit architectures.
Tested on i686, x86-64, and x86-64 with i686 userland, using firecontrol and
gscanbus.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
* 'sbp2-spindown' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6:
ieee1394: sbp2: spin disks down on suspend and shutdown
firewire: fw-sbp2: spin disks down on suspend and shutdown
ieee1394: sbp2: fix spindown for PL-3507 and TSB42AA9 firmwares
firewire: fw-sbp2: fix spindown for PL-3507 and TSB42AA9 firmwares
scsi: sd: optionally set power condition in START STOP UNIT
After card->done and card->work are completed, any remaining pending
request would be a bug. We cannot safely complete a transaction at
that point anymore.
IOW card users must not drop their last fw_card reference (usually
indirect references through fw_device references) before their last
outbound transaction through that card was finished.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
- better name for a function argument
- removal of a local variable which became unnecessary after
"fully initialize fw_transaction before marking it pending"
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
In theory, card->flush_timer could already access a transaction between
fw_send_request()'s spin_unlock_irqrestore and the rest of what happens
in fw_send_request(). This would happen if the process which sends the
request is preempted and put to sleep right after spin_unlock_irqrestore
for longer than 100ms.
Therefore we fill in everything in struct fw_transaction at which the
flush_timer might look at before we lift the lock.
To do: Ensure that the timer does not pick up the transaction before
the time of the AT request event plus split transaction timeout.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Reported by Jay Fenlason: A bus reset tasklet may call
fw_flush_transactions and touch transactions (call their callback which
will free them) while the context which submitted the transaction is
still inserting it into the transmission queue.
A simple solution to this problem is to _not_ "flush" the transactions
because of a bus reset (complete the transcations as 'cancelled'). They
will now simply time out (completed as 'cancelled' by the split-timeout
timer).
Jay Fenlason thought of this fix too but I was quicker to type it out.
:-)
Background:
Contexts which access an instance of struct fw_transaction are:
1. the submitter, until it inserted the packet which is embedded in the
transaction into the AT req DMA,
2. the AsReqTrContext tasklet when the request packet was acked by the
responder node or transmission to the responder failed,
3. the AsRspRcvContext tasklet when it found a request which matched
an incoming response,
4. the card->flush_timer when it picks up timed-out transactions to
cancel them,
5. the bus reset tasklet when it cancels transactions (this access is
eliminated by this patch),
6. a process which shuts down an fw_card (unregisters it from fw-core
when the controller is unbound from fw-ohci) --- although in this
case there shouldn't really be any transactions anymore because we
wait until all card users finished their business with the card.
All of these contexts run concurrently (except for the 6th, presumably).
The 1st is safe against the 2nd and 3rd because of the way how a request
packet is carefully submitted to the hardware. A race between 2nd and
3rd has been fixed a while ago (bug 9617). The 4th is almost safe
against 1st, 2nd, 3rd; there are issues with it if huge scheduling
latencies occur, to be fixed separately. The 5th looks safe against
2nd, 3rd, and 4th but is unsafe against 1st. Maybe this could be fixed
with an explicit state variable in struct fw_transaction. But this
would require fw_transaction to be rewritten as only dynamically
allocatable object with reference counting --- not a good solution if we
also can simply kill this 5th accessing context (replace it by the 4th).
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Contrary to a comment in the source, request->ack of a broadcast write
request can be ACK_PENDING. Hence the existing check is insufficient.
Debug dmesg before:
AR spd 0 tl 00, ffc0 -> ffff, ack_pending , QW req, fffff0000234 = ffffffff
AT spd 0 tl 00, ffff -> ffc0, ack_complete, W resp
And the requesting node (linux1394) reports an unsolicited response.
Debug dmesg after:
AR spd 0 tl 00, ffc0 -> ffff, ack_pending , QW req, fffff0000234 = ffffffff
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
This is a functionally equivalent replacement of the current reference
counting of struct fw_card instances. It only converts it to common
idioms as suggested by Kristian Høgsberg:
- struct kref replaces atomic_t as the counter.
- wait_for_completion is used to wait for all card users to complete.
BTW, it may make sense to count card->flush_timer and card->work as
card users too.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
This instructs sd_mod to send START STOP UNIT on suspend and resume,
and on driver unbinding or unloading (including when the system is shut
down).
We don't do this though if multiple initiators may log in to the target.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Tested-by: Tino Keitel <tino.keitel@gmx.de>
Reported by Tino Keitel: PL-3507 with firmware from Prolific does not
spin down the disk on START STOP UNIT with power condition = 0 and start
= 0. It does however work with power condition = 2 or 3.
Also found while investigating this: DViCO Momobay CX-1 and FX-3A (TI
TSB42AA9/A based) become unresponsive after START STOP UNIT with power
condition = 0 and start = 0. They stay responsive if power condition is
set when stopping the motor.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Tested-by: Tino Keitel <tino.keitel@gmx.de>
There is a small off-by-one bug in firewire-sbp2. This causes problems
when a device exports multiple LUN Directories. I found it when trying
to talk to a SONY DVD Jukebox.
Signed-off-by: Richard Sharpe <realrichardsharpe@gmail.com>
Acked-by: Kristian Høgsberg <krh@redhat.com>
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> (op. order, changelog)
Emphasize the recommendation to build only one stack.
Trim the prompts to better fit into short attention spans.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
If the low-level driver failed to initialize a card properly without
noticing it, fw-core was blocked indefinitely when trying to send a
PHY config packet. This hung up the events kernel thread, e.g. locked
up keyboard input.
https://bugzilla.redhat.com/show_bug.cgi?id=444694https://bugzilla.redhat.com/show_bug.cgi?id=446763
This problem was introduced between 2.6.25 and 2.6.26-rc1 by commit
2a0a259049 "firewire: wait until PHY
configuration packet was transmitted (fix bus reset loop)".
The solution is to wait with timeout. I tested it with 7 different
working controllers and 1 non-working controller. On the working ones,
the packet callback complete()s usually --- but not always --- before a
timeout of 10ms. Hence I chose a safer timeout of 100ms.
On the few tests with the non-working controller ALi M5271, PHY config
packet transmission always timed out so far. (Fw-ohci needs to be fixed
for this controller independently of this deadline fix. Often the core
doesn't even attempt to send a phy config because not even self ID
reception works.)
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
The messages which can be enabled by fw-ohci's debug module parameter
are changed from KERN_DEBUG to KERN_NOTICE level and uniformly prefixed
with "firewire_ohci: ". This further simplifies communication with
users when we ask them to capture debug messages.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Callers of fill_bus_reset_event() have to take card->lock. Otherwise
access to node data may oops if node removal is in progress.
A lockless alternative would be
- event->local_node_id = card->local_node->node_id;
+ tmp = fw_node_get(card->local_node);
+ event->local_node_id = tmp->node_id;
+ fw_node_put(tmp);
and ditto with the other node pointers which fill_bus_reset_event()
accesses. But I went the locked route because one of the two callers
already holds the lock. As a bonus, we don't need the memory barrier
anymore because device->generation and device->node_id are written in
a card->lock protected section.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: Kristian Høgsberg <krh@redhat.com>
OHCI 1.1 clause 5.10 requires that selfIDBufferPtr is valid when a 1 is
written into LinkControl.rcvSelfID.
This driver bug has so far not been known to cause harm because most
chips obviously accept a later selfIDBufferPtr write, at least before
HCControl.linkEnable is written.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: Jarod Wilson <jwilson@redhat.com>
Signed-off-by: Kristian Høgsberg <krh@redhat.com>
We want the rcvPhyPkt bit in LinkControl off before we start using the
chip. However, the spec says that the reset value of it is undefined.
Hence switch it explicitly off.
https://bugzilla.redhat.com/show_bug.cgi?id=244576#c48 shows that for
example the nForce2 integrated FireWire controller seems to have it on
by default.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: Jarod Wilson <jwilson@redhat.com>
header_length and payload_length are filled with random data if an
unknown tcode was read from the AR buffer (i.e. if the AR buffer
contained invalid data).
We still need a better strategy to recover from this, but at least
handle_ar_packet now doesn't return out of bound buffer addresses
anymore.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
BUG() at this place is wrong. (Unless if the low level driver would
already do higher-level input validation of incoming request headers.)
Invalid incoming requests or bugs in the controller which corrupt the
AR-req buffer needlessly crashed the box because this is run in tasklet
context.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
If userspace ignores the POLLERR bit from poll(), and only attempts to
read() the device when POLLIN is set, it can still make ioctl() calls on
a device that has been removed from the system. The node_id and
generation returned by GET_INFO will be outdated, but INITIATE_BUS_RESET
would still cause a bus reset, and GET_CYCLE_TIMER will return data.
And if you guess the correct generation to use, you can send requests to
a different device on the bus, and get responses back.
This patch prevents open, ioctl, compat_ioctl, and mmap against shutdown
devices.
Signed-off-by: Jay Fenlason <fenlason@redhat.com>
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6:
[SCSI] aic94xx: fix section mismatch
[SCSI] u14-34f: Fix 32bit only problem
[SCSI] dpt_i2o: sysfs code
[SCSI] dpt_i2o: 64 bit support
[SCSI] dpt_i2o: move from virt_to_bus/bus_to_virt to dma_alloc_coherent
[SCSI] dpt_i2o: use standard __init / __exit code
[SCSI] megaraid_sas: fix suspend/resume sections
[SCSI] aacraid: Add Power Management support
[SCSI] aacraid: Fix jbod operations scan issues
[SCSI] aacraid: Fix warning about macro side-effects
[SCSI] add support for variable length extended commands
[SCSI] Let scsi_cmnd->cmnd use request->cmd buffer
[SCSI] bsg: add large command support
[SCSI] aacraid: Fix down_interruptible() to check the return value correctly
[SCSI] megaraid_sas; Update the Version and Changelog
[SCSI] ibmvscsi: Handle non SCSI error status
[SCSI] bug fix for free list handling
[SCSI] ipr: Rename ipr's state scsi host attribute to prevent collisions
[SCSI] megaraid_mbox: fix Dell CERC firmware problem
- struct scsi_cmnd had a 16 bytes command buffer of its own.
This is an unnecessary duplication and copy of request's
cmd. It is probably left overs from the time that scsi_cmnd
could function without a request attached. So clean that up.
- Once above is done, few places, apart from scsi-ml, needed
adjustments due to changing the data type of scsi_cmnd->cmnd.
- Lots of drivers still use MAX_COMMAND_SIZE. So I have left
that #define but equate it to BLK_MAX_CDB. The way I see it
and is reflected in the patch below is.
MAX_COMMAND_SIZE - means: The longest fixed-length (*) SCSI CDB
as per the SCSI standard and is not related
to the implementation.
BLK_MAX_CDB. - The allocated space at the request level
- I have audit all ISA drivers and made sure none use ->cmnd in a DMA
Operation. Same audit was done by Andi Kleen.
(*)fixed-length here means commands that their size can be determined
by their opcode and the CDB does not carry a length specifier, (unlike
the VARIABLE_LENGTH_CMD(0x7f) command). This is actually not exactly
true and the SCSI standard also defines extended commands and
vendor specific commands that can be bigger than 16 bytes. The kernel
will support these using the same infrastructure used for VARLEN CDB's.
So in effect MAX_COMMAND_SIZE means the maximum size command
scsi-ml supports without specifying a cmd_len by ULD's
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6:
firewire: fw-sbp2: log scsi_target ID at release
ieee1394: fix NULL pointer dereference in sysfs access
None of these files use any of the functionality promised by
asm/semaphore.h. It's possible that they rely on it dragging in some
unrelated header file, but I can't build all these files, so we'll have
fix any build failures as they come up.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Fix: The fact that nodes had different gap counts would be overlooked
if the bus manager code would pick gap count 63 because of beta
repeaters or because of very large hop counts. In this case, the bus
manager code would miss that it actually has to send the PHY config
packet with gap count 63.
Related trivial changes: Use bool for an int used as bool, touch up
some comments.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
We now exit fw_send_phy_config /after/ the PHY config packet has been
transmitted, instead of before. A subsequent fw_core_initiate_bus_reset
will therefore not overlap with the transmission. This is meant to make
the send PHY config packet + reset bus routine more deterministic.
Fixes bus reset loop and eventual panic with
- VIA VT6307 + IOGEAR hub + Unibrain Fire-i camera
http://bugzilla.kernel.org/show_bug.cgi?id=10128
- JMicron card
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: Jarod Wilson <jwilson@redhat.com>
Trivial change to replace more meaningless (to the untrained eye) hex
values with defined CSR constants.
Signed-off-by: Jarod Wilson <jwilson@redhat.com>
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
When a device changes its configuration ROM, it announces this with a
bus reset. firewire-core has to check which node initiated a bus reset
and whether any unit directories went away or were added on this node.
Tested with an IOI FWB-IDE01AB which has its link-on bit set if bus
power is available but does not respond to ROM read requests if self
power is off. This implements
- recognition of the units if self power is switched on after fw-core
gave up the initial attempt to read the config ROM,
- shutdown of the units when self power is switched off.
Also tested with a second PC running Linux/ieee1394. When the eth1394
driver is inserted and removed on that node, fw-core now notices the
addition and removal of the IPv4 unit on the ieee1394 node.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
read_bus_info_block() is repeatedly called by workqueue jobs.
These will step on each others toes eventually if there are multiple
workqueue threads, and we end up with corrupt config ROM images.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Unlike the ohci1394 driver, fw-ohci uses the selfIDGeneration field of
bus reset packets to determine the generation of incoming requests as
per OHCI 1.1 clause 8.4.2.3. This is more precise --- provided that the
controller inserts the correct generation. Texas Instruments chips
often don't.
This prevented the transmission of response packets, which for example
broke AV/C transactions as used when communicating with miniDV cameras
and any other AV/C devices.
There is apparently no way to detect and adjust incorrect generations.
Therefore we ignore the generation of bus reset packets from TI chips
and use the generation of the self ID buffer instead. Alas this is
received at a slightly wrong time. In rare cases, this could cause us
to not respond to legitimate requests or to respond to expired requests.
(The latter is less likely because the bus reset packet AR event is
typically handled before the self ID complete event.)
Bug reported by Mladen Kuntner, who was extraordinarily patient while
dealing with the driver maintainers. Fix confirmed to be required and
effective for TSB82AA2 and a TSB43AB22 or TSB43AB22A.
https://bugzilla.redhat.com/show_bug.cgi?id=243081
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: Jarod Wilson <jwilson@redhat.com>
Extend the logging of "AR evt_bus_reset, link internal" to "AR
evt_bus_reset, generation ${selfIDGeneration}". That way we can check
whether this generation matches the one seen in self ID complete event
logging. See OHCI 1.1 clause 8.4.2.3.
Also extend logging of "firewire_ohci: * selfIDs, generation *" by
"local node ID ffc*" in self ID logging to make the local node in AT/AR
event logs more obvious.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: Jarod Wilson <jwilson@redhat.com>
Add a debug option to watch bus reset interrupt events. Half of this
patch is taken from Jarod Wilson's first version of the JMicron fix.
BusReset interrupts are only generated if the respective module
parameter flag was set before the controller is being initialized.
Else we keep this event masked to reduce IRQ load in normal operation
and to avoid potential problems with buggy chips.
Note, this is unlike the other IRQ events whose logging can be enabled
any time after chip initialization. This and the influence on what
interrupts the chip generates is why I added an extra flag for it.
Also, reorder the debug parameter flags according to their perceived
usefulness.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: Jarod Wilson <jwilson@redhat.com>
I finally tracked down the issues with this JMicron PCI-e card in my
possession to a failure to comply with section 7.2.3.2 of the OHCI 1.1
specification (thanks to Kristian for the pointer to illustrate that it
is indeed a flaw in this card, not the driver). The controller should
simply flush the packets we've appended to its AT queue if a bus reset
occurs before they've been transmitted and we'll try again, but
something goes wrong and the controller winds up hung.
However, we can avoid the problem by simply checking if the
IntEvent.busReset register had been set before we try appending to the
AT context. When busReset is set, the AT context is completely halted
until busReset is cleared, so there's no point in appending AT packets
until the register is cleared. So at_context_queue_packet() now checks
for busReset being set, and bails with an RCODE_GENERATION packet ack,
which results in us trying to append the packet again after recognizing
the fact there has been a bus reset, and clearing busReset.
Signed-off-by: Jarod Wilson <jwilson@redhat.com>
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
While trying to debug this piece of crap JMicron PCI-e controller in my
possession, one thought was that perhaps I was encountering register access
failures. I'm not, but logging them would be good, so we can see if they
are a real problem we should be taking into account anywhere in the code.
Signed-off-by: Jarod Wilson <jwilson@redhat.com>
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> (added list contact)
I've now witnessed multiple occasions where one of my controllers (a very
poorly working JMicron PCIe card) fails to get its registers properly set
up in ohci_enable(), apparently due to an occasionally very slow to
initiate SClk. The easy fix for this problem is to add a tiny while loop
to try again a time or three after initially enabling LPS before we
move on (or give up).
Of course, the card still isn't fully functional yet, but this gets it at
least one tiny step closer...
Signed-off-by: Jarod Wilson <jwilson@redhat.com>
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
This adds debug printks for asynchronous transmission and reception and
for self ID reception. They can be enabled at module load time, and at
runtime via /sys/module/firewire_ohci/parameters/debug.
Signed-off-by: Jarod Wilson <jwilson@redhat.com>
Also added: Logging of interrupt event codes and of cancelled AT
packets.
The code now depends on a Kconfig variable. This makes it easier to
build firewire-ohci without the feature or to make it an option in the
future. The variable is currently hidden and always on.
This feature inflates firewire-ohci.ko by 7 kB = 27% on x86-64 and by
4 kB = 23% on i686.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
fw_core_handle_bus_reset() incorrectly relied on the assumption that
self_id_count > 0.
We check early in fw-ohci and discard the self ID complete event if
self_id_count == 0 because a valid event always has at least one self ID
packet in it (the one of the local node). Hence treat self_id_count ==
0 like any other kind of invalid self ID buffer.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: Jarod Wilson <jwilson@redhat.com>
Discard self ID buffer contents if
- the selfIDError flag is set,
- any of the self ID packets has bit errors.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: Jarod Wilson <jwilson@redhat.com>
The platform feature calls in the suspend method switched off cable
power, but the calls in the resume method did not switch it back on.
Add the necessary feature call to .resume. Also add the corresponding
call to .suspend to make .suspend's behavior explicitly the same on all
PMacs.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
This way firewire-ohci can be used for remote debugging like ohci1394.
Version with amendment from Fri, 11 Apr 2008 00:08:08 +0200.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Acked-by: Bernhard Kaindl <bk@suse.de>
Try to write dual-phase retry protocol limits to BUSY_TIMEOUT register.
- The dual-phase retry protocol is optional to implement, and if not
supported, writes to the dual-phase portion of the register will be
ignored. We try to write the original 1394-1995 default here.
- In the case of devices that are also SBP-3-compliant, all writes are
ignored, as the register is read-only, but contains single-phase retry of
15, which is what we're trying to set for all SBP-2 device anyway, so this
write attempt is safe and yields more consistent behavior for all devices.
See section 8.3.2.3.5 of the 1394-1995 spec, section 6.2 of the SBP-2 spec,
and section 6.4 of the SBP-3 spec for further details.
Signed-off-by: Jarod Wilson <jwilson@redhat.com>
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Write directly in big endian instead of byte-swapping after the fact.
This saves a few conversions, lets gcc use constant endianess
conversions where possible, and enables deeper endianess annotation.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Add wrappers for getting and putting a unit.
Remove some line breaks.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: Jarod Wilson <jwilson@redhat.com>
The reference count of the unit dropped too low in an error path in
sbp2_probe. Fixed by moving the _get further up.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: Jarod Wilson <jwilson@redhat.com>
The card->kref became obsolete since patch "firewire: fix crash in
automatic module unloading" added another counter of card users.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: Jarod Wilson <jwilson@redhat.com>
There's an ugly little memory leak in firewire-ohci's
ar_context_tasklet(), where we're not freeing up some of the memory we
use for each ar_buffer, due to a moving pointer. The problem has been
there for a while, but didn't get noticed until after converting the AR
routines over to use coherent DMA and I started running into I/O stall-
outs with the following message output repeatedly to the console:
PCI-DMA: Out of IOMMU space for 53248 bytes at device 0000:04:09.0
Plugging this leak is definitely necessary, but unfortunately, isn't the
entire answer to my problem, it only increases the amount of I/O that I
can do before hitting the problem. Still working on tracking down the
root cause..
Signed-off-by: Jarod Wilson <jwilson@redhat.com>
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
This fixes a use-after-free bug in the handling of split transactions.
The AT DMA handler of the request was occasionally executed after the
AR DMA handler of the response. The AT DMA handler then accessed an
already freed packet.
Reported by Johannes Berg.
http://bugzilla.kernel.org/show_bug.cgi?id=9617
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Tested-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Jarod Wilson <jwilson@redhat.com>
Shut up "may be used uninitialised in this function" warnings due to
PPC32's implementation of dma_alloc_coherent().
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Currently, we do nothing to guarantee we have a consistent DMA buffer for
asynchronous receive packets. Rather than doing several sync's following a
dma_map_single() to get consistent buffers, just switch to using
dma_alloc_coherent().
Resolves constant buffer failures on my own x86_64 laptop w/4GB of RAM and
likely to fix a number of other failures witnessed on x86_64 systems with
4GB of RAM or more.
Signed-off-by: Jarod Wilson <jwilson@redhat.com>
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Fix I/O errors due to SYM13FW500's inability to handle larger request
sizes. Reported by Piergiorgio Sartor <piergiorgio.sartor@nexgo.de> in
https://bugzilla.redhat.com/show_bug.cgi?id=436879
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: Jarod Wilson <jwilson@redhat.com>
Remove some less necessary information, point out that video1394 and
dv1394 should be blacklisted along with ohci1394.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Per the SBP-2 specification, all SBP-2 target devices must have a BUSY_TIMEOUT
register. Per the 1394-1995 specification, the retry_limt portion of the
register should be set to 0x0 initially, and set on the target by a logged in
initiator (i.e., a Linux host w/firewire controller(s)).
Well, as it turns out, lots of devices these days have actually moved on to
starting to implement SBP-3 compliance, which says that retry_limit should
default to 0xf instead (yes, SBP-3 stomps directly on 1394-1995, oops).
Prior to this change, the firewire driver stack didn't touch retry_limit, and
any SBP-3 compliant device worked fine, while SBP-2 compliant ones were unable
to retransmit when the host returned an ack_busy_X, which resulted in stalled
out I/O, eventually causing the SCSI layer to give up and offline the device.
The simple fix is for us to set retry_limit to 0xf in the register for all
devices (which actually matches what the old ieee1394 stack did).
Prior to this change, a hard disk behind an SBP-2 Prolific PL-3507 bridge chip
would routinely encounter buffer I/O errors and wind up offlined by the SCSI
layer. With this change, I've encountered zero I/O failures moving tens of GB
of data around.
Signed-off-by: Jarod Wilson <jwilson@redhat.com>
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Mostly copied from ohci1394.c. Necessary for some older Macs, e.g.
PowerBook G3 Pismo and early PowerBook G4 Titanium.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Copied from ohci1394.c. This code is necessary to prevent machine check
exceptions when reloading or resuming the driver.
Tested on a 1st generation PowerBook G4 Titanium, which also needs the
pci_probe() hunk.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
I was able to reproduce the system exception on resume with a 3rd-gen
Titanium PowerBook G4 667, and this patch does let the system resume
successfully now.
Not quite clear if there was possibly an updated version coming using
pci_enable_device() instead of the pair of pmac_call_feature() calls,
but either way, this is a definite must-have, at least for older ppc
macs -- my Aluminum PowerBook G4/1.67 suspends and resumes without this
patch just fine.
Signed-off-by: Jarod Wilson <jwilson@redhat.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Kills warnings from 'make C=1 CHECKFLAGS="-D__CHECK_ENDIAN__" modules':
drivers/firewire/fw-transaction.c:771:10: warning: incorrect type in assignment (different base types)
drivers/firewire/fw-transaction.c:771:10: expected unsigned int [unsigned] [usertype] <noident>
drivers/firewire/fw-transaction.c:771:10: got restricted unsigned int [usertype] <noident>
drivers/firewire/fw-transaction.h:93:10: warning: incorrect type in assignment (different base types)
drivers/firewire/fw-transaction.h:93:10: expected unsigned int [unsigned] [usertype] <noident>
drivers/firewire/fw-transaction.h:93:10: got restricted unsigned int [usertype] <noident>
drivers/firewire/fw-ohci.c:1490:8: warning: restricted degrades to integer
drivers/firewire/fw-ohci.c:1490:35: warning: restricted degrades to integer
drivers/firewire/fw-ohci.c:1516:5: warning: cast to restricted type
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: Jarod Wilson <jwilson@redhat.com>
The generation of incoming requests was filled in in wrong byte order on
machines with big endian CPU.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: Jarod Wilson <jwilson@redhat.com>
The bus management workqueue job was in danger to dereference NULL
pointers. Also, after having temporarily lifted card->lock, a few node
pointers and a device pointer may have become invalid.
Add NULL pointer checks and get the necessary references. Also, move
card->local_node out of fw_card_bm_work's sight during shutdown of the
card.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: Jarod Wilson <jwilson@redhat.com>
Patch "firewire: fw-sbp2: fix NULL pointer deref. in scsi_remove_device"
had the unintended effect that firewire-sbp2 could not be unloaded
anymore until all SBP-2 devices were unplugged.
We now fix the NULL pointer bug by reacquiring a reference to the sdev
instead of holding a reference to the sdev (and to the module) all the
time.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Tested-by: Jarod Wilson <jwilson@redhat.com>
By supplying ioctl()s in the wrong order, a userspace client was able to
trigger NULL pointer dereferences. Furthermore, by calling
ioctl_create_iso_context more than once, new contexts could be created
without ever freeing the previously created contexts.
Thanks to Anders Blomdell for the report.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Fix a kernel bug when unplugging an SBP-2 device after having its
scsi_device already removed via the "delete" sysfs attribute.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
While fw-sbp2 takes the necessary time to reconnect to a logical unit
after bus reset, the SCSI core keeps sending new commands. They are all
immediately completed with host busy status, and application clients or
filesystems will break quickly. The SCSI device might even be taken
offline: http://bugzilla.kernel.org/show_bug.cgi?id=9734
The only remedy seems to be to block the SCSI device until reconnect.
Alas the SCSI core has no useful API to block only one logical unit i.e.
the scsi_device, therefore we block the entire Scsi_Host. This
currently corresponds to an SBP-2 target. In case of targets with
multiple logical units, we need to satisfy the dependencies between
logical units by carefully tracking the blocking state of the target and
its units. We block all logical units of a target as soon as one of
them needs to be blocked, and keep them blocked until all of them are
ready to be unblocked.
Furthermore, as the history of the old sbp2 driver has shown, the
scsi_block_requests() API is a minefield with high potential of
deadlocks. We therefore take extra measures to keep logical units
unblocked during __scsi_add_device() and during shutdown.
This avoids I/O errors during reconnect in many but alas not in all
cases. There may still be errors after a re-login had to be performed.
Also, some bridges have been seen to cease fetching management ORBs if
I/O went on up until a bus reset. In these cases, all management ORBs
time out after mgt_orb_timeout. The old sbp2 driver is less vulnerable
or maybe not vulnerable to this, for as yet unknown reasons.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
fw-sbp2 is unable to reconnect while performing __scsi_add_device
because there is only a single workqueue thread context available for
both at the moment. This should be fixed eventually.
An actual failure of __scsi_add_device is easy to handle, but an
incomplete execution of __scsi_add_device with an sdev returned would
remain undetected and leave the SBP-2 target unusable.
Therefore we use a workaround: If there was a bus reset during
__scsi_add_device (i.e. during the SCSI probe), we remove the new sdev
immediately, log out, and attempt login and SCSI probe again.
Tested-by: Jarod Wilson <jwilson@redhat.com> (earlier version)
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
If fw-sbp2 was too late with requesting the reconnect, the target would
reject this. In this case, log out before attempting the reconnect.
Else several firmwares will deny the re-login because they somehow
didn't invalidate the old login.
Also, don't retry reconnects in this situation. The retries won't
succeed either.
These changes improve chances for successful re-login and shorten the
period during which the logical unit is inaccessible.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: Jarod Wilson <jwilson@redhat.com>
When a reconnect failed but re-login succeeded, __scsi_add_device was
called again.
In those cases, __scsi_add_device succeeded and returned the pointer to
the existing scsi_device. fw-sbp2 then continued orderly, except that
it missed to call sbp2_cancel_orbs. SCSI core would call fw-sbp2's
eh_abort_handler eventually if there had been an outstanding command.
This patch avoids the needless lookups and temporary allocations in SCSI
core and I/O stall and timeout until eh_abort_handler hits.
Also, __scsi_add_device tolerating calls for devices which already exist
is undocumented behavior on which we shouldn't rely.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: Jarod Wilson <jwilson@redhat.com>
for easier readable logs if more than one SBP-2 device is present.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: Jarod Wilson <jwilson@redhat.com>
Like the old sbp2 driver, wait for the write transaction to the
AGENT_RESET to complete before proceeding (after login, after reconnect,
or in SCSI error handling).
There is one occasion where AGENT_RESET is written to from atomic
context when getting DEAD status for a command ORB. There we still
continue without waiting for the transaction to complete because this
is more difficult to fix...
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Several different SBP-2 bridges accept a login early while the IDE
device is still powering up. They are therefore unable to respond to
SCSI INQUIRY immediately, and the SCSI core has to retry the INQUIRY.
One of these retries is typically successful, and all is well.
But in case of Momobay FX-3A, the INQUIRY retries tend to fail entirely.
This can usually be avoided by waiting a little while after login before
letting the SCSI core send the INQUIRY. The old sbp2 driver handles
this more gracefully for as yet unknown reasons (perhaps because it
waits for fetch agent resets to complete, unlike fw-sbp2 which quickly
proceeds after requesting the agent reset). Therefore the workaround is
not as much necessary for sbp2.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: Jarod Wilson <jwilson@redhat.com>
This should help to interpret user reports. E.g. one can look up the
vendor OUI (first three bytes of the GUID) and thus tell what is what.
Also simplifies the math in the GUID sysfs attribute.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: Jarod Wilson <jwilson@redhat.com>
If a device is being unplugged while fw-sbp2 had a login or reconnect on
schedule, it would take about half a minute to shut the fw_unit down:
Jan 27 18:34:54 stein firewire_sbp2: logged in to fw2.0 LUN 0000 (0 retries)
<unplug>
Jan 27 18:34:59 stein firewire_sbp2: sbp2_scsi_abort
Jan 27 18:34:59 stein scsi 25:0:0:0: Device offlined - not ready after error recovery
Jan 27 18:35:01 stein firewire_sbp2: orb reply timed out, rcode=0x11
Jan 27 18:35:06 stein firewire_sbp2: orb reply timed out, rcode=0x11
Jan 27 18:35:12 stein firewire_sbp2: orb reply timed out, rcode=0x11
Jan 27 18:35:17 stein firewire_sbp2: orb reply timed out, rcode=0x11
Jan 27 18:35:22 stein firewire_sbp2: orb reply timed out, rcode=0x11
Jan 27 18:35:27 stein firewire_sbp2: orb reply timed out, rcode=0x11
Jan 27 18:35:32 stein firewire_sbp2: orb reply timed out, rcode=0x11
Jan 27 18:35:32 stein firewire_sbp2: failed to login to fw2.0 LUN 0000
Jan 27 18:35:32 stein firewire_sbp2: released fw2.0
After this patch, typically only a few seconds spent in __scsi_add_device
remain:
Jan 27 19:05:50 stein firewire_sbp2: logged in to fw2.0 LUN 0000 (0 retries)
<unplug>
Jan 27 19:05:56 stein firewire_sbp2: sbp2_scsi_abort
Jan 27 19:05:56 stein scsi 33:0:0:0: Device offlined - not ready after error recovery
Jan 27 19:05:56 stein firewire_sbp2: released fw2.0
The benefit of this is less noise in the syslog. It furthermore avoids
a few wasted CPU cycles and needlessly prolonged lifetime of a few
driver objects.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: Jarod Wilson <jwilson@redhat.com>
There is a race between shutdown and creation of devices: fw-core may
attempt to add a device with the same name of an already existing
device. http://bugzilla.kernel.org/show_bug.cgi?id=9828
Impact of the bug: Happens rarely (when shutdown of a device coincides
with creation of another), forces the user to unplug and replug the new
device to get it working.
The fix is obvious: Free the minor number *after* instead of *before*
device_unregister(). This requires to take an additional reference of
the fw_device as long as the IDR tree points to it.
And while we are at it, we fix an additional race condition:
fw_device_op_open() took its reference of the fw_device a little bit too
late, hence was in danger to access an already invalid fw_device.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
This fixes a "can't recognize device" kind of bug.
If the SCSI INQUIRY failed and hence __scsi_add_device failed due to a
bus reset, we tried a logout and then waited for the already scheduled
login work to happen. So far so good, but the generation used for the
logout was outdated, hence the logout never reached the target. The
target might therefore deny the subsequent relogin attempt, which would
also leave the target inaccessible.
Therefore fetch a fresh device->generation for the logout. Use memory
barriers to prevent our plan being foiled by compiler or hardware
optimizations.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
To be more compliant with section 7.4.8 of the SBP-2 specification,
use the mgt_ORB_timeout specified in the SBP-2 device's config rom
for login ORB attempts (though with some sanity checks). A happy
side-effect is that certain device and controller combinations that
sometimes take more than 20 seconds to get synced up (like my laptop
with just about any SBP-2 device) now function more reliably.
Signed-off-by: Jarod Wilson <jwilson@redhat.com>
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> (silenced sparse)
Increase (and rename) the login orb reply timeout value to 20s
to match that of the old firewire stack. 2s simply didn't give
many devices enough time to spin up and reply.
Fixes inability to recognize some devices.
Failure mode was "orb reply timed out"/"failed to login".
Signed-off-by: Jarod Wilson <jwilson@redhat.com>
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> (style, comments, changelog)
Replace an unnecessary subtraction with a bitwise AND when determining the
value of ext_tcode in fw_fill_transaction() to save a cpu cycle or two in a
somewhat critical path.
Signed-off-by: Jarod Wilson <jwilson@redhat.com>
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
read_rom() obtained a fresh new fw_device.generation for each read
transaction. Hence it was able to continue reading in the middle of the
ROM even if a bus reset happened. However the device may have modified
the ROM during the reset. We would end up with a corrupt fetched ROM
image then.
Although all of this is quite unlikely, it is not impossible.
Therefore we now restart reading the ROM if the bus generation changed.
Note, the memory barrier in read_rom() is still necessary according to
tests by Jarod Wilson, despite of the ->generation access being moved up
in the call chain.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
This is essentially what I've been beating on locally, and I've yet to hit
another config rom read failure with it.
Signed-off-by: Jarod Wilson <jwilson@redhat.com>
fw_device.node_id and fw_device.generation are accessed without mutexes.
We have to ensure that all readers will get to see node_id updates
before generation updates.
Fixes an inability to recognize devices after "giving up on config rom",
https://bugzilla.redhat.com/show_bug.cgi?id=429950
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Reviewed by Nick Piggin <nickpiggin@yahoo.com.au>.
Verified to fix 'giving up on config rom' issues on multiple system and
drive combinations that were previously affected.
Signed-off-by: Jarod Wilson <jwilson@redhat.com>
Signed-off-by: Kristian Høgsberg <krh@redhat.com>