Commit Graph

83 Commits

Author SHA1 Message Date
NeilBrown
2604b703b6 [PATCH] md: remove personality numbering from md
md supports multiple different RAID level, each being implemented by a
'personality' (which is often in a separate module).

These personalities have fairly artificial 'numbers'.  The numbers
are use to:
 1- provide an index into an array where the various personalities
    are recorded
 2- identify the module (via an alias) which implements are particular
    personality.

Neither of these uses really justify the existence of personality numbers.
The array can be replaced by a linked list which is searched (array lookup
only happens very rarely).  Module identification can be done using an alias
based on level rather than 'personality' number.

The current 'raid5' modules support two level (4 and 5) but only one
personality.  This slight awkwardness (which was handled in the mapping from
level to personality) can be better handled by allowing raid5 to register 2
personalities.

With this change in place, the core md module does not need to have an
exhaustive list of all possible personalities, so other personalities can be
added independently.

This patch also moves the check for chunksize being non-zero into the ->run
routines for the personalities that need it, rather than having it in core-md.
 This has a side effect of allowing 'faulty' and 'linear' not to have a
chunk-size set.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06 08:34:06 -08:00
NeilBrown
a8745db232 [PATCH] md: convert recently exported symbol to GPL
...because that seems to be the preferred practice these days.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06 08:34:06 -08:00
NeilBrown
9ffae0cf3e [PATCH] md: convert md to use kzalloc throughout
Replace multiple kmalloc/memset pairs with kzalloc calls.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06 08:34:05 -08:00
NeilBrown
2d1f3b5d1b [PATCH] md: clean up 'page' related names in md
Substitute:

  page_cache_get -> get_page
  page_cache_release -> put_page
  PAGE_CACHE_SHIFT -> PAGE_SHIFT
  PAGE_CACHE_SIZE -> PAGE_SIZE
  PAGE_CACHE_MASK -> PAGE_MASK
  __free_page -> put_page

because we aren't using the page cache, we are just using pages.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06 08:34:05 -08:00
NeilBrown
d7603b7e3a [PATCH] md: make /proc/mdstat pollable
With this patch it is possible to poll /proc/mdstat to detect arrays appearing
or disappearing, to detect failures, recovery starting, recovery completing,
and devices being added and removed.

It is similar to the poll-ability of /proc/mounts, though different in that:

We always report that the file is readable (because face it, it is, even if
only for EOF).

We report POLLPRI when there is a change so that select() can detect
it as an exceptional event.  Not only are these exceptional events, but
that is the mechanism that the current 'mdadm' uses to watch for events
(It also polls after a timeout).
(We also report POLLERR like /proc/mounts).

Finally, we only reset the per-file event counter when the start of the file
is read, rather than when poll() returns an event.  This is more robust as it
means that an fd will continue to report activity to poll/select until the
program clearly responds to that activity.

md_new_event takes an 'mddev' which isn't currently used, but it will be soon.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06 08:34:05 -08:00
NeilBrown
ddaf22abaa [PATCH] md: attempt to auto-correct read errors in raid1
On a read-error we suspend the array, then synchronously read the block from
other arrays until we find one where we can read it.  Then we try writing the
good data back everywhere and make sure it works.  If any write or subsequent
read fails, only then do we fail the device out of the array.

To be able to suspend the array, we need to also keep track of how many
requests are queued for handling by raid1d.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06 08:34:03 -08:00
NeilBrown
6cce3b23f6 [PATCH] md: write intent bitmap support for raid10
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06 08:34:03 -08:00
NeilBrown
b15c2e57f0 [PATCH] md: move bitmap_create to after md array has been initialised
This is important because bitmap_create uses
  mddev->resync_max_sectors
and that doesn't have a valid value until after the array
has been initialised (with pers->run()).
[It doesn't make a difference for current personalities that
 support bitmaps, but will make a difference for raid10]

This has the added advantage of meaning with can move the thread->timeout
manipulation inside the bitmap.c code instead of sprinkling identical code
throughout all personalities.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06 08:34:03 -08:00
NeilBrown
6ff8d8ec06 [PATCH] md: allow dirty raid[456] arrays to be started at boot
See patch to md.txt for more details

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06 08:34:02 -08:00
Neil Brown
bcb97940f3 [PATCH] md: Change case of raid level reported in sys/mdX/md/level
I had thought that keeping the reported tail level clearly different
from the module name was a good idea, but I've changed my mind.

'raid5' is better and probably less confusing than 'RAID-5'.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-12-19 16:47:50 -08:00
NeilBrown
b2a2703c28 [PATCH] md: set default_bitmap_offset properly in set_array_info
If an array is created using set_array_info, default_bitmap_offset isn't set
properly meaning that an internal bitmap cannot be hot-added until the array
is stopped and re-assembled.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-28 14:42:25 -08:00
NeilBrown
c0e485216d [PATCH] md: fix is_mddev_idle calculation now that disk/sector accounting happens when request completes
md needs to monitor the rate of requests to its devices when doing
resync/recovery so that it can back-off when there is non-resync IO.  It
does this by comparing resync IO, which it counts, with total IO which is
taken from disk_stats.

disk_stats were recently changed to account sectors when a request
completes instead of when it is queued.  This upsets md's calculations.

We could do the sync_io accounting at the end of requests too, but that has
problems.  If an underlying device is an md array, the accounting will
still be done when the request is submitted.  This could be changed for
some raid levels, but it cannot be changed for raid0 or linear without
substantial code changes.

So instead, we increase the error that is_mddev_idle allows, up to the
maximum amount of resync IO that can be in flight at any time.  The
calculation is current fragile as each personality as different limits for
in-flight resync.  This should be fixed up.

For now, this simple patch fixes the problem.

Increasing the error margin decreases the sensitivity to non-resync IO.  To
partially compensate for this, the time to wait when non-resync IO is
detected is increased so that less steady IO is required to keep the resync
at bay.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-18 07:49:46 -08:00
NeilBrown
93588e2284 [PATCH] md: make md threads interruptible again
Despite the fact that md threads don't need to be signalled, and won't
respond to signals anyway, we need to have an 'interruptible' wait, else
they stay in 'D' state and add to the load average.

(akpm: the signal_pending() test is unneeded - we'll fix that up in the next
round.  For now, leave it there because that's how the code used to be).

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-15 08:59:19 -08:00
NeilBrown
e8a0033451 [PATCH] md: mark START_ARRAY deprecated with a date
This was marked deprecated "after 2.6" back in the 2.5 days.  But now it
seems there isn't going to be any "after 2.6", and we deprecate by date
now.  So set a date.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-15 08:59:19 -08:00
NeilBrown
bb636547b0 [PATCH] md: document sysfs usage of md, and make a couple of small refinements
Document in Documentation/md.txt the files that now appear in sysfs, and make
a couple of small refinements to exactly when 'level' and 'raid_disks' are
empty, to make it match the documentation.

Signed-off-by: Neil Brown <neilb@suse.de>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-09 07:56:40 -08:00
NeilBrown
7eec314d75 [PATCH] md: improve 'scan_mode' and rename it to 'sync_action'
The current sync_action for an array can be one of

   idle  - nothing happening
   resync - reduncancy being recalcualted
   recover - missing device being recoverred to spare
   check   - user initiated check of redundancy
   repair  - like resync but user-initiated and ignores
             bitmap optimisation.

Each of these strings can also be written to the 'sync_action' file to cause
that action to happen (if appropriate).

While 'sync' is not technically correct, as a recovery is *not* a 'sync', I
think it is the most servicable word here.  Also 'action' is a strong word
than 'mode'.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-09 07:56:40 -08:00
NeilBrown
787453c239 [PATCH] md: complete conversion of md to use kthreads
There are a few loose ends following the conversion of md to use kthreads:

- Some fields in mdk_thread_t that aren't needed (kthreads does it's own
  completion and manages it's own name).

- thread->run is now never NULL, so no need to check

- Some tests for signal_pending that aren't needed (As we don't use signals
  to stop threads any more)

- Some flush_signals are not needed

- Some waits are interruptible and don't need to be.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-09 07:56:40 -08:00
NeilBrown
fd9d49cac4 [PATCH] md: ignore auto-readonly flag for arrays where it isn't meaningful
The 'auto-readonly' flag (which suppresses resync and superblock updates until
the first write) is not meaningful for personalities that don't support resync
or superblock writes (raid0, linear, etc).

So clear the setting early to avoid it confusing anything - e.g.  appearing in
/proc/mdstat

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-09 07:56:39 -08:00
NeilBrown
8e1b39d623 [PATCH] md: only try to print recovery/resync status for personalities that support recovery
The introduction of 'resync=PENDING' (for read-only devices) caused that
message to appear for non-syncable arrays like raid0 and linear.  Simplest
thing is to not try to print any resync info unless the personality clearly
supports it.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-09 07:56:39 -08:00
NeilBrown
411036fa19 [PATCH] md: split off some md attributes in sysfs to a separate group
Some, but not all, md array support data redundancy and hence support checking
and restoring that redundancy (resync, rebuild).

Some attributes apply specifically to functions involving this redundancy, and
so should only appear for md arrays for which they are meaningful.  i.e.  they
should not appear for raid0, linear, multpath, faulty.

This patch separates these into a distinct group and creates the group only if
the personality supports sync_request.

Signed-off-by: Neil Brown <neilb@suse.de>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-09 07:56:39 -08:00
NeilBrown
96de1e663c [PATCH] md: fix some locking and module refcounting issues with md's use of sysfs
1/ I really should be using the __ATTR macros for defining attributes, so
   that the .owner field get set properly, otherwise modules can be removed
   while sysfs files are open.  This also involves some name changes of _show
   routines.

2/ Always lock the mddev (against reconfiguration) for all sysfs attribute
   access.  This easily avoid certain races and is completely consistant with
   other interfaces (ioctl and /proc/mdstat both always lock against
   reconfiguration).

3/ raid5 attributes must check that the 'conf' structure actually exists
   (the array could have been stopped while an attribute file was open).

4/ A missing 'kfree' from when the raid5_conf_t was converted to have a
   kobject embedded, and then converted back again.

Signed-off-by: Neil Brown <neilb@suse.de>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-09 07:56:39 -08:00
NeilBrown
f637b9f9fc [PATCH] md: make sure /block link in /sys/.../md/ goes to correct devices
If a block_device is a partition, then it's kobject is
  bdev->bd_part->kobj
otherwise (if it is a full device), the kobject is
  bdev->bd_disk->kobj

As md wants back-links to the correct object (whether partition or not), we
need to respect this difference...  (Thus current code shows a link to the
whole device, whether we are using a partition or not, which is wrong).

Signed-off-by: Neil Brown <neilb@suse.de>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-09 07:56:39 -08:00
NeilBrown
f91de92ed6 [PATCH] md: allow md arrays to be started read-only (module parameter).
When an md array is started, the superblock will be written, and resync may
commense.  This is not good if you want to be completely read-only as, for
example, when preparing to resume from a suspend-to-disk image.

So introduce a module parameter "start_ro" which can be set
to '1' at boot, at module load, or via
  /sys/module/md_mod/parameters/start_ro

When this is set, new arrays get an 'auto-ro' mode, which disables all
internal io (superblock updates, resync, recovery) and is automatically
switched to 'rw' when the first write request arrives.

The array can be set to true 'ro' mode using 'mdadm -r' before the first
write request, or resync can be started without a write using 'mdadm -w'.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-09 07:56:38 -08:00
NeilBrown
19133a4298 [PATCH] md: Remove attempt to use dynamic names in sysfs for component devices on an MD array.
With version-0.90 superblock, component devices on an md device to not have
any stable name related to the array -(version-1 assigns a fixed index when
a device is added to an array, and this remains despit any hot-swap).

The intial code for making these devices appear in sysfs used dynamic
names, which would change whenever a hot-spare was swapped for a failed or
missing device.  This turns out not to be practical in sysfs for a number
of reasons.

This patch changes then naming of component devices to be based on the
result of 'bdevname'.  This is stable and should be unique.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-09 07:56:38 -08:00
NeilBrown
a9701a3047 [PATCH] md: support BIO_RW_BARRIER for md/raid1
We can only accept BARRIER requests if all slaves handle
barriers, and that can, of course, change with time....

So we keep track of whether the whole array seems safe for barriers,
and also whether each individual rdev handles barriers.

We initially assumes barriers are OK.

When writing the superblock we try a barrier, and if that fails, we flag
things for no-barriers.  This will usually clear the flags fairly quickly.

If writing the superblock finds that BIO_RW_BARRIER is -ENOTSUPP, we need to
resubmit, so introduce function "md_super_wait" which waits for requests to
finish, and retries ENOTSUPP requests without the barrier flag.

When writing the real raid1, write requests which were BIO_RW_BARRIER but
which aresn't supported need to be retried.  So raid1d is enhanced to do this,
and when any bio write completes (i.e.  no retry needed) we remove it from the
r1bio, so that devices needing retry are easy to find.

We should hardly ever get -ENOTSUPP errors when writing data to the raid.
It should only happen if:
  1/ the device used to support BARRIER, but now doesn't.  Few devices
     change like this, though raid1 can!
or
  2/ the array has no persistent superblock, so there was no opportunity to
     pre-test for barriers when writing the superblock.

Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-09 07:56:38 -08:00
NeilBrown
bd926c63b7 [PATCH] md: make md on-disk bitmaps not host-endian
Current bitmaps use set_bit et.al and so are host-endian, which means
not-portable.  Oops.

Define a new version number (4) for which bitmaps are little-endian.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-09 07:56:38 -08:00
NeilBrown
b2d444d7ad [PATCH] md: convert 'faulty' and 'in_sync' fields to bits in 'flags' field
This has the advantage of removing the confusion caused by 'rdev_t' and
'mddev_t' both having 'in_sync' fields.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-09 07:56:38 -08:00
NeilBrown
ba22dcbf10 [PATCH] md: improvements to raid5 handling of read errors
Two refinements to the 'attempt-overwrite-on-read-error' mechanism.
1/ If the array is read-only, don't attempt an over-write.
2/ If there are more than max_nr_stripes read errors on a device with
   no success, fail the drive.  This will make sure a dead
   drive will be eventually kicked even when we aren't trying
   to rewrite (which would normally kick a dead drive more quickly.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-09 07:56:38 -08:00
NeilBrown
007583c925 [PATCH] md: change raid5 sysfs attribute to not create a new directory
There isn't really a need for raid5 attributes to be an a subdirectory,
so this patch moves them from
  /sys/block/mdX/md/raid5/attribute
to
  /sys/block/mdX/md/attribute

This suggests that all md personalities should co-operate about
namespace usage, but that shouldn't be a problem.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-09 07:56:38 -08:00
NeilBrown
31399d9e56 [PATCH] md: minor MD fixes
1/ Use reduce stack usage, because 'gcc' apparently doesn't overlay
   different variables  that are in separate scopes...

2/ Use test_bit instead of ( .. & 1<< ..) which in this case is buggy.

Thanks to Andrew Morton

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-09 07:56:37 -08:00
NeilBrown
9c79197761 [PATCH] md: fix ref-counting problems with kobjects in md
Thanks Greg.

Cc:  Greg KH <greg@kroah.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-09 07:56:37 -08:00
NeilBrown
9d88883e68 [PATCH] md: teach raid5 the difference between 'check' and 'repair'.
With this, raid5 can be asked to check parity without repairing it.  It also
keeps a count of the number of incorrect parity blocks found (mismatches) and
reports them through sysfs.

Signed-off-by: Neil Brown <neilb@suse.de>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-09 07:56:37 -08:00
NeilBrown
24dd469d72 [PATCH] md: allow a manual resync with md
You can trigger a 'check' with
  echo check > /sys/block/mdX/md/scan_mode
or a check-and-repair errors with
  echo repair > /sys/block/mdX/md/scan_mode

and read the current state from the same file.

Note: personalities need to know the different between 'check' and 'repair',
but don't yet.  Until they do, 'check' will be the same as 'repair' and will
just do a normal resync pass.

Signed-off-by: Neil Brown <neilb@suse.de>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-09 07:56:37 -08:00
NeilBrown
86e6ffdd24 [PATCH] md: extend md sysfs support to component devices.
Each device in an md array how has a corresponding
  /sys/block/mdX/md/devNN/
directory which can contain attributes.  Currently there is only 'state' which
summarises the state, nd 'super' which has a copy of the superblock, and
'block' which is a symlink to the block device.

Also, /sys/block/mdX/md/rdNN represents slot 'NN' in the array, and is a
symlink to the relevant 'devNN'.  Obviously spare devices do not have a slot
in the array, and so don't have such a symlink.

Signed-off-by: Neil Brown <neilb@suse.de>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-09 07:56:37 -08:00
NeilBrown
eae1701fbd [PATCH] md: initial sysfs support for md
Start using kobjects in mddevs, and provide a couple of simple attributes
(level and disks).  Attributes live in
  /sys/block/mdX/md/attr-name

Signed-off-by: Neil Brown <neilb@suse.de>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-09 07:56:36 -08:00
Jens Axboe
a362357b6c [BLOCK] Unify the seperate read/write io stat fields into arrays
Instead of having ->read_sectors and ->write_sectors, combine the two
into ->sectors[2] and similar for the other fields. This saves a branch
several places in the io path, since we don't have to care for what the
actual io direction is. On my x86-64 box, that's 200 bytes less text in
just the core (not counting the various drivers).

Signed-off-by: Jens Axboe <axboe@suse.de>
2005-11-01 09:26:16 +01:00
NeilBrown
8712e55356 [PATCH] md: make sure mdthreads will always respond to kthread_stop
There are still a couple of cases where md threads (the resync/recovery
thread) is not interruptible since the change to use kthreads.  All places
there it tests "signal_pending", it should also test kthread_should_stop,
as with this patch.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-26 10:39:42 -07:00
NeilBrown
6985c43f39 [PATCH] Three one-liners in md.c
The main problem fixes is that in certain situations stopping md arrays may
take longer than you expect, or may require multiple attempts.  This would
only happen when resync/recovery is happening.

This patch fixes three vaguely related bugs.

1/ The recent change to use kthreads got the setting of the
   process name wrong.  This fixes it.
2/ The recent change to use kthreads lost the ability for
   md threads to be signalled with SIG_KILL.  This restores that.
3/ There is a long standing bug in that if:
    - An array needs recovery (onto a hot-spare) and
    - The recovery is being blocked because some other array being
       recovered shares a physical device and
    - The recovery thread is killed with SIG_KILL
   Then the recovery will appear to have completed with no IO being
   done, which can cause data corruption.
   This patch makes sure that incomplete recovery will be treated as
   incomplete.

Note that any kernel affected by bug 2 will not suffer the problem of bug
3, as the signal can never be delivered.  Thus the current 2.6.14-rc
kernels are not susceptible to data corruption.  Note also that if arrays
are shutdown (with "mdadm -S" or "raidstop") then the problem doesn't
occur.  It only happens if a SIGKILL is independently delivered as done by
'init' when shutting down.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-19 23:04:30 -07:00
Adrian Bunk
338cec3253 [PATCH] merge some from Rusty's trivial patches
This patch contains the most trivial from Rusty's trivial patches:
- spelling fixes
- remove duplicate includes

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-10 10:06:30 -07:00
NeilBrown
611815651b [PATCH] md: really get sb_size setting right in all cases
There was another case where sb_size wasn't being set, so instead do the
sensible thing and set if when filling in the content of a superblock.  That
ensures that whenever we write a superblock, the sb_size MUST be set.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 16:39:14 -07:00
NeilBrown
188c18fd79 [PATCH] md: make sure the new 'sb_size' is set properly device added without pre-existing superblock.
There are two ways to add devices to an md/raid array.

  It can have superblock written to it, and then given to the md driver,
  which will read the superblock (the new way)

or

  md can be told (through SET_ARRAY_INFO) the shape of the array, and
  the told about individual drives, and md will create the required
  superblock (the old way).

The newly introduced sb_size was only set for drives being added the
new way, not the old ways.  Oops :-(

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 16:39:14 -07:00
NeilBrown
b325a32e57 [PATCH] md: report spare drives in /proc/mdstat
Just like failed drives have (F), so spare drives now have (S).

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 16:39:14 -07:00
NeilBrown
1cd6bf19bb [PATCH] md: add information about superblock version to /proc/mdstat
Leave it unchanged if the original (0.90) is used, incase it might be a
compatability problem.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 16:39:14 -07:00
NeilBrown
720a3dc39b [PATCH] md: use queue_hardsect_size instead of block_size for md superblock size calc.
Doh.  I want the physical hard-sector-size, not the current block size...

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 16:39:13 -07:00
NeilBrown
53e87fbb5d [PATCH] md: choose better default offset for bitmap.
On reflection, a better default location for hot-adding bitmaps with version-1
superblocks is immediately after the superblock.  There might not be much room
there, but there is usually atleast 3k, and that is a good start.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 16:39:13 -07:00
NeilBrown
a6fb0934f9 [PATCH] md: use kthread infrastructure in md
Switch MD to use the kthread infrastructure, to simplify the code and get rid
of tasklist_lock abuse in md_unregister_thread.

Also don't flush signals in md_thread, as the called thread will always do
that.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 16:39:13 -07:00
NeilBrown
934ce7c840 [PATCH] md: write-intent bitmap support for raid6
This is a direct port of the raid5 patch.

Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 16:39:12 -07:00
NeilBrown
72626685dc [PATCH] md: add write-intent-bitmap support to raid5
Most awkward part of this is delaying write requests until bitmap updates have
been flushed.

To achieve this, we have a sequence number (seq_flush) which is incremented
each time the raid5 is unplugged.

If the raid thread notices that this has changed, it flushes bitmap changes,
and assigned the value of seq_flush to seq_write.

When a write request arrives, it is given the number from seq_write, and that
write request may not complete until seq_flush is larger than the saved seq
number.

We have a new queue for storing stripes which are waiting for a bitmap flush
and an extra flag for stripes to record if the write was 'degraded' and so
should not clear the a bit in the bitmap.

Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 16:39:12 -07:00
NeilBrown
0002b2718d [PATCH] md: limit size of sb read/written to appropriate amount
version-1 superblocks are not (normally) 4K long, and can be of variable size.
 Writing the full 4K can cause corruption (but only in non-default
configurations).

With this patch the super-block-flavour can choose a size to read, and set a
size to write based on what it finds.

Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 16:39:12 -07:00
NeilBrown
71c0805cb4 [PATCH] md: allow md to load a superblock with feature-bit '1' set
As this is used to flag an internal bitmap.

Also, introduce symbolic names for feature bits.

Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 16:39:11 -07:00