We must ensure that the readdir data is always invalidated whether or not
the weak cache consistency data update succeeds.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
We don't need to force a dentry lookup just because we're making changes to
the directory.
Don't update nfsi->cache_change_attribute in nfs_end_data_update: that
overrides the NFSv3/v4 weak consistency checking that tells us our update
was the only one, and that tells us the dcache is still valid.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
We always want to check that the verifier and directory
cache_change_attribute match. This also allows us to remove the 'wraparound
hack' for the cache_change_attribute. If we're only checking for equality,
then we don't care about wraparound issues.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
It doesn't really make sense to cache an access call without also
revalidating the attributes.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Currently, _nfs4_do_access() is just a copy of nfs_do_access() with added
conversion of the open flags into an access mask. This patch merges the
duplicate functionality.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
nfs_set_verifier() and nfs_refresh_verifier() do exactly the same thing, so
replace one with the other.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The nfs_renew_times() function plants the current time in jiffies in
dentry->d_time. But a call to nfs_renew_times() is always followed by
another call that overwrites dentry->d_time. Get rid of the
nfs_renew_times() calls.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Negative dentries need to be reverified after an asynchronous unlink.
Quoth Trond:
"Unfortunately I don't think that we can avoid revalidating the
resulting negative dentry since the UNLINK call is asynchronous,
and so the new verifier on the directory will only be known a
posteriori."
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The default "intr" setting is different for NFS and NFSv4. To avoid
confusion on this issue, don't hide the "nointr" option in /proc/mounts.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Re-order mount option sanity checking slightly to ensure we have a valid
server address *before* trying to do the mountd RPC call.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Add a dependency on RDMA before enabling SUNRPC_XPRT_RDMA
Yes, "INFINIBAND" also turns on iWARP and other RDMA support.
Signed-off-by: Tom Talpey <talpey@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Adds hooks to the string-based NFS mount to support an "rdma" protocol option.
Signed-off-by: Tom Talpey <tmt@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
This file implements the configuration target, protocol template and
constants for the rpcrdma transport framing, for use by the xprtrdma
rpc transport implementation.
Signed-off-by: Tom Talpey <talpey@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Use the per-transport strings to display the transport protocol accurately.
Signed-off-by: Tom Talpey <tmt@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Instead of an { address family, raw IP protocol number }-tuple, use the
newly-defined RPC identifier when creating clients in the upper layers.
Signed-off-by: Tom Talpey <tmt@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Adds a flag word to the xdrbuf struct which indicates any bulk
disposition of the data. This enables RPC transport providers to
marshal it efficiently/appropriately, and may enable other
optimizations.
Signed-off-by: Tom Talpey <tmt@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The previous patch introduced a bug when copying the server address.
Also clarify a copy into the auth_flavours array: currently the two
size calculations are equivalent, but we may decide to change the size
of auth_flavors[] at some point.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The user-visible nfs4_mount_data does not contain sufficient data to
describe new mount options, and also is now a legacy structure. Replace
it with the internal nfs_parsed_mount_data for nfsv4 in-kernel use.
Signed-off-by: Tom Talpey <tmt@netapp.com>
Acked-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The user-visible nfs_mount_data does not contain sufficient data to
describe new mount options, and also is now a legacy structure. Replace
it with the internal nfs_parsed_mount_data for nfsv[23] in-kernel use.
Signed-off-by: Tom Talpey <tmt@netapp.com>
Acked-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
In preparation for rearranging the nfs mount argument passing, make the
nfs_parsed_mount_data struct visible across nfs kernel files.
Signed-off-by: Tom Talpey <tmt@netapp.com>
Acked-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Due to recent edict to remove or replace printk's that can flood the system
log.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Due to recent edict to remove or replace printk's that might flood the
system log.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Due to recent edict to replace or remove printk's that can be triggered en
masse by remote misbehavior. Left a few that only occur just before a BUG.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
I got the 'mounthost=' option wrong - it shouldn't look for an address
value, but rather a hostname value. However, the in-kernel mount client
and NFS client cannot resolve a hostname by themselves; they rely on
user-land to pass in the resolved address.
Create a new mount option that does take an address so that the mount
program's address can be passed in. The mount hostname is now ignored
by the kernel.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
If no mount server port number is specified, the previous change to the
kernel mount client inadvertently allows the NFS server's port number to be
the used as the mount server's port number. If the user specifies an NFS
server port (-o port=x), the mount will fail.
The fix below sets the mount server's port to 0 if no mount server
port is specified by the user.
Signed-off-by: James Lentini <jlentini@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Simplify the in-kernel mount client by using autobind instead of an
explicit call to rpc_getport_sync.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
A minor thing, but useful when working with a server with multiple
addrs. This looks like it might also be necessary if Miklos' effort
to eliminate /etc/mtab ever comes to fruition.
When displaying mount options in /proc/mounts, the kernel prints
"addr=hostname". This info is redundant since we already have the
hostname displayed as part of the "device" section of the mount. This
patch changes it to display the IP address to which the socket is
connected.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
I would like to discuss the idea that the current checks for attribute
timeout using time_after are inadequate for 32bit architectures, since
time_after works correctly only when the two timestamps being compared
are within 2^31 jiffies of each other. The signed overflow caused by
comparing values more than 2^31 jiffies apart will flip the result,
causing incorrect assumptions of validity.
2^31 jiffies is a fairly large period of time (~25 days) when compared
to the lifetime of most kernel data structures, but for long lived NFS
mounts that can sit idle for months (think that for some reason autofs
cannot be used), it is easy to compare inode attribute timestamps with
very disparate or even bogus values (as in when jiffies have wrapped
many times, where the comparison doesn't even make sense).
Currently the code tests for attribute timeout by simply adding the
desired amount of jiffies to the stored timestamp and comparing that
with the current timestamp of obtained attribute data with time_after.
This is incorrect, as it returns true for the desired timeout period
and another full 2^31 range of jiffies.
In testing with artificial jumps (several small jumps, not one big
crank) of the jiffies I was able to reproduce a problem found in a
server with very long lived NFS mounts, where attributes would not be
refreshed even after touching files and directories in the server:
Initial uptime:
03:42:01 up 6 min, 0 users, load average: 0.01, 0.12, 0.07
NFS volume is mounted and time is advanced:
03:38:09 up 25 days, 2 min, 0 users, load average: 1.22, 1.05, 1.08
# ls -l /local/A/foo/bar /nfs/A/foo/bar
-rw-r--r-- 1 root root 0 Dec 17 03:38 /local/A/foo/bar
-rw-r--r-- 1 root root 0 Nov 22 00:36 /nfs/A/foo/bar
# touch /local/A/foo/bar
# ls -l /local/A/foo/bar /nfs/A/foo/bar
-rw-r--r-- 1 root root 0 Dec 17 03:47 /local/A/foo/bar
-rw-r--r-- 1 root root 0 Nov 22 00:36 /nfs/A/foo/bar
We can see the local mtime is updated, but the NFS mount still shows
the old value. The patch below makes it work:
Initial setup...
07:11:02 up 25 days, 1 min, 0 users, load average: 0.15, 0.03, 0.04
# ls -l /local/A/foo/bar /nfs/A/foo/bar
-rw-r--r-- 1 root root 0 Jan 11 07:11 /local/A/foo/bar
-rw-r--r-- 1 root root 0 Jan 11 07:11 /nfs/A/foo/bar
# touch /local/A/foo/bar
# ls -l /local/A/foo/bar /nfs/A/foo/bar
-rw-r--r-- 1 root root 0 Jan 11 07:14 /local/A/foo/bar
-rw-r--r-- 1 root root 0 Jan 11 07:14 /nfs/A/foo/bar
Signed-off-by: Fabio Olive Leite <fleite@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Hi.
Attached is a patch to modify the NFS client code to support
64 bit ino's, as appropriate for the system and the NFS
protocol version.
The code basically just expand the NFS interfaces for routines
which handle ino's from using ino_t to u64 and then uses the
fileid in the nfs_inode instead of i_ino in the inode. The
code paths that were updated are in the getattr method and
the readdir methods.
This should be no real change on 64 bit platforms. Since
the ino_t is an unsigned long, it would already be 64 bits
wide.
Thanx...
ps
Signed-off-by: Peter Staubach <staubach@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
This helps prevent huge queues of background writes from building up
whenever the server runs out of disk or quota space, or if someone changes
the file access modes behind our backs.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Schedule writes using WB_SYNC_NONE first, then come back for a second pass
using WB_SYNC_ALL.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The only user of nfs_sync_mapping_range() is nfs_getattr(), which uses it
to flush out the entire inode without sending a commit. We therefore
replace nfs_sync_mapping_range with a more appropriate helper.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Just call write_cache_pages directly instead of hacking the writeback
control structure in order to find out if we were called from writepages()
or directly from the VM.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The addition of nfs_page_mkwrite means that We should no longer need to
create requests inside nfs_writepage()
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The recent fix for a circular lock dependency unfortunately introduced a
potential memory leak in the event where the call to nlmsvc_lookup_host
fails for some reason.
Thanks to Roel Kluin for spotting this.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When IOCB_FLAG_RESFD flag is set and iocb->aio_resfd is incorrect,
statement 'goto out_put_req' is executed. At label 'out_put_req',
aio_put_req(..) is called, which requires 'req->ki_filp' set.
Signed-off-by: Yan Zheng<yanzheng@21cn.com>
Cc: Zach Brown <zach.brown@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/cooloney/blackfin-2.6:
Blackfin arch: fix PORT_J BUG for BF537/6 EMAC driver reported by Kalle Pokki <kalle.pokki@iki.fi>
Blackfin arch: gpio pinmux and resource allocation API required by BF537 on chip ethernet mac driver
Blackfin arch: add some missing syscall
binfmt_flat: checkpatch fixing minimum support for the blackfin relocations
Binfmt_flat: Add minimum support for the Blackfin relocations
The fs was not unlocking the local alloc inode mutex in the code path in
which it failed to find a window of free bits in the global bitmap.
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Nick Piggin points out that splice isn't being good about the mmap
semaphore: while two readers can nest inside each others, it does leave
a possible deadlock if a writer (ie a new mmap()) comes in during that
nesting.
Original "just move the locking" patch by Nick, replaced by one by me
based on an optimistic pagefault_disable(). And then Jens tested and
updated that patch.
Reported-by: Nick Piggin <npiggin@suse.de>
Tested-by: Jens Axboe <jens.axboe@oracle.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This reverts commit b394e43e99.
Lachlan McIlroy says:
It tried to fix an issue where log replay is replaying an inode cluster
initialisation transaction that should not be replayed because the inode
cluster on disk is more up to date. Since we don't log file sizes (we
rely on inode flushing to get them to disk) then we can't just replay
all the transations in the log and expect the inode to be completely
restored. We lose file size updates. Unfortunately this fix is causing
more (serious) problems than it is fixing.
SGI-PV: 969656
SGI-Modid: xfs-linux-melb:xfs-kern:29804a
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
It doesn't look as if the NFS file name limit is being initialised correctly
in the struct nfs_server. Make sure that we limit whatever is being set in
nfs_probe_fsinfo() and nfs_init_server().
Also ensure that readdirplus and nfs4_path_walk respect our file name
limits.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The problem is that the garbage collector for the 'host' structures
nlm_gc_hosts(), holds nlm_host_mutex while calling down to
nlmsvc_mark_resources, which, eventually takes the file->f_mutex.
We cannot therefore call nlmsvc_lookup_host() from within
nlmsvc_create_block, since the caller will already hold file->f_mutex, so
the attempt to grab nlm_host_mutex may deadlock.
Fix the problem by calling nlmsvc_lookup_host() outside the file->f_mutex.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6:
[PATCH] WE : Add missing auth compat-ioctl
[PATCH] softmac: Fix inability to associate with WEP networks
Different types of ufs hold state in different places, to hide complexity
of this, there is ufs_get_fs_state, it returns state according to
"UFS_SB(sb)->s_flags", but during mount ufs_get_fs_state is called, before
setting s_flags, this cause message for ufs types like sun ufs: "fs need
fsck", and remount in readonly state.
Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>