kernel_optimize_test

Author	SHA1	Message	Date
Miklos Szeredi	6787341a0f	ovl: check snprintf return Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2017-07-27 21:54:05 +02:00
Amir Goldstein	0e082555ce	ovl: check for bad and whiteout index on lookup Index should always be of the same file type as origin, except for the case of a whiteout index. A whiteout index should only exist if all lower aliases have been unlinked, which means that finding a lower origin on lookup whose index is a whiteout should be treated as a lookup error. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2017-07-20 11:08:21 +02:00
Amir Goldstein	61b674710c	ovl: do not cleanup directory and whiteout index entries Directory index entries are going to be used for looking up redirected upper dirs by lower dir fh when decoding an overlay file handle of a merge dir. Whiteout index entries are going to be used as an indication that an exported overlay file handle should be treated as stale (i.e. after unlink of the overlay inode). We don't know the verification rules for directory and whiteout index entries, because they have not been implemented yet, so fail to mount overlay rw if those entries are found to avoid corrupting an index that was created by a newer kernel. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2017-07-20 11:08:21 +02:00
Miklos Szeredi	1d88f18373	ovl: fix xattr get and set with selinux inode_doinit_with_dentry() in SELinux wants to read the upper inode's xattr to get security label, and ovl_xattr_get() calls ovl_dentry_real(), which depends on dentry->d_inode, but d_inode is null and not initialized yet at this point resulting in an Oops. Fix by getting the upperdentry info from the inode directly in this case. Reported-by: Eryu Guan <eguan@redhat.com> Fixes: `09d8b58673` ("ovl: move __upperdentry to ovl_inode") Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2017-07-20 11:08:21 +02:00
David Howells	bc98a42c1f	VFS: Convert sb->s_flags & MS_RDONLY to sb_rdonly(sb) Firstly by applying the following with coccinelle's spatch: @@ expression SB; @@ -SB->s_flags & MS_RDONLY +sb_rdonly(SB) to effect the conversion to sb_rdonly(sb), then by applying: @@ expression A, SB; @@ ( -(!sb_rdonly(SB)) && A +!sb_rdonly(SB) && A \| -A != (sb_rdonly(SB)) +A != sb_rdonly(SB) \| -A == (sb_rdonly(SB)) +A == sb_rdonly(SB) \| -!(sb_rdonly(SB)) +!sb_rdonly(SB) \| -A && (sb_rdonly(SB)) +A && sb_rdonly(SB) \| -A \|\| (sb_rdonly(SB)) +A \|\| sb_rdonly(SB) \| -(sb_rdonly(SB)) != A +sb_rdonly(SB) != A \| -(sb_rdonly(SB)) == A +sb_rdonly(SB) == A \| -(sb_rdonly(SB)) && A +sb_rdonly(SB) && A \| -(sb_rdonly(SB)) \|\| A +sb_rdonly(SB) \|\| A ) @@ expression A, B, SB; @@ ( -(sb_rdonly(SB)) ? 1 : 0 +sb_rdonly(SB) \| -(sb_rdonly(SB)) ? A : B +sb_rdonly(SB) ? A : B ) to remove left over excess bracketage and finally by applying: @@ expression A, SB; @@ ( -(A & MS_RDONLY) != sb_rdonly(SB) +(bool)(A & MS_RDONLY) != sb_rdonly(SB) \| -(A & MS_RDONLY) == sb_rdonly(SB) +(bool)(A & MS_RDONLY) == sb_rdonly(SB) ) to make comparisons against the result of sb_rdonly() (which is a bool) work correctly. Signed-off-by: David Howells <dhowells@redhat.com>	2017-07-17 08:45:34 +01:00
Amir Goldstein	a59f97ff66	ovl: remove unneeded check for IS_ERR() ovl_workdir_create() returns a valid index dentry or NULL. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2017-07-13 22:06:46 +02:00
Amir Goldstein	961af647fc	ovl: fix origin verification of index dir Commit `54fb347e83` ("ovl: verify index dir matches upper dir") introduced a new ovl_fh flag OVL_FH_FLAG_PATH_UPPER to indicate an upper file handle, but forgot to add the flag to the mask of valid flags, so index dir origin verification always discards existing origin and stores a new one. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2017-07-13 22:06:46 +02:00
Amir Goldstein	ea3dad18dc	ovl: mark parent impure on ovl_link() When linking a file with copy up origin into a new parent, mark the new parent dir "impure". Fixes: `ee1d6d37b6` ("ovl: mark upper dir with type origin entries "impure"") Cc: <stable@vger.kernel.org> # v4.12 Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2017-07-13 22:06:45 +02:00
Amir Goldstein	8fc646b443	ovl: fix random return value on mount On failure to prepare_creds(), mount fails with a random return value, as err was last set to an integer cast of a valid lower mnt pointer or set to 0 if inodes index feature is enabled. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Fixes: `3fe6e52f06` ("ovl: override creds with the ones from ...") Cc: <stable@vger.kernel.org> # v4.7 Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2017-07-13 22:06:45 +02:00
Amir Goldstein	f4439de118	ovl: mark parent impure and restore timestamp on ovl_link_up() Signed-off-by: Amir Goldstein <amir73il@gmail.com>	2017-07-04 22:08:15 +02:00
Amir Goldstein	caf70cb2ba	ovl: cleanup orphan index entries index entry should live only as long as there are upper or lower hardlinks. Cleanup orphan index entries on mount and when dropping the last overlay inode nlink. When about to cleanup or link up to orphan index and the index inode nlink > 1, admit that something went wrong and adjust overlay nlink to index inode nlink - 1 to prevent it from dropping below zero. This could happen when adding lower hardlinks underneath a mounted overlay and then trying to unlink them. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2017-07-04 22:03:19 +02:00
Amir Goldstein	5f8415d6b8	ovl: persistent overlay inode nlink for indexed inodes With inodes index enabled, an overlay inode nlink counts the union of upper and non-covered lower hardlinks. During the lifetime of a non-pure upper inode, the following nlink modifying operations can happen: 1. Lower hardlink copy up 2. Upper hardlink created, unlinked or renamed over 3. Lower hardlink whiteout or renamed over For the first, copy up case, the union nlink does not change, whether the operation succeeds or fails, but the upper inode nlink may change. Therefore, before copy up, we store the union nlink value relative to the lower inode nlink in the index inode xattr trusted.overlay.nlink. For the second, upper hardlink case, the union nlink should be incremented or decremented IFF the operation succeeds, aligned with nlink change of the upper inode. Therefore, before link/unlink/rename, we store the union nlink value relative to the upper inode nlink in the index inode. For the last, lower cover up case, we simplify things by preceding the whiteout or cover up with copy up. This makes sure that there is an index upper inode where the nlink xattr can be stored before the copied up upper entry is unlink. Return the overlay inode nlinks for indexed upper inodes on stat(2). Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2017-07-04 22:03:19 +02:00
Amir Goldstein	59be09712a	ovl: implement index dir copy up Implement a copy up method for non-dir objects using index dir to prevent breaking lower hardlinks on copy up. This method requires that the inodes index dir feature was enabled and that all underlying fs support file handle encoding/decoding. On the first lower hardlink copy up, upper file is created in index dir, named after the hex representation of the lower origin inode file handle. On the second lower hardlink copy up, upper file is found in index dir, by the same lower handle key. On either case, the upper indexed inode is then linked to the copy up upper path. The index entry remains linked for future lower hardlink copy up and for lower to upper inode map, that is needed for exporting overlayfs to NFS. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2017-07-04 22:03:19 +02:00
Miklos Szeredi	fd210b7d67	ovl: move copy up lock out Move ovl_copy_up_start()/ovl_copy_up_end() out so that it's used for both tempfile and workdir copy ups. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2017-07-04 22:03:18 +02:00
Miklos Szeredi	a6fb235a44	ovl: rearrange copy up Split up and rearrange copy up functions to make them better readable. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2017-07-04 22:03:18 +02:00
Miklos Szeredi	55acc66182	ovl: add flag for upper in ovl_entry For rename, we need to ensure that an upper alias exists for hard links before attempting the operation. Introduce a flag in ovl_entry to track the state of the upper alias. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2017-07-04 22:03:18 +02:00
Miklos Szeredi	23f0ab13ea	ovl: use struct copy_up_ctx as function argument This cleans up functions with too many arguments. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2017-07-04 22:03:18 +02:00
Miklos Szeredi	7ab8b1763f	ovl: base tmpfile in workdir too Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2017-07-04 22:03:18 +02:00
Amir Goldstein	02209d1070	ovl: factor out ovl_copy_up_inode() helper Factor out helper for copying lower inode data and metadata to temp upper inode, that is common to copy up using O_TMPFILE and workdir. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2017-07-04 22:03:18 +02:00
Miklos Szeredi	7d90b853f9	ovl: extract helper to get temp file in copy up Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2017-07-04 22:03:18 +02:00
Amir Goldstein	15932c415b	ovl: defer upper dir lock to tempfile link On copy up of regular file using an O_TMPFILE, lock upper dir only before linking the tempfile in place. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2017-07-04 22:03:18 +02:00
Miklos Szeredi	b9ac5c274b	ovl: hash overlay non-dir inodes by copy up origin Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2017-07-04 22:03:17 +02:00
Amir Goldstein	415543d5c6	ovl: cleanup bad and stale index entries on mount Bad index entries are entries whose name does not match the origin file handle stored in trusted.overlay.origin xattr. Bad index entries could be a result of a system power off in the middle of copy up. Stale index entries are entries whose origin file handle is stale. Stale index entries could be a result of copying layers or removing lower entries while the overlay is not mounted. The case of copying layers should be detected earlier by the verification of upper root dir origin and index dir origin. Both bad and stale index entries are detected and removed on mount. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2017-07-04 22:03:17 +02:00
Amir Goldstein	359f392ca5	ovl: lookup index entry for copy up origin When inodes index feature is enabled, lookup in indexdir for the index entry of lower real inode or copy up origin inode. The index entry name is the hex representation of the lower inode file handle. If the index dentry in negative, then either no lower aliases have been copied up yet, or aliases have been copied up in older kernels and are not indexed. If the index dentry for a copy up origin inode is positive, but points to an inode different than the upper inode, then either the upper inode has been copied up and not indexed or it was indexed, but since then index dir was cleared. Either way, that index cannot be used to indentify the overlay inode. If a positive dentry that matches the upper inode was found, then it is safe to use the copy up origin st_ino for upper hardlinks, because all indexed upper hardlinks are represented by the same overlay inode as the copy up origin. Set the INDEX type flag on an indexed upper dentry. A non-upper dentry may also have a positive index from copy up of another lower hardlink. This situation will be handled by following patches. Index lookup is going to be used to prevent breaking hardlinks on copy up. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2017-07-04 22:03:17 +02:00
Amir Goldstein	54fb347e83	ovl: verify index dir matches upper dir An index dir contains persistent hardlinks to files in upper dir. Therefore, we must never mount an existing index dir with a differnt upper dir. Store the upper root dir file handle in index dir inode when index dir is created and verify the file handle before using an existing index dir on mount. Add an 'is_upper' flag to the overlay file handle encoding and set it when encoding the upper root file handle. This is not critical for index dir verification, but it is good practice towards a standard overlayfs file handle format for NFS export. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2017-07-04 22:03:17 +02:00
Amir Goldstein	8b88a2e640	ovl: verify upper root dir matches lower root dir When inodes index feature is enabled, verify that the file handle stored in upper root dir matches the lower root dir or fail to mount. If upper root dir has no stored file handle, encode and store the lower root dir file handle in overlay.origin xattr. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2017-07-04 22:03:17 +02:00
Amir Goldstein	02bcd15774	ovl: introduce the inodes index dir feature Create the index dir on mount. The index dir will contain hardlinks to upper inodes, named after the hex representation of their origin lower inodes. The index dir is going to be used to prevent breaking lower hardlinks on copy up and to implement overlayfs NFS export. Because the feature is not fully backward compat, enabling the feature is opt-in by config/module/mount option. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2017-07-04 22:03:17 +02:00
Amir Goldstein	6b8aa129dc	ovl: generalize ovl_create_workdir() Pass in the subdir name to create and specify if subdir is persistent or if it should be cleaned up on every mount. Move fallback to readonly mount on failure to create dir and print of error message into the helper. This function is going to be used for creating the persistent 'index' dir under workbasedir. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2017-07-04 22:03:17 +02:00
Amir Goldstein	f7d3daca7c	ovl: relax same fs constrain for ovl_check_origin() For the case of all layers not on the same fs, try to decode the copy up origin file handle on any of the lower layers. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2017-07-04 22:03:17 +02:00
Amir Goldstein	2cac0c00a6	ovl: get exclusive ownership on upper/work dirs Bad things can happen if several concurrent overlay mounts try to use the same upperdir/workdir path. Try to get the 'inuse' advisory lock on upperdir and workdir. Fail mount if another overlay mount instance or another user holds the 'inuse' lock on these directories. Note that this provides no protection for concurrent overlay mount that use overlapping (i.e. descendant) upper/work dirs. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2017-07-04 22:03:17 +02:00
Amir Goldstein	ad0af7104d	vfs: introduce inode 'inuse' lock Added an i_state flag I_INUSE and helpers to set/clear/test the bit. The 'inuse' lock is an 'advisory' inode lock, that can be used to extend exclusive create protection beyond parent->i_mutex lock among cooperating users. This is going to be used by overlayfs to get exclusive ownership on upper and work dirs among overlayfs mounts. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2017-07-04 22:03:16 +02:00
Miklos Szeredi	04a01ac7ed	ovl: move cache and version to ovl_inode Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2017-07-04 22:03:16 +02:00
Amir Goldstein	a015dafcaf	ovl: use ovl_inode mutex to synchronize concurrent copy up Use the new ovl_inode mutex to synchonize concurrent copy up instead of the super block copy up workqueue. Moving the synchronization object from the overlay dentry to the overlay inode is needed for synchonizing concurrent copy up of lower hardlinks to the same upper inode. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2017-07-04 22:03:16 +02:00
Miklos Szeredi	13c72075ac	ovl: move impure to ovl_inode Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2017-07-04 22:03:16 +02:00
Miklos Szeredi	cf31c46347	ovl: move redirect to ovl_inode Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2017-07-04 22:03:16 +02:00
Miklos Szeredi	09d8b58673	ovl: move __upperdentry to ovl_inode Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2017-07-04 22:03:16 +02:00
Miklos Szeredi	9020df3720	ovl: compare inodes When checking for consistency in directory operations (unlink, rename, etc.) match inodes not dentries. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2017-07-04 22:03:16 +02:00
Miklos Szeredi	25b7713afe	ovl: use i_private only as a key Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2017-07-04 22:03:16 +02:00
Miklos Szeredi	e6d2ebddbc	ovl: simplify getting inode Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2017-07-04 22:03:16 +02:00
Amir Goldstein	13cf199d00	ovl: allocate an ovl_inode struct We need some more space to store overlay inode data in memory, so allocate overlay inodes from a slab of struct ovl_inode. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2017-07-04 22:03:15 +02:00
Amir Goldstein	f681eb1d5c	ovl: fix nlink leak in ovl_rename() This patch fixes an overlay inode nlink leak in the case where ovl_rename() renames over a non-dir. This is not so critical, because overlay inode doesn't rely on nlink dropping to zero for inode deletion. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2017-07-04 22:03:15 +02:00
Miklos Szeredi	7f53b7d047	UUID/GUID updates: - introduce the new uuid_t/guid_t types that are going to replace the somewhat confusing uuid_be/uuid_le types and make the terminology fit the various specs, as well as the userspace libuuid library. (me, based on a previous version from Amir) - consolidated generic uuid/guid helper functions lifted from XFS and libnvdimm (Amir and me) - conversions to the new types and helpers (Amir, Andy and me) -----BEGIN PGP SIGNATURE----- iQI/BAABCAApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAllZfmILHGhjaEBsc3Qu ZGUACgkQD55TZVIEUYMvyg/9EvWHOOsSdeDykCK3KdH2uIqnxwpl+m7ljccaGJIc MmaH0KnsP9p/Cuw5hESh2tYlmCYN7pmYziNXpf/LRS65/HpEYbs4oMqo8UQsN0UM 2IXHfXY0HnCoG5OixH8RNbFTkxuGphsTY8meaiDr6aAmqChDQI2yGgQLo3WM2/Qe R9N1KoBWH/bqY6dHv+urlFwtsREm2fBH+8ovVma3TO73uZCzJGLJBWy3anmZN+08 uYfdbLSyRN0T8rqemVdzsZ2SrpHYkIsYGUZV43F581vp8e/3OKMoMxpWRRd9fEsa MXmoaHcLJoBsyVSFR9lcx3axKrhAgBPZljASbbA0h49JneWXrzghnKBQZG2SnEdA ktHQ2sE4Yb5TZSvvWEKMQa3kXhEfIbTwgvbHpcDr5BUZX8WvEw2Zq8e7+Mi4+KJw QkvFC1S96tRYO2bxdJX638uSesGUhSidb+hJ/edaOCB/GK+sLhUdDTJgwDpUGmyA xVXTF51ramRS2vhlbzN79x9g33igIoNnG4/PV0FPvpCTSqxkHmPc5mK6Vals1lqt cW6XfUjSQECq5nmTBtYDTbA/T+8HhBgSQnrrvmferjJzZUFGr/7MXl+Evz2x4CjX OBQoAMu241w6Vp3zoXqxzv+muZ/NLar52M/zbi9TUjE0GvvRNkHvgCC4NmpIlWYJ Sxg= =J/4P -----END PGP SIGNATURE----- Merge tag 'uuid-for-4.13' of git://git.infradead.org/users/hch/uuid into overlayfs-next UUID/GUID updates: - introduce the new uuid_t/guid_t types that are going to replace the somewhat confusing uuid_be/uuid_le types and make the terminology fit the various specs, as well as the userspace libuuid library. (me, based on a previous version from Amir) - consolidated generic uuid/guid helper functions lifted from XFS and libnvdimm (Amir and me) - conversions to the new types and helpers (Amir, Andy and me)	2017-07-04 04:05:05 +02:00
Miklos Szeredi	fbaf94ee3c	ovl: don't set origin on broken lower hardlink When copying up a file that has multiple hard links we need to break any association with the origin file. This makes copy-up be essentially an atomic replace. The new file has nothing to do with the old one (except having the same data and metadata initially), so don't set the overlay.origin attribute. We can relax this in the future when we are able to index upper object by origin. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: `3a1e819b4e` ("ovl: store file handle of lower inode on copy up")	2017-06-28 13:41:22 +02:00
Miklos Szeredi	e85f82ff9b	ovl: copy-up: don't unlock between lookup and link Nothing prevents mischief on upper layer while we are busy copying up the data. Move the lookup right before the looked up dentry is actually used. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: `01ad3eb8a0` ("ovl: concurrent copy up of regular files") Cc: <stable@vger.kernel.org> # v4.11	2017-06-28 13:41:22 +02:00
Christoph Hellwig	01633fd254	overlayfs: use uuid_t instead of uuid_be Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>	2017-06-05 16:59:13 +02:00
Christoph Hellwig	85787090a2	fs: switch ->s_uuid to uuid_t For some file systems we still memcpy into it, but in various places this already allows us to use the proper uuid helpers. More to come.. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Acked-by: Mimi Zohar <zohar@linux.vnet.ibm.com> (Changes to IMA/EVM) Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>	2017-06-05 16:59:12 +02:00
Miklos Szeredi	a082c6f680	ovl: filter trusted xattr for non-admin Filesystems filter out extended attributes in the "trusted." domain for unprivlieged callers. Overlay calls underlying filesystem's method with elevated privs, so need to do the filtering in overlayfs too. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2017-05-29 15:15:27 +02:00
Amir Goldstein	f3a1568582	ovl: mark upper merge dir with type origin entries "impure" An upper dir is marked "impure" to let ovl_iterate() know that this directory may contain non pure upper entries whose d_ino may need to be read from the origin inode. We already mark a non-merge dir "impure" when moving a non-pure child entry inside it, to let ovl_iterate() know not to iterate the non-merge dir directly. Mark also a merge dir "impure" when moving a non-pure child entry inside it and when copying up a child entry inside it. This can be used to optimize ovl_iterate() to perform a "pure merge" of upper and lower directories, merging the content of the directories, without having to read d_ino from origin inodes. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2017-05-29 11:48:00 +02:00
Amir Goldstein	ee1d6d37b6	ovl: mark upper dir with type origin entries "impure" When moving a merge dir or non-dir with copy up origin into a non-merge upper dir (a.k.a pure upper dir), we are marking the target parent dir "impure". ovl_iterate() iterates pure upper dirs directly, because there is no need to filter out whiteouts and merge dir content with lower dir. But for the case of an "impure" upper dir, ovl_iterate() will not be able to iterate the real upper dir directly, because it will need to lookup the origin inode and use it to fill d_ino. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2017-05-19 09:33:49 +02:00
Miklos Szeredi	3d27573ce3	ovl: remove unused arg from ovl_lookup_temp() Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2017-05-19 09:33:49 +02:00
Amir Goldstein	21a2287811	ovl: handle rename when upper doesn't support xattr On failure to set opaque/redirect xattr on rename, skip setting xattr and return -EXDEV. On failure to set opaque xattr when creating a new directory, -EIO is returned instead of -EOPNOTSUPP. Any failure to set those xattr will be recorded in super block and then setting any xattr on upper won't be attempted again. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2017-05-19 09:33:49 +02:00
Miklos Szeredi	6266d465bd	ovl: don't fail copy-up if upper doesn't support xattr Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2017-05-18 16:11:24 +02:00
Amir Goldstein	82b749b2c6	ovl: check on mount time if upper fs supports setting xattr xattr are needed by overlayfs for setting opaque dir, redirect dir and copy up origin. Check at mount time by trying to set the overlay.opaque xattr on the workdir and if that fails issue a warning message. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2017-05-18 16:11:24 +02:00
Amir Goldstein	8137ae26d2	ovl: fix creds leak in copy up error path Fixes: `42f269b925` ("ovl: rearrange code in ovl_copy_up_locked()") Cc: <stable@vger.kernel.org> # v4.11 Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2017-05-18 16:11:24 +02:00
Arnd Bergmann	72d42504bd	ovl: select EXPORTFS We get a link error when EXPORTFS is not enabled: ERROR: "exportfs_encode_fh" [fs/overlayfs/overlay.ko] undefined! ERROR: "exportfs_decode_fh" [fs/overlayfs/overlay.ko] undefined! This adds a Kconfig 'select' statement for overlayfs, the same way that it is done for the other users of exportfs. Fixes: `3a1e819b4e` ("ovl: store file handle of lower inode on copy up") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2017-05-15 10:53:07 +02:00
Linus Torvalds	b948abf53a	Merge branch 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs Pull overlayfs update from Miklos Szeredi: "The biggest part of this is making st_dev/st_ino on the overlay behave like a normal filesystem (i.e. st_ino doesn't change on copy up, st_dev is the same for all files and directories). Currently this only works if all layers are on the same filesystem, but future work will move the general case towards more sane behavior. There are also miscellaneous fixes, including fixes to handling append-only files. There's a small change in the VFS, but that only has an effect on overlayfs, since otherwise file->f_path.dentry->inode and file_inode(file) are always the same" * 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: ovl: update documentation w.r.t. constant inode numbers ovl: persistent inode numbers for upper hardlinks ovl: merge getattr for dir and nondir ovl: constant st_ino/st_dev across copy up ovl: persistent inode number for directories ovl: set the ORIGIN type flag ovl: lookup non-dir copy-up-origin by file handle ovl: use an auxiliary var for overlay root entry ovl: store file handle of lower inode on copy up ovl: check if all layers are on the same fs ovl: do not set overlay.opaque on non-dir create ovl: check IS_APPEND() on real upper inode vfs: ftruncate check IS_APPEND() on real upper inode ovl: Use designated initializers ovl: lockdep annotate of nested stacked overlayfs inode lock	2017-05-10 09:03:48 -07:00
Amir Goldstein	5b6c9053fb	ovl: persistent inode numbers for upper hardlinks An upper type non directory dentry that is a copy up target should have a reference to its lower copy up origin. There are three ways for an upper type dentry to be instantiated: 1. A lower type dentry that is being copied up 2. An entry that is found in upper dir by ovl_lookup() 3. A negative dentry is hardlinked to an upper type dentry In the first case, the lower reference is set before copy up. In the second case, the lower reference is found by ovl_lookup(). In the last case of hardlinked upper dentry, it is not easy to update the lower reference of the negative dentry. Instead, drop the newly hardlinked negative dentry from dcache and let the next access call ovl_lookup() to find its lower reference. This makes sure that the inode number reported by stat(2) after the hardlink is created is the same inode number that will be reported by stat(2) after mount cycle, which is the inode number of the lower copy up origin of the hardlink source. NOTE that this does not fix breaking of lower hardlinks on copy up, but only fixes the case of lower nlink == 1, whose upper copy up inode is hardlinked in upper dir. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2017-05-05 11:38:58 +02:00
Miklos Szeredi	5b712091a3	ovl: merge getattr for dir and nondir Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2017-05-05 11:38:58 +02:00
Amir Goldstein	72b608f085	ovl: constant st_ino/st_dev across copy up When all layers are on the same underlying filesystem, let stat(2) return st_dev/st_ino values of the copy up origin inode if it is known. This results in constant st_ino/st_dev representation of files in an overlay mount before and after copy up. When the underlying filesystem support NFS exportfs, the result is also persistent st_ino/st_dev representation before and after mount cycle. Lower hardlinks are broken on copy up to different upper files, so we cannot use the lower origin st_ino for those different files, even for the same fs case. When all overlay layers are on the same fs, use overlay st_dev for non-dirs to get the correct result from du -x. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2017-05-05 11:38:58 +02:00
Amir Goldstein	b7a807dc20	ovl: persistent inode number for directories stat(2) on overlay directories reports the overlay temp inode number, which is constant across copy up, but is not persistent. When all layers are on the same fs, report the copy up origin inode number for directories. This inode number is persistent, unique across the overlay mount and constant across copy up. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2017-05-05 11:38:58 +02:00
Amir Goldstein	595485033d	ovl: set the ORIGIN type flag For directory entries, non zero oe->numlower implies OVL_TYPE_MERGE. Define a new type flag OVL_TYPE_ORIGIN to indicate that an entry holds a reference to its lower copy up origin. For directory entries ORIGIN := MERGE && UPPER. For non-dir entries ORIGIN means that a lower type dentry has been recently copied up or that we were able to find the copy up origin from overlay.origin xattr. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2017-05-05 11:38:58 +02:00
Amir Goldstein	a9d019573e	ovl: lookup non-dir copy-up-origin by file handle If overlay.origin xattr is found on a non-dir upper inode try to get lower dentry by calling exportfs_decode_fh(). On failure to lookup by file handle to lower layer, do not lookup the copy up origin by name, because the lower found by name could be another file in case the upper file was renamed. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2017-05-05 11:38:58 +02:00
Amir Goldstein	c22205d058	ovl: use an auxiliary var for overlay root entry Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2017-05-05 11:38:58 +02:00
Amir Goldstein	3a1e819b4e	ovl: store file handle of lower inode on copy up Sometimes it is interesting to know if an upper file is pure upper or a copy up target, and if it is a copy up target, it may be interesting to find the copy up origin. This will be used to preserve lower inode numbers across copy up. Store the lower inode file handle in upper inode extended attribute overlay.origin on copy up to use it later for these cases. Store the lower filesystem uuid along side the file handle, so we can validate that we are looking for the origin file in the original fs. If lower fs does not support NFS export ops store a zero sized xattr so we can always use the overlay.origin xattr to distinguish between a copy up and a pure upper inode. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2017-05-05 11:38:58 +02:00
Amir Goldstein	7bcd74b98d	ovl: check if all layers are on the same fs Some features can only work when all layers are on the same fs. Test this condition during mount time, so features can check them later. Add helper ovl_same_sb() to return the common super block in case all layers are on the same fs. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2017-05-05 11:38:57 +02:00
Amir Goldstein	4a99f3c83d	ovl: do not set overlay.opaque on non-dir create The optimization for opaque dir create was wrongly being applied also to non-dir create. Fixes: `97c684cc91` ("ovl: create directories inside merged parent opaque") Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Cc: <stable@vger.kernel.org> # v4.10	2017-04-26 14:33:44 +02:00
Amir Goldstein	b0990fbbbd	ovl: check IS_APPEND() on real upper inode For overlay file open, check IS_APPEND() on the real upper inode inside d_real(), because the overlay inode does not have the S_APPEND flag and IS_APPEND() can only be checked at open time. Note that because overlayfs does not copy up the chattr inode flags (i.e. S_APPEND, S_IMMUTABLE), the IS_APPEND() check is only relevant for upper inodes that were set with chattr +a and not to lower inodes that had chattr +a before copy up. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2017-04-20 16:37:26 +02:00
Kees Cook	33006cdf9c	ovl: Use designated initializers Prepare to mark sensitive kernel structures for randomization by making sure they're using designated initializers. These were identified during allyesconfig builds of x86, arm, and arm64, with most initializer fixes extracted from grsecurity. For these cases, use { }, which will be zero-filled, instead of undesignated NULLs. Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2017-04-20 16:37:26 +02:00
Linus Torvalds	04bb94b13c	overlayfs: remove now unnecessary header file include This removes the extra include header file that was added in commit `e58bc92783` "Pull overlayfs updates from Miklos Szeredi" now that it is no longer needed. There are probably other such includes that got added during the scheduler header splitup series, but this is the one that annoyed me personally and I know about. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-03-08 10:42:13 -08:00
Amir Goldstein	b1eaa950f7	ovl: lockdep annotate of nested stacked overlayfs inode lock An overlayfs instance can be the lower layer of another overlayfs instance. This setup triggers a lockdep splat of possible recursive locking of sb->s_type->i_mutex_key in iterate_dir(). Trimmed snip: [ INFO: possible recursive locking detected ] bash/2468 is trying to acquire lock: &sb->s_type->i_mutex_key#14, at: iterate_dir+0x7d/0x15c but task is already holding lock: &sb->s_type->i_mutex_key#14, at: iterate_dir+0x7d/0x15c One problem observed with this splat is that ovl_new_inode() does not call lockdep_annotate_inode_mutex_key() to annotate the dir inode lock as &sb->s_type->i_mutex_dir_key like other fs do. The other problem is that the 2 nested levels of overlayfs inode lock are annotated using the same key, which is the cause of the false positive lockdep warning. Fix this by annotating overlayfs inode lock in ovl_fill_inode() according to stack level of the super block instance and use different key for dir vs. non-dir like other fs do. Here is an edited snip from /proc/lockdep_chains after iterate_dir() of nested overlayfs: [...] &ovl_i_mutex_dir_key[depth] (stack_depth=2) [...] &ovl_i_mutex_dir_key[depth]#2 (stack_depth=1) [...] &type->i_mutex_dir_key (stack_depth=0) Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2017-03-08 15:05:23 +01:00
Linus Torvalds	e58bc92783	Merge branch 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs Pull overlayfs updates from Miklos Szeredi: "Because copy up can take a long time, serialized copy ups could be a big performance bottleneck. This update allows concurrent copy up of regular files eliminating this potential problem. There are also minor fixes" * 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: ovl: drop CAP_SYS_RESOURCE from saved mounter's credentials ovl: properly implement sync_filesystem() ovl: concurrent copy up of regular files ovl: introduce copy up waitqueue ovl: copy up regular file using O_TMPFILE ovl: rearrange code in ovl_copy_up_locked() ovl: check if upperdir fs supports O_TMPFILE	2017-03-03 12:02:42 -08:00
Linus Torvalds	590dce2d49	Merge branch 'rebased-statx' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs 'statx()' update from Al Viro. This adds the new extended stat() interface that internally subsumes our previous stat interfaces, and allows user mode to specify in more detail what kind of information it wants. It also allows for some explicit synchronization information to be passed to the filesystem, which can be relevant for network filesystems: is the cached value ok, or do you need open/close consistency, or what? From David Howells. Andreas Dilger points out that the first version of the extended statx interface was posted June 29, 2010: https://www.spinics.net/lists/linux-fsdevel/msg33831.html * 'rebased-statx' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: statx: Add a system call to make enhanced file info available	2017-03-03 11:38:56 -08:00
David Howells	a528d35e8b	statx: Add a system call to make enhanced file info available Add a system call to make extended file information available, including file creation and some attribute flags where available through the underlying filesystem. The getattr inode operation is altered to take two additional arguments: a u32 request_mask and an unsigned int flags that indicate the synchronisation mode. This change is propagated to the vfs_getattr() function. Functions like vfs_stat() are now inline wrappers around new functions vfs_statx() and vfs_statx_fd() to reduce stack usage. ======== OVERVIEW ======== The idea was initially proposed as a set of xattrs that could be retrieved with getxattr(), but the general preference proved to be for a new syscall with an extended stat structure. A number of requests were gathered for features to be included. The following have been included: (1) Make the fields a consistent size on all arches and make them large. (2) Spare space, request flags and information flags are provided for future expansion. (3) Better support for the y2038 problem [Arnd Bergmann] (tv_sec is an __s64). (4) Creation time: The SMB protocol carries the creation time, which could be exported by Samba, which will in turn help CIFS make use of FS-Cache as that can be used for coherency data (stx_btime). This is also specified in NFSv4 as a recommended attribute and could be exported by NFSD [Steve French]. (5) Lightweight stat: Ask for just those details of interest, and allow a netfs (such as NFS) to approximate anything not of interest, possibly without going to the server [Trond Myklebust, Ulrich Drepper, Andreas Dilger] (AT_STATX_DONT_SYNC). (6) Heavyweight stat: Force a netfs to go to the server, even if it thinks its cached attributes are up to date [Trond Myklebust] (AT_STATX_FORCE_SYNC). And the following have been left out for future extension: (7) Data version number: Could be used by userspace NFS servers [Aneesh Kumar]. Can also be used to modify fill_post_wcc() in NFSD which retrieves i_version directly, but has just called vfs_getattr(). It could get it from the kstat struct if it used vfs_xgetattr() instead. (There's disagreement on the exact semantics of a single field, since not all filesystems do this the same way). (8) BSD stat compatibility: Including more fields from the BSD stat such as creation time (st_btime) and inode generation number (st_gen) [Jeremy Allison, Bernd Schubert]. (9) Inode generation number: Useful for FUSE and userspace NFS servers [Bernd Schubert]. (This was asked for but later deemed unnecessary with the open-by-handle capability available and caused disagreement as to whether it's a security hole or not). (10) Extra coherency data may be useful in making backups [Andreas Dilger]. (No particular data were offered, but things like last backup timestamp, the data version number and the DOS archive bit would come into this category). (11) Allow the filesystem to indicate what it can/cannot provide: A filesystem can now say it doesn't support a standard stat feature if that isn't available, so if, for instance, inode numbers or UIDs don't exist or are fabricated locally... (This requires a separate system call - I have an fsinfo() call idea for this). (12) Store a 16-byte volume ID in the superblock that can be returned in struct xstat [Steve French]. (Deferred to fsinfo). (13) Include granularity fields in the time data to indicate the granularity of each of the times (NFSv4 time_delta) [Steve French]. (Deferred to fsinfo). (14) FS_IOC_GETFLAGS value. These could be translated to BSD's st_flags. Note that the Linux IOC flags are a mess and filesystems such as Ext4 define flags that aren't in linux/fs.h, so translation in the kernel may be a necessity (or, possibly, we provide the filesystem type too). (Some attributes are made available in stx_attributes, but the general feeling was that the IOC flags were to ext[234]-specific and shouldn't be exposed through statx this way). (15) Mask of features available on file (eg: ACLs, seclabel) [Brad Boyer, Michael Kerrisk]. (Deferred, probably to fsinfo. Finding out if there's an ACL or seclabal might require extra filesystem operations). (16) Femtosecond-resolution timestamps [Dave Chinner]. (A __reserved field has been left in the statx_timestamp struct for this - if there proves to be a need). (17) A set multiple attributes syscall to go with this. =============== NEW SYSTEM CALL =============== The new system call is: int ret = statx(int dfd, const char filename, unsigned int flags, unsigned int mask, struct statx buffer); The dfd, filename and flags parameters indicate the file to query, in a similar way to fstatat(). There is no equivalent of lstat() as that can be emulated with statx() by passing AT_SYMLINK_NOFOLLOW in flags. There is also no equivalent of fstat() as that can be emulated by passing a NULL filename to statx() with the fd of interest in dfd. Whether or not statx() synchronises the attributes with the backing store can be controlled by OR'ing a value into the flags argument (this typically only affects network filesystems): (1) AT_STATX_SYNC_AS_STAT tells statx() to behave as stat() does in this respect. (2) AT_STATX_FORCE_SYNC will require a network filesystem to synchronise its attributes with the server - which might require data writeback to occur to get the timestamps correct. (3) AT_STATX_DONT_SYNC will suppress synchronisation with the server in a network filesystem. The resulting values should be considered approximate. mask is a bitmask indicating the fields in struct statx that are of interest to the caller. The user should set this to STATX_BASIC_STATS to get the basic set returned by stat(). It should be noted that asking for more information may entail extra I/O operations. buffer points to the destination for the data. This must be 256 bytes in size. ====================== MAIN ATTRIBUTES RECORD ====================== The following structures are defined in which to return the main attribute set: struct statx_timestamp { __s64 tv_sec; __s32 tv_nsec; __s32 __reserved; }; struct statx { __u32 stx_mask; __u32 stx_blksize; __u64 stx_attributes; __u32 stx_nlink; __u32 stx_uid; __u32 stx_gid; __u16 stx_mode; __u16 __spare0[1]; __u64 stx_ino; __u64 stx_size; __u64 stx_blocks; __u64 __spare1[1]; struct statx_timestamp stx_atime; struct statx_timestamp stx_btime; struct statx_timestamp stx_ctime; struct statx_timestamp stx_mtime; __u32 stx_rdev_major; __u32 stx_rdev_minor; __u32 stx_dev_major; __u32 stx_dev_minor; __u64 __spare2[14]; }; The defined bits in request_mask and stx_mask are: STATX_TYPE Want/got stx_mode & S_IFMT STATX_MODE Want/got stx_mode & ~S_IFMT STATX_NLINK Want/got stx_nlink STATX_UID Want/got stx_uid STATX_GID Want/got stx_gid STATX_ATIME Want/got stx_atime{,_ns} STATX_MTIME Want/got stx_mtime{,_ns} STATX_CTIME Want/got stx_ctime{,_ns} STATX_INO Want/got stx_ino STATX_SIZE Want/got stx_size STATX_BLOCKS Want/got stx_blocks STATX_BASIC_STATS [The stuff in the normal stat struct] STATX_BTIME Want/got stx_btime{,_ns} STATX_ALL [All currently available stuff] stx_btime is the file creation time, stx_mask is a bitmask indicating the data provided and __spares[] are where as-yet undefined fields can be placed. Time fields are structures with separate seconds and nanoseconds fields plus a reserved field in case we want to add even finer resolution. Note that times will be negative if before 1970; in such a case, the nanosecond fields will also be negative if not zero. The bits defined in the stx_attributes field convey information about a file, how it is accessed, where it is and what it does. The following attributes map to FS__FL flags and are the same numerical value: STATX_ATTR_COMPRESSED File is compressed by the fs STATX_ATTR_IMMUTABLE File is marked immutable STATX_ATTR_APPEND File is append-only STATX_ATTR_NODUMP File is not to be dumped STATX_ATTR_ENCRYPTED File requires key to decrypt in fs Within the kernel, the supported flags are listed by: KSTAT_ATTR_FS_IOC_FLAGS [Are any other IOC flags of sufficient general interest to be exposed through this interface?] New flags include: STATX_ATTR_AUTOMOUNT Object is an automount trigger These are for the use of GUI tools that might want to mark files specially, depending on what they are. Fields in struct statx come in a number of classes: (0) stx_dev_, stx_blksize. These are local system information and are always available. (1) stx_mode, stx_nlinks, stx_uid, stx_gid, stx_[amc]time, stx_ino, stx_size, stx_blocks. These will be returned whether the caller asks for them or not. The corresponding bits in stx_mask will be set to indicate whether they actually have valid values. If the caller didn't ask for them, then they may be approximated. For example, NFS won't waste any time updating them from the server, unless as a byproduct of updating something requested. If the values don't actually exist for the underlying object (such as UID or GID on a DOS file), then the bit won't be set in the stx_mask, even if the caller asked for the value. In such a case, the returned value will be a fabrication. Note that there are instances where the type might not be valid, for instance Windows reparse points. (2) stx_rdev_*. This will be set only if stx_mode indicates we're looking at a blockdev or a chardev, otherwise will be 0. (3) stx_btime. Similar to (1), except this will be set to 0 if it doesn't exist. ======= TESTING ======= The following test program can be used to test the statx system call: samples/statx/test-statx.c Just compile and run, passing it paths to the files you want to examine. The file is built automatically if CONFIG_SAMPLES is enabled. Here's some example output. Firstly, an NFS directory that crosses to another FSID. Note that the AUTOMOUNT attribute is set because transiting this directory will cause d_automount to be invoked by the VFS. [root@andromeda ~]# /tmp/test-statx -A /warthog/data statx(/warthog/data) = 0 results=7ff Size: 4096 Blocks: 8 IO Block: 1048576 directory Device: 00:26 Inode: 1703937 Links: 125 Access: (3777/drwxrwxrwx) Uid: 0 Gid: 4041 Access: 2016-11-24 09:02:12.219699527+0000 Modify: 2016-11-17 10:44:36.225653653+0000 Change: 2016-11-17 10:44:36.225653653+0000 Attributes: 0000000000001000 (-------- -------- -------- -------- -------- -------- ---m---- --------) Secondly, the result of automounting on that directory. [root@andromeda ~]# /tmp/test-statx /warthog/data statx(/warthog/data) = 0 results=7ff Size: 4096 Blocks: 8 IO Block: 1048576 directory Device: 00:27 Inode: 2 Links: 125 Access: (3777/drwxrwxrwx) Uid: 0 Gid: 4041 Access: 2016-11-24 09:02:12.219699527+0000 Modify: 2016-11-17 10:44:36.225653653+0000 Change: 2016-11-17 10:44:36.225653653+0000 Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2017-03-02 20:51:15 -05:00
Ingo Molnar	174cd4b1e5	sched/headers: Prepare to move signal wakeup & sigpending methods from <linux/sched.h> into <linux/sched/signal.h> Fix up affected files that include this signal functionality via sched.h. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-03-02 08:42:32 +01:00
Ingo Molnar	5b825c3af1	sched/headers: Prepare to remove <linux/cred.h> inclusion from <linux/sched.h> Add #include <linux/cred.h> dependencies to all .c files rely on sched.h doing that for them. Note that even if the count where we need to add extra headers seems high, it's still a net win, because <linux/sched.h> is included in over 2,200 files ... Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-03-02 08:42:31 +01:00
Konstantin Khlebnikov	51f8f3c4e2	ovl: drop CAP_SYS_RESOURCE from saved mounter's credentials If overlay was mounted by root then quota set for upper layer does not work because overlay now always use mounter's credentials for operations. Also overlay might deplete reserved space and inodes in ext4. This patch drops capability SYS_RESOURCE from saved credentials. This affects creation new files, whiteouts, and copy-up operations. Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Fixes: `1175b6b8d9` ("ovl: do operations on underlying file system in mounter's context") Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2017-02-07 15:47:14 +01:00
Amir Goldstein	e593b2bf51	ovl: properly implement sync_filesystem() overlayfs syncs all inode pages on sync_filesystem(), but it also needs to call s_op->sync_fs() of upper fs for metadata sync. This fixes correctness of syncfs(2) as demonstrated by following xfs specific test: xfs_sync_stats() { echo $1 echo -n "xfs_log_force = " grep log /proc/fs/xfs/stat \| awk '{ print $5 }' } xfs_sync_stats "before touch" touch x xfs_sync_stats "after touch" xfs_io -c syncfs . xfs_sync_stats "after syncfs" xfs_io -c fsync x xfs_sync_stats "after fsync" xfs_io -c fsync x xfs_sync_stats "after fsync #2" When this test is run in overlay mount over xfs, log force count does not increase with syncfs command. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2017-02-07 15:47:14 +01:00
Amir Goldstein	01ad3eb8a0	ovl: concurrent copy up of regular files Now that copy up of regular file is done using O_TMPFILE, we don't need to hold rename_lock throughout copy up. Use the copy up waitqueue to synchronize concurrent copy up of the same file. Different regular files can be copied up concurrently. The upper dir inode_lock is taken instead of rename_lock, because it is needed for lookup and later for linking the temp file, but it is released while copying up data. Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2017-02-07 15:47:14 +01:00
Amir Goldstein	39d3d60a54	ovl: introduce copy up waitqueue The overlay sb 'copyup_wq' and overlay inode 'copying' condition variable are about to replace the upper sb rename_lock, as finer grained synchronization objects for concurrent copy up. Suggested-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2017-02-07 15:47:14 +01:00
Amir Goldstein	d8514d8edb	ovl: copy up regular file using O_TMPFILE In preparation for concurrent copy up, implement copy up of regular file as O_TMPFILE that is linked to upperdir instead of a file in workdir that is moved to upperdir. Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2017-02-07 15:47:14 +01:00
Amir Goldstein	42f269b925	ovl: rearrange code in ovl_copy_up_locked() As preparation to implementing copy up with O_TMPFILE, name the variable for dentry before final rename 'temp' and assign it to 'newdentry' only after rename. Also lookup upper dentry before looking up temp dentry and move ovl_set_timestamps() into ovl_copy_up_locked(), because that is going to be more convenient for upcoming change. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2017-02-07 15:47:14 +01:00
Amir Goldstein	e7f52429b4	ovl: check if upperdir fs supports O_TMPFILE This is needed for choosing between concurrent copyup using O_TMPFILE and legacy copyup using workdir+rename. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2017-02-07 15:47:14 +01:00
Amir Goldstein	4c7d0c9cb7	ovl: fix possible use after free on redirect dir lookup ovl_lookup_layer() iterates on path elements of d->name.name but also frees and allocates a new pointer for d->name.name. For the case of lookup in upper layer, the initial d->name.name pointer is stable (dentry->d_name), but for lower layers, the initial d->name.name can be d->redirect, which can be freed during iteration. [SzM] Keep the count of remaining characters in the redirect path and calculate the current position from that. This works becuase only the prefix is modified, the ending always stays the same. Fixes: `02b69b284c` ("ovl: lookup redirects") Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2017-01-18 15:19:54 +01:00
Linus Torvalds	231753ef78	Merge uncontroversial parts of branch 'readlink' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs Pull partial readlink cleanups from Miklos Szeredi. This is the uncontroversial part of the readlink cleanup patch-set that simplifies the default readlink handling. Miklos and Al are still discussing the rest of the series. * git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: vfs: make generic_readlink() static vfs: remove ".readlink = generic_readlink" assignments vfs: default to generic_readlink() vfs: replace calling i_op->readlink with vfs_readlink() proc/self: use generic_readlink ecryptfs: use vfs_get_link() bad_inode: add missing i_op initializers	2016-12-17 19:16:12 -08:00
Linus Torvalds	ff0f962ca3	Merge branch 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs Pull overlayfs updates from Miklos Szeredi: "This update contains: - try to clone on copy-up - allow renaming a directory - split source into managable chunks - misc cleanups and fixes It does not contain the read-only fd data inconsistency fix, which Al didn't like. I'll leave that to the next year..." * 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: (36 commits) ovl: fix reStructuredText syntax errors in documentation ovl: fix return value of ovl_fill_super ovl: clean up kstat usage ovl: fold ovl_copy_up_truncate() into ovl_copy_up() ovl: create directories inside merged parent opaque ovl: opaque cleanup ovl: show redirect_dir mount option ovl: allow setting max size of redirect ovl: allow redirect_dir to default to "on" ovl: check for emptiness of redirect dir ovl: redirect on rename-dir ovl: lookup redirects ovl: consolidate lookup for underlying layers ovl: fix nested overlayfs mount ovl: check namelen ovl: split super.c ovl: use d_is_dir() ovl: simplify lookup ovl: check lower existence of rename target ovl: rename: simplify handling of lower/merged directory ...	2016-12-16 10:58:12 -08:00
Linus Torvalds	9a19a6db37	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs updates from Al Viro: - more ->d_init() stuff (work.dcache) - pathname resolution cleanups (work.namei) - a few missing iov_iter primitives - copy_from_iter_full() and friends. Either copy the full requested amount, advance the iterator and return true, or fail, return false and do _not_ advance the iterator. Quite a few open-coded callers converted (and became more readable and harder to fuck up that way) (work.iov_iter) - several assorted patches, the big one being logfs removal * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: logfs: remove from tree vfs: fix put_compat_statfs64() does not handle errors namei: fold should_follow_link() with the step into not-followed link namei: pass both WALK_GET and WALK_MORE to should_follow_link() namei: invert WALK_PUT logics namei: shift interpretation of LOOKUP_FOLLOW inside should_follow_link() namei: saner calling conventions for mountpoint_last() namei.c: get rid of user_path_parent() switch getfrag callbacks to ..._full() primitives make skb_add_data,{_nocache}() and skb_copy_to_page_nocache() advance only on success [iov_iter] new primitives - copy_from_iter_full() and friends don't open-code file_inode() ceph: switch to use of ->d_init() ceph: unify dentry_operations instances lustre: switch to use of ->d_init()	2016-12-16 10:24:44 -08:00
Geliang Tang	313684c48c	ovl: fix return value of ovl_fill_super If kcalloc() failed, the return value of ovl_fill_super() is -EINVAL, not -ENOMEM. So this patch sets this value to -ENOMEM before calling kcalloc(), and sets it back to -EINVAL after calling kcalloc(). Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2016-12-16 11:02:57 +01:00
Al Viro	32a3d848eb	ovl: clean up kstat usage FWIW, there's a bit of abuse of struct kstat in overlayfs object creation paths - for one thing, it ends up with a very small subset of struct kstat (mode + rdev), for another it also needs link in case of symlinks and ends up passing it separately. IMO it would be better to introduce a separate object for that. In principle, we might even lift that thing into general API and switch ->mkdir()/->mknod()/->symlink() to identical calling conventions. Hell knows, perhaps ->create() as well... Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2016-12-16 11:02:57 +01:00
Amir Goldstein	9aba652190	ovl: fold ovl_copy_up_truncate() into ovl_copy_up() This removes code duplication. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2016-12-16 11:02:57 +01:00
Amir Goldstein	97c684cc91	ovl: create directories inside merged parent opaque The benefit of making directories opaque on creation is that lookups can stop short when they reach the original created directory, instead of continue lookup the entire depth of parent directory stack. The best case is overlay with N layers, performing lookup for first level directory, which exists only in upper. In that case, there will be only one lookup instead of N. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2016-12-16 11:02:57 +01:00
Miklos Szeredi	5cf5b477f0	ovl: opaque cleanup oe->opaque is set for a) whiteouts b) directories having the "trusted.overlay.opaque" xattr Case b can be simplified, since setting the xattr always implies setting oe->opaque. Also once set, the opaque flag is never cleared. Don't need to set opaque flag for non-directories. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2016-12-16 11:02:57 +01:00
Amir Goldstein	c5bef3a72b	ovl: show redirect_dir mount option Show the value of redirect_dir in /proc/mounts. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2016-12-16 11:02:57 +01:00
Miklos Szeredi	3ea22a71b6	ovl: allow setting max size of redirect Add a module option to allow tuning the max size of absolute redirects. Default is 256. Size of relative redirects is naturally limited by the the underlying filesystem's max filename length (usually 255). Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2016-12-16 11:02:57 +01:00
Miklos Szeredi	688ea0e5a0	ovl: allow redirect_dir to default to "on" This patch introduces a kernel config option and a module param. Both can be used independently to turn the default value of redirect_dir on or off. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2016-12-16 11:02:57 +01:00
Amir Goldstein	d15951198e	ovl: check for emptiness of redirect dir Before introducing redirect_dir feature, the condition !ovl_lower_positive(dentry) for a directory, implied that it is a pure upper directory, which may be removed if empty. Now that directory can be redirect, it is possible that upper does not cover any lower (i.e. !ovl_lower_positive(dentry)), but the directory is a merge (with redirected path) and maybe non empty. Check for this case in ovl_remove_upper(). This change fixes the following test case from rename-pop-dir.py of unionmount-testsuite: """Remove dir and rename old name""" d = ctx.non_empty_dir() d2 = ctx.no_dir() ctx.rmdir(d, err=ENOTEMPTY) ctx.rename(d, d2) ctx.rmdir(d, err=ENOENT) ctx.rmdir(d2, err=ENOTEMPTY) ./run --ov rename-pop-dir /mnt/a/no_dir103: Expected error (Directory not empty) was not produced Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2016-12-16 11:02:57 +01:00
Miklos Szeredi	a6c6065511	ovl: redirect on rename-dir Current code returns EXDEV when a directory would need to be copied up to move. We could copy up the directory tree in this case, but there's another, simpler solution: point to old lower directory from moved upper directory. This is achieved with a "trusted.overlay.redirect" xattr storing the path relative to the root of the overlay. After such attribute has been set, the directory can be moved without further actions required. This is a backward incompatible feature, old kernels won't be able to correctly mount an overlay containing redirected directories. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2016-12-16 11:02:56 +01:00
Miklos Szeredi	02b69b284c	ovl: lookup redirects If a directory has the "trusted.overlay.redirect" xattr, it means that the value of the xattr should be used to find the underlying directory on the next lower layer. The redirect may be relative or absolute. Absolute redirects begin with a slash. A relative redirect means: instead of the current dentry's name use the value of the redirect to find the directory in the next lower layer. Relative redirects must not contain a slash. An absolute redirect means: look up the directory relative to the root of the overlay using the value of the redirect in the next lower layer. Redirects work on lower layers as well. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2016-12-16 11:02:56 +01:00
Miklos Szeredi	e28edc46b8	ovl: consolidate lookup for underlying layers Use a common helper for lookup of upper and lower layers. This paves the way for looking up directory redirects. No functional change. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2016-12-16 11:02:56 +01:00
Amir Goldstein	48fab5d7c7	ovl: fix nested overlayfs mount When the upper overlayfs checks "trusted.overlay.*" xattr on the underlying overlayfs mount, it gets -EPERM, which confuses the upper overlayfs. Fix this by returning -EOPNOTSUPP instead of -EPERM from ovl_own_xattr_get() and ovl_own_xattr_set(). This behavior is consistent with the behavior of ovl_listxattr(), which filters out the private overlayfs xattrs. Note: nested overlays are deprecated. But this change makes sense regardless: these xattrs are private to the overlay and should always be hidden. Hence getting and setting them should indicate this. [SzMi: Use EOPNOTSUPP instead of ENODATA and use it for both getting and setting "trusted.overlay." xattrs. This is a perfectly valid error code for "we don't support this prefix", which is the case here.] Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2016-12-16 11:02:56 +01:00
Miklos Szeredi	6b2d5fe46f	ovl: check namelen We already calculate f_namelen in statfs as the maximum of the name lengths provided by the filesystems taking part in the overlay. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2016-12-16 11:02:56 +01:00

1 2 3 4 5 ...

348 Commits