With eBPF getting more extended and exposure to user space is on it's way,
hardening the memory range the interpreter uses to steer its command flow
seems appropriate. This patch moves the to be interpreted bytecode to
read-only pages.
In case we execute a corrupted BPF interpreter image for some reason e.g.
caused by an attacker which got past a verifier stage, it would not only
provide arbitrary read/write memory access but arbitrary function calls
as well. After setting up the BPF interpreter image, its contents do not
change until destruction time, thus we can setup the image on immutable
made pages in order to mitigate modifications to that code. The idea
is derived from commit 314beb9bca ("x86: bpf_jit_comp: secure bpf jit
against spraying attacks").
This is possible because bpf_prog is not part of sk_filter anymore.
After setup bpf_prog cannot be altered during its life-time. This prevents
any modifications to the entire bpf_prog structure (incl. function/JIT
image pointer).
Every eBPF program (including classic BPF that are migrated) have to call
bpf_prog_select_runtime() to select either interpreter or a JIT image
as a last setup step, and they all are being freed via bpf_prog_free(),
including non-JIT. Therefore, we can easily integrate this into the
eBPF life-time, plus since we directly allocate a bpf_prog, we have no
performance penalty.
Tested with seccomp and test_bpf testsuite in JIT/non-JIT mode and manual
inspection of kernel_page_tables. Brad Spengler proposed the same idea
via Twitter during development of this patch.
Joint work with Hannes Frederic Sowa.
Suggested-by: Brad Spengler <spender@grsecurity.net>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Kees Cook <keescook@chromium.org>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Although rhashtable library allows user to specify a quiet big size
for user's created hash table, the table may be shrunk to a
very small size - HASH_MIN_SIZE(4) after object is removed from
the table at the first time. Subsequently, even if the total amount
of objects saved in the table is quite lower than user's initial
setting in a long time, the hash table size is still dynamically
adjusted by rhashtable_shrink() or rhashtable_expand() each time
object is inserted or removed from the table. However, as
synchronize_rcu() has to be called when table is shrunk or
expanded by the two functions, we should permit user to set the
minimum table size through configuring the minimum number of shifts
according to user specific requirement, avoiding these expensive
actions of shrinking or expanding because of calling synchronize_rcu().
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
'shift by register' operations are supported by eBPF interpreter, but were
accidently left out of x64 JIT compiler. Fix it and add a testcase.
Reported-by: Brendan Gregg <brendan.d.gregg@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Fixes: 622582786c ("net: filter: x86: internal BPF JIT")
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch addresses a couple of minor items, mostly addesssing
prandom_bytes(): 1) prandom_bytes{,_state}() should use size_t
for length arguments, 2) We can use put_unaligned() when filling
the array instead of open coding it [ perhaps some archs will
further benefit from their own arch specific implementation when
GCC cannot make up for it ], 3) Fix a typo, 4) Better use unsigned
int as type for getting the arch seed, 5) Make use of
prandom_u32_max() for timer slack.
Regarding the change to put_unaligned(), callers of prandom_bytes()
which internally invoke prandom_bytes_state(), don't bother as
they expect the array to be filled randomly and don't have any
control of the internal state what-so-ever (that's also why we
have periodic reseeding there, etc), so they really don't care.
Now for the direct callers of prandom_bytes_state(), which
are solely located in test cases for MTD devices, that is,
drivers/mtd/tests/{oobtest.c,pagetest.c,subpagetest.c}:
These tests basically fill a test write-vector through
prandom_bytes_state() with an a-priori defined seed each time
and write that to a MTD device. Later on, they set up a read-vector
and read back that blocks from the device. So in the verification
phase, the write-vector is being re-setup [ so same seed and
prandom_bytes_state() called ], and then memcmp()'ed against the
read-vector to check if the data is the same.
Akinobu, Lothar and I also tested this patch and it runs through
the 3 relevant MTD test cases w/o any errors on the nandsim device
(simulator for MTD devs) for x86_64, ppc64, ARM (i.MX28, i.MX53
and i.MX6):
# modprobe nandsim first_id_byte=0x20 second_id_byte=0xac \
third_id_byte=0x00 fourth_id_byte=0x15
# modprobe mtd_oobtest dev=0
# modprobe mtd_pagetest dev=0
# modprobe mtd_subpagetest dev=0
We also don't have any users depending directly on a particular
result of the PRNG (except the PRNG self-test itself), and that's
just fine as it e.g. allowed us easily to do things like upgrading
from taus88 to taus113.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Tested-by: Akinobu Mita <akinobu.mita@gmail.com>
Tested-by: Lothar Waßmann <LW@KARO-electronics.de>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking fixes from David Miller:
"I'm sending this out, in particular, to get the iwlwifi fix
propagated:
1) Fix build due to missing include in i40e driver, from Lucas
Tanure.
2) Memory leak in openvswitch port allocation, from Chirstoph Jaeger.
3) Check DMA mapping errors in myri10ge, from Stanislaw Gruszka.
4) Fix various deadlock scenerios in sunvnet driver, from Sowmini
Varadhan.
5) Fix cxgb4i build failures with incompatible Kconfig settings of
the driver vs ipv6, from Anish Bhatt.
6) Fix generation of ACK packet timestamps in the presence of TSO
which will be split up, from Willem de Bruijn.
7) Don't enable sched scan in iwlwifi driver, it causes firmware
crashes in some revisions. From Emmanuel Grumbach.
8) Revert a macvlan simplification that causes crashes.
9) Handle RTT calculations properly in the presence of repair'd SKBs,
from Andrey Vagin.
10) SIT tunnel lookup uses wrong device index in compares, from
Shmulik Ladkani.
11) Handle MTU reductions in TCP properly for ipv4 mapped ipv6
sockets, from Neal Cardwell.
12) Add missing annotations in rhashtable code, from Thomas Graf.
13) Fix false interpretation of two RTOs as being from the same TCP
loss event in the FRTO code, from Neal Cardwell"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (42 commits)
netlink: Annotate RCU locking for seq_file walker
rhashtable: fix annotations for rht_for_each_entry_rcu()
rhashtable: unexport and make rht_obj() static
rhashtable: RCU annotations for next pointers
tcp: fix ssthresh and undo for consecutive short FRTO episodes
tcp: don't allow syn packets without timestamps to pass tcp_tw_recycle logic
tcp: fix tcp_release_cb() to dispatch via address family for mtu_reduced()
sit: Fix ipip6_tunnel_lookup device matching criteria
net: ethernet: ibm: ehea: Remove duplicate object from Makefile
net: xgene: Check negative return value of xgene_enet_get_ring_size()
tcp: don't use timestamp from repaired skb-s to calculate RTT (v2)
net: xilinx: Remove .owner field for driver
Revert "macvlan: simplify the structure port"
iwlwifi: mvm: disable scheduled scan to prevent firmware crash
xen-netback: remove loop waiting function
xen-netback: don't stop dealloc kthread too early
xen-netback: move NAPI add/remove calls
xen-netback: fix debugfs entry creation
xen-netback: fix debugfs write length check
net-timestamp: fix missing tcp fragmentation cases
...
No need to export rht_obj(), all inner to outer object translations
occur internally. It was intended to be used with rht_for_each() which
now primarily serves as the iterator for rhashtable_remove_pprev() to
effectively flush and free the full table.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Properly annotate next pointers as access is RCU protected in
the lookup path.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull kbuild updates from Michal Marek:
- make clean also considers $(extra-m) and $(extra-) to be consistent
- cleanup and fixes in scripts/Makefile.host
- allow to override the name of the Python 2 executable with make
PYTHON=... (only needed for ia64 in practice)
- option to split debugingo into *.dwo files to save disk space if the
compiler supports it (CONFIG_DEBUG_INFO_SPLIT)
- option to use dwarf4 debuginfo if the compiler supports it
(CONFIG_DEBUG_INFO_DWARF4)
- fix for disabling certain warnings with clang
- fix for unneeded rebuild with dash when a command contains
backslashes
* 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
kbuild: Fix handling of backslashes in *.cmd files
kbuild, LLVMLinux: Supress warnings unless W=1-3
Kbuild: Add a option to enable dwarf4 v2
kbuild: Support split debug info v4
kbuild: allow to override Python command name
kbuild: clean-up and bug fix of scripts/Makefile.host
kbuild: clean up scripts/Makefile.host
kbuild: drop shared library support from Makefile.host
kbuild: fix a bug of C++ host program handling
kbuild: fix a typo in scripts/Makefile.host
scripts/Makefile.clean: clean also $(extra-m) and $(extra-)
Pull block driver changes from Jens Axboe:
"Nothing out of the ordinary here, this pull request contains:
- A big round of fixes for bcache from Kent Overstreet, Slava Pestov,
and Surbhi Palande. No new features, just a lot of fixes.
- The usual round of drbd updates from Andreas Gruenbacher, Lars
Ellenberg, and Philipp Reisner.
- virtio_blk was converted to blk-mq back in 3.13, but now Ming Lei
has taken it one step further and added support for actually using
more than one queue.
- Addition of an explicit SG_FLAG_Q_AT_HEAD for block/bsg, to
compliment the the default behavior of adding to the tail of the
queue. From Douglas Gilbert"
* 'for-3.17/drivers' of git://git.kernel.dk/linux-block: (86 commits)
bcache: Drop unneeded blk_sync_queue() calls
bcache: add mutex lock for bch_is_open
bcache: Correct printing of btree_gc_max_duration_ms
bcache: try to set b->parent properly
bcache: fix memory corruption in init error path
bcache: fix crash with incomplete cache set
bcache: Fix more early shutdown bugs
bcache: fix use-after-free in btree_gc_coalesce()
bcache: Fix an infinite loop in journal replay
bcache: fix crash in bcache_btree_node_alloc_fail tracepoint
bcache: bcache_write tracepoint was crashing
bcache: fix typo in bch_bkey_equal_header
bcache: Allocate bounce buffers with GFP_NOWAIT
bcache: Make sure to pass GFP_WAIT to mempool_alloc()
bcache: fix uninterruptible sleep in writeback thread
bcache: wait for buckets when allocating new btree root
bcache: fix crash on shutdown in passthrough mode
bcache: fix lockdep warnings on shutdown
bcache allocator: send discards with correct size
bcache: Fix to remove the rcu_sched stalls.
...
Rather than have architectures #define ARCH_HAS_SG_CHAIN in an
architecture specific scatterlist.h, make it a proper Kconfig option and
use that instead. At same time, remove the header files are are now
mostly useless and just include asm-generic/scatterlist.h.
[sfr@canb.auug.org.au: powerpc files now need asm/dma.h]
Signed-off-by: Laura Abbott <lauraa@codeaurora.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de> [x86]
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [powerpc]
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Now with 64bit bzImage and kexec tools, we support ramdisk that size is
bigger than 2g, as we could put it above 4G.
Found compressed initramfs image could not be decompressed properly. It
turns out that image length is int during decompress detection, and it
will become < 0 when length is more than 2G. Furthermore, during
decompressing len as int is used for inbuf count, that has problem too.
Change len to long, that should be ok as on 32 bit platform long is
32bits.
Tested with following compressed initramfs image as root with kexec.
gzip, bzip2, xz, lzma, lzop, lz4.
run time for populate_rootfs():
size name Nehalem-EX Westmere-EX Ivybridge-EX
9034400256 root_img : 26s 24s 30s
3561095057 root_img.lz4 : 28s 27s 27s
3459554629 root_img.lzo : 29s 29s 28s
3219399480 root_img.gz : 64s 62s 49s
2251594592 root_img.xz : 262s 260s 183s
2226366598 root_img.lzma: 386s 376s 277s
2901482513 root_img.bz2 : 635s 599s
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Rashika Kheria <rashika.kheria@gmail.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Kyungsik Lee <kyungsik.lee@lge.com>
Cc: P J P <ppandit@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Cc: "Daniel M. Weeks" <dan@danweeks.net>
Cc: Alexandre Courbot <acourbot@nvidia.com>
Cc: Jan Beulich <JBeulich@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
During testing initrd (>2G) support, find decompress/lz4 does not work
with initrd at all.
decompress_* should support:
1. inbuf[]/outbuf[] for kernel preboot.
2. inbuf[]/flush() for initramfs
3. fill()/flush() for initrd.
in the unlz4 does not handle case 3, as input len is passed as 0, and it
failed in first try.
Fix that add one extra if (fill) checking, and get out if EOF from the
fill().
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Kyungsik Lee <kyungsik.lee@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use BUG_ON(x) rather than if(x) BUG();
The semantic patch that fixes this problem is as follows:
// <smpl>
@@ identifier x; @@
-if (!x) BUG();
+BUG_ON(!x);
// </smpl>
Signed-off-by: Himangi Saraogi <himangi774@gmail.com>
Acked-by: Julia Lawall <julia.lawall@lip6.fr>
Cc: Stefani Seibold <stefani@seibold.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In case 1, it passes down the BLACK color from G to p and u, and maintains
the color of n. By doing so, it maintains the black height of the
sub-tree.
While in the comment, it marks the color of n to BLACK. This is a typo
and not consistents with the code.
This patch fixs this typo in comment.
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Acked-by: Michel Lespinasse <walken@google.com>
Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I'm working on address sanitizer project for kernel. Recently we
started experiments with stack instrumentation, to detect out-of-bounds
read/write bugs on stack.
Just after booting I've hit out-of-bounds read on stack in idr_for_each
(and in __idr_remove_all as well):
struct idr_layer **paa = &pa[0];
while (id >= 0 && id <= max) {
...
while (n < fls(id)) {
n += IDR_BITS;
p = *--paa; <--- here we are reading pa[-1] value.
}
}
Despite the fact that after this dereference we are exiting out of loop
and never use p, such behaviour is undefined and should be avoided.
Fix this by moving pointer derference to the beggining of the loop,
right before we will use it.
Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com>
Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Alexey Preobrazhensky <preobr@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Merge incoming from Andrew Morton:
- Various misc things.
- arch/sh updates.
- Part of ocfs2. Review is slow.
- Slab updates.
- Most of -mm.
- printk updates.
- lib/ updates.
- checkpatch updates.
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (226 commits)
checkpatch: update $declaration_macros, add uninitialized_var
checkpatch: warn on missing spaces in broken up quoted
checkpatch: fix false positives for --strict "space after cast" test
checkpatch: fix false positive MISSING_BREAK warnings with --file
checkpatch: add test for native c90 types in unusual order
checkpatch: add signed generic types
checkpatch: add short int to c variable types
checkpatch: add for_each tests to indentation and brace tests
checkpatch: fix brace style misuses of else and while
checkpatch: add --fix option for a couple OPEN_BRACE misuses
checkpatch: use the correct indentation for which()
checkpatch: add fix_insert_line and fix_delete_line helpers
checkpatch: add ability to insert and delete lines to patch/file
checkpatch: add an index variable for fixed lines
checkpatch: warn on break after goto or return with same tab indentation
checkpatch: emit a warning on file add/move/delete
checkpatch: add test for commit id formatting style in commit log
checkpatch: emit fewer kmalloc_array/kcalloc conversion warnings
checkpatch: improve "no space after cast" test
checkpatch: allow multiple const * types
...
This patch set consists of the usual driver updates (ufs, storvsc, pm8001
hpsa). It also has removal of the user space target driver code (everyone is
using LIO now), a partial PCI MSI-X update, more multi-queue updates,
conversion to 64 bit LUNs (so we could theoretically cope with any LUN
returned by a device) and placeholder support for the ZBC device type (Shingle
drives), plus an assortment of minor updates and bug fixes.
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJT4mS9AAoJEDeqqVYsXL0Mq34H/2AeXiM8GEVO3PIsBtF3TFZ9
poJvAyb8t//+VwAIVLHU9wrssIrIcyvNQmNHH/InGt5rOaXwGQRsnEc73bBtot4b
aC1t+hAnp2Ddvu6phmyUg7iY2GmQhAoZmeaj7krGIu2XgtLGiPg26eSsgk4Yv/U9
cuULEuOc/UnTj3w5VK8SvpyXMybVF6oQhSrS1slOglfFwPTlTI/NHU9xo7Wc3qHT
VifHXNphIvye5EH8zwtKX5p8qCrFW0pevJwyfPz7Hp2CTA9XYKx3SoeOh+n9F9ez
udBBggg7Vb1tb4mPKUoZ78UrtCVdFSCmesBU/RJe7cIh8daKaO5MVr3WPSx2JhM=
=yGai
-----END PGP SIGNATURE-----
Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI updates from James Bottomley:
"This patch set consists of the usual driver updates (ufs, storvsc,
pm8001 hpsa). It also has removal of the user space target driver
code (everyone is using LIO now), a partial PCI MSI-X update, more
multi-queue updates, conversion to 64 bit LUNs (so we could
theoretically cope with any LUN returned by a device) and placeholder
support for the ZBC device type (Shingle drives), plus an assortment
of minor updates and bug fixes"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (143 commits)
scsi: do not issue SCSI RSOC command to Promise Vtrak E610f
vmw_pvscsi: Use pci_enable_msix_exact() instead of pci_enable_msix()
pm8001: Fix invalid return when request_irq() failed
lpfc: Remove superfluous call to pci_disable_msix()
isci: Use pci_enable_msix_exact() instead of pci_enable_msix()
bfa: Use pci_enable_msix_exact() instead of pci_enable_msix()
bfa: Cleanup bfad_setup_intr() function
bfa: Do not call pci_enable_msix() after it failed once
fnic: Use pci_enable_msix_exact() instead of pci_enable_msix()
scsi: use short driver name for per-driver cmd slab caches
scsi_debug: support scsi-mq, queues and locks
Drivers: add blist flags
scsi: ufs: fix endianness sparse warnings
scsi: ufs: make undeclared functions static
bnx2i: Update driver version to 2.7.10.1
pm8001: fix a memory leak in nvmd_resp
pm8001: fix update_flash
pm8001: fix a memory leak in flash_update
pm8001: Cleaning up uninitialized variables
pm8001: Fix to remove null pointer checks that could never happen
...
Apparently, bitmap_andnot is supposed to return whether the new bitmap
is empty. But it didn't take potential garbage bits in the last word
into account.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Apparently, bitmap_and is supposed to return whether the new bitmap is
empty. But it didn't take potential garbage bits in the last word into
account.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
__reg_op(..., REG_OP_ALLOC) always returns 0, so we might as well use that
and save an instruction.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Changing the pos parameter of __reg_op to unsigned allows the compiler
to generate slightly smaller and simpler code. Also update its callers
bitmap_*_region to receive and pass unsigned int. The return types of
bitmap_find_free_region and bitmap_allocate_region are still int to
allow a negative error code to be returned. An int is certainly capable
of representing any realistic return value.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A few lines above, it was stated that positions for non-set bits are
mapped to -1, which is obviously also what the code does.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We want len to be the index of the first '\n', or the length of the
string if there is no newline. This is a good example of the usefulness
of strchrnul(). Use that instead, thus eliminating a branch and a call
to strlen().
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The compiler can generate slightly smaller and simpler code when it
knows that "start" is non-negative.
Also, use the names "start" and "len" for the two parameters for
consistency with bitmap_set.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The compiler can generate slightly smaller and simpler code when it
knows that "start" is non-negative.
Also, use the names "start" and "len" for the two parameters in both
header file and implementation, instead of the previous mix.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The compiler can generate slightly smaller and simpler code when it
knows that "nbits" is non-negative. Since no-one passes a negative
bit-count, this shouldn't affect the semantics.
I didn't change the return type, since that might change the semantics
of some expression containing a call to bitmap_weight(). Certainly an
int is capable of holding the result.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The compiler can generate slightly smaller and simpler code when it
knows that "nbits" is non-negative. Since no-one passes a negative
bit-count, this shouldn't affect the semantics.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The compiler can generate slightly smaller and simpler code when it
knows that "nbits" is non-negative. Since no-one passes a negative
bit-count, this shouldn't affect the semantics.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This change is only for consistency with the changes to the other
bitmap_* functions; it doesn't change the size of the generated code:
inside BITS_TO_LONGS there is a sizeof(long), which causes bits to be
interpreted as unsigned anyway.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Since the extra bits are "don't care", there is no reason to mask the
last word to the used bits when complementing. This shaves off yet a
few bytes.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The compiler can generate slightly smaller and simpler code when it
knows that "nbits" is non-negative. Since no-one passes a negative
bit-count, this shouldn't affect the semantics.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The compiler can generate slightly smaller and simpler code when it
knows that "nbits" is non-negative. Since no-one passes a negative
bit-count, this shouldn't affect the semantics.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The compiler can generate slightly smaller and simpler code when it
knows that "nbits" is non-negative. Since no-one passes a negative
bit-count, this shouldn't affect the semantics.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Many functions in lib/bitmap.c start with an expression such as lim =
bits/BITS_PER_LONG. Since bits has type (signed) int, and since gcc
cannot know that it is in fact non-negative, it generates worse code
than it could. These patches, mostly consisting of changing various
parameters to unsigned, gives a slight overall code reduction:
add/remove: 1/1 grow/shrink: 8/16 up/down: 251/-414 (-163)
function old new delta
tick_device_uses_broadcast 335 425 +90
__irq_alloc_descs 498 554 +56
__bitmap_andnot 73 115 +42
__bitmap_and 70 101 +31
bitmap_weight - 11 +11
copy_hugetlb_page_range 752 762 +10
follow_hugetlb_page 846 854 +8
hugetlb_init 1415 1417 +2
hugetlb_nrpages_setup 130 131 +1
hugetlb_add_hstate 377 376 -1
bitmap_allocate_region 82 80 -2
select_task_rq_fair 2202 2191 -11
hweight_long 66 55 -11
__reg_op 230 219 -11
dm_stats_message 2849 2833 -16
bitmap_parselist 92 74 -18
__bitmap_weight 115 97 -18
__bitmap_subset 153 129 -24
__bitmap_full 128 104 -24
__bitmap_empty 120 96 -24
bitmap_set 179 149 -30
bitmap_clear 185 155 -30
__bitmap_equal 136 105 -31
__bitmap_intersects 148 108 -40
__bitmap_complement 109 67 -42
tick_device_setup_broadcast_func.isra 81 - -81
[The increases in __bitmap_and{,not} are due to bug fixes 17/18,18/18.
No idea why bitmap_weight suddenly appears.] While 163 bytes treewide is
insignificant, I believe the bitmap functions are often called with
locks held, so saving even a few cycles might be worth it.
While making these changes, I found a few other things that might be
worth including. 16,17,18 are actual bug fixes. The rest shouldn't
change the behaviour of any of the functions, provided no-one passed
negative nbits values. If something should come up, it should be fairly
bisectable.
A few issues I thought about, but didn't know what to do with:
* Many of the functions misbehave if nbits is compile-time 0; the
out-of-line functions generally handle 0 correctly. bitmap_fill() is
particularly bad, whether the 0 is known at compile time or not. It
would probably be nice to add detection of at least compile-time 0 and
handle that appropriately.
* I didn't change __bitmap_shift_{left,right} to use unsigned because I
want to fully understand why the algorithm works before making that
change. However, AFAICT, they behave correctly for all (positive) shift
amounts. This is not the case for the small_const_nbits versions. If
for example nbits = n = BITS_PER_LONG, the shift operators turn into
no-ops (at least on x86), so one get *dst = *src, whereas one would
expect to get *dst=0. That difference in behaviour is somewhat
annoying.
This patch (of 18):
The compiler can generate slightly smaller and simpler code when it
knows that "nbits" is non-negative. Since no-one passes a negative
bit-count, this shouldn't affect the semantics.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The helper merge_and_restore_back_links() makes sure to call the
caller's cmp function during the final ->prev pointer fixup, so that the
cmp function may call cond_resched(). However, if the cmp function does
not call cond_resched() at all, this is entirely redundant. If it does,
doing at least two function calls for every two pointer assignments is a
bit excessive. This patch limits the calls to once for every 256
iterations.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Cc: Don Mullis <don.mullis@gmail.com>
Cc: Dave Chinner <david@fromorbit.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There is no reason to maintain the list structure while freeing the
debug elements. Aside from the redundant pointer manipulations, it is
also inefficient from a locality-of-reference viewpoint, since they are
visited in a random order (wrt. the order they were allocated).
Furthermore, if we jumped to exit: after detecting list corruption, it
is actually dangerous.
So just free the elements in the order they were allocated, using the
backing array elts. Allocate that using kcalloc(), so that if
allocation of one of the debug element fails, we just end up calling
kfree(NULL) for the trailing elements.
Minor details: Use sizeof(*elts) instead of sizeof(void *), and return
err immediately when allocation of elts fails, to avoid introducing
another label just before the final return statement.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Cc: Don Mullis <don.mullis@gmail.com>
Cc: Dave Chinner <david@fromorbit.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a check to make sure that the prev pointer of the list head points
to the last element on the list.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Cc: Don Mullis <don.mullis@gmail.com>
Cc: Dave Chinner <david@fromorbit.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Complement commit 68aecfb979 ("lib/string_helpers.c: make arrays
static") by making the arrays const -- not only pointing to const
strings. This moves them out of the data section to the r/o data
section:
text data bss dec hex filename
1150 176 0 1326 52e lib/string_helpers.old.o
1326 0 0 1326 52e lib/string_helpers.new.o
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
For modern filesystems such as btrfs, t/p/e size level operations are
common. add size unit t/p/e parsing to memparse
Signed-off-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com>
Acked-by: David Rientjes <rientjes@google.com>
Reviewed-by: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This was useful during development, and is retained for future
regression testing.
GCC appears to have no way to place string literals in a particular
section; adding __initconst to a char pointer leaves the string itself
in the default string section, where it will not be thrown away after
module load.
Thus all string constants are kept in explicitly declared and named
arrays. Sorry this makes printk a bit harder to read. At least the
tests are more compact.
Signed-off-by: George Spelvin <linux@horizon.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is a helper function from drivers/ata/libata_core.c, where it is
used to blacklist particular device models. It's being moved to lib/ so
other drivers may use it for the same purpose.
This implementation in non-recursive, so is safe for the kernel stack.
[akpm@linux-foundation.org: fix sparse warning]
Signed-off-by: George Spelvin <linux@horizon.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cleanup unused `if 0'-ed functions, which have been dead since 2006
(commits 87c2ce3b93 ("lib/zlib*: cleanups") by Adrian Bunk and
4f3865fb57 ("zlib_inflate: Upgrade library code to a recent version")
by Richard Purdie):
- zlib_deflateSetDictionary
- zlib_deflateParams
- zlib_deflateCopy
- zlib_inflateSync
- zlib_syncsearch
- zlib_inflateSetDictionary
- zlib_inflatePrime
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The name was modified from hlist_add_after() to hlist_add_behind() when
adjusting the order of arguments to match the one with
klist_add_after(). This is necessary to break old code when it would
use it the wrong way.
Make klist follow this naming scheme for consistency.
Signed-off-by: Ken Helias <kenhelias@firemail.de>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit a8fe19ebfb ("kernel/printk: use symbolic defines for console
loglevels") makes consistent use of symbolic values for printk() log
levels.
The naming scheme used is different from the one used for
DEFAULT_MESSAGE_LOGLEVEL though. Change that symbol name to be
MESSAGE_LOGLEVEL_DEFAULT for consistency. And because the value of that
symbol comes from a similarly-named config option, rename
CONFIG_DEFAULT_MESSAGE_LOGLEVEL as well.
Signed-off-by: Alex Elder <elder@linaro.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Jan Kara <jack@suse.cz>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Petr Mladek <pmladek@suse.cz>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull networking updates from David Miller:
"Highlights:
1) Steady transitioning of the BPF instructure to a generic spot so
all kernel subsystems can make use of it, from Alexei Starovoitov.
2) SFC driver supports busy polling, from Alexandre Rames.
3) Take advantage of hash table in UDP multicast delivery, from David
Held.
4) Lighten locking, in particular by getting rid of the LRU lists, in
inet frag handling. From Florian Westphal.
5) Add support for various RFC6458 control messages in SCTP, from
Geir Ola Vaagland.
6) Allow to filter bridge forwarding database dumps by device, from
Jamal Hadi Salim.
7) virtio-net also now supports busy polling, from Jason Wang.
8) Some low level optimization tweaks in pktgen from Jesper Dangaard
Brouer.
9) Add support for ipv6 address generation modes, so that userland
can have some input into the process. From Jiri Pirko.
10) Consolidate common TCP connection request code in ipv4 and ipv6,
from Octavian Purdila.
11) New ARP packet logger in netfilter, from Pablo Neira Ayuso.
12) Generic resizable RCU hash table, with intial users in netlink and
nftables. From Thomas Graf.
13) Maintain a name assignment type so that userspace can see where a
network device name came from (enumerated by kernel, assigned
explicitly by userspace, etc.) From Tom Gundersen.
14) Automatic flow label generation on transmit in ipv6, from Tom
Herbert.
15) New packet timestamping facilities from Willem de Bruijn, meant to
assist in measuring latencies going into/out-of the packet
scheduler, latency from TCP data transmission to ACK, etc"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1536 commits)
cxgb4 : Disable recursive mailbox commands when enabling vi
net: reduce USB network driver config options.
tg3: Modify tg3_tso_bug() to handle multiple TX rings
amd-xgbe: Perform phy connect/disconnect at dev open/stop
amd-xgbe: Use dma_set_mask_and_coherent to set DMA mask
net: sun4i-emac: fix memory leak on bad packet
sctp: fix possible seqlock seadlock in sctp_packet_transmit()
Revert "net: phy: Set the driver when registering an MDIO bus device"
cxgb4vf: Turn off SGE RX/TX Callback Timers and interrupts in PCI shutdown routine
team: Simplify return path of team_newlink
bridge: Update outdated comment on promiscuous mode
net-timestamp: ACK timestamp for bytestreams
net-timestamp: TCP timestamping
net-timestamp: SCHED timestamp on entering packet scheduler
net-timestamp: add key to disambiguate concurrent datagrams
net-timestamp: move timestamp flags out of sk_flags
net-timestamp: extend SCM_TIMESTAMPING ancillary data struct
cxgb4i : Move stray CPL definitions to cxgb4 driver
tcp: reduce spurious retransmits due to transient SACK reneging
qlcnic: Initialize dcbnl_ops before register_netdev
...
Pull security subsystem updates from James Morris:
"In this release:
- PKCS#7 parser for the key management subsystem from David Howells
- appoint Kees Cook as seccomp maintainer
- bugfixes and general maintenance across the subsystem"
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (94 commits)
X.509: Need to export x509_request_asymmetric_key()
netlabel: shorter names for the NetLabel catmap funcs/structs
netlabel: fix the catmap walking functions
netlabel: fix the horribly broken catmap functions
netlabel: fix a problem when setting bits below the previously lowest bit
PKCS#7: X.509 certificate issuer and subject are mandatory fields in the ASN.1
tpm: simplify code by using %*phN specifier
tpm: Provide a generic means to override the chip returned timeouts
tpm: missing tpm_chip_put in tpm_get_random()
tpm: Properly clean sysfs entries in error path
tpm: Add missing tpm_do_selftest to ST33 I2C driver
PKCS#7: Use x509_request_asymmetric_key()
Revert "selinux: fix the default socket labeling in sock_graft()"
X.509: x509_request_asymmetric_keys() doesn't need string length arguments
PKCS#7: fix sparse non static symbol warning
KEYS: revert encrypted key change
ima: add support for measuring and appraising firmware
firmware_class: perform new LSM checks
security: introduce kernel_fw_from_file hook
PKCS#7: Missing inclusion of linux/err.h
...
Conflicts:
drivers/net/Makefile
net/ipv6/sysctl_net_ipv6.c
Two ipv6_table_template[] additions overlap, so the index
of the ipv6_table[x] assignments needed to be adjusted.
In the drivers/net/Makefile case, we've gotten rid of the
garbage whereby we had to list every single USB networking
driver in the top-level Makefile, there is just one
"USB_NETWORKING" that guards everything.
Signed-off-by: David S. Miller <davem@davemloft.net>