kernel_optimize_test/net
Daniel Borkmann 4a8f87e60f bpf: Allow for map-in-map with dynamic inner array map entries
Recent work in f4d0525921 ("bpf: Add map_meta_equal map ops") and 134fede4ee
("bpf: Relax max_entries check for most of the inner map types") added support
for dynamic inner max elements for most map-in-map types. Exceptions were maps
like array or prog array where the map_gen_lookup() callback uses the maps'
max_entries field as a constant when emitting instructions.

We recently implemented Maglev consistent hashing into Cilium's load balancer
which uses map-in-map with an outer map being hash and inner being array holding
the Maglev backend table for each service. This has been designed this way in
order to reduce overall memory consumption given the outer hash map allows to
avoid preallocating a large, flat memory area for all services. Also, the
number of service mappings is not always known a-priori.

The use case for dynamic inner array map entries is to further reduce memory
overhead, for example, some services might just have a small number of back
ends while others could have a large number. Right now the Maglev backend table
for small and large number of backends would need to have the same inner array
map entries which adds a lot of unneeded overhead.

Dynamic inner array map entries can be realized by avoiding the inlined code
generation for their lookup. The lookup will still be efficient since it will
be calling into array_map_lookup_elem() directly and thus avoiding retpoline.
The patch adds a BPF_F_INNER_MAP flag to map creation which therefore skips
inline code generation and relaxes array_map_meta_equal() check to ignore both
maps' max_entries. This also still allows to have faster lookups for map-in-map
when BPF_F_INNER_MAP is not specified and hence dynamic max_entries not needed.

Example code generation where inner map is dynamic sized array:

  # bpftool p d x i 125
  int handle__sys_enter(void * ctx):
  ; int handle__sys_enter(void *ctx)
     0: (b4) w1 = 0
  ; int key = 0;
     1: (63) *(u32 *)(r10 -4) = r1
     2: (bf) r2 = r10
  ;
     3: (07) r2 += -4
  ; inner_map = bpf_map_lookup_elem(&outer_arr_dyn, &key);
     4: (18) r1 = map[id:468]
     6: (07) r1 += 272
     7: (61) r0 = *(u32 *)(r2 +0)
     8: (35) if r0 >= 0x3 goto pc+5
     9: (67) r0 <<= 3
    10: (0f) r0 += r1
    11: (79) r0 = *(u64 *)(r0 +0)
    12: (15) if r0 == 0x0 goto pc+1
    13: (05) goto pc+1
    14: (b7) r0 = 0
    15: (b4) w6 = -1
  ; if (!inner_map)
    16: (15) if r0 == 0x0 goto pc+6
    17: (bf) r2 = r10
  ;
    18: (07) r2 += -4
  ; val = bpf_map_lookup_elem(inner_map, &key);
    19: (bf) r1 = r0                               | No inlining but instead
    20: (85) call array_map_lookup_elem#149280     | call to array_map_lookup_elem()
  ; return val ? *val : -1;                        | for inner array lookup.
    21: (15) if r0 == 0x0 goto pc+1
  ; return val ? *val : -1;
    22: (61) r6 = *(u32 *)(r0 +0)
  ; }
    23: (bc) w0 = w6
    24: (95) exit

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20201010234006.7075-4-daniel@iogearbox.net
2020-10-11 10:21:04 -07:00
..
6lowpan
9p treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
802
8021q net: vlan: Fixed signedness in vlan_group_prealloc_vid() 2020-09-28 00:51:39 -07:00
appletalk
atm net: atm: delete duplicated words 2020-09-18 14:12:43 -07:00
ax25
batman-adv net: bridge: mcast: rename br_ip's u member to dst 2020-09-23 13:24:34 -07:00
bluetooth Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next 2020-09-29 13:22:53 -07:00
bpf bpf: fix raw_tp test run in preempt kernel 2020-09-30 08:34:08 -07:00
bpfilter bpf: Add kernel module with user mode driver that populates bpffs. 2020-08-20 16:02:36 +02:00
bridge net: bridge: mcast: remove only S,G port groups from sg_port hash 2020-09-25 16:50:19 -07:00
caif caif: Remove duplicate macro SRVL_CTRL_PKT_SIZE 2020-09-05 15:57:05 -07:00
can can: remove "WITH Linux-syscall-note" from SPDX tag of C files 2020-09-21 10:13:16 +02:00
ceph treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
core bpf: Add redirect_peer helper 2020-10-11 10:21:04 -07:00
dcb net: DCB: Validate DCB_ATTR_DCB_BUFFER argument 2020-09-10 15:09:08 -07:00
dccp inet: remove icsk_ack.blocked 2020-09-30 14:21:30 -07:00
decnet treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
dns_resolver
dsa net: dsa: tag_rtl4_a: use the generic flow dissector procedure 2020-09-26 14:17:59 -07:00
ethernet
ethtool Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-09-22 16:45:34 -07:00
hsr Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-09-22 16:45:34 -07:00
ieee802154 treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
ife
ipv4 bpf: tcp: Do not limit cb_flags when creating child sk from listen sk 2020-10-02 11:34:48 -07:00
ipv6 ip6gre: avoid tx_error when sending MLD/DAD on external tunnels 2020-09-28 16:01:37 -07:00
iucv treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
kcm
key
l2tp l2tp: report rx cookie discards in netlink get 2020-09-29 13:26:36 -07:00
l3mdev net: Fix some comments 2020-08-27 07:55:59 -07:00
lapb
llc
mac80211 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-09-22 16:45:34 -07:00
mac802154 Merge tag 'ieee802154-for-davem-2020-09-08' of git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan 2020-09-08 20:12:58 -07:00
mpls treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
mptcp net: tcp: drop unused function argument from mptcp_incoming_options 2020-09-24 20:17:01 -07:00
ncsi treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
netfilter Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-09-22 16:45:34 -07:00
netlabel netlabel: Fix some kernel-doc warnings 2020-09-08 20:04:27 -07:00
netlink netlink: add spaces around '&' in netlink_recv/sendmsg() 2020-09-17 16:53:47 -07:00
netrom treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
nfc NFC: digital: Remove two unused macroes 2020-09-05 16:01:52 -07:00
nsh
openvswitch net: openswitch: reuse the helper variable to improve the code readablity 2020-09-18 14:24:08 -07:00
packet net/packet: Fix a comment about network_header 2020-09-19 16:40:48 -07:00
phonet treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
psample
qrtr net: qrtr: check skb_put_padto() return value 2020-09-09 11:04:39 -07:00
rds RDS: drop double zeroing 2020-09-20 19:09:11 -07:00
rfkill
rose treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
rxrpc rxrpc: Fix an overget of the conn bundle when setting up a client conn 2020-09-14 16:18:59 +01:00
sched net/sched: cls_u32: Replace one-element array with flexible-array member 2020-09-28 18:48:42 -07:00
sctp Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-09-22 16:45:34 -07:00
smc net/smc: CLC decline - V2 enhancements 2020-09-28 15:19:03 -07:00
strparser
sunrpc Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-09-22 16:45:34 -07:00
switchdev
tipc Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-09-22 16:45:34 -07:00
tls net/tls: Implement getsockopt SOL_TLS TLS_RX 2020-09-01 11:47:12 -07:00
unix net: unix: remove redundant assignment to variable 'err' 2020-09-21 14:51:37 -07:00
vmw_vsock
wimax
wireless Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-09-22 16:45:34 -07:00
x25 treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
xdp bpf: Allow for map-in-map with dynamic inner array map entries 2020-10-11 10:21:04 -07:00
xfrm treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
compat.c
devres.c
Kconfig drop_monitor: Convert to using devlink tracepoint 2020-09-30 18:01:26 -07:00
Makefile
socket.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-09-04 21:28:59 -07:00
sysctl_net.c