kernel_optimize_test/kernel/time
Thomas Gleixner 9c1645727b timekeeping_Force_unsigned_clocksource_to_nanoseconds_conversion
The clocksource delta to nanoseconds conversion is using signed math, but
the delta is unsigned. This makes the conversion space smaller than
necessary and in case of a multiplication overflow the conversion can
become negative. The conversion is done with scaled math:

    s64 nsec_delta = ((s64)clkdelta * clk->mult) >> clk->shift;

Shifting a signed integer right obvioulsy preserves the sign, which has
interesting consequences:
 
 - Time jumps backwards
 
 - __iter_div_u64_rem() which is used in one of the calling code pathes
   will take forever to piecewise calculate the seconds/nanoseconds part.

This has been reported by several people with different scenarios:

David observed that when stopping a VM with a debugger:

 "It was essentially the stopped by debugger case.  I forget exactly why,
  but the guest was being explicitly stopped from outside, it wasn't just
  scheduling lag.  I think it was something in the vicinity of 10 minutes
  stopped."

 When lifting the stop the machine went dead.

The stopped by debugger case is not really interesting, but nevertheless it
would be a good thing not to die completely.

But this was also observed on a live system by Liav:

 "When the OS is too overloaded, delta will get a high enough value for the
  msb of the sum delta * tkr->mult + tkr->xtime_nsec to be set, and so
  after the shift the nsec variable will gain a value similar to
  0xffffffffff000000."

Unfortunately this has been reintroduced recently with commit 6bd58f09e1
("time: Add cycles to nanoseconds translation"). It had been fixed a year
ago already in commit 35a4933a89 ("time: Avoid signed overflow in
timekeeping_get_ns()").

Though it's not surprising that the issue has been reintroduced because the
function itself and the whole call chain uses s64 for the result and the
propagation of it. The change in this recent commit is subtle:

   s64 nsec;

-  nsec = (d * m + n) >> s:
+  nsec = d * m + n;
+  nsec >>= s;

d being type of cycle_t adds another level of obfuscation.

This wouldn't have happened if the previous change to unsigned computation
would have made the 'nsec' variable u64 right away and a follow up patch
had cleaned up the whole call chain.

There have been patches submitted which basically did a revert of the above
patch leaving everything else unchanged as signed. Back to square one. This
spawned a admittedly pointless discussion about potential users which rely
on the unsigned behaviour until someone pointed out that it had been fixed
before. The changelogs of said patches added further confusion as they made
finally false claims about the consequences for eventual users which expect
signed results.

Despite delta being cycle_t, aka. u64, it's very well possible to hand in
a signed negative value and the signed computation will happily return the
correct result. But nobody actually sat down and analyzed the code which
was added as user after the propably unintended signed conversion.

Though in sensitive code like this it's better to analyze it proper and
make sure that nothing relies on this than hunting the subtle wreckage half
a year later. After analyzing all call chains it stands that no caller can
hand in a negative value (which actually would work due to the s64 cast)
and rely on the signed math to do the right thing.

Change the conversion function to unsigned math. The conversion of all call
chains is done in a follow up patch.

This solves the starvation issue, which was caused by the negative result,
but it does not solve the underlying problem. It merily procrastinates
it. When the timekeeper update is deferred long enough that the unsigned
multiplication overflows, then time going backwards is observable again.

It does neither solve the issue of clocksources with a small counter width
which will wrap around possibly several times and cause random time stamps
to be generated. But those are usually not found on systems used for
virtualization, so this is likely a non issue.

I took the liberty to claim authorship for this simply because
analyzing all callsites and writing the changelog took substantially
more time than just making the simple s/s64/u64/ change and ignore the
rest.

Fixes: 6bd58f09e1 ("time: Add cycles to nanoseconds translation")
Reported-by: David Gibson <david@gibson.dropbear.id.au>
Reported-by: Liav Rehana <liavr@mellanox.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Parit Bhargava <prarit@redhat.com>
Cc: Laurent Vivier <lvivier@redhat.com>
Cc: "Christopher S. Hall" <christopher.s.hall@intel.com>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20161208204228.688545601@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-12-09 12:06:41 +01:00
..
alarmtimer.c alarmtimer: Add tracepoints for alarm timers 2016-12-01 14:45:08 +01:00
clockevents.c clockevents: Make clockevents_subsys static 2016-07-19 10:48:06 +02:00
clocksource.c clocksource: Defer override invalidation unless clock is unstable 2016-08-31 14:43:33 -07:00
hrtimer.c timers: Fix documentation for schedule_timeout() and similar 2016-10-26 13:14:47 +02:00
itimer.c timer: Move sys_alarm from timer.c to itimer.c 2016-11-16 09:26:34 +01:00
jiffies.c jiffies: Use CLOCKSOURCE_MASK instead of constant 2016-02-27 08:55:31 +01:00
Kconfig rcu: Drop RCU_USER_QS in favor of NO_HZ_FULL 2015-07-06 13:52:18 -07:00
Makefile posix-timers: Make them configurable 2016-11-16 09:26:35 +01:00
ntp_internal.h ntp: Fix second_overflow's input parameter type to be 64bits 2015-12-16 16:50:56 -08:00
ntp.c ntp: Fix ADJ_SETOFFSET being used w/ ADJ_NANO 2016-01-22 12:01:42 +01:00
posix-clock.c posix-clock: Fix return code on the poll method's error path 2015-12-29 11:33:06 +01:00
posix-cpu-timers.c posix_cpu_timers: Move the add_device_randomness() call to a proper place 2016-11-16 09:26:34 +01:00
posix-stubs.c posix-timers: Make them configurable 2016-11-16 09:26:35 +01:00
posix-timers.c posix-timers: Handle relative timers with CONFIG_TIME_LOW_RES proper 2016-01-17 11:13:55 +01:00
sched_clock.c timers, sched/clock: Clean up the code a bit 2015-03-27 08:34:01 +01:00
test_udelay.c time: Avoid timespec in udelay_test 2016-06-20 12:47:26 -07:00
tick-broadcast-hrtimer.c tick/broadcast-hrtimer: Set name of the ce_broadcast_hrtimer 2016-07-05 17:02:19 +02:00
tick-broadcast.c tick: Move the export of tick_broadcast_oneshot_control to the proper place 2015-07-14 12:01:04 +02:00
tick-common.c clockevents: Remove unused set_mode() callback 2015-09-14 11:00:55 +02:00
tick-internal.h timers: Forward the wheel clock whenever possible 2016-07-07 10:35:11 +02:00
tick-oneshot.c clockevents: Provide functions to set and get the state 2015-06-02 14:40:47 +02:00
tick-sched.c tick/nohz: Prevent stopping the tick on an offline CPU 2016-09-13 17:53:52 +02:00
tick-sched.h timers/nohz: Convert tick dependency mask to atomic_t 2016-03-29 11:52:11 +02:00
time.c time: Avoid undefined behaviour in timespec64_add_safe() 2016-08-31 14:43:35 -07:00
timeconst.bc timeconst: Update path in comment 2015-10-26 10:06:06 +09:00
timeconv.c time: Add time64_to_tm() 2016-06-20 12:47:15 -07:00
timecounter.c
timekeeping_debug.c timekeeping: Prints the amounts of time spent during suspend 2016-08-31 14:43:34 -07:00
timekeeping_internal.h clocksource: Make clocksource validation work for all clocksources 2015-12-19 15:59:57 +01:00
timekeeping.c timekeeping_Force_unsigned_clocksource_to_nanoseconds_conversion 2016-12-09 12:06:41 +01:00
timekeeping.h hrtimer: Make offset update smarter 2015-04-22 17:06:49 +02:00
timer_list.c hrtimer: Handle remaining time proper for TIME_LOW_RES 2016-01-17 11:13:55 +01:00
timer_stats.c timer: Avoid using timespec 2016-06-20 12:47:33 -07:00
timer.c posix-timers: Make them configurable 2016-11-16 09:26:35 +01:00