[DirectoryWatcher] Increase timeout to make test less flaky

We've observed this test being significantly flaky on our Mac CI machines when we're running the full check-clang suite. It fails because the wait_for condition isn't met within 3 seconds. We believe it's because our CI machines are somewhat underpowered and pretty heavily loaded when we're running the full check-clang suite. I ran some experiments on increasing the timeout. I ran the full check-clang suite 100 times with each timeout value and recorded how many flaky failures we encountered in these tests. The results are: 3 second timeout (baseline): 20 failures 10 second timeout: 14 failures 20 second timeout: 4 failures 30 second timeout: 2 failures 40 second timeout: 1 failure 50 second timeout: 0 failures 60 second timeout: 0 failures I ran another set of 100 tests for the 50 second timeout and observed one flaky failure. By contrast, I ended up running check-clang 500 times for the 60 second timeout and didn't observe a single flaky failure. That's how the 60 second timeout value used in this patch was derived. While a 60 second timeout might seem high, keep in mind that: - This is a timeout, not a sleep; the test should require much less time the vast majority of instances, especially on more powerful machines. - The long timeout is most likely to occur when other tests are also running at the same time, so the latency of the timeout will also be masked by the latency of the other tests. See https://reviews.llvm.org/D58418?id=200123#inline-554211 for where this timeout was originally introduced and the possibility of raising it if it wasn't enough was discussed. Reviewed By: plotfi Differential Revision: https://reviews.llvm.org/D97878
2021-03-03 11:32:06 -08:00 · 2021-03-03 11:32:06 -08:00 · 9a2a167b6c
commit 9a2a167b6c
parent 1c5f083128
1 changed files with 12 additions and 1 deletions
--- a/clang/unittests/DirectoryWatcher/DirectoryWatcherTest.cpp
+++ b/clang/unittests/DirectoryWatcher/DirectoryWatcherTest.cpp
@ -34,6 +34,17 @@ namespace {

 typedef DirectoryWatcher::Event::EventKind EventKind;

+// We've observed this test being significantly flaky when running on a heavily
+// loaded machine (e.g. when it's being run as part of the full check-clang
+// suite). Set a high timeout value to avoid this flakiness. The 60s timeout
+// value was determined empirically. It's a timeout value, not a sleep value,
+// and the test should require much less time in practice the vast majority of
+// instances. The cases where we do come close to (or still end up hitting) the
+// longer timeout are most likely to occur when other tests are also running at
+// the same time (e.g. as part of the full check-clang suite), in which case the
+// latency of the timeout will be masked by the latency of the other tests.
+constexpr std::chrono::seconds EventualResultTimeout(60);
+
 struct DirectoryWatcherTestFixture {
  std::string TestRootDir;
  std::string TestWatchedDir;
@ -243,7 +254,7 @@ void checkEventualResultWithTimeout(VerifyingConsumer &TestConsumer) {
  std::thread worker(std::move(task));
  worker.detach();

-  EXPECT_TRUE(WaitForExpectedStateResult.wait_for(std::chrono::seconds(3)) ==
+  EXPECT_TRUE(WaitForExpectedStateResult.wait_for(EventualResultTimeout) ==
              std::future_status::ready)
      << "The expected result state wasn't reached before the time-out.";
  std::unique_lock<std::mutex> L(TestConsumer.Mtx);