The existing logic was unsound, in two ways.
First, due to wrapping on the trip count computation, it could compute a value which convert a loop which exiting on iteration 256, to one which exited at 255. (With i8 trip counts.)
Second, it allowed rewriting when the trip count implies wrapping around the alternate IV. As a trivial example, it allowed rewriting an i128 exit test in terms of an i64 IV. This is obviously wrong.
Note that the test change is fairly minimal - i.e. only the targeted test - but that's only because I precommitted a change which switched the test from 32 to 64 bit pointers. For 32 bit point architectures with 32 bit primary inductions, this transform is almost always unsound to perform.
Differential Revision: https://reviews.llvm.org/D146429
The function order in some tests had to be changed because they relied on ordering of functions returned in an SCC which is consistent but unspecified.
Introduce the possibility to load/store scalars via amdgpu.raw_buffer_{load,store}
Reviewed By: krzysz00
Differential Revision: https://reviews.llvm.org/D146413
Without this patch we were asserting with a generic message `Failed to
create Target`, but we already have a detailed error message stored in
the variable `error` after calling `lookupTarget()` but this error was not
getting used/printed.
With this patch we will emit a message with more details instead of a
stack dump with a generic message.
Differential Revision: https://reviews.llvm.org/D146333
Change-Id: I7ddee917cf921a2133ca3e6b35791b2142f770a2
If the buildvector node matches the vector node, it reuse the vector
value from this vector node, but its VectorizedValue field is not
updated. Need to update this field to avoid misses during the analysis
of the reused gather/buildvector nodes.
getFileLoc() is guaranteed to return a file loc, and getSpellingLoc()
on a file loc is a no-op.
Differential Revision: https://reviews.llvm.org/D146377
This patch adds OpenMP IRBuilder support for the Target Data directives to allow lowering to LLVM IR.
The mlir::Translation is responsible for generating supporting code for processing the map_operands through the processMapOperand function, and also generate code for the r>
The OMPIRBuilder is responsible for generating the begin and end mapper function calls.
Limitations:
- use_device_ptr and use_device_addr clauses are NOT supported for Target Data operation.
- nowait clauses are NOT supported for Target Enter and Exit Data operations.
- Only LLVMPointerType is supported for map_operands.
Differential Revision: https://reviews.llvm.org/D142914
Also, add a comment to highlight that the "good" result on this test is accidental, and not based on a principled decision. I matched the original behavior to make this nfc, but selecting the last legal IV is not well motivated here.
I think it's good practice to avoid having default ctors unless they're really
valid/useful. For OutlinedFunction the default ctor was used to represent a
bail-out value for getOutliningCandidateInfo(), so I changed the API to return
an optional<getOutliningCandidateInfo> instead which seems a tad cleaner.
Differential Revision: https://reviews.llvm.org/D146375
This patch performs the same operation to copy over the `argv` array to
the `envp` array. This allows the GPU tests to use environment
variables.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D146322
In the scanner's VFS, we cache all files by default and only avoid caching stat failures for certain files. This tanks the performance of scanning with pre-populated module cache. When there is a stale PCM file, it gets cached by the scanner at the start and the rebuilt version never makes it through the VFS again. The TU invocation that rebuilds the PCM only sees the copy in its InMemoryModuleCache, which is invisible to other invocations. This means the PCM gets rebuilt for every TU given to the scanner.
This patch fixes the situation by flipping the default, only caching files that are known to be important, and letting everything else fall through to the underlying VFS.
rdar://106376153
Reviewed By: Bigcheese
Differential Revision: https://reviews.llvm.org/D146328
On AIX, libraries are still being linked when `-r` is passed to the driver. This patch corrects this error.
Differential Revision: https://reviews.llvm.org/D145899
When lowering LinalgToStandard for named UnaryFn/BinaryFn ops, ensure
the fun name appears in the generated library name. Further, for
linalg.copy to/from different address spaces, ensure the to/from
address spaces are appended onto the library name for uniqueness.
This fixes the lowering error with the linalg.copy testcase shown in
this patch.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D145467
This is to test D143210 patch to have the same vector
compatibility logic for error and warning diagnostics.
Reviewed By: lei
Differential Revision: https://reviews.llvm.org/D144611
The goal of this patch is to add the ability for the CMake configure to
fail when some optional test dependencies are not met. LLDB tries to be
flexible when test dependencies are not present but there are cases
where it would be useful to know that these dependencies are missing
before we run the test suite.
The intent here is to apply this setting on CI machines and make sure
that they have useful optional dependencies installed. We recently hit a
case where some CI machines were timing out while running the test suite
because a few tests were hanging. With this option, we'll be able to
know if the machine does not have psutil installed so we can install it
and avoid the timeout scenario altogether.
rdar://103194447
Differential Revision: https://reviews.llvm.org/D146335
Main benefit here is making the logic easier to follow, slightly more efficient, and more in line with LFTR. This is not NFC. There are three semantic changes here.
First, we drop handling for constants on the LHS of the comparison. These are non-canonical, and we're very late in the optimization pipeline here, so there's no point in supporting this. I removed a test which covered this case.
Second, we don't need the almost dead IV to be an addrec. We just need SCEV to be able to compute a trip count for it.
Third, we require a simple IV for the almost dead IV. In theory, this removes cases we could have previously handled, but given a) zero testing and b) multiple known correctness issues, I'm adopting an attidute of narrowing this down to something which works correctly, and *then* expanding.
Fix several problems related to serialization causing command line
defines to be reported as being built-in defines:
* When serializing the <built-in> and <command line> files don't
convert them into absolute paths.
* When deserializing SM_SLOC_BUFFER_ENTRY we need to call
setHasLineDirectives in the same way as we do for
SM_SLOC_FILE_ENTRY.
* When created suggested predefines based on the current command line
options we need to add line markers in the same way that
InitializePreprocessor does.
* Adjust a place in clangd where it was implicitly relying on command
line defines being treated as builtin.
Differential Revision: https://reviews.llvm.org/D144651
Some test code was doing loose conversions caught by compiler
warnings in the Fuchsia build. This included duplicated code
in a few tests that was reconsolidated with the existing header
file copy of the same functions.
The MemoryMatcher abstraction presumes gtest-style matcher support,
which is not available in Fuchsia's zxtest library. It's avoided
in favor of simpler memory-comparing assertions.
Reviewed By: abrachet
Differential Revision: https://reviews.llvm.org/D146343
This fixes a compile time issue due to guarding loop unswitching based
on whether the enclosing function is cold. That approach is very
inefficient in the case of large cold functions that contain numerous
loops, since the loop pass calls isFunctionColdInCallGraph once per
loop, and that function walks all BBs in the function (twice for Sample
PGO) looking for any non-cold blocks.
Originally, this code only checked if the current Loop's header was cold
(D129599). However, that apparently caused a slowdown on a SPEC
benchmark, and the example given was that of a cold inner loop nested in
a non-cold outer loop (see comments in D129599). The fix was to check if
the whole function is cold, done in D133275.
This is overkill, and we can simply check if the header of any loop in
the current loop's loop nest is non-cold (looking at both outer and
inner loops). This patch drops the compile time for a large module by
40% with this approach.
I also updated PGO-nontrivial-unswitch2.ll since it only had one cold
loop in a non-cold function, so that it instead had IR based off the
example given in the comments relating to the SPEC degradation in
D129599. I confirmed that the new version of the test fails with the
original check done in D129599 of only the current loop's header
coldness.
Similarly updated test PGO-nontrivial-unswitch.ll to contain a cold loop
in a cold loop nest, and created PGO-nontrivial-unswitch3.ll to contain
a non-cold loop in a non-cold loop nest.
Differential Revision: https://reviews.llvm.org/D146383
These are currently in a `Predicates = [HasStdExtZfhOrZfhmin]` block,
but Zfhmin has no fcmp instructions so the definition makes no sense for
Zfhmin.
Differential Revision: https://reviews.llvm.org/D146435
Summary:
Some older compilers, which we still support, have problems handling the
copy elision that allows us to directly move an `Error` to an
`Expected`. This patch adds explicit moves to remove the error. Same as
last patch but I forgot this one.
Summary:
Some older compilers, which we still support, have problems handling the
copy elision that allows us to directly move an `Error` to an
`Expected`. This patch adds explicit moves to remove the error.
D142084 moved an enumeration inside a header from the llvm namespace
into an anon namespace. Some of the bots started failing as a result.
Differential Revision: https://reviews.llvm.org/D146419
This review implements the following PowerPC math operations that we care about:
- fnabs
- fre
- fres
- frsqrte
- frsqrtes
None of these intrinsics require additional error checks in semantics. The interfaces handle checking types and kinds
Reviewed By: kkwli0
Differential Revision: https://reviews.llvm.org/D146139
Previous:
When we do not make decisions about commutative operands, we can end up in a situation where two values have two potential canonical numbers between two regions. This ensures that an ordering is decided after the initial structure between two regions is determined.
Current:
Previously the outliner only checked that assignment to a value matched what was already known, this patch makes sure that it matches what has already been found, and creates a mapping between the two values where it is a one-to-one mapping.
Reviewer: paquette
Differential Revision: https://reviews.llvm.org/D139336