This fixes a compile time issue due to guarding loop unswitching based
on whether the enclosing function is cold. That approach is very
inefficient in the case of large cold functions that contain numerous
loops, since the loop pass calls isFunctionColdInCallGraph once per
loop, and that function walks all BBs in the function (twice for Sample
PGO) looking for any non-cold blocks.
Originally, this code only checked if the current Loop's header was cold
(D129599). However, that apparently caused a slowdown on a SPEC
benchmark, and the example given was that of a cold inner loop nested in
a non-cold outer loop (see comments in D129599). The fix was to check if
the whole function is cold, done in D133275.
This is overkill, and we can simply check if the header of any loop in
the current loop's loop nest is non-cold (looking at both outer and
inner loops). This patch drops the compile time for a large module by
40% with this approach.
I also updated PGO-nontrivial-unswitch2.ll since it only had one cold
loop in a non-cold function, so that it instead had IR based off the
example given in the comments relating to the SPEC degradation in
D129599. I confirmed that the new version of the test fails with the
original check done in D129599 of only the current loop's header
coldness.
Similarly updated test PGO-nontrivial-unswitch.ll to contain a cold loop
in a cold loop nest, and created PGO-nontrivial-unswitch3.ll to contain
a non-cold loop in a non-cold loop nest.
Differential Revision: https://reviews.llvm.org/D146383
These are currently in a `Predicates = [HasStdExtZfhOrZfhmin]` block,
but Zfhmin has no fcmp instructions so the definition makes no sense for
Zfhmin.
Differential Revision: https://reviews.llvm.org/D146435
Summary:
Some older compilers, which we still support, have problems handling the
copy elision that allows us to directly move an `Error` to an
`Expected`. This patch adds explicit moves to remove the error. Same as
last patch but I forgot this one.
Summary:
Some older compilers, which we still support, have problems handling the
copy elision that allows us to directly move an `Error` to an
`Expected`. This patch adds explicit moves to remove the error.
D142084 moved an enumeration inside a header from the llvm namespace
into an anon namespace. Some of the bots started failing as a result.
Differential Revision: https://reviews.llvm.org/D146419
This review implements the following PowerPC math operations that we care about:
- fnabs
- fre
- fres
- frsqrte
- frsqrtes
None of these intrinsics require additional error checks in semantics. The interfaces handle checking types and kinds
Reviewed By: kkwli0
Differential Revision: https://reviews.llvm.org/D146139
Previous:
When we do not make decisions about commutative operands, we can end up in a situation where two values have two potential canonical numbers between two regions. This ensures that an ordering is decided after the initial structure between two regions is determined.
Current:
Previously the outliner only checked that assignment to a value matched what was already known, this patch makes sure that it matches what has already been found, and creates a mapping between the two values where it is a one-to-one mapping.
Reviewer: paquette
Differential Revision: https://reviews.llvm.org/D139336
IRUnit is defined as:
using IRUnit = PointerUnion<Operation *, Region *, Block *, Value>;
The tracing::Action is extended to take an ArrayRef<IRUnit> as context to
describe an Action. It is demonstrated in the "ActionLogging" observer.
Reviewed By: rriddle, Mogball
Differential Revision: https://reviews.llvm.org/D144814
This is a convenient flag for context where we intend to summarize a top-level
operation without the full-blown regions it may hold.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D145889
Integrate the `tracing::ExecutionContext()` into mlir-opt with a new
--log-action-to=<file> option to demonstrate the feature.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D144813
c59465e120 introduced mapping to warps and
linear GPU ids.
In the implementation, the delinearization basis is reversed from [x, y, z]
to [z, y x] order to properly compute the strides and allow delinearization.
Prior to this commit, we forgot to reverse it back to [x, y, z] order
before materializing the indices.
Fix this oversight.
By adding signal codes to UnixSignals and adding a new function
where you can get a string with optional address and bounds.
Added signal codes to the Linux, FreeBSD and NetBSD signal sets.
I've checked the numbers against the relevant sources.
Each signal code has a code number, description and printing options.
By default you just get the descripton, you can opt into adding either
a fault address or bounds information.
Bounds signals we'll use the description, unless we have the bounds
values in which case we say whether it is an upper or lower bound
issue.
GetCrashReasonString remains in CrashReason because we need it to
be compiled only for platforms with siginfo_t. Ideally it would
move into NativeProcessProtocol, but that is also used
by NativeRegisterContextWindows, where there would be no siginfo_t.
Reviewed By: JDevlieghere
Differential Revision: https://reviews.llvm.org/D146044
applyLoopGuards doesn't always preserve information when there are multiple assumes.
This patch tries to deal with multiple assumes regarding a SCEV's divisibility and min/max values, and rewrite it into a SCEV that still preserves all of the information.
For example, let the trip count of the loop be TC. Consider the 3 following assumes:
1. __builtin_assume(TC % 8 == 0);
2. __builtin_assume(TC > 0);
3. __builtin_assume(TC < 100);
Before this patch, depending on the assume processing order applyLoopGuards could create the following SCEV:
max(min((8 * (TC / 8)) , 99), 1)
Looking at this SCEV, it doesn't preserve the divisibility by 8 information.
After this patch, depending on the assume processing order applyLoopGuards could create the following SCEV:
max(min((8 * (TC / 8)) , 96), 8)
By aligning up 1 to 8, and aligning down 99 to 96, the new SCEV still preserves all of the original assumes.
Differential Revision: https://reviews.llvm.org/D144947
The token annotator doesn't annotate the template opener and closer
as such if they enclose an overloaded operator. This causes the
space between the operator and the closer to be removed, resulting
in invalid C++ code.
Fixes#58602.
Differential Revision: https://reviews.llvm.org/D143755
Add a clang part of OpenHarmony target
Related LLVM part: D138202
~~~
Huawei RRI, OS Lab
Reviewed By: DavidSpickett
Differential Revision: https://reviews.llvm.org/D145227
Update lowering of allocate statement to use the new
functions defined in D146290.
Depends on D146290
Reviewed By: PeteSteinfeld
Differential Revision: https://reviews.llvm.org/D146291
`AllocatableInitIntrinsic`, `AllocatableInitCharacter` and
`AllocatableInitDerived` are meant to be used to initialize a
descriptor when it is instantiated and not to be used multiple
times in a scope.
Add `AllocatableInitDerivedForAllocate`, `AllocatableInitCharacterForAllocate`
and `AllocatableInitDerivedForAllocate` to be used for the allocation
in allocate statement.
These new functions are meant to be used on an initialized descriptor
and will return directly if the descriptor is allocated so the
error handling is done by the call to `AllocatableAllocate`.
Reviewed By: PeteSteinfeld
Differential Revision: https://reviews.llvm.org/D146290
This revisions refactors the implementation of mapping to threads to additionally allow warps and linear ids to be specified.
`warp_dims` is currently specified along with `block_dims` as a transform attribute.
Linear ids on th other hand use the flattened block_dims to predicate on the first (linearized) k threads.
An additional GPULinearIdMappingAttr is added to the GPU dialect to allow specifying loops mapped to this new scheme.
Various implementation and transform op semantics cleanups are also applied.
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D146130