If an inlined kernel is called in a loop, the launch point alloca would
lead to increasing stack usage every time the kernel is invoked. This
could make the application run out of stack space and crash. This problem
is fixed by using the alloca insertion point while creating the alloca instruction.
Fixes https://github.com/llvm/llvm-project/issues/60602
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D145820
Since P0857, part of C++20, a *lambda-expression* can contain a
*requires-clause* after its *template-parameter-list*.
While support for this was added as part of
eccc734a69, one specific case isn't
handled properly, where the *requires-clause* consists of an
instantiation of a boolean variable template. This is due to a
diagnostic check which was written with the assumption that a
*requires-clause* can never be followed by a left parenthesis. This
assumption no longer holds for lambdas.
This diagnostic check would then attempt to perform a "recovery", but it
does so in a valid parse state, resulting in an invalid parse state
instead!
This patch adds a special case when parsing requires clauses of lambda
templates, to skip this diagnostic check.
Fixes https://github.com/llvm/llvm-project/issues/61278
Fixes https://github.com/llvm/llvm-project/issues/61387
Reviewed By: erichkeane
Differential Revision: https://reviews.llvm.org/D146140
The original MemDeallocPolicy had two options:
* Standard: allocated memory lives until deallocated or abandoned.
* Finalize: allocated memory lives until all finalize actions have been run,
then is destroyed.
This patch introduces a new 'NoAlloc' option. NoAlloc indicates that the
section should be ignored by the JITLinkMemoryManager -- the memory manager
should allocate neither working memory nor executor address space to blocks in
NoAlloc sections. The NoAlloc option is intended to support metadata sections
(e.g. debug info) that we want to keep in the graph and have fixed up if
necessary, but don't want allocated or transmitted to the executor (or we want
that allocation and transmission to be managed manually by plugins).
Since NoAlloc blocks are ignored by the JITLinkMemoryManager they will not have
working memory allocated to them by default post-allocation. Clients wishing to
modify the content of a block in a NoAlloc section should call
`Block::getMutableMemory(LinkGraph&)` to get writable memory allocated on the
LinkGraph's allocator (this memory will exist for the lifetime of the graph).
If no client requests mutable memory prior to the fixup phase then the generic
link algorithm will do so when it encounters the first edge in any given block.
Addresses of blocks in NoAlloc sections are initialized by the LinkGraph
creator (a LinkGraphBuilder, if the graph is generated from an object file),
and should not be modified by the JITLinkMemoryManager. Plugins are responsible
for updating addresses if they add/remove content from these sections. The
meaning of addresses in NoAlloc-sections is backend/plugin defined, but for
fixup purposes they will be treated the same as addresses in Standard/Finalize
sections. References from Standard/Finalize sections to NoAlloc sections are
expected to be common (these represent metadata tracking executor addresses).
References from NoAlloc sections to Standard/Finalize sections are expected to
be rare/non-existent (they would represent JIT'd code / data tracking metadata
in the controller, which would be surprising). LinkGraphBuilders and specific
backends may impose additional constraints on edges between Standard/Finalize
and NoAlloc sections where required for correctness.
Differential Revision: https://reviews.llvm.org/D146183
This can prevent unnecessarily hoisting out of loops.
Test case cribbed from AArch64.
I also intend to make them rematerializable.
Differential Revision: https://reviews.llvm.org/D146314
Currently compiler does not support mixing of shuffled nodes
+ gather/buildvector of the remaining scalar values. It may reduce total
number of instructions and improve performance of the
gather/buildvector sequences.
Part of D110978
Differential Revision: https://reviews.llvm.org/D146167
Summary:
Fixes the lack of a dependency after changing the order of some
includes. Also we weren't running any tests as the GPU was always
disabling them. Fix the logic.
Only override InOperandList when the Real instruction needs a different
type for $offset than the Pseudo.
Differential Revision: https://reviews.llvm.org/D146313
This patch enables integration tests running on the GPU. This uses the
RPC interface implemented in D145913 to compile the necessary
dependencies for the integration test object. We can then use this to
compile the objects for the GPU directly and execute them using the AMD
HSA loader combined with its RPC server. For example, the compiler is
performing the following actions to execute the integration tests.
```
$ clang++ --target=amdgcn-amd-amdhsa -mcpu=gfx1030 -nostdlib -flto -ffreestanding \
crt1.o io.o quick_exit.o test.o rpc_client.o args_test.o -o image
$ ./amdhsa_loader image 1 2 5
args_test.cpp:24: Expected 'my_streq(argv[3], "3")' to be true, but is false
```
This currently only works with a single threaded client implementation
running on AMDGPU. Further work will implement multiple clients for AMD
and the ability to run on NVPTX as well.
Depends on D145913
Reviewed By: sivachandra, JonChesterfield
Differential Revision: https://reviews.llvm.org/D146256
This patch adds initial support for an RPC client / server architecture.
The GPU is unable to perform several system utilities on its own, so in
order to implement features like printing or memory allocation we need
to be able to communicate with the executing process. This is done via a
buffer of "sharable" memory. That is, a buffer with a unified pointer
that both the client and server can use to communicate.
The implementation here is based off of Jon Chesterfields minimal RPC
example in his work. We use an `inbox` and `outbox` to communicate
between if there is an RPC request and to signify when work is done.
We use a fixed-size buffer for the communication channel. This is fixed
size so that we can ensure that there is enough space for all
compute-units on the GPU to issue work to any of the ports. Right now
the implementation is single threaded so there is only a single buffer
that is not shared.
This implementation still has several features missing to be complete.
Such as multi-threaded support and asynchrnonous calls.
Depends on D145912
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D145913
Currently the conversion of elementwise ops only checks for scalar index
types when checking for bitwidth emulation.
Differential Revision: https://reviews.llvm.org/D146307
This is the funcref counterpart to 890146b. We introduce a new attribute
that marks a function pointer as a funcref. It also implements builtin
__builtin_wasm_ref_null_func(), that returns a null funcref value.
Differential Revision: https://reviews.llvm.org/D128440
In Shell tests, replace use of the `p` alias with the `expression` command.
To avoid conflating tests of the alias with tests of the expression command,
this patch canonicalizes to the use `expression`.
See also D141539 which made the same change to API tests.
Differential Revision: https://reviews.llvm.org/D146230
LLDB uses `_up`, `_sp` and `_wp` suffixes for unique, shared and weak
pointers respectively. This can become confusing in combination with
watchpoints which are commonly abbreviated to `wp`. Update
CommandObjectWatchpoint to use `watch_sp` for all `WatchpointSP`
variables.
When interfacing with C code, assumed type should be passed as
basic pointer.
Reviewed By: PeteSteinfeld
Differential Revision: https://reviews.llvm.org/D146300
When setting a variable watchpoint, the watchpoint stores the variable
name in the watchpoint spec. For expression variables we should store
the expression in the watchpoint spec. This patch adds that
functionality.
rdar://106096860
Differential revision: https://reviews.llvm.org/D146262
When offloading is mandatory we can emit a more helpful message if we
did not find any compatible images with the user's system.
Fixes#60221
Reviewed By: ye-luo
Differential Revision: https://reviews.llvm.org/D142369
Update isVectorizedMemAccessUse to also check if the pointer is stored.
This prevents LV to incorrectly consider a pointer as uniform if it is
used as both pointer and stored by the same StoreInst.
Fixes#61396.
This reverts commit 2c78a9e65c.
Fails to compile on mlir-s390x-linux buildbot using GCC 9.4 with:
llvm/lib/Analysis/AliasSetTracker.cpp: In member function 'void llvm::AliasSet::mergeSetIn(llvm::AliasSet&, llvm::AliasSetTracker&, llvm::BatchAAResults&)':
llvm/lib/Analysis/AliasSetTracker.cpp:50:19: error: invalid operands of types 'unsigned char:2' and 'unsigned char:2' to binary 'operator|'
The 'mma.sp.sync.aligned' family of instructions expects
the sparsity selector as a direct literal (0x0 or 0x1).
The current MLIR inline asm passed this as a value in
register, which broke the downstream assemblers
This is a small step towards supporting 2:4 sparsity on
NVidia GPUs in the sparse compiler of MLIR.
Reviewed By: ThomasRaoux, guraypp
Differential Revision: https://reviews.llvm.org/D146110
InnerLoopVectorizer::createBitOrPointerCast only supported fixed
length vectors since it hadn't been updated. Supporting scalable
vectors is just a matter of changing types and using elementcount
instead of numelements, since there's nothing which actually relies
on knowing the exact length of the vector.
Original written by mgabka.
Split out from D145163.
This allows-fembed-offload-object's and -fopenmp-is-device
compiler invocation arguments to be passed to the Flang frontend
during split compilation when offloading in OpenMP.
An example use case is when passing an offload-arch alongside
-fopenmp to embed device objects compiled for the offload-arch
within the host architecture.
This borrows from existing clangDriver+Clang.h/.cpp work and the intent
is currently to reuse as much of the existing infrastructure and design as
we can to achieve offloading for Flang+OpenMP. An overview of
Clang's offloading design can be found
here: https://clang.llvm.org/docs/OffloadingDesign.html
Reviewers:
awarzynski
jhuber6
Differential Revision: https://reviews.llvm.org/D145815
This patch adds llvm::codeview::SourceLanguage entries, DWARF translations, and PDB source file extensions in LLVM and allow LLDB's PDB parsers to recognize them correctly.
The CV_CFL_LANG enum in the Visual Studio 2022 documentation https://learn.microsoft.com/en-us/visualstudio/debugger/debug-interface-access/cv-cfl-lang defines:
```
CV_CFL_OBJC = 0x11,
CV_CFL_OBJCXX = 0x12,
```
Since the initial commit in D24317, ObjC was emitted as C language and ObjC++ as Masm.
Reviewed By: DavidSpickett
Differential Revision: https://reviews.llvm.org/D146221
CopyFileToErr() uses Printf("%s", ...) which fails with a negative size on
files >2Gb (Its path is through var-args wrappers to an unnecessary "%s"
expansion and subject to int overflows) Using puts() in place of printf()
bypasses this path and writes the string directly to stderr. This avoids the
present loss of data when a crashed worker has generated >2Gb of output.
rdar://99384640
Reviewed By: yln
Differential Revision: https://reviews.llvm.org/D146189
This reverts commit 54225c457a.
The lack of RVO causes compile errors in our code.
Reverting to unblock our integrate.
See D145639 for full discussion.
In some circumstances llvm-readelf symlink to llvm-readobj appears
to not be available. For want of a proper fix in CMake, use llvm-readobj
in the test instead.
Autogen for naming change, and remove comments about C code inspiration. Multiple of these are out of sync with the actual IR, and these are IR tests anyways.