[flang] RFC: -fstack-arrays
An RFC describing how -fstack-arrays will be implemented. Differential Revision: https://reviews.llvm.org/D139617
This commit is contained in:
parent
2feafd059e
commit
5922b88635
192
flang/docs/fstack-arrays.md
Normal file
192
flang/docs/fstack-arrays.md
Normal file
|
@ -0,0 +1,192 @@
|
|||
# -fstack-arrays
|
||||
## Problem Description
|
||||
In gfortran, `-fstack-arrays` will cause all local arrays, including those of
|
||||
unknown size, to be allocated from stack memory. Gfortran enables this flag by
|
||||
default at `-Ofast`.
|
||||
|
||||
Flang already allocates all local arrays on the stack (the
|
||||
`memory-allocation-opt` pass can move large allocations to the heap, but by
|
||||
default it will not). But there are some cases where temporary arrays are
|
||||
created on the heap by Flang. Moving these temporary arrays is the goal here.
|
||||
|
||||
## Proposed Solution
|
||||
One approach would be to write a pass which attempts to move heap allocations to
|
||||
the stack (like the `memory-allocation-opt` pass in reverse). This approach has
|
||||
the advantage of a self-contained implementation and ensuring that newly added
|
||||
cases are covered.
|
||||
|
||||
Another approach would be to modify each place in which arrays are allocated on
|
||||
the heap to instead generate a stack allocation. The advantage of the second
|
||||
approach is that it would be difficult to match up all `fir.allocmem` operations
|
||||
with their associated `fir.freemem`. In general, the lifetimes of heap allocated
|
||||
memory are not constrained by the current function's stack frame and so cannot
|
||||
be always converted to stack allocations. It is much easier to swap heap
|
||||
allocations for stack allocations when they are first generated because the
|
||||
lifetime information is conveniently available.
|
||||
|
||||
For example, to rewrite the heap allocation in the `array-value-copy` pass with
|
||||
a stack allocation using the first approach would require analysis to ensure
|
||||
that the heap allocation is always freed before the function returns. This is
|
||||
much more complex than never generating a heap allocation (and free) in the
|
||||
first place (the second approach).
|
||||
|
||||
The plan is to take the more complex first approach so that newly added changes
|
||||
to lowering code do not need to be made to support the stack arrays option. The
|
||||
general problem of determining heap allocation lifetimes can be simplified in
|
||||
this case because only those allocations which are always freed before exit from
|
||||
the same function are possible to be moved to heap allocations in that
|
||||
function's stack frame. Furthermore, the aim of this pass would be to only move
|
||||
those allocations generated by Flang, and so some of the more difficult cases
|
||||
can be avoided. Where the transformation is not possible, the existing heap
|
||||
allocation will be kept.
|
||||
|
||||
## Implementation details overview
|
||||
In order to limit the complexity of the proposed pass, it is useful to
|
||||
understand the situations in which Flang will generate heap allocations.
|
||||
|
||||
### Known Heap Array Allocations
|
||||
Flang allocates most arrays on the stack by default, but there are a few cases
|
||||
where temporary arrays are allocated on the heap:
|
||||
- `flang/lib/Optimizer/Transforms/ArrayValueCopy.cpp`
|
||||
- `flang/lib/Optimizer/Transforms/MemoryAllocation.cpp`
|
||||
- `flang/lib/Lower/ConvertExpr.cpp`
|
||||
- `flang/lib/Lower/IntrinsicCall.cpp`
|
||||
- `flang/lib/Lower/ConvertVariable.cpp`
|
||||
|
||||
Lowering code is being updated and in the future, temporaries for expressions
|
||||
will be created in the HLFIR bufferization pass in
|
||||
`flang/lib/Optimizer/HLFIR/Trnasforms/BufferizeHLFIR.cpp`.
|
||||
|
||||
#### `ArrayValueCopy.cpp`
|
||||
Memory is allocated for a temporary array in `allocateArrayTemp()`. This
|
||||
temporary array is used to ensure that assignments of one array to itself
|
||||
produce the required value. E.g.
|
||||
|
||||
```
|
||||
integer, dimension(5), intent(inout) :: x
|
||||
x(3,4) = x(1,2)
|
||||
```
|
||||
|
||||
#### `MemoryAllocation.cpp`
|
||||
The default options for the Memory Allocation transformation ensure that no
|
||||
array allocations, no matter how large, are moved from the stack to the heap.
|
||||
|
||||
#### `ConvertExpr.cpp`
|
||||
`ConvertExpr.cpp` allocates many array temporaries on the heap:
|
||||
- Passing array arguments by value or when they need re-shaping
|
||||
- Lowering elemental array expressions
|
||||
- Lowering mask expressions
|
||||
- Array constructors
|
||||
|
||||
The last two of these cases are **not** covered by the current stack arrays pass
|
||||
design.
|
||||
|
||||
The FIR code generated for mask expressions (the WHERE construct) sets a
|
||||
boolean variable to indicate whether a heap allocation was necessary. The
|
||||
allocation is only freed if the variable indicates that the allocation was
|
||||
performed to begin with. The proposed dataflow analysis is not intelligent
|
||||
enough to statically determine that the boolean variable will always be true
|
||||
when the allocation is performed. Beyond this, the control flow in the generated
|
||||
FIR code passes the allocated memory through `fir.result`, resulting in a
|
||||
different SSA value to be allocated and freed, causing the analysis not to
|
||||
realise that the allocated memory is freed. The most convenient solution here
|
||||
would be to generate less complicated FIR code, as the existing codegen has
|
||||
known bugs: https://github.com/llvm/llvm-project/issues/56921,
|
||||
https://github.com/llvm/llvm-project/issues/59803.
|
||||
|
||||
Code generated for array constructors uses `realloc()` to grow the allocated
|
||||
buffer because the size of the resulting array cannot always be determined
|
||||
ahead of running the constructor. This makes this temporary unsuitable
|
||||
for allocation on the stack.
|
||||
|
||||
#### `IntrinsicCall.cpp`
|
||||
The existing design is for the runtime to do the allocation and the lowering
|
||||
code to insert `fir.freemem` to remove the allocation. It is not clear whether
|
||||
this can be replaced by adapting lowering code to do stack allocations and to
|
||||
pass these to the runtime. This would be a significant change and so is out of
|
||||
scope of `-fstack-arrays`.
|
||||
|
||||
#### `ConvertVariable.cpp`
|
||||
See `Fortran::lower::MapSymbolAttributes`:
|
||||
|
||||
Dummy arguments that are not declared in the current entry point require a
|
||||
skeleton definition. Most such "unused" dummies will not survive into final
|
||||
generated code, but some will. It is illegal to reference one at run time if it
|
||||
does. There are three ways these dummies can be mapped to a value:
|
||||
- a `fir::UndefOp` value: preferable but can't be used for dummies with dynamic
|
||||
bounds or used to define another dummy.
|
||||
- A stack slot. This has intermediate-weight cost but may not be usable for
|
||||
objects with dynamic bounds.
|
||||
- A heap box/descriptor is the heaviest weight option, only used for dynamic
|
||||
objects.
|
||||
|
||||
These heap allocations will be out of scope for `-fstack-arrays` because the
|
||||
existing implementation already uses stack allocations (or better) where
|
||||
possible, and most such dummy arguments will be removed from generated code.
|
||||
|
||||
#### `BufferizeHLFIR.cpp`
|
||||
TODO
|
||||
|
||||
### Detecting Allocations to Move
|
||||
Allocations which could be moved to the stack will be detected by performing a
|
||||
forward dense data flow analysis using `mlir::dataflow::DenseDataFlowAnalysis`.
|
||||
This analysis will search for SSA values created by a `fir.allocmem` which are
|
||||
always freed using `fir.freemem` within the same function.
|
||||
|
||||
Tracking allocations by SSA values will miss some cases where address
|
||||
calculations are duplicated in different blocks: resulting in different SSA
|
||||
values as arguments for the allocation and the free. It is believed that simple
|
||||
tracking of SSA values is sufficient for detecting the allocations for array
|
||||
temporaries added by Flang, because these temporaries should be simple arrays:
|
||||
not nested inside of derived types or other arrays.
|
||||
|
||||
Arrays allocated by an `allocate` statement in source code should not be moved
|
||||
to the stack. These will be identified by adding an attribute to these
|
||||
`fir.allocmem` operations when they are lowered. Intrinsics such as `allocated`
|
||||
and `move_alloc` do not need to be supported because the array temporaries moved
|
||||
to the stack will never be visible in user code.
|
||||
|
||||
Only allocations for arrays will be considered for moving to the stack.
|
||||
|
||||
When conducting the dense data flow analysis, the pass will assume that existing
|
||||
allocations are not freed inside called functions. This is true for the existing
|
||||
heap array temporary allocations generated by Flang. If an allocation were to be
|
||||
freed inside of a called function, the allocation would presumably not also be
|
||||
freed later in the caller function (as this would be a double-free). Therefore
|
||||
the stack arrays pass would not find where the allocation was freed and so not
|
||||
transform the allocation. It is necessary to make this assumption so that the
|
||||
stack arrays pass can transform array allocations created for pass-by-value
|
||||
function arguments.
|
||||
|
||||
### Moving Allocations
|
||||
The `fir.allocmem` will be replaced by a `fir.alloca` with the same arguments.
|
||||
The `fir.freemem` will be removed entirely.
|
||||
|
||||
Where possible, the `fir.alloca` should be placed in the function entry block
|
||||
(so we can be sure it only happens once). This may not be possible if the array
|
||||
has non-constant extents (causing the `fir.alloca` to have SSA values as
|
||||
operands). In this case, the `fir.alloca` will be placed immediately after the
|
||||
last operand becomes available.
|
||||
|
||||
If this location is inside a loop (either an `mlir::LoopLikeOpInterface` or a
|
||||
cyclic CFG), the transformation should attempt to use the `llvm.stacksave`/
|
||||
`llvm.stackrestore` intrinsics to ensure that the stack does not grow on every
|
||||
loop iteration. Use of these intrinsics is considered valid when the allocation
|
||||
and deallocation occur within the same block and there are no other stack
|
||||
allocations between them. If this is not the case, the existing heap allocation
|
||||
will be preserved.
|
||||
|
||||
### FIR attribute
|
||||
A FIR attribute will be added to distinguish `fir.allocmem` arising from
|
||||
`allocate` statements in source from `fir.allocmem` operations added by Flang.
|
||||
The attribute will be called `"fir.must_be_heap"` and will have a boolean value:
|
||||
`true` meaning that the stack arrays pass must not move the allocation, `false`
|
||||
meaning that stack arrays may move the allocation. Not specifying the attribute
|
||||
will be equivalent to setting it to `false`.
|
||||
|
||||
## Testing Plan
|
||||
FileCheck tests will be written to check each of the above identified sources of
|
||||
heap allocated array temporaries are detected and converted by the new pass.
|
||||
|
||||
Another test will check that `allocate` statements in source code will not be
|
||||
moved to the stack.
|
Loading…
Reference in New Issue
Block a user