[flang] RFC: -fstack-arrays

An RFC describing how -fstack-arrays will be implemented. Differential Revision: https://reviews.llvm.org/D139617
2022-12-07 17:52:31 +00:00 · 2022-12-07 17:52:31 +00:00 · 5922b88635
commit 5922b88635
parent 2feafd059e
1 changed files with 192 additions and 0 deletions
--- a/flang/docs/fstack-arrays.md
+++ b/flang/docs/fstack-arrays.md
@ -0,0 +1,192 @@
+# -fstack-arrays
+## Problem Description
+In gfortran, `-fstack-arrays` will cause all local arrays, including those of
+unknown size, to be allocated from stack memory. Gfortran enables this flag by
+default at `-Ofast`.
+
+Flang already allocates all local arrays on the stack (the
+`memory-allocation-opt` pass can move large allocations to the heap, but by
+default it will not). But there are some cases where temporary arrays are
+created on the heap by Flang. Moving these temporary arrays is the goal here.
+
+## Proposed Solution
+One approach would be to write a pass which attempts to move heap allocations to
+the stack (like the `memory-allocation-opt` pass in reverse). This approach has
+the advantage of a self-contained implementation and ensuring that newly added
+cases are covered.
+
+Another approach would be to modify each place in which arrays are allocated on
+the heap to instead generate a stack allocation. The advantage of the second
+approach is that it would be difficult to match up all `fir.allocmem` operations
+with their associated `fir.freemem`. In general, the lifetimes of heap allocated
+memory are not constrained by the current function's stack frame and so cannot
+be always converted to stack allocations. It is much easier to swap heap
+allocations for stack allocations when they are first generated because the
+lifetime information is conveniently available.
+
+For example, to rewrite the heap allocation in the `array-value-copy` pass with
+a stack allocation using the first approach would require analysis to ensure
+that the heap allocation is always freed before the function returns. This is
+much more complex than never generating a heap allocation (and free) in the
+first place (the second approach).
+
+The plan is to take the more complex first approach so that newly added changes
+to lowering code do not need to be made to support the stack arrays option. The
+general problem of determining heap allocation lifetimes can be simplified in
+this case because only those allocations which are always freed before exit from
+the same function are possible to be moved to heap allocations in that
+function's stack frame. Furthermore, the aim of this pass would be to only move
+those allocations generated by Flang, and so some of the more difficult cases
+can be avoided. Where the transformation is not possible, the existing heap
+allocation will be kept.
+
+## Implementation details overview
+In order to limit the complexity of the proposed pass, it is useful to
+understand the situations in which Flang will generate heap allocations.
+
+### Known Heap Array Allocations
+Flang allocates most arrays on the stack by default, but there are a few cases
+where temporary arrays are allocated on the heap:
+- `flang/lib/Optimizer/Transforms/ArrayValueCopy.cpp`
+- `flang/lib/Optimizer/Transforms/MemoryAllocation.cpp`
+- `flang/lib/Lower/ConvertExpr.cpp`
+- `flang/lib/Lower/IntrinsicCall.cpp`
+- `flang/lib/Lower/ConvertVariable.cpp`
+
+Lowering code is being updated and in the future, temporaries for expressions
+will be created in the HLFIR bufferization pass in
+`flang/lib/Optimizer/HLFIR/Trnasforms/BufferizeHLFIR.cpp`.
+
+#### `ArrayValueCopy.cpp`
+Memory is allocated for a temporary array in `allocateArrayTemp()`. This
+temporary array is used to ensure that assignments of one array to itself
+produce the required value. E.g.
+
+```
+integer, dimension(5), intent(inout) :: x
+x(3,4) = x(1,2)
+```
+
+#### `MemoryAllocation.cpp`
+The default options for the Memory Allocation transformation ensure that no
+array allocations, no matter how large, are moved from the stack to the heap.
+
+#### `ConvertExpr.cpp`
+`ConvertExpr.cpp` allocates many array temporaries on the heap:
+  - Passing array arguments by value or when they need re-shaping
+  - Lowering elemental array expressions
+  - Lowering mask expressions
+  - Array constructors
+
+The last two of these cases are **not** covered by the current stack arrays pass
+design.
+
+The FIR code generated for mask expressions (the WHERE construct) sets a
+boolean variable to indicate whether a heap allocation was necessary. The
+allocation is only freed if the variable indicates that the allocation was
+performed to begin with. The proposed dataflow analysis is not intelligent
+enough to statically determine that the boolean variable will always be true
+when the allocation is performed. Beyond this, the control flow in the generated
+FIR code passes the allocated memory through `fir.result`, resulting in a
+different SSA value to be allocated and freed, causing the analysis not to
+realise that the allocated memory is freed. The most convenient solution here
+would be to generate less complicated FIR code, as the existing codegen has
+known bugs: https://github.com/llvm/llvm-project/issues/56921,
+https://github.com/llvm/llvm-project/issues/59803.
+
+Code generated for array constructors uses `realloc()` to grow the allocated
+buffer because the size of the resulting array cannot always be determined
+ahead of running the constructor. This makes this temporary unsuitable
+for allocation on the stack.
+
+#### `IntrinsicCall.cpp`
+The existing design is for the runtime to do the allocation and the lowering
+code to insert `fir.freemem` to remove the allocation. It is not clear whether
+this can be replaced by adapting lowering code to do stack allocations and to
+pass these to the runtime. This would be a significant change and so is out of
+scope of `-fstack-arrays`.
+
+#### `ConvertVariable.cpp`
+See `Fortran::lower::MapSymbolAttributes`:
+
+Dummy arguments that are not declared in the current entry point require a
+skeleton definition. Most such "unused" dummies will not survive into final
+generated code, but some will. It is illegal to reference one at run time if it
+does. There are three ways these dummies can be mapped to a value:
+- a `fir::UndefOp` value: preferable but can't be used for dummies with dynamic
+  bounds or used to define another dummy.
+- A stack slot. This has intermediate-weight cost but may not be usable for
+  objects with dynamic bounds.
+- A heap box/descriptor is the heaviest weight option, only used for dynamic
+  objects.
+
+These heap allocations will be out of scope for `-fstack-arrays` because the
+existing implementation already uses stack allocations (or better) where
+possible, and most such dummy arguments will be removed from generated code.
+
+#### `BufferizeHLFIR.cpp`
+TODO
+
+### Detecting Allocations to Move
+Allocations which could be moved to the stack will be detected by performing a
+forward dense data flow analysis using `mlir::dataflow::DenseDataFlowAnalysis`.
+This analysis will search for SSA values created by a `fir.allocmem` which are
+always freed using `fir.freemem` within the same function.
+
+Tracking allocations by SSA values will miss some cases where address
+calculations are duplicated in different blocks: resulting in different SSA
+values as arguments for the allocation and the free. It is believed that simple
+tracking of SSA values is sufficient for detecting the allocations for array
+temporaries added by Flang, because these temporaries should be simple arrays:
+not nested inside of derived types or other arrays.
+
+Arrays allocated by an `allocate` statement in source code should not be moved
+to the stack. These will be identified by adding an attribute to these
+`fir.allocmem` operations when they are lowered. Intrinsics such as `allocated`
+and `move_alloc` do not need to be supported because the array temporaries moved
+to the stack will never be visible in user code.
+
+Only allocations for arrays will be considered for moving to the stack.
+
+When conducting the dense data flow analysis, the pass will assume that existing
+allocations are not freed inside called functions. This is true for the existing
+heap array temporary allocations generated by Flang. If an allocation were to be
+freed inside of a called function, the allocation would presumably not also be
+freed later in the caller function (as this would be a double-free). Therefore
+the stack arrays pass would not find where the allocation was freed and so not
+transform the allocation. It is necessary to make this assumption so that the
+stack arrays pass can transform array allocations created for pass-by-value
+function arguments.
+
+### Moving Allocations
+The `fir.allocmem` will be replaced by a `fir.alloca` with the same arguments.
+The `fir.freemem` will be removed entirely.
+
+Where possible, the `fir.alloca` should be placed in the function entry block
+(so we can be sure it only happens once). This may not be possible if the array
+has non-constant extents (causing the `fir.alloca` to have SSA values as
+operands). In this case, the `fir.alloca` will be placed immediately after the
+last operand becomes available.
+
+If this location is inside a loop (either an `mlir::LoopLikeOpInterface` or a
+cyclic CFG), the transformation should attempt to use the `llvm.stacksave`/
+`llvm.stackrestore` intrinsics to ensure that the stack does not grow on every
+loop iteration. Use of these intrinsics is considered valid when the allocation
+and deallocation occur within the same block and there are no other stack
+allocations between them. If this is not the case, the existing heap allocation
+will be preserved.
+
+### FIR attribute
+A FIR attribute will be added to distinguish `fir.allocmem` arising from
+`allocate` statements in source from `fir.allocmem` operations  added by Flang.
+The attribute will be called `"fir.must_be_heap"` and will have a boolean value:
+`true` meaning that the stack arrays pass must not move the allocation, `false`
+meaning that stack arrays may move the allocation. Not specifying the attribute
+will be equivalent to setting it to `false`.
+
+## Testing Plan
+FileCheck tests will be written to check each of the above identified sources of
+heap allocated array temporaries are detected and converted by the new pass.
+
+Another test will check that `allocate` statements in source code will not be
+moved to the stack.