forked from luck/tmp_suning_uos_patched
209 lines
8.3 KiB
Plaintext
209 lines
8.3 KiB
Plaintext
|
MEMORY ATTRIBUTE ALIASING ON IA-64
|
||
|
|
||
|
Bjorn Helgaas
|
||
|
<bjorn.helgaas@hp.com>
|
||
|
May 4, 2006
|
||
|
|
||
|
|
||
|
MEMORY ATTRIBUTES
|
||
|
|
||
|
Itanium supports several attributes for virtual memory references.
|
||
|
The attribute is part of the virtual translation, i.e., it is
|
||
|
contained in the TLB entry. The ones of most interest to the Linux
|
||
|
kernel are:
|
||
|
|
||
|
WB Write-back (cacheable)
|
||
|
UC Uncacheable
|
||
|
WC Write-coalescing
|
||
|
|
||
|
System memory typically uses the WB attribute. The UC attribute is
|
||
|
used for memory-mapped I/O devices. The WC attribute is uncacheable
|
||
|
like UC is, but writes may be delayed and combined to increase
|
||
|
performance for things like frame buffers.
|
||
|
|
||
|
The Itanium architecture requires that we avoid accessing the same
|
||
|
page with both a cacheable mapping and an uncacheable mapping[1].
|
||
|
|
||
|
The design of the chipset determines which attributes are supported
|
||
|
on which regions of the address space. For example, some chipsets
|
||
|
support either WB or UC access to main memory, while others support
|
||
|
only WB access.
|
||
|
|
||
|
MEMORY MAP
|
||
|
|
||
|
Platform firmware describes the physical memory map and the
|
||
|
supported attributes for each region. At boot-time, the kernel uses
|
||
|
the EFI GetMemoryMap() interface. ACPI can also describe memory
|
||
|
devices and the attributes they support, but Linux/ia64 currently
|
||
|
doesn't use this information.
|
||
|
|
||
|
The kernel uses the efi_memmap table returned from GetMemoryMap() to
|
||
|
learn the attributes supported by each region of physical address
|
||
|
space. Unfortunately, this table does not completely describe the
|
||
|
address space because some machines omit some or all of the MMIO
|
||
|
regions from the map.
|
||
|
|
||
|
The kernel maintains another table, kern_memmap, which describes the
|
||
|
memory Linux is actually using and the attribute for each region.
|
||
|
This contains only system memory; it does not contain MMIO space.
|
||
|
|
||
|
The kern_memmap table typically contains only a subset of the system
|
||
|
memory described by the efi_memmap. Linux/ia64 can't use all memory
|
||
|
in the system because of constraints imposed by the identity mapping
|
||
|
scheme.
|
||
|
|
||
|
The efi_memmap table is preserved unmodified because the original
|
||
|
boot-time information is required for kexec.
|
||
|
|
||
|
KERNEL IDENTITY MAPPINGS
|
||
|
|
||
|
Linux/ia64 identity mappings are done with large pages, currently
|
||
|
either 16MB or 64MB, referred to as "granules." Cacheable mappings
|
||
|
are speculative[2], so the processor can read any location in the
|
||
|
page at any time, independent of the programmer's intentions. This
|
||
|
means that to avoid attribute aliasing, Linux can create a cacheable
|
||
|
identity mapping only when the entire granule supports cacheable
|
||
|
access.
|
||
|
|
||
|
Therefore, kern_memmap contains only full granule-sized regions that
|
||
|
can referenced safely by an identity mapping.
|
||
|
|
||
|
Uncacheable mappings are not speculative, so the processor will
|
||
|
generate UC accesses only to locations explicitly referenced by
|
||
|
software. This allows UC identity mappings to cover granules that
|
||
|
are only partially populated, or populated with a combination of UC
|
||
|
and WB regions.
|
||
|
|
||
|
USER MAPPINGS
|
||
|
|
||
|
User mappings are typically done with 16K or 64K pages. The smaller
|
||
|
page size allows more flexibility because only 16K or 64K has to be
|
||
|
homogeneous with respect to memory attributes.
|
||
|
|
||
|
POTENTIAL ATTRIBUTE ALIASING CASES
|
||
|
|
||
|
There are several ways the kernel creates new mappings:
|
||
|
|
||
|
mmap of /dev/mem
|
||
|
|
||
|
This uses remap_pfn_range(), which creates user mappings. These
|
||
|
mappings may be either WB or UC. If the region being mapped
|
||
|
happens to be in kern_memmap, meaning that it may also be mapped
|
||
|
by a kernel identity mapping, the user mapping must use the same
|
||
|
attribute as the kernel mapping.
|
||
|
|
||
|
If the region is not in kern_memmap, the user mapping should use
|
||
|
an attribute reported as being supported in the EFI memory map.
|
||
|
|
||
|
Since the EFI memory map does not describe MMIO on some
|
||
|
machines, this should use an uncacheable mapping as a fallback.
|
||
|
|
||
|
mmap of /sys/class/pci_bus/.../legacy_mem
|
||
|
|
||
|
This is very similar to mmap of /dev/mem, except that legacy_mem
|
||
|
only allows mmap of the one megabyte "legacy MMIO" area for a
|
||
|
specific PCI bus. Typically this is the first megabyte of
|
||
|
physical address space, but it may be different on machines with
|
||
|
several VGA devices.
|
||
|
|
||
|
"X" uses this to access VGA frame buffers. Using legacy_mem
|
||
|
rather than /dev/mem allows multiple instances of X to talk to
|
||
|
different VGA cards.
|
||
|
|
||
|
The /dev/mem mmap constraints apply.
|
||
|
|
||
|
However, since this is for mapping legacy MMIO space, WB access
|
||
|
does not make sense. This matters on machines without legacy
|
||
|
VGA support: these machines may have WB memory for the entire
|
||
|
first megabyte (or even the entire first granule).
|
||
|
|
||
|
On these machines, we could mmap legacy_mem as WB, which would
|
||
|
be safe in terms of attribute aliasing, but X has no way of
|
||
|
knowing that it is accessing regular memory, not a frame buffer,
|
||
|
so the kernel should fail the mmap rather than doing it with WB.
|
||
|
|
||
|
read/write of /dev/mem
|
||
|
|
||
|
This uses copy_from_user(), which implicitly uses a kernel
|
||
|
identity mapping. This is obviously safe for things in
|
||
|
kern_memmap.
|
||
|
|
||
|
There may be corner cases of things that are not in kern_memmap,
|
||
|
but could be accessed this way. For example, registers in MMIO
|
||
|
space are not in kern_memmap, but could be accessed with a UC
|
||
|
mapping. This would not cause attribute aliasing. But
|
||
|
registers typically can be accessed only with four-byte or
|
||
|
eight-byte accesses, and the copy_from_user() path doesn't allow
|
||
|
any control over the access size, so this would be dangerous.
|
||
|
|
||
|
ioremap()
|
||
|
|
||
|
This returns a kernel identity mapping for use inside the
|
||
|
kernel.
|
||
|
|
||
|
If the region is in kern_memmap, we should use the attribute
|
||
|
specified there. Otherwise, if the EFI memory map reports that
|
||
|
the entire granule supports WB, we should use that (granules
|
||
|
that are partially reserved or occupied by firmware do not appear
|
||
|
in kern_memmap). Otherwise, we should use a UC mapping.
|
||
|
|
||
|
PAST PROBLEM CASES
|
||
|
|
||
|
mmap of various MMIO regions from /dev/mem by "X" on Intel platforms
|
||
|
|
||
|
The EFI memory map may not report these MMIO regions.
|
||
|
|
||
|
These must be allowed so that X will work. This means that
|
||
|
when the EFI memory map is incomplete, every /dev/mem mmap must
|
||
|
succeed. It may create either WB or UC user mappings, depending
|
||
|
on whether the region is in kern_memmap or the EFI memory map.
|
||
|
|
||
|
mmap of 0x0-0xA0000 /dev/mem by "hwinfo" on HP sx1000 with VGA enabled
|
||
|
|
||
|
See https://bugzilla.novell.com/show_bug.cgi?id=140858.
|
||
|
|
||
|
The EFI memory map reports the following attributes:
|
||
|
0x00000-0x9FFFF WB only
|
||
|
0xA0000-0xBFFFF UC only (VGA frame buffer)
|
||
|
0xC0000-0xFFFFF WB only
|
||
|
|
||
|
This mmap is done with user pages, not kernel identity mappings,
|
||
|
so it is safe to use WB mappings.
|
||
|
|
||
|
The kernel VGA driver may ioremap the VGA frame buffer at 0xA0000,
|
||
|
which will use a granule-sized UC mapping covering 0-0xFFFFF. This
|
||
|
granule covers some WB-only memory, but since UC is non-speculative,
|
||
|
the processor will never generate an uncacheable reference to the
|
||
|
WB-only areas unless the driver explicitly touches them.
|
||
|
|
||
|
mmap of 0x0-0xFFFFF legacy_mem by "X"
|
||
|
|
||
|
If the EFI memory map reports this entire range as WB, there
|
||
|
is no VGA MMIO hole, and the mmap should fail or be done with
|
||
|
a WB mapping.
|
||
|
|
||
|
There's no easy way for X to determine whether the 0xA0000-0xBFFFF
|
||
|
region is a frame buffer or just memory, so I think it's best to
|
||
|
just fail this mmap request rather than using a WB mapping. As
|
||
|
far as I know, there's no need to map legacy_mem with WB
|
||
|
mappings.
|
||
|
|
||
|
Otherwise, a UC mapping of the entire region is probably safe.
|
||
|
The VGA hole means the region will not be in kern_memmap. The
|
||
|
HP sx1000 chipset doesn't support UC access to the memory surrounding
|
||
|
the VGA hole, but X doesn't need that area anyway and should not
|
||
|
reference it.
|
||
|
|
||
|
mmap of 0xA0000-0xBFFFF legacy_mem by "X" on HP sx1000 with VGA disabled
|
||
|
|
||
|
The EFI memory map reports the following attributes:
|
||
|
0x00000-0xFFFFF WB only (no VGA MMIO hole)
|
||
|
|
||
|
This is a special case of the previous case, and the mmap should
|
||
|
fail for the same reason as above.
|
||
|
|
||
|
NOTES
|
||
|
|
||
|
[1] SDM rev 2.2, vol 2, sec 4.4.1.
|
||
|
[2] SDM rev 2.2, vol 2, sec 4.4.6.
|