wayland/spec/main.tex
Tiago Vignatti 3cce6be20d spec: fix typo
Signed-off-by: Tiago Vignatti <tiago.vignatti@intel.com>
2012-02-09 09:42:13 -05:00

661 lines
25 KiB
TeX
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

\documentclass{article}
\usepackage{palatino}
\usepackage{graphicx}
\author{Kristian Høgsberg\\
\texttt{krh@bitplanet.net}
}
\title{The Wayland Compositing System}
\begin{document}
\maketitle
\section{Wayland Overview}
\begin{itemize}
\item wayland is a protocol for a new display server.
\item weston is the open source project implementing a wayland based compositor
\end{itemize}
\subsection{Replacing X11}
In Linux and other Unix-like systems, the X stack has grown to
encompass functionality arguably belonging in client libraries,
helper libraries, or the host operating system kernel. Support for
things like PCI resource management, display configuration management,
direct rendering, and memory management has been integrated into the X
stack, imposing limitations like limited support for standalone
applications, duplication in other projects (e.g. the Linux fb layer
or the DirectFB project), and high levels of complexity for systems
combining multiple elements (for example radeon memory map handling
between the fb driver and X driver, or VT switching).
Moreover, X has grown to incorporate modern features like offscreen
rendering and scene composition, but subject to the limitations of the
X architecture. For example, the X implementation of composition adds
additional context switches and makes things like input redirection
difficult.
\begin{figure}
\begin{center}
\includegraphics[width=70mm]{x-architecture.png}
\caption{\small \sl X with a compositing manager.\label{fig:X architecture}}
\end{center}
\end{figure}
The diagram above illustrates the central role of the X server and
compositor in operations, and the steps required to get contents on to
the screen.
Over time, X developers came to understand the shortcomings of this
approach and worked to split things up. Over the past several years,
a lot of functionality has moved out of the X server and into
client-side libraries or kernel drivers. One of the first components
to move out was font rendering, with freetype and fontconfig providing
an alternative to the core X fonts. Direct rendering OpenGL as a
graphics driver in a client side library went through some iterations,
ending up as DRI2, which abstracted most of the direct rendering
buffer management from client code. Then cairo came along and provided
a modern 2D rendering library independent of X, and compositing
managers took over control of the rendering of the desktop as toolkits
like GTK+ and Qt moved away from using X APIs for rendering. Recently,
memory and display management have moved to the Linux kernel, further
reducing the scope of X and its driver stack. The end result is a
highly modular graphics stack.
\subsection{Make the compositing manager the display server}
Wayland is a new display server and compositing protocol, and Weston
is the implementation of this protocol which builds on top of all the
components above. We are trying to distill out the functionality in
the X server that is still used by the modern Linux desktop. This
turns out to be not a whole lot. Applications can allocate their own
off-screen buffers and render their window contents directly, using
hardware accelerated libraries like libGL, or high quality software
implementations like those found in Cairo. In the end, whats needed
is a way to present the resulting window surface for display, and a
way to receive and arbitrate input among multiple clients. This is
what Wayland provides, by piecing together the components already in
the eco-system in a slightly different way.
X will always be relevant, in the same way Fortran compilers and VRML
browsers are, but its time that we think about moving it out of the
critical path and provide it as an optional component for legacy
applications.
Overall, the philosophy of Wayland is to provide clients with a way to
manage windows and how their contents is displayed. Rendering is left
to clients, and system wide memory management interfaces are used to
pass buffer handles between clients and the compositing manager.
\begin{figure}
\begin{center}
\includegraphics[width=50mm]{wayland-architecture.png}
\caption{\small \sl The Wayland system\label{fig:Wayland architecture}}
\end{center}
\end{figure}
The figure above illustrates how Wayland clients interact with a
Wayland server. Note that window management and composition are
handled entirely in the server, significantly reducing complexity
while marginally improving performance through reduced context
switching. The resulting system is easier to build and extend than a
similar X system, because often changes need only be made in one
place. Or in the case of protocol extensions, two (rather than 3 or 4
in the X case where window management and/or composition handling may
also need to be updated).
\section{Wayland protocol}
\subsection{Basic Principles}
The wayland protocol is an asynchronous object oriented protocol. All
requests are method invocations on some object. The request include
an object id that uniquely identifies an object on the server. Each
object implements an interface and the requests include an opcode that
identifies which method in the interface to invoke.
The server sends back events to the client, each event is emitted from
an object. Events can be error conditions. The event includes the
object id and the event opcode, from which the client can determine
the type of event. Events are generated both in response to requests
(in which case the request and the event constitutes a round trip) or
spontaneously when the server state changes.
\begin{itemize}
\item state is broadcast on connect, events are sent out when state
changes. clients must listen for these changes and cache the state.
there is no need (or mechanism) to query server state.
\item the server will broadcast the presence of a number of global objects,
which in turn will broadcast their current state.
\end{itemize}
\subsection{Code generation}
The interfaces, requests and events are defined in protocol/wayland.xml.
This xml is used to generate the function prototypes that can be used by
clients and compositors.
The protocol entry points are generated as inline functions which just
wrap the \verb:wl_proxy_*: functions. The inline functions aren't
part of the library ABI and language bindings should generate their
own stubs for the protocol entry points from the xml.
\subsection{Wire format}
The protocol is sent over a UNIX domain stream socket. Currently, the
endpoint is named \texttt{\textbackslash0wayland}, but it is subject
to change. The protocol is message-based. A message sent by a client
to the server is called \texttt{request}. A message from the server
to a client is called \texttt{event}. Every message is structured as
32-bit words, values are represented in the host's byte-order.
The message header has 2 words in it:
\begin{itemize}
\item The first word is the sender's object id (32-bit).
\item The second has 2 parts of 16-bit. The upper 16-bits are the message
size in bytes, starting at the header (i.e. it has a minimum value of 8).
The lower is the request/event opcode.
\end{itemize}
The payload describes the request/event arguments. Every argument is always
aligned to 32-bits. There is no prefix that describes the type, but it is
inferred implicitly from the xml specification.
The representation of argument types are as follows:
\begin{itemize}
\item "int" or "uint": The value is the 32-bit value of the signed/unsigned
int.
\item "string": Starts with an unsigned 32-bit length, followed by the
string contents, including terminating NUL byte, then padding to a
32-bit boundary.
\item "object": A 32-bit object ID.
\item "new\_id": the 32-bit object ID. On requests, the client
decides the ID. The only events with "new\_id" are advertisements of
globals, and the server will use IDs below 0x10000.
\item "array": Starts with 32-bit array size in bytes, followed by the array
contents verbatim, and finally padding to a 32-bit boundary.
\item "fd": the file descriptor is not stored in the message buffer, but in
the ancillary data of the UNIX domain socket message (msg\_control).
\end{itemize}
\subsection{Interfaces}
The protocol includes several interfaces which are used for
interacting with the server. Each interface provides requests,
events, and errors (which are really just special events) as described
above. Specific compositor implementations may have their own
interfaces provided as extensions, but there are several which are
always expected to be present.
Core interfaces:
\begin{itemize}
\item wl_display: provides global functionality like objecting binding and fatal error events
\item wl_callback: callback interface for dnoe events
\item wl_compositor: core compositor interface, allows surface creation
\item wl_shm: buffer management interface with buffer creation and format handling
\item wl_buffer: buffer handling interface for indicating damage and object destruction, also provides buffer release events from the server
\item wl_data_offer: for accepting and receiving specific mime types
\item wl_data_source: for offering specific mime types
\item wl_data_Device: lets clients manage drag & drop, provides pointer enter/leave events and motion
\item wl_data_device_manager: for managing data sources and devices
\item wl_shell: shell surface handling
\item wl_shell_surface: shell surface handling and desktop-like events (e.g. set a surface to fullscreen, display a popup, etc.)
\item wl_surface: surface management (destruction, damage, buffer attach, frame handling)
\item wl_input_device: cursor setting, motion, button, and key events, etc.
\item wl_output: events describing an attached output (subpixel orientation, current mode & geometry, etc.)
\end{itemize}
\subsection{Connect Time}
\begin{itemize}
\item no fixed format connect block, the server emits a bunch of
events at connect time
\item presence events for global objects: output, compositor, input
devices
\end{itemize}
\subsection{Security and Authentication}
\begin{itemize}
\item mostly about access to underlying buffers, need new drm auth
mechanism (the grant-to ioctl idea), need to check the cmd stream?
\item getting the server socket depends on the compositor type, could
be a system wide name, through fd passing on the session dbus. or
the client is forked by the compositor and the fd is already opened.
\end{itemize}
\subsection{Creating Objects}
\begin{itemize}
\item client allocates object ID, uses range protocol
\item server tracks how many IDs are left in current range, sends new
range when client is about to run out.
\end{itemize}
\subsection{Compositor}
The compositor is a global object, advertised at connect time.
\begin{tabular}{l}
\hline
Interface \texttt{compositor} \\ \hline
Requests \\ \hline
\texttt{create\_surface(id)} \\
\texttt{commit()} \\ \hline
Events \\ \hline
\texttt{device(device)} \\
\texttt{acknowledge(key, frame)} \\
\texttt{frame(frame, time)} \\ \hline
\end{tabular}
\begin{itemize}
\item a global object
\item broadcasts drm file name, or at least a string like drm:/dev/dri/card0
\item commit/ack/frame protocol
\end{itemize}
\subsection{Surface}
Created by the client.
\begin{tabular}{l}
\hline
Interface \texttt{surface} \\ \hline
Requests \\ \hline
\texttt{destroy()} \\
\texttt{attach()} \\
\texttt{map()} \\
\texttt{damage()} \\ \hline
Events \\ \hline
no events \\ \hline
\end{tabular}
Needs a way to set input region, opaque region.
\subsection{Input}
Represents a group of input devices, including mice, keyboards. Has a
keyboard and pointer focus. Global object. Pointer events are
delivered in both screen coordinates and surface local coordinates.
\begin{tabular}{l}
\hline
Interface \texttt{cache} \\ \hline
Requests \\ \hline
\texttt{attach(buffer, x, y)} \\
Events \\ \hline
\texttt{motion(x, y, sx, sy)} \\
\texttt{button(button, state, x, y, sx, sy)} \\
\texttt{key(key, state)} \\
\texttt{pointer\_focus(surface)} \\
\texttt{keyboard\_focus(surface, keys)} \\ \hline
\end{tabular}
Talk about:
\begin{itemize}
\item keyboard map, change events
\item xkb on wayland
\item multi pointer wayland
\end{itemize}
A surface can change the pointer image when the surface is the pointer
focus of the input device. Wayland doesn't automatically change the
pointer image when a pointer enters a surface, but expects the
application to set the cursor it wants in response the pointer
focus and motion events. The rationale is that a client has to manage
changing pointer images for UI elements within the surface in response
to motion events anyway, so we'll make that the only mechanism for
setting changing the pointer image. If the server receives a request
to set the pointer image after the surface loses pointer focus, the
request is ignored. To the client this will look like it successfully
set the pointer image.
The compositor will revert the pointer image back to a default image
when no surface has the pointer focus for that device. Clients can
revert the pointer image back to the default image by setting a NULL
image.
What if the pointer moves from one window which has set a special
pointer image to a surface that doesn't set an image in response to
the motion event? The new surface will be stuck with the special
pointer image. We can't just revert the pointer image on leaving a
surface, since if we immediately enter a surface that sets a different
image, the image will flicker. Broken app, I suppose.
\subsection{Output}
A output is a global object, advertised at connect time or as they
come and go.
\begin{tabular}{l}
\hline
Interface \texttt{output} \\ \hline
Requests \\ \hline
no requests \\ \hline
Events \\ \hline
\texttt{geometry(width, height)} \\ \hline
\end{tabular}
\begin{itemize}
\item laid out in a big (compositor) coordinate system
\item basically xrandr over wayland
\item geometry needs position in compositor coordinate system\
\item events to advertise available modes, requests to move and change
modes
\end{itemize}
\subsection{Shared object cache}
Cache for sharing glyphs, icons, cursors across clients. Lets clients
share identical objects. The cache is a global object, advertised at
connect time.
\begin{tabular}{l}
\hline
Interface \texttt{cache} \\ \hline
Requests \\ \hline
\texttt{upload(key, visual, bo, stride, width, height)} \\ \hline
Events \\ \hline
\texttt{item(key, bo, x, y, stride)} \\
\texttt{retire(bo)} \\ \hline
\end{tabular}
\begin{itemize}
\item Upload by passing a visual, bo, stride, width, height to the
cache.
\item Upload returns a bo name, stride, and x, y location of object in
the buffer. Clients take a reference on the atlas bo.
\item Shared objects are refcounted, freed by client (when purging
glyphs from the local cache) or when a client exits.
\item Server can't delete individual items from an atlas, but it can
throw out an entire atlas bo if it becomes too sparse. The server
sends out an \texttt{retire} event when this happens, and clients
must throw away any objects from that bo and reupload. Between the
server dropping the atlas and the client receiving the retire event,
clients can still legally use the old atlas since they have a ref on
the bo.
\item cairo needs to hook into the glyph cache, and maybe also a way
to create a read-only surface based on an object form the cache
(icons).
\texttt{cairo\_wayland\_create\_cached\_surface(surface-data)}.
\end{itemize}
\subsection{Drag and Drop}
Multi-device aware. Orthogonal to rest of wayland, as it is its own
toplevel object. Since the compositor determines the drag target, it
works with transformed surfaces (dragging to a scaled down window in
expose mode, for example).
Issues:
\begin{itemize}
\item we can set the cursor image to the current cursor + dragged
object, which will last as long as the drag, but maybe an request to
attach an image to the cursor will be more convenient?
\item Should drag.send() destroy the object? There's nothing to do
after the data has been transferred.
\item How do we marshal several mime-types? We could make the drag
setup a multi-step operation: dnd.create, drag.offer(mime-type1),
drag.offer(mime-type2), drag.activate(). The drag object could send
multiple offer events on each motion event. Or we could just
implement an array type, but that's a pain to work with.
\item Middle-click drag to pop up menu? Ctrl/Shift/Alt drag?
\item Send a file descriptor over the protocol to let initiator and
source exchange data out of band?
\item Action? Specify action when creating the drag object? Ask
action?
\end{itemize}
New objects, requests and events:
\begin{itemize}
\item New toplevel dnd global. One method, creates a drag object:
\texttt{dnd.start(new object id, surface, input device, mime
types)}. Starts drag for the device, if it's grabbed by the
surface. drag ends when button is released. Caller is responsible
for destroying the drag object.
\item Drag object methods:
\texttt{drag.destroy(id)}, destroy drag object.
\texttt{drag.send(id, data)}, send drag data.
\texttt{drag.accept(id, mime type)}, accept drag offer, called by
target surface.
\item Drag object events:
\texttt{drag.offer(id, mime-types)}, sent to potential destination
surfaces to offer drag data. If the device leaves the window or the
originator cancels the drag, this event is sent with mime-types =
NULL.
\texttt{drag.target(id, mime-type)}, sent to drag originator when a
target surface has accepted the offer. if a previous target goes
away, this event is sent with mime-type = NULL.
\texttt{drag.data(id, data)}, sent to target, contains dragged data.
ends transaction on the target side.
\end{itemize}
Sequence of events:
\begin{itemize}
\item The initiator surface receives a click (which grabs the input
device to that surface) and then enough motion to decide that a drag
is starting. Wayland has no subwindows, so it's entirely up to the
application to decide whether or not a draggable object within the
surface was clicked.
\item The initiator creates a drag object by calling the
\texttt{create\_drag} method on the dnd global object. As for any
client created object, the client allocates the id. The
\texttt{create\_drag} method also takes the originating surface, the
device that's dragging and the mime-types supported. If the surface
has indeed grabbed the device passed in, the server will create an
active drag object for the device. If the grab was released in the
meantime, the drag object will be in-active, that is, the same state
as when the grab is released. In that case, the client will receive
a button up event, which will let it know that the drag finished.
To the client it will look like the drag was immediately cancelled
by the grab ending.
The special mime-type application/x-root-target indicates that the
initiator is looking for drag events to the root window as well.
\item To indicate the object being dragged, the initiator can replace
the pointer image with an larger image representing the data being
dragged with the cursor image overlaid. The pointer image will
remain in place as long as the grab is in effect, since the
initiating surface keeps pointer focus, and no other surface
receives enter events.
\item As long as the grab is active (or until the initiator cancels
the drag by destroying the drag object), the drag object will send
\texttt{offer} events to surfaces it moves across. As for motion
events, these events contain the surface local coordinates of the
device as well as the list of mime-types offered. When a device
leaves a surface, it will send an \texttt{offer} event with an empty
list of mime-types to indicate that the device left the surface.
\item If a surface receives an offer event and decides that it's in an
area that can accept a drag event, it should call the
\texttt{accept} method on the drag object in the event. The surface
passes a mime-type in the request, picked from the list in the offer
event, to indicate which of the types it wants. At this point, the
surface can update the appearance of the drop target to give
feedback to the user that the drag has a valid target. If the
\texttt{offer} event moves to a different drop target (the surface
decides the offer coordinates is outside the drop target) or leaves
the surface (the offer event has an empty list of mime-types) it
should revert the appearance of the drop target to the inactive
state. A surface can also decide to retract its drop target (if the
drop target disappears or moves, for example), by calling the accept
method with a NULL mime-type.
\item When a target surface sends an \texttt{accept} request, the drag
object will send a \texttt{target} event to the initiator surface.
This tells the initiator that the drag currently has a potential
target and which of the offered mime-types the target wants. The
initiator can change the pointer image or drag source appearance to
reflect this new state. If the target surface retracts its drop
target of if the surface disappears, a \texttt{target} event with a
NULL mime-type will be sent.
If the initiator listed application/x-root-target as a valid
mime-type, dragging into the root window will make the drag object
send a \texttt{target} event with the application/x-root-target
mime-type.
\item When the grab is released (indicated by the button release
event), if the drag has an active target, the initiator calls the
\texttt{send} method on the drag object to send the data to be
transferred by the drag operation, in the format requested by the
target. The initiator can then destroy the drag object by calling
the \texttt{destroy} method.
\item The drop target receives a \texttt{data} event from the drag
object with the requested data.
\end{itemize}
MIME is defined in RFC's 2045-2049. A registry of MIME types is
maintained by the Internet Assigned Numbers Authority (IANA).
ftp://ftp.isi.edu/in-notes/iana/assignments/media-types/
\section{Types of compositors}
\subsection{System Compositor}
\begin{itemize}
\item ties in with graphical boot
\item hosts different types of session compositors
\item lets us switch between multiple sessions (fast user switching,
secure/personal desktop switching)
\item multiseat
\item linux implementation using libudev, egl, kms, evdev, cairo
\item for fullscreen clients, the system compositor can reprogram the
video scanout address to source from the client provided buffer.
\end{itemize}
\subsection{Session Compositor}
\begin{itemize}
\item nested under the system compositor. nesting is feasible because
protocol is async, roundtrip would break nesting
\item gnome-shell
\item moblin
\item compiz?
\item kde compositor?
\item text mode using vte
\item rdp session
\item fullscreen X session under wayland
\item can run without system compositor, on the hw where it makes
sense
\item root window less X server, bridging X windows into a wayland
session compositor
\end{itemize}
\subsection{Embbedding Compositor}
X11 lets clients embed windows from other clients, or lets client copy
pixmap contents rendered by another client into their window. This is
often used for applets in a panel, browser plugins and similar.
Wayland doesn't directly allow this, but clients can communicate GEM
buffer names out-of-band, for example, using d-bus or as command line
arguments when the panel launches the applet. Another option is to
use a nested wayland instance. For this, the wayland server will have
to be a library that the host application links to. The host
application will then pass the wayland server socket name to the
embedded application, and will need to implement the wayland
compositor interface. The host application composites the client
surfaces as part of it's window, that is, in the web page or in the
panel. The benefit of nesting the wayland server is that it provides
the requests the embedded client needs to inform the host about buffer
updates and a mechanism for forwarding input events from the host
application.
\begin{itemize}
\item firefox embedding flash by being a special purpose compositor to
the plugin
\end{itemize}
\section{Implementation}
what's currently implemented
\subsection{Wayland Server Library}
\texttt{libwayland-server.so}
\begin{itemize}
\item implements protocol side of a compositor
\item minimal, doesn't include any rendering or input device handling
\item helpers for running on egl and evdev, and for nested wayland
\end{itemize}
\subsection{Wayland Client Library}
\texttt{libwayland.so}
\begin{itemize}
\item minimal, designed to support integration with real toolkits such as
Qt, GTK+ or Clutter.
\item doesn't cache state, but lets the toolkits cache server state in
native objects (GObject or QObject or whatever).
\end{itemize}
\subsection{Wayland System Compositor}
\begin{itemize}
\item implementation of the system compositor
\item uses libudev, eagle (egl), evdev and drm
\item integrates with ConsoleKit, can create new sessions
\item allows multi seat setups
\item configurable through udev rules and maybe /etc/wayland.d type thing
\end{itemize}
\subsection{X Server Session}
\begin{itemize}
\item xserver module and driver support
\item uses wayland client library
\item same X.org server as we normally run, the front buffer is a wayland
surface but all accel code, 3d and extensions are there
\item when full screen the session compositor will scan out from the X
server wayland surface, at which point X is running pretty much as it
does natively.
\end{itemize}
\end{document}