理解移动平台浏览器的硬件加速

Understanding Hardware Acceleration on Mobile Browsers这篇文章介绍了移动平台浏览器的2D绘图加速的现状和一些相关的概念,基本上也适用于GUI框架本身。

  • Accelerating Primitive Drawing 基本图元绘制加速

在iOS上的2D绘图引擎CoreGraphics一直是通过GPU进行基本图元绘制,而Android使用的Skia绘图引擎在3.0之前都只是通过CPU进行基本图元的绘制,据说Android团队这样选择的原因在于:1)Skia引擎的效率很高,通过CPU绘制性能跟GPU绘制基本相当(当时移动平台的GPU性能可能比较一般) 2)Android本身不像iOS是绑定特定硬件,不能直接依赖于专有硬件的实现;不过在Android 3.0后,Skia已经实现通过使用Open ES GL作为backend,把大部分的基本图元绘制都通过GPU来进行,毕竟第1:GPU的硬件已经飞速发展;第2:GPU绘图比CPU更省电;第3:可以同时使用GPU和CPU进行并行绘图(通过RenderScript)

  • (Tiled)Backing Store (分块)后台离屏位图

这个可以看作是后面的Layer的一个特例,主要针对较大的视图如网页的绘制,通过使用分块离屏位图和多线程后台绘制的技巧,在用户对网页进行滚动或者缩放时,一边使用当前的离屏位图块即时响应用户的操作,一边在后台绘制需要的新的离屏位图块,如果用户操作速度太快,后台更新跟不上,就有可能看到还未更新的空白区域,像iOS上Safari,如果快速滚动一个很大的网页,就会看到后面都是空白的区域。

Tiled Backing Store在CoreGraphics已经有相应的API提供支持,Safari上也应用了这项技术。Android上貌似没有这样的API,Chrome Lite也未应用上。

  • Layer and Compositing

将上面的Backing Store的概念扩展应用于一颗渲染树(网页)或者UI组件树(窗口系统)的每一个节点,Layer可以看作是这些节点的离屏缓冲,通过使用Layer,可以在大部分情况下减少对这个节点或者这个节点以下的整个分支的实际绘制(只需要使用现成的Layer进行Compositing),特别对这个节点(及其所在的分支)进行启动一个几何变换动画时(位移,拉伸,旋转…)。因为Layer会耗费内存,所以每个节点都开启是不可能的,实际上也是需要时才开启特定节点的Layer(启动动画),不需要时则将其关闭(动画结束)。

iOS的GUI框架已经完全支持Layer,它的动画模块CoreAnimation也充分利用Layer来获得较高的动画帧数,而在Android上也是直到3.0才提供相应的支持。

从上面的说明可以获得一些结论:

  1. Android平台的2D绘图技术还是落后于iOS平台,3.0以前的纯CPU的实现也带来性能问题和耗电问题
  2. Android3.0在2D绘图上有巨大的飞跃,正在逐步接近甚至部分赶超iOS平台

 

Understanding Hardware Acceleration on Mobile Browsers

There has been a lot of mentions of the use of GPU (graphics processing unit) hardware acceleration in smartphone and tablet web browsers. So far, the content has been pretty general and hasn’t provided much technical direction apart from simple advice such as “use CSS translate3d”. This blog article tries to shed some more light on browser interactions with the GPU and explain what happens behind the scenes.

Accelerating Primitive Drawing

A web rendering engine, such as WebKit, takes a web page that is described structurally using HTML and a DOM and visually using CSS and transforms it into a series of painting commands and then passes these commands to the graphics stack. In WebKit specifically, WebKit talks to an abstract interface called GraphicsContext. There are different implementations of GraphicsContext depending on the underlying platform. For example, on iOS the GraphicsContext is bound to CoreGraphics. On Android, GraphicContext uses the Skia graphics engine.

A major responsibility of the graphics stack is rasterization: converting vector painting commands into color pixels on a screen. Rasterization also applies to text display. A single letter can consist of a chain of hundreds of curves. Rasterization produces a matrix of pixels of varying colors that gives users the impression of smoothly drawn text. The following picture shows the enlarged portion of a letter displayed on the screen:

Graphic showing letter 'A' and the zoomed in pixels thereof

The most common mobile graphics API is OpenGL for Embedded System, shortened as OpenGL ES, which operates quite similarly to its desktop OpenGL counterpart. A modern GPU has the power to carry out a lot of primitive drawing, from textured triangles to anti-aliased polygons, with massively paralleled implementation of various graphics algorithm. This is, of course, evidenced by a lot of graphics-intensive games which run smoothly – often achieving the ultimate goal of 60 fps on even highly complex scenes.

If you’re building a browser, it makes sense to reduce the burden of the CPU and to delegate most of the primitive drawing (such as images, curves, gradients, and so on) to the GPU. This is one way that the GPU accelerates performance and it is very often taken care of automatically by the graphics stack. On iOS, CoreGraphics leverages many different GPU features for difficult drawing operations, leveraging its Mac OS X experience. On Android (since Honeycomb), Skia also has a full-featured OpenGL ES back-end which fits nicely with NVIDIA Tegra2 GPUs.

Backing Store

It is important to note here that GPUs were originally designed to tackle heavy-duty operations needed in engineering applications (CAD/CAM) and graphics-intensive games. But optimizing the primitive drawing typically found in a web page is very different than making game graphics fast. For a start, most web pages consist of a lot of text and an occasional image. Most web page user-interface elements have solid colors, with some gradients and rounded corners here and there. In contrast, top-selling games like Angry Birds, Need for Speed, and Quake hardly contain any text and almost everything in the game world is an object with a texture. In addition, 3-D models with photo-realistic appearances are pretty common in such games.

Since the GPU is optimized for complex use-cases, it does not always come as a surprise that simply asking a GPU to draw images, curves, text glyphs, and other content, does not magically translate into a fluid 60 frames/second for web page rendering. In addition, unlike games, web content can’t be predicted by the browser. A web page can be as simple as a Bing search page or as complicated as the New York Times front page. To achieve a really smooth browsing experience, user interactions with the browser should not be limited by the complexity of the page. In other words, even if the browser is busy loading images and rendering the page, the user should still be able to scroll around and zoom in/out as she wants.

Modern mobile browsers adopt an off-screen buffer approach to decouple the complexity of displaying a web page from the user interaction. Usually the web rendering engine, (WebKit for example), draws into the buffer instead of straight to the display. This buffer, often called the backing store, will be shown on screen based on user activity. When the web page is quite complicated and the user scrolls and zooms quickly, the backing store is often not filled fast enough. This is the reason why on an iPhone or iPad, the checkerboard pattern is visible; it serves as a placeholder for the region of the buffer that is not fully rendered yet. This way, the web page can be scrolled around or zoomed in/out as fast as the user wants. The rendering process (which fills the backing store) may lag user interactions but since it is in a separate thread, it does not block any user interactions occurring in the main UI thread.

Another side effect of using a backing store is progressive rendering when the user zooms in and out. A backing store is nothing but a rectangle with a texture. For efficiency, the backing store is usually tiled, i.e. it comprises several small textured rectangles instead of a giant one. During pinching, all the browser does is scale the backing store up and down, thus giving an enlarged but blurry version of the web page. Since pinching typically happens in a few hundred milliseconds, there is no use of faithful high-resolution rendering. Once the user is done with pinching, or when there is an idle moment, the backing store is updated with the correct resolution web page rendering.

Graphic showing blurry SVG vs sharp SVG, side by side

One of the disadvantages of using a backing store per page (regardless whether it is tiled or not) is that is causes difficulty in implementing support for overflow:scroll and position:fixed. The main reason is that the panning and zooming actions from the user modify only the transformation matrix of the backing store, but do not update the backing store. For these two CSS features to work, the handling of the backing store has to be improved to account for content movement within the display.

Layer and Compositing

For web applications which have more dynamic content, including for example CSS animations, having a static off-screen buffer does not really help. However, the same backing store concept can be extended further. Instead of one giant backing store for the entire page, we can have multiple smaller backing stores, each associated with an animated element.

Take for example the famous falling leaves demo from WebKit. This demo really shows how creating backing stores at a more granular level can improve the frame rate. Rather than drawing the leaves (with different rotation and position) for each animation step, WebKit creates a small layer for each leaf, sends those layers to the GPU once, and performs the animation by varying the transformation matrix and opacity of every layer (and thus also the corresponding leaf thereof). Effectively, this creates a really smooth animation because (1) the CPU does not need to do anything beside the initial animation setup and (2) the GPU is only responsible for compositing different layers during the entire animation process. As evidenced from 60 fps performance of many graphics-intensive mobile games, compositing such a rather simple collection of layers is a piece of cake for modern GPU nowadays.

Graphic outlining the layers in the falling leaves demo

The best practice of setting the CSS transformation matrix to translate3d or scale3d (even though there is no 3-D involved) comes from the fact that those types of matrix will switch the animated element to have its own layer which will then be composited together with the rest of the web page and other layers. But you should note that creating and compositing layers come with a price, namely memory allocation. It is not wise to blindly composite every little element in the web page for the sake of hardware acceleration, you’ll eat memory.

Conclusion

In short, making a web browser take advantage of GPU hardware acceleration is far from trivial. It involves making lots of changes at multiple levels, from primitive drawing acceleration, to textured backing store, and layer compositing. But the best possible performance can be achieved when all of these work in harmony.