Monday, February 28, 2011

Drawing Sprites: Canvas 2D vs. WebGL

Lately I've seen a lot of graphics benchmarks that basically just test image blitting/sprite performance. These include Flying Images, FishIE, Speed Reading and JSGameBench(Update: I just saw the blog post for the WebGL JSGameBench. This further confirms my claim that WebGL is a better way to do sprites). They all try to draw a bunch of images in a short amount of time. They mostly use two techniques: positioned images or canvas' drawImage. Neither of these methods is particularly well suited to this task. Positioned images have typically been used for document layout and the Canvas 2D API was designed as a JavaScript binding to CoreGraphics which owes most of its design to Postscript. Neither were designed for high performance interactive graphics. However, OpenGL, and its web counterpart WebGL, was designed for exactly this.

To show off some of the potential performance difference available, I ported the FishIE benchmark to WebGL. Along the way I discovered some different problems and ways to solve them.

The problem, once the overhead of Canvas 2D is removed, is that FishIE very quickly becomes texture read bound. I noticed that the FishIE sprites have a lot of horizontal padding. This padding was included in the drawImage calls which causes us to do a bunch of texture reads for transparent pixels. Trimming this down a little gave a noticeable framerate boost.

An even bigger cause of texture bandwidth waste is that the demo uses a large sprite to draw a small fish. Fortunately, OpenGL has a great solution to this problem: mipmaps. without mipmapsMipmaps let the GPU use smaller textures when drawing smaller fish, which can dramatically reduce the texture bandwidth required. They also improve the quality of small fish by eliminating the aliasing that occurs when downscaling by large amounts.

Mipmapping is a good example of the flexibility that WebGL allows. Canvas 2D aims to be an easy to use API for drawing pictures, but this ease of use comes at some cost. First, the Canvas 2D implementation has to guess the intents of the author. For example drawImage on OS X does a high quality lanczos down scaling of the image. Direct2D just does a quick bilinear down scale. This makes it difficult for authors to know how fast drawImage will be. Further, because the design of Canvas 2D is inspired by an API for describing print jobs, it's not well suited to reusing data between paints.

Try out the difference with these two modified versions of FishIE:
  1. The original FishIE modified only to allow more fish.
  2. FishIE ported to WebGL.
The method I used to port FishIE to WebGL is pretty straight forward so I expect that any of the other benchmarks listed above could also be easily ported to WebGL.

Pushing the limits

Once the number of fish becomes high enough we run into Javascript performance problems. FishIE has some Javascript problems that make things worse than they need to be. First, it loops over the fish with "for (var fishie in fish) {". This can end up using 10% of the total CPU time. The problem with this code is that converts all of the array indices to strings and then uses those strings to index into the array. It also has the problem that any additional properties added to the array will also show up as index values, which is likely not the intent of the author.

Second, each fish object includes a swim() method. Unfortunately, in the FishIE source swim() is a closure inside the Fish() object. This means that the swim() method is different for each Fish which makes things worse for Javascript engines.

Fixing both of these problems, and making the fish really small lets us get an idea of how many sprites we can actually push around. Here's a final version. If I disable the method jit (bug 637878) and run at an even window size (bug 637894) I can do 60000 fish at 30fps, which I think is pretty impressive compared to the 1000 that the original Microsoft demo does.

18 comments:

Edward Lee said...

Are there bugs filed for getting the same/similar performance without needing to transform "for each" -> "for .. length" and closure -> object properties?

Did you look into the closure transformation to see if the slowdown was because swim appears as multiple functions or if because it was referencing a variable in the containing closure (vs on an object).

Jeff Muizelaar said...

Edward Lee: The bug for "for .. length" is 505818.

I haven't filed a bug about the closure -> object properties and didn't look into what was actually causing the slow down.

Edward Lee said...

This seems to result in similar FPS for me, but it's already ~20fps at 10k fishies for me:

- }
-
- Fish.prototype = { swim: function swim() {
+ this.swim = function() {

Marco said...

Hm, I see black squares instead of the fishes, I guess my old nVidia 7900Gs cannot do that?

Boris said...

Edward, there's no really good way to optimize the closure thing, because those are all different swim methods per spec. They could behave totally differently, so you either have to do some very complicated analysis to figure out whether they do or compile them separately and take a large performance hit.

Jeff Muizelaar said...

Marco, the demo uses a very large texture. It is probably larger than your maximum texture size.

Neil Rashbrook said...

Of course, the canvas version has the advantage of working on more graphics adapters, including RDP and VNC.

Marco said...

Jeff, how much large is the texture? I can play most of the recent games on this GPU, that use pretty large textures, so I'd be curious to check the size against the GPU datasheet.

Corban Brook said...

This might be a naive question, but would it be possible to add automatic mipmapping. Perhaps drawImage already does this type of caching? For instance when drawImage is called with the optional scaling params it caches a scaled version for fast recall.

Jeff Muizelaar said...

Neil: WebGL should work over VNC and can work over RDP

Marco: The texture is 8192x1024. It's possible to pack it in tighter but it was easiest to just make it large.

Corbin: Mipmapping is more a property of a texture than it is a draw call. A better idea might be to have a attribute on images that specifies whether they have mipmaps or not.

Marco said...

Yep, looks like the maximum texture size of my card is 4096x4096.

Peter Strohm said...

The demo is a great idea.
Unfortunately the framerates depend on each other if you open both versions at the same time (in 2 tabs). Have you considered using requestAnimationFrame instead of setInterval ?
Please see here for details: http://khronos.org/webgl/wiki/FAQ#What_is_the_recommended_way_to_implement_a_rendering_loop.3F

Neil Rashbrook said...

When I try RDP I just get "Could not initialise WebGL, sorry :-(". I'll try and remember to try it over VNC.

vpi79 said...

For WebGL, the bitmap sizes should not be a depending factor on which the textures would draw or not. If you see all-black sprites because of that, it can only be caused by a lack of support within the WebGL implementation, to subdivide large bitmaps internally into multiple ones, multiplying effectively the number of sprites and reducing the maximum frame rate, but at least providing a consistant display.
So I think this is an initial limit bug of the first WebGL implementations. WebGL implementations should apply a best effort strategy (also notably because such subdivision is not necessary at low scaling factors in MIPMAPs).

Also I find it strange that Microsoft still handles the sprites animations fully in JAvascript, in Canvas2D, and did not even try to provide a Javascript binding to an animator object (an integral part now of the Windows API in Windows Seven) to reduce the work load (of for loops indexes, and of closures): the Microsoft approach is clearly incomplete, when WebGL has specified all aspects of 3D animations: scene layout, optimized MIPMPA texturing, animation of geometries, time synchronisations, collision detections, levels of transparencies, lighting, and visual effects on textures or "transparent" volumes (such as fogs and clouds simulation), independant camera positioning and animation, perspectives, and various interpolation modes for textures... all that within a scene description language that maximizes the parallellisms and avoids allmost all closures and specializations.

Microsoft pretends that WebGL is not a standard, but it has inherited fro decenials of research in OpenGL from which it is a simple binding focused on increasing the interoperability and better independance from the hardware. Microsoft will probably react by trying to build its own API using DirectX, but in fact I see no interest for doing that. Microsoft can very well port the WebGL API using a DirectX implementation in IE. Let's not support another incompatible standard when in fact WebGL alrady has the string support of the three major graphics accelerator manufacturers (AMD/ATI, nVIDIA, and Intel, plus makers of smartphones) that all have made a lot of efforts to also support OpenGL very well on their GPUs and even in their CPU for software emulations... OpenGL also works very well through the Java engine of Droid devices, because Java strongly supports OpenGL as well. OpenGL is in fact the only industry standard, DirectX is not and is, most of the time now, a binding layer on top of OpenGL-optimized hardwares (with additional features now for parallel GPU-computing for things not directly related to graphics rendering, but to more general signal processing)

Neil Rashbrook said...

WebGL doesn't initialise over VNC to my Linux VM either.

Anonymous said...

I have the same effect of black boxes instead of fish on my notebook graphics (Intel Core i5 integrated) both on your final optimized version and on the WebGL version (last one even says the texture size is too large).
Seems like your optimization largely affects compatibility. Maybe that's one of the reasons that led Microsoft to do it another way.

Jeff Muizelaar said...

Anonymous, the black boxes are caused by the large texture size. Because the texture is 8129x1024 it would be possible to rearrange the sprites so that the texture was 4096x2048. I didn't do this because I wanted to keep the demo code as similar to Microsoft's as possible.

Anonymous said...

On my box the webgl demo performs much much worse than the original IE version.