To show off some of the potential performance difference available, I ported the FishIE benchmark to WebGL. Along the way I discovered some different problems and ways to solve them.
The problem, once the overhead of Canvas 2D is removed, is that FishIE very quickly becomes texture read bound. I noticed that the FishIE sprites have a lot of horizontal padding. This padding was included in the drawImage calls which causes us to do a bunch of texture reads for transparent pixels. Trimming this down a little gave a noticeable framerate boost.
An even bigger cause of texture bandwidth waste is that the demo uses a large sprite to draw a small fish. Fortunately, OpenGL has a great solution to this problem: mipmaps. Mipmaps let the GPU use smaller textures when drawing smaller fish, which can dramatically reduce the texture bandwidth required. They also improve the quality of small fish by eliminating the aliasing that occurs when downscaling by large amounts.
Mipmapping is a good example of the flexibility that WebGL allows. Canvas 2D aims to be an easy to use API for drawing pictures, but this ease of use comes at some cost. First, the Canvas 2D implementation has to guess the intents of the author. For example drawImage on OS X does a high quality lanczos down scaling of the image. Direct2D just does a quick bilinear down scale. This makes it difficult for authors to know how fast drawImage will be. Further, because the design of Canvas 2D is inspired by an API for describing print jobs, it's not well suited to reusing data between paints.
Try out the difference with these two modified versions of FishIE: The method I used to port FishIE to WebGL is pretty straight forward so I expect that any of the other benchmarks listed above could also be easily ported to WebGL.
Pushing the limits
Once the number of fish becomes high enough we run into Javascript performance problems. FishIE has some Javascript problems that make things worse than they need to be. First, it loops over the fish with "for (var fishie in fish) {". This can end up using 10% of the total CPU time. The problem with this code is that converts all of the array indices to strings and then uses those strings to index into the array. It also has the problem that any additional properties added to the array will also show up as index values, which is likely not the intent of the author.Second, each fish object includes a swim() method. Unfortunately, in the FishIE source swim() is a closure inside the Fish() object. This means that the swim() method is different for each Fish which makes things worse for Javascript engines.
Fixing both of these problems, and making the fish really small lets us get an idea of how many sprites we can actually push around. Here's a final version. If I disable the method jit (bug 637878) and run at an even window size (bug 637894) I can do 60000 fish at 30fps, which I think is pretty impressive compared to the 1000 that the original Microsoft demo does.