INTERACTIVE WATER STREAMS

ACM SIGGRAPH Symposium on Interactive Graphics and Games (i3D), 2009. Boston, Massachusetts.

2009. Hoetzlein, R. and Hollerer, Tobias. “Interactive Water Streams with Sphere Scan Conversion”. ACM Interactive Graphics and Games (i3D) 2009. Boston, Massachusetts. February 2009.

The potential of real time fluid simulation presents a range of novel opportunities in interactive game design, film production, media arts, and live performance. While GPU simulation of fluids as particles is now possible at interactive rates (see Fluids v.3), rendering of a water surface requires additional computational resources. Polygonization, the transforming of particle simulations into a set of renderable polygons (triangles), has typically been performed using the Marching Cubes algorithm.

Scan conversion is a novel technique for polygonization which generates a surface by conforming a cylinder around the fluid. While limited to stream-like fluids, sphere scan conversion performs significantly faster than traditional marching cubes, enabling direct interaction with the rendered fluid.

The follow page contains public source code and demos for the paper “Graphics Performance in Rich Internet Applications”, published in IEEE Computer Graphics & Applications (CG&A), September 2012.

A series of sprite tests was developed for testing raw graphics performance in Rich Internet Application Frameworks. By writing the same test in four different rich internet application frameworks and languages, 1) Flex/Flash, 2) HTML5/Javascript, 3) WebGL/Javascript, and 4) OpenGL/C++, it is possible to get a realistic picture of online graphics performance across browsers and languages.

The test consists of N transparent 2D sprites of 32×32 pixels each, rendered and initially placed at random locations on a 1280 x 900 pixel canvas. The sprites are animated according to a simple physics system with a point gravity at the center of the canvas. The test suite displays frame times in milliseconds, and allows for measurement of simulation and rendering separately. User input permits scaling the number of sprites (N) up to one million.

Raw Data can be downloaded here:

Raw Data (MS Excel spreadsheet)

All tests were conducted on a Sager NP8690 Core i7 640M laptop, with 4 GB ram, and a GeForce GTX 460M graphics card.
Original source code and demos are provided here for reference and validation.
To run these tests in different browsers, simply click on the links below in your browser of choice.

A future goal of this work, which readers are invited to participate in, is to perform these tests in other operating systems and on mobile devices.

Comments welcome below. NOTE: I recently fixed the comment entry below. If you have any problems posting comments, please email me at rama@rchoetzlein.com and let me know.

Keyboard control is identical for all demos below:

Z, X – Decrease/increase number of sprites by 100
Q,W – Decrease/increase number of sprites by 10,000
P – Pause simulation. Continue rendering. (Measures rendering only)
D – Pause rendering. Continue simulation. (Measures simulation only)
S – Switch to BitmapData sprites (Flex/Flash version only)

Flex/Flash/ActionScript

Click for online demo. http://www.rchoetzlein.com/sprite_tests/flex/

Download source here: sprites_flex.zip

Use the ‘S’ key to switch between Sprite and BitmapData rendering.

HTML5/Javascript

Click for online demo. http://www.rchoetzlein.com/sprite_tests/jscript/

Download source here: sprites_jscript.zip

WebGL/Javascript

Click for online demo. http://www.rchoetzlein.com/sprite_tests/webgl

Download source here: sprites_webgl.zip

OpenGL/C++

Download source here (Windows): sprites_opengl.zip

Ok, time to finally buy a smarthphone. In my search for the perfect phone, I wanted to understand why the Apple iPhone maintains such a large share of the market despite the many competitors out there now. How is it that one phone, by one manufacturer, can maintain 44.9% of the whole market– over all other phones!?

First, its not raw processing power. The HTC Rezound has dual 1.5 ghz Snapdragon CPUs, the Google Galaxy Nexus has dual 1.2 ghz ARMs, and the LG Optimus 2X has dual 1.0 Nvidia Tegra CPUs (the first dual core phone). The iPhone has dual 800 mhz A5 CPUs.

Second, its not screen resolution. While the iPhone 4s has 960×640 pixels, other phones also have this display quality, such as the HTC Evo 3D (960×640). The HTC Rezound has a 4.3″ 720×1280 resolution.

I’ve come to a simple theory to explain it – the physical size of the device. Here is a comparison of phone designs based solely on size:

The larger phones, such as the Motorola Droid Razr, are almost tablets in their grandeosity with dimensions of 69 x 131 mm. Yet they are still also used as phones. The Apple iPhone 4S and HTC Incredible are compact, with sizes around 58 x 116 mm, making them easy to hold as actual phones. As mentioned, it is now possible to get high performance (CPU) phones in all sizes, so the deciding factor for me was phone size.

Now, one would argue that the Google Galaxy Nexus and Droid Razr serve a slightly different user, that of the programming power-user who enjoys a phone with the heft of a tablet. However, take look again at -all- the Andorid phones here. If you are a girl, which are you most likely to choose? I’d suggest the driving factor behind the Apple iPhone 4S success, from a purely physical perspective, is that the size, sleekness, and interface are elegant and compact.

Just look also at the marketing campaigns for the “Droid”, “Galaxy” and “Incredible”. The names themselves connotate an excess of masculinity, and the sharp, large-digit numeric calendars of the phones reflect a hyper-attention to information. The Droid advertisement is a pure exercise in robotic utopia; d-r-o-i-d. The iPhone is called an “Apple”.

Please don’t get the idea that I am an Apple iPhone 4S supporter. Because the iPhone is the elephant in the room, so to speak, it has used its power to make a phone that is fairly antagonistic toward developers. Its proprietary OS, and closed app store, make it very difficult to develop for. In addition, the 2010 decision to drop Flash support for the iPhone has created major headaches for programmers (more recently overcome by Adobe AIR). Finally, Apple charges high fees to carriers to support its phone (iPhone Kills Carrier Profits), which ultimately result in a loss to all consumers – such as no more unlimited data.

What I’m suggesting, instead, is that Motorola, HTC, LG, and Sony are all missing out on a really great opportunity here to design non-geeky, small, nicely designed phones that aren’t for the ubernerd. A recent article (Women want Apple’s iPhone), supports the notion that girls prefer the iPhone, but not to such a degree that I would expect – 31% of women prefer iPhone, while 28% of men do. The fact that the iPhone maintains a 48% market share overall suggests that smartphone users, overall, like its compactness. Of the smartphones I reviewed for this article, the smallest one was the Apple iPhone 4S — it is smaller than all other modern phones I could find. When you hold it, it feels sleek, compact, and phone-like. I considered the HTC Incredible, which is very close in size, but it’s now 2 years old and not a high performance (single core). Ultimately I think I will get an HTC Rezound, because of the 4G, dual-core… But I have to say, I wish I could get an Android phone at the size of the Apple iPhone. It just doesn’t exist yet.

 

I have received several emails from people asking about surface reconstruction of Smoothed Particle Hydrodynamics. This is, of course, the current challenge :).

Currently, I know of several techniques which have been successfully applied to render surfaces from SPH particles:

1) Point Set surfaces
This was applied to an SPH fluid in GPU Gems 3. You can find the article here:
http://http.developer.nvidia.com/GPUGems3/gpugems3_ch07.html
The technique uses a ‘repulsion’ method to avoid rendering interior particles, and computes surface normals for the surface particles. They achieve 16,000 particles at 40 fps.

2) Screen Space curvature
http://www.cs.rug.nl/~roe/publications/fluidcurvature.pdf
This technique was developed by NVidia. They use a screen-space
technique which projects all the particles onto the screen with some
gaussian filter, then blend them on the screen. They achieve 60,000 particles at 20 fps. You can find a demo video of this on youtube.

3) Raycasting of Metaballs
http://www.cs.tsukuba.ac.jp/~kanamori/projects/metaball/eg08_metaballs.pdf
This method casts rays out on the GPU to intersect the metaball surface. They achieve 10,000 particles at 5 fps.

4) Image-Space 3D Metaballs
http://portal.acm.org/citation.cfm?id=1280731
This method creates a 3D texture which represents a metaball isosurface, then uses a screen-space shader to raycast into this surface. They achieve 60,000 particles at 10 fps.

There is a group on Gamedev which is exploring this technique, and provides some code for download:
http://www.gamedev.net/community/forums/topic.asp?topic_id=564607

5) Sphere Scan Conversion
http://www.rchoetzlein.com/eng/graphics/water.htm
This is my own technique. It achieves 16,000 at 80 fps, and is the only method listed here which can render shadows, but it is only for vertical streams and not a general method.

6) Perspective Grid Raycasting (*NEW*)
http://wwwcg.in.tum.de/Research/Publications/SPHSplatting
This recent (2010) technique uses a perspective grid textures on the GPU to achieve efficient raycasting. Results are 2,575,500 particles at 6.57 fps.

To give a better overall picture of current progress in SPH surface methods, I’ve normalized the performance measures here and listed them in order of performance. Note that the units are in milliseconds per 10,000 particles (computed as = 10^7/( #particles * fps ).

  • Perspective Grids = 0.59 ms/10k
  • Sphere Scan Conversion = 7.81 ms/10k (with shadows, only for vertical streams
  • Screen Space Curvature = 8.33 ms/10k
  • Point Set surfaces = 15.63 ms/10k
  • Image-Space 3D Metaballs = 16.67 ms/10k
  • Raycasting Metaballs = 200.00 ms/10k
  • As far as I know, this list represents the current state-of-the art in particle surface reconstruction, as of May 2010. I will update it regularly as new methods appear.

    Currently, an SPH physics simulation can run on the GPU in real-time with 60,000 particles at 57 fps (Zhang 2007), which is 3 ms per 10k particles. This means that research in surface reconstruction is lagging behind simulation performance.

    Although there are several successful methods above, none of them has achieved the frame rates that would be needed for an interactive game. In my view, both simulation and rendering need to total around 6 ms per 10k particles (30,000 particles at 60 fps) in order to be feasible for a commerical game, leaving enough time for rendering the rest of the game world and logic. Not to mention, to be useful in a game the fluid should interact with other objects and the world. The current bottleneck is the surface rendering (not simulation), and even the best generic methods are presently only at 8 ms/10k. In addition, none of the generic techniques above render shadows, and I’ve found shadows to be an important visual aspect to fluids.

    The most obvious technique, in a classical graphics sense, is 3D Marching Cubes, which must first generate an implicit metaball function, and then construct a marching cubes surface from this function while performing each step in real-time per frame. Current best methods for real-time Marching Cubes achieve around 10 fps for at 64^3 grid (Real-Time Isosurface Extraction, Tatarchuk). The ideal methods for SPH fluids are those which could avoid processing the interior particles of the fluid, and only render or polygonize the surface particles. This would allow surface reconstruction to scale in the future as O(n^2) rather than O(n^3). Although the Point Set surfaces method above achieves this during rendering, it must process all particles in order to determine the surface set.

    Although this is currently an open research problem, based on the interest and effort going into it, I imagine this problem will be solved within the next 3 years. Its just a guess though.

    Feel free to comment or ask questions below.

    This article is about programming multiple graphics cards to render OpenGL scenes on up to 6 monitors with a single computer. Recently, I’ve been doing research in this area for the Allosphere, an immersive, 30ft. display at UC Santa Barbara. Rather than have 16 computers, each with a projector, and gigabit ethernet (which has been the classic way to do cluster display walls for over 20 years), it may be more cost effective and lower latency to have only 2 to 4 high-performance workstations with 3x NVIDIA graphics cards in each. We recently built such a test system for a project called Presence (collaboration with Dennis Adderton and Jeff Elings), with multiple monitor rendering in OpenGL.

    How do you use OpenGL to render to multiple displays?

    Once upon a time, it was possible to use the “horizontal span” feature of some graphics cards. This instructed the OS to present to opengl a single continuous frame buffer you could write to. However, this has been discontinued due to changes in the Windows OS. I don’t know if such a feature ever existed for linux.

    The only way I know of now is to detect and render to each monitor individually per frame. This is also the only way to achieve a 3×2 display wall using 3 graphics cards, because the “horizontal span” only let you place them side-by-side. By rendering to each monitor, you can create arbitrary monitor layouts, and also arbitrary methods of projection. This sounds inefficient, but there are many things that can be done to speed it up. Its also possible to run Cg shaders on each monitor for a single frame. In the Presence project, we found that we could render deferred shading on 6-screens, with shadows, and depth-of-field on each.

    How does this work?

    The key is an undocumented feature of the OpenGL API called wglShareLists (although there is a man page for it, I say undocumented because it says very very little about how to invoke it, conditions required for it to work, or how use it with multiple GPUs).

    The common way to start opengl is to create a device context (in Windows this is an HDC, in linux an Xwindow), and then create an opengl render context, called an HGLRC. An opengl render context basically contains graphics data – textures, display lists, vertex buffer objects, frame buffers, etc. It does not record individual render commands invoked at render-time, but essentially all pre-frame data.

    With multiple displays, you need to detect each monitor and create an HDC on each (This can be done with EnumDisplaySettingsEx). If you have two monitors, but _one_ card – a dual-head card which is common – then you only need one HGLRC (render context) because there is only one card to store data. During rendering, you switch which HDC is active, but keep the same HGLRC (see wglMakeCurrent).

    If you want to play with multiple cards, then you need to create a window, an HDC, and an HGLRC for each screen. Since each card has its own memory space, they somehow need to share all textures, vertex buffers and data. This is what wglShareLists does. It instructs the OpenGL API to copy all server-side commands to every opengl render context that is shared. The undocumented bit is that this will happen even if the HGLRCs exist on different cards on the PCI bus. Take for example a glTexImage2D, which transfers texture data to the GPU for later rendering. In this case, the OpenGL driver will replicate the glTexImage2D command to every GPU on the bus. In addition, if you have 3 cards, you don’t need to explicitly create 3 textures.. share lists lets you access all of them through the primary context, although there is in fact a copy of your texture in each GPU memory.

    This may sound slow. It is, but at present there’s no other way to share a texture across three GPUs. (Perhaps in the future SLI may provide this, but it currently has other limits that dont permit multi-monitor rendering). Remember, however, this is not a rendering cost. It is a buffer setup cost, which for static scenes will usually occur only once at the beginning of your app. Thus, once the data is on the GPUs using wglShareLists, you can ask each card to render it relatively quickly.

    If you are trying to render dynamic geometry that changes every frame, then you’ve got much bigger problems. Note that I’m not talking about moving static objects, such as character limbs or terrain. These should still be fast on multiple monitors , because the vertex buffers dont change, or can be generated using vertex shaders. I’m talking about geometry such as a dynamic tentacle mesh where all verticies move each frame. This requires a PCI bus transfer on every frame, and should be avoided. When you render to multiple GPUs, the bus transfer overhead is multiplied by however many graphics cards you have. Thus, avoid dynamic geometry rendering on multiple cards.

    Sticking with static geometry buffers (as in most games), how does the rendering work?

    Now that the HDC and HGLRCs are setup for each monitor. And assuming you’ve called glShareLists properly, the only thing to do is render. Rendering to multiple displays is fairly simple.

    You attach the OpenGL driver to the context you want to render to using wglMakeCurrent. This tells the driver to render to that particular device context (OS window) using a particular opengl render context (graphics state). You then invoke opengl graphics commands as usual.

    First, you would setup the perspective, model and view matricies to create a window into your screen for that particular monitor. Depending on the layout of your monitors, there are several ways to do this. The simplest is to use glFrustum (not gluPerspective) to select the sub-portion of a camera frustum that you wish to render on a particular monitor. Then, you call opengl draw commands. If you bind to a texture, or use a vertex object, it will use the shared graphics state that now exists on every card – you basically don’t have to worry about which card the texture comes from.

    Another note about performance. I said that wglShareLists is only slow at the beginning of your app, as textures are transfered to each graphics card. This is only partly true. Your main render loop also now consists of perspective matrix setup, and draw commands, for each monitor. Ideally, since the graphics data is shared, it should be possible to instruct each GPU on the bus to do their rendering now in parallel (at the same time the other GPUs are rendering their monitors). However, as far as I know, modern GPUs can’t do this yet (NVIDIA?). Basically, your render loop has to wait while you send draw commands separately to each GPU, then wait for that GPU to finish so you can swap its buffer, thus updating each monitor. Fortunately, since the vertex/texture data is already on the card, and since you’ve writter your render code to bundle opengl calls together as much as possible (i hope!), then this doesn’t take too much longer.

    So, the overall pseudo-code is:

    1. Detect all hardware displays
    2. Setup for each one
    2a. … Create OS window (HWND CreateWindow method)
    2b. … Get the HDC device context from the window (GetDC method)
    2c. … Create HGLRC opengl context. (wglCreateContext method)
    3. Call wglShareLists
    4. Set wglMakeCurrent to HDC and HGLRC for context 0 (wglMakeCurrent method)
    5. Create textures, VBOs, disp lists, frame buffers, etc.
    6. Start main rendering (for each monitor)
    6a. … Call wglMakeCurrent for HDC/HGLRC for specific monitor
    6b. … Create projection, view matricies for specific monitor
    6c. … Clear frame and depth buffer
    6d. … Draw scene
    6e. … Call wglSwapBuffers to refresh that monitor
    6f. End render loop
    7. Delete all textures, VBOs, then close contexts.

    Using the methods above, I was able to render the happy Buddha (a test object in the graphics community) at over 60 fps with deferred shading, soft shadows, and depth of field on 6x monitors using three Nvidia GeForce 8800GTX cards.

    A final point: I’ve found there are two types of multi-monitor research out there: 1) what most commerical games, and graphics students do – which is to figure out, at most, how to do a dual-monitor setup using a single dual-head card (one GPU), and 2) large research institutions that build giant display walls using dozens or hundreds of computers the old fashioned way. There is very little work so far using multiple GPUs in a single computer, probably because graphics cards to do this are so new (NVIDIA spends lots of time meeting the huge needs of parallel GPGPU scientific computing).

    However, I encourage those interested to explore single computer multi-GPU rendering for these reasons: a) The hardware is relatively cheap now (an LCD can be had for $150 ea). b) This area of research is relatively unexplored so far. c) Although a projector gives a larger physical area, unlike a projector you actually increase your renderable resolution for every monitor added. Thats an anti-aliased pixel resolution of 3840 x 2048 for six screens (6x1280x1024). If you render to 6 projectors, were talking huge space. d) It looks really cool having a desktop running a game at ultra-highres on 6 screens!

    For some screen-shots of results, check here:
    http://www.rchoetzlein.com/art/recent/presence.htm
    (with Dennis Adderton and Jeff Elings):