Differentiable Visual Computing: Bridging 2D and 3D in Machine Learning Applications

An illustration of the forward and backward processes of a differentiable renderer.

Introduction 

The world around us is 3-dimensional, yet vision, one of the most important ways that we observe the world, relies heavily on 2-dimensional representations. This is true of both biological vision and computer vision. For this reason, many of the seminal moments in machine learning and deep learning history involve 2D image data. The use of deep convolutional neural networks to achieve record breaking performance in the 2012 ImageNet challenge brought widespread attention to the field of deep learning. More recently, generative image and video models, such as DALL-E and Sora by OpenAI, have had a significant cultural impact. 

In the coming years, thanks to developments in the field of differentiable visual computing (DVC), many groundbreaking machine learning technologies will make the jump from 2D to 3D. These innovations will impact sectors involving 3D object detection and scene understanding, computational art and design, manufacturing, autonomous vehicles, robotics, optimal control, and physical inference. These are all sectors in which physically accurate and interpretable models are important, and in which black-box deep learning approaches may not be suitable. DVC allows humans to insert scientific knowledge into a deep learning pipeline, making models more data efficient and more easily interpretable and controllable. Already, these techniques have been applied to design products with special optical properties, to control soft body robots, to capture basketball stars in 3D, to infer object properties like mass and elasticity from video, and much more. 

The Differentiable Graphics Pipeline

Visual computing (VC) methods can create realistic 2D images of virtual 3D worlds using algorithms to simulate physical processes such as light transport, and the dynamics of fluids, rigid-bodies and other media. The process of turning 3D virtual worlds into 2D images or videos can be considered the “forward” direction of a physical simulation. In some applications, we are also interested in learning information about the 3D world from 2D data. This complementary process is called an “inverse” problem. Solving inverse problems allows us to learn and design virtual 3D worlds in order to produce a particular outcome or effect in a physical simulation or visual computing pipeline. 

To solve an inverse problem, we must efficiently search the high-dimensional space of parameters describing a 3D scene in order to find an optimal set of parameters. These parameters could represent the geometry of an object, the optical properties of an object (e.g. color, reflectivity, or translucency), the physical properties of an object (e.g. mass, viscosity, or elasticity), or the properties of an imaging system (e.g. the camera position, orientation, or focal depth). In deep learning, such parameter searches are typically conducted using gradient-based optimization algorithms. This approach, however, is predicated on the property of differentiability with respect to a set of parameters. Thus enters the field of differentiable visual computing (DVC). By designing visual computing and graphics pipelines that support differentiation, we can utilize the same optimization approaches that are used to train deep learning models. Furthermore, DVC components can seamlessly integrate with deep learning models to create hybrid systems consisting of flexible deep learning models and physically accurate simulations. 

One of the core components of a differentiable graphics pipeline is a differentiable renderer. Just like regular renderers, differentiable renderers can run a forward rendering process, in which a 3D scene is turned into a realistic 2D image. Additionally, differentiable renderers can backpropagate gradients from a rendered image to the 3D scene parameters. This allows for applications where 3D scene parameters are learned from 2D image supervision. Video 1 shows an example where the lights illuminating a bunny mesh are learned using gradient descent optimization and a ground truth image of the bunny. The 3D bunny model was obtained from The Stanford 3D Scanning Repository.

Video 2 shows an example where the material of a bunny mesh is learned using gradient descent optimization and a ground truth image of the bunny. 

An illustration of the forward and backward processes of a differentiable renderer.
Figure 1: An illustration of the forward and backward processes of a differentiable renderer. The forward process runs a light transport simulation in which the scene parameters (light properties, object transparency and material properties, and camera properties) are used to render an output image. The backward process propagates gradients from the output image back to the scene parameters allowing them to be updated as part of a learning process. 

 

Picture of a 2D bunny, that shows an example where the lights illuminating a bunny mesh are learned using gradient descent optimization and a ground truth image of the bunny
Video 1: An example of lighting optimization using differentiable rendering. The true lighting configuration uses a teal and a purple light aimed at the left and the right sides of the bunny mesh. The initial lighting configuration uses white lights aimed at the top and the bottom of the meshes. The optimization process recovers the true color and position of the lights.

 

A picture of a 2D bunny, that shows an example where the material of a bunny mesh is learned using gradient descent optimization and a ground truth image of the bunny
Video 2: An example of material optimization using differentiable rendering. The ground truth material is purple, and the initial material is green. Material properties are defined for each vertex in the mesh. The optimization process recovers the true material values for vertices that are visible to the camera.

Frequently, VC pipelines are used to create images of dynamic 3D scenes. In such scenes, objects may be animated (e.g. character body movements), or their dynamics may be generated using physical simulations (e.g. the movement of clothing or hair in response to character body movement). These simulations can also be made differentiable and integrated into a DVC pipeline. This allows for applications where physical parameters (e.g. object mass, elasticity, or external forces) or animations are learned from 2D or 3D data.

An illustration of the forward and backward processes of a differentiable physics simulation
Figure 2: An illustration of the forward and backward processes of a differentiable physics simulation. The forward process runs a dynamical simulation in which the scene parameters (object geometry and initial states, mass, elasticity, friction and external forces) are used to evolve the scene over a period of time. The backward process propagates gradients from the final state of the simulation back to the scene parameters allowing them to be updated as part of a learning process.

In different stages of a VC pipeline, different geometric representations of the same object may be used to facilitate different simulation or rendering processes. For example, a volumetric representation may be used for a physical simulation, after which, the object is converted into a triangle mesh representation in preparation for rendering. Representation conversions, along with transformations of and computations on representations, are other core components of DVC pipelines. These components connect different stages of the pipeline while allowing gradients to flow freely between the different stages during learning tasks.

An illustration of the forward and backward processes associated with differentiable geometry processing
Figure 3: An illustration of the forward and backward processes associated with differentiable geometry processing. The forward process converts an object to a different representational format, transforms the shape of the object, and then runs a computation on the new representation. The backward process propagates gradients from the final processed geometry back to the parameters (including the initial geometry, conversion, transformation and computation parameters) allowing them to be updated as part of a learning process.

Further Information and Future Work

In a recent review article [1] Derek Nowrouzezahrai (core Mila academic member and Ubisoft–Mila research Chair) and colleagues propose a holistic and unified vision for a differentiable visual computing pipeline. Jordan J. Bannister (Mila 3D machine learning scientist), in collaboration with Dr. Nowrouzezahrai, has recently released an open course and codebase to explore and explain state-of-the-art approaches to differentiable triangle rasterization [2], as well a differentiable rendering approach to fractal art creation [3]. Their ongoing work in this space will seek to develop and implement differentiable visual computing systems, in order to apply machine learning to 3D, visual and interactive media.

 

References

[1] Differentiable visual computing for inverse problems and machine learning. Spielberg, A.; Zhong, F.; Rematas, K.; Jatavallabhula, K. M.; Öztireli, C.; Li, T.; and Nowrouzezahrai, D. Nat. Mac. Intell., 5(11): 1189–1199. 2023.

[2] TinyDiffRast: A tiny course on differentiable triangle rasterization. Bannister J.J. and  Nowrouzezahrai, D. https://jjbannister.github.io/tinydiffrast/. 2024

[3] Learnable Fractal Flames. Bannister, J.J. and Nowrouzezahrai, D., CoRR, abs/2406.09328. 2024.