AI in Motion : D-Box Case Study

d-box chairs

Background

In recent years, movies have gotten louder, longer, and more stimulating than ever before. With enormous IMAX screens and surround sound, it can almost feel like being in the film itself — all that’s missing is to feel the rumble of a car chase or the boom of an explosion. That’s exactly what a team of applied researchers at Mila helped do, in the scope of a collaboration with D-BOX, an industry partner dedicated to creating hyper-realistic, immersive experiences in the movie theater by adding haptic effects to theater seats.

A Collaborative Industry Project

Mila’s applied researchers Mirko Bronzi, Bruce Wen and Gaétan Marceau Caron started an applied research project with experts at D-BOX in 2022, with the goal of facilitating the use of AI in D-BOX’s processes — in other words creating an automatic system to analyze movies, understand the content of the movie’s scenes, and decide which signals to send those immersive, moving seats. Up until then, D-BOX’s team of haptic experts watched movies and noted the various visual and audio events in the film. During an explosion, for example, a jolt and rumble of the seat had to be perfectly timed to feel realistic, a task of meticulous design and sometimes arduous. 

Mila researchers working on the D-Box project.

It was clear that through the use of AI, the visual and audio data from movies could facilitate the task of identifying events in movies and generating the associated haptic effects in order to enrich the user experience while watching. 

Through years of haptic design, D-BOX had amassed large amounts of data from movies at their disposal and studied the timing and nature of every event — gunshots, explosions, car chases — a seemingly endless list of different cues that could be used in machine learning. Using state-of-the-art deep learning models for video comprehension, the Mila researchers trained different models to identify events in movies that could later be used to propose the corresponding haptic effect, like a first draft of some of the tasks D-BOX’s haptic designers typically do. 

“The most important thing to figure out is when the chair should vibrate, and that was actually one of the challenges of this project, because it needs to be very precise,” Mirko explained. “If you see an explosion and the chair vibrates even 100 milliseconds too early, you can feel the difference and it doesn't work anymore — you lose that immersive feeling.”

Mirko cites Mission Impossible, which they watched again and again, in snippets, to get the models aligned right. When asked if they were sick of the film now, Bruce and Mirko both laughed. “Oh yes,” joked Mirko, without skipping a beat. “But it’s nothing compared to John Wick.”

A Unique, Multipronged Approach

What set their approach apart was the integration of different foundation models and the combination of insights from both audio and visual cues. Mirko emphasized the utility of their three-part system, allowing detailed event recognition through both local and contextualized analysis. A gunshot, for example, can be difficult to identify using only visual cues. Combining the visual cue with a wide audio element (ie: “there is a sound within these 10 seconds and it is indeed a gunshot”) and a narrow, precise audio element (ie: “the sound happens precisely at this moment”) made for a considerably more reliable model than using any of these elements alone.

Once the model had been trained, the team used software provided by D-BOX to directly visualize how well the output was aligned with the actual events in the movie. 

“That made it easier to see, for example, that a result was wrong because it was off by 50 milliseconds,” Mirko explained. “And we could also see that the visuals didn’t look right — maybe there was the sound of someone shooting, but the visual was showing another character, so that visual data wasn’t actually helping. Those things are good to know, and it’s important to invest time into understanding the data.”

Takeaways and Lessons Learned

When asked which lessons and takeaways they would carry forward to future projects, Mirko and Bruce were in agreement: prioritization and iterative feedback. 

Some cues, for example, were difficult to automate, because they appeared too infrequently to provide much data. Events like fireworks or the scrape of a blade proved a challenge, but the team sought to mitigate this issue by using models pre-trained on a larger amount of external data, rather than being confined to the data contained in the movies they used. They also attempted data augmentation, modifying their existing data to make it seem like new data to train the models on. 

But beyond that, the team explained, it became a question of deciding with D-BOX’s specialists which types of events to prioritize in order to provide the greatest benefit — a synergistic collaboration which made the team feel their work was more impactful and tailored to the client.

The team also highlighted the importance of integrating feedback regularly, which they received often thanks to D-BOX’s enthusiastic involvement in the project. 

“We developed automatic algorithms for evaluating [our models], but we knew from the beginning that probably wouldn’t be enough, because it won’t be as accurate as a human evaluation,” Bruce explained. “So we had several checkpoints with D-BOX and had someone on their side manually look at the performance of our model. We used a human evaluation to calibrate the automatic one.”

A Soon-to-Be-Implemented Success 

In February 2024, the fully AI-generated haptic experience using D-BOX seats was demonstrated in Mila’s auditorium. Curious employees and researchers could test out the seat by watching scenes from Kingsman, Rambo, or RRR, all while being shaken and jolted with each action-packed event. Mirko and Bruce themselves were impressed to see how well the haptic seats worked on new movies — no more Mission Impossible or John Wick, but instead, films that hadn’t been used to train the models. For casual onlookers, it was amusing to watch the viewer be tossed around while focused intently on the movie. Yet once in the chair, watching the scene oneself, it made sense. The timing and direction of the seat’s movements were incorporated seamlessly into the viewing experience, and made the already exciting scenes feel even more intense.


Overall, the team at Mila sees the project as a resounding success. Far from making D-BOX’s experts obsolete, it instead maximizes their resources. Rather than having to go through and annotate every single gunshot or every single explosion, the haptic designers can instead focus more on adding value from their expertise. It allows the work to be scaled up thanks to an efficient and reliable process. 

And the team could feel the impact of their work through D-BOX’s enthusiastic feedback. Mirko jokes that during a visit at D-BOX’s offices, they showed him the company’s organizational chart, onto which Mirko and Bruce’s photos had been added, to highlight how important their help had been. According to the team, D-BOX was very pleased with the outcome of the project and plans to implement the newly created model into their upcoming projects.

“This was a great project, especially because of the amount of data, the data quality and the involvement of D-BOX,” Bruce explained. “This project has been a really good experience, and we are definitely happy with the results.”

Meet the Team

Mila Members
Portrait of Mirko Bronzi
Senior Applied Research Scientist, Applied Machine Learning Research
Portrait of Bruce (Zhi) Wen
Senior Applied Research Scientist, Applied Machine Learning Research
Portrait of Gaétan Marceau Caron
Senior Director, Applied Machine Learning Research

Questions about the project?