How Twelve Labs Teaches A.I. to ‘See’ and Transform Video Understanding: Interview

<img decoding="async" class="size-full-width wp-image-1597297" src="https://observer.com/wp-content/uploads/sites/2/2025/11/GettyImages-2216886069.jpg?quality=80&w=970" alt="Woman in black jacket stands onstage" width="970" height="659" data-caption='Soyoung Lee, co-founder and head of GTM at Twelve Labs, pictured at Web Summit Vancouver 2025. <span class=”media-credit”>Photo by Vaughn Ridley/Web Summit via Sportsfile via Getty Images</span>’>

Sure, the score of a football game is important. But sporting events can also foster cultural moments that slip under the radar—such as Travis Kelce signing a heart to Taylor Swift in the stands. While such footage could be social-media gold, it’s easily missed by traditional content tagging systems. That’s where Twelve Labs comes in.

“Every sports team or sports league has decades of footage that they’ve captured in-game, around the stadium, about players,” Soyoung Lee, co-founder and head of GTM at Twelve Labs, told Observer. However, these archives are often underutilized due to inconsistent and outdated content management. “To date, most of the processes for tagging content have been manual.”

Twelve Labs, a San Francisco-based startup specializing in video-understanding A.I., wants to unlock the value of video content by offering models that can search vast archives, generate text summaries and create short-form clips from long-form footage. Its work extends far beyond sports, touching industries from entertainment and advertising to security.

“Large language models can read and write really well,” said Lee. “But we want to move on to create a world in which A.I. can also see.”

Is Twelve Labs related to Eleven Labs?

Founded in 2021, Twelve Labs isn’t to be confused with ElevenLabs, an A.I. startup that specializes in audio. “We started a year earlier,” Lee joked, adding that Twelve Labs—which named itself after the initial size of its founding team—often partners with ElevenLabs for hackathons, including one dubbed “23Labs.”

The startup’s ambitious vision has drawn interest from deep-pocketed backers. It has raised more than $100 million from investors such as Nvidia, Intel, and Firstman Studio, the studio of Squid Game creator Hwang Dong-hyuk. Its advisory bench is equally star-studded, featuring Fei-Fei Li, Jeffrey Katzenberg and Alexandr Wang.

Twelve Labs counts thousands of developers and hundreds of enterprise customers. Demand is highest in entertainment and media, spanning Hollywood studios, sports leagues, social media influencers and advertising firms that rely on Twelve Labs tools to automate clip generation, assist with scene selection or enable contextual ad placements.

Government agencies also use the startup’s technology for video search and event retrieval. Beyond its work with the U.S. and other nations, Lee said that Twelve Labs has a deployment in South Korea’s Sejong City to help CCTV operators monitor thousands of camera feeds and locate specific incidents. To reduce security risks, the company has removed capabilities for facial and biometric recognition, she added.

Will video-native A.I. come for human jobs?

Many of the industries Twelve Labs serves are already debating whether A.I. threatens humans jobs—a concern Lee argues is only partly warranted. “I don’t know if jobs will be lost, per se, but jobs will have to transition,” she said, comparing the shift to how tools like Photoshop reshaped creative roles.

If anything, Lee believes systems like Twelve Labs’ will democratize creative work traditionally limited to companies with big budgets. “You are now able to do things with less, which means you have more stories that can be created from independent creatives who do not have that same capital,” she said. “It actually allows for the scaling of content creation and personalizing distribution.”

Twelve Labs is not the only A.I. player eyeing video, but the company insists it serves a different need than its much larger competitors. “We’re excited that video is now starting to get more attention, but the way we’re seeing it is a lot of innovation in large language models, a lot of innovation in video generation models and image generation models like Sora—but not in video understanding,” said Lee, referencing OpenAI’s text-to-video A.I. model and app.

For now, Twelve Labs offers video search, video analysis and video-to-text capabilities. The company plans to expand into agentic platforms that can not only understand video but also build narratives from it. Such models could be useful beyond creative fields, Lee said, pointing to examples like retailers identifying peak foot-traffic hours or security clients mapping the sequence of events surrounding an accident.

While A.I. might help a Hollywood director assemble a movie, Lee believes it won’t ever be the director. Even if the technology can provide narrative options, humans still decide which story is most compelling, identify gaps and supply the footage. “At the end of the day, I think there’s nothing that can replace human creative intent.”

Want more insights? Join Working Title - our career elevating newsletter and get the future of work delivered weekly.