I’ve always been fascinated by the notion of enriching our lives with intelligent machines augmented with all of the world’s knowledge. As an extension to this fascination, my research interest has largely been on neural interfaces for accessing the text-based knowledge we’ve amassed. I spend most of my time exploring how language models can be wired to a vast pool of information not only for the sake of retrieving existing information but also discovering new knowledge through reasoning.
The next step is video as I believe it is the most powerful representation of the world we live in, and thereby, the knowledge within it. As the world quickly runs out of usable text data, the next generation of foundation models will most likely be multimodal, leveraging the untapped information that resides in videos. And the pioneers that build these models will have to overcome immense engineering and research challenges as videos are painfully hard to handle. This is an incredibly daring vision that will impact the future of AI and transform countless industries.
That’s why I am thrilled to join the Twelve Labs team as the Chief Scientist. I often encounter bright individuals and sometimes teams full of rockstars; however, it is truly rare to come across a team that is deeply technical, humble, hyper-aligned, and most importantly, market-aware. We are a small and young team, and yet Twelve’s closed-beta product is already highly preferred by customers over other video intelligence APIs. With the recent multi-year compute partnership with OCI and the massive proprietary dataset the team has amassed, I expect the team to make big advancements for the video understanding category.
If history is any indication, the next wave of video-based applications will have to be intelligent from inception and Twelve Labs will be strategically positioned to be their reliable infrastructure. I look forward to working with our team to show what amazing things can be done with video foundation models when science, engineering, and product are truly aligned together.
Minjoon is an Assistant Professor and the Director of Language & Knowledge Lab at KAIST AI. He is the inventor of the Bi-Directional Attention Flow (BiDAF) Network and the recipient of the 2019 Facebook Fellowship and 2020 AI2 Lasting Impact Paper Award. Minjoon received his PhD in Computer Science at the University of Washington.