Logo detection in videos refers to the automated process of identifying and recognizing logos or trademarks embedded within video content. This involves dissecting video frames or segments to detect and locate specific logo patterns or visual elements associated with a brand. This technology empowers us to quickly navigate through video content, accurately identifying the specific instances when certain logo patterns manifest on screen. Logo detection in videos refers to the automated process of identifying and recognizing logos or trademarks embedded within video content. This involves dissecting video frames or segments to detect and locate specific logo patterns or visual elements associated with a brand. This technology empowers us to quickly navigate through video content, accurately identifying the specific instances when certain logo patterns manifest on screen. Logo detection in video data can identify a wide range of elements, and thus, it has a broad array of applications across various industry verticals:
In this tutorial, we're going to delve into the world of logo detection from two distinct perspectives and levels. The first one is at the video level, where we tackle the entire video content as a single entity, seeking to unearth every bit of logo information it contains. The second approach, the index level, narrows our lens to concentrate on a specific logo or a group of logos. We'll employ natural language queries to conduct an exhaustive search across a rich library of videos indexed on the Twelve Labs platform.
Here's the best part – with Twelve Labs API at your fingertips, you can achieve all of this without getting bogged down in the intricacies of model training, deployment, inference, or load scaling that make up the detection process. We've got you covered from development to infrastructure, and even provide continuous support. So, buckle up and join us on this fascinating journey into the domain of logo detection.
There may be instances where a logo is simply the name of a brand or company, leading to questions about whether it could be treated as text on screen, and if text-in-video indexing and search options might suffice for logo detection. It's true that such text will be detected as it appears on screen during video playback. However, in cases where text might have different meanings in different contexts, specifically configuring the indexing and search options to detect logos is key.
For instance, 'Amazon' could refer to the multinational technology company or the South American river. If you employ logo detection, the system will differentiate between searches for Amazon's brand logo and the text 'Amazon'. Hence, even though text-in-video and logo detection may seemingly overlap in cases where the brand logo is text-based, choosing to use the logo detection feature should be intentional to ensure accurate results.
The Twelve Labs platform is presently in open beta, and we are offering free video indexing credits for up to 10 hours upon sign-up. I'd recommend familiarizing yourself with the core aspects of the Twelve Labs platform prior to embarking on this tutorial. A solid understanding of concepts such as video indexing, indexing options, the Task API, and search options is essential to seamlessly navigate through this tutorial. I've covered these topics in depth in my first tutorial. Nonetheless, if you encounter any obstacles or find yourself stuck at any point, please feel free to reach out. I'm here to assist you on this exciting journey into the realm of logo detection. Additionally, our response times on our Discord server are lightning fast 🚅🏎️⚡️ if Discord is your preferred platform.
In continuation with our previous discussion, we'll delve into logo detection from two distinct perspectives and levels. I've segmented this tutorial into two crucial sections, culminating in a final demonstration where we amalgamate all elements into a fully functional web-app:
The extraction of recognized logos from a specific video involves the following three steps:
Applying logo detection on the entire video enabled us to scrutinize and distill it for all instances of logo pattern(s). Now, the logo search feature empowers us to zero in on precise moments or video snippets where the input or searched logo or brand name materializes. This greatly diminishes the time spent perusing a sizable catalogue of videos, yielding accurate search results predicated on alignment of search terms with the logo that becomes visible on screen during video playbacks.
In our earlier tutorials, we delved into content search within indexed videos, utilizing natural language queries and a variety of search options like visual (audio-visual search), conversation (dialogue search), and text-in-video (OCR). In this tutorial, we'll pivot our approach, harnessing exclusively the logo detection pipeline to search for logo(s) within indexed videos. To maximize processing efficiency and minimize costs, we'll establish an index using solely the 'logo' indexing option. Subsequently, we'll trigger our search query with the 'logo' search option, thereby enabling us to uncover relevant logo matches within the indexed videos.
To tie everything together, we will utilize the data generated by the API endpoints and present them on a webpage, spinning up a demo app based on Flask that hosts a minimalistic HTML page. The outcome of the logo detection process will be systematically tabulated, exhibiting timestamps and their corresponding logo names. The logo search section, on the other hand, will display the query we used and the relevant video segments we discovered in response.
For the sake of simplicity, I've uploaded just five videos to an index using a pre-existing account. Feel free to sign up; given we're currently in open beta, you'll receive complimentary credits allowing you to index up to 10 hours of video content. If your needs extend beyond that, check out our pricing page for upgrading to the Developer plan.
Here, we’re going to delve into the essential elements that we'll need to include in our Jupyter notebook. This includes the necessary imports, defining API URLs, creating the index, and uploading videos from our local file system to kick off the indexing process:
Uploading five formula one race videos to the index we've just created. I have downloaded these videos from their some formula one related YouTube channels and saved them in a folder named 'static' on my local hard drive. We'll use these local files to index the videos onto the Twelve Labs platform:
Now let's enumerate all the videos in our index. This allows us to retain the video ID of a specific video, the goal being to extract all the text embedded within it. Furthermore, akin to our methods in prior tutorials, I'm assembling a list of video IDs and their respective titles, designed to be subsequently fed into our Flask application.
Time to put our plan into action! We'll now proceed to extract all logo content from the chosen video:
Clearly, the API skillfully extracted all the on-screen logos line by line. This information can be stored as metadata for subsequent workflows, including content filtering, classification, and search purposes. Please note that the output displayed here is abbreviated for conciseness - the actual output was considerably more extensive.
Launching our search query utilizing the logo search option to uncover pertinent logo pattern matches within our collection of indexed videos:
Once more, the API proficiently scoured and retrieved all on-screen logos corresponding to the input logo name, but this time across the entire index of videos we've uploaded.
Preparing the data for the Flask application to ensure our results will be presented neatly:
Further data preparation for the logo detection results, followed by our standard procedure of pickling everything:
Now we've reached the last stretch of our current journey – integrating everything to make our outputs come to life. Besides the standard configuration we implement for fetching videos from the local folder and loading the pickled data dispatched from the Jupyter notebook, this time we have some additional requirements - a conversion of timestamps from a seconds-only format to a minutes-and-seconds format. This makes the data visualization on the webpage more intuitive. Here's the code for the app.py file:
Awesome! let’s just run the last cell of our Jupyter notebook to launch our Flask app:
You should see an output similar to the one below, confirming that everything went as anticipated 😊:
After clicking on the URL link http://127.0.0.1:5000, you should be greeted with the following web page:
Here's the Jupyter Notebook containing the complete code that we've put together throughout this tutorial - https://drive.google.com/drive/folders/1D97_UU2Z0lvp3y52BHV5GKkSNOQKv3Xi?usp=share_link
Expect more exciting content coming your way! If you haven't already, I cordially invite you to join our vibrant Discord community, populated with individuals who share a passion for multimodal AI.
See you next time,
Crafting stellar Developer Experiences @Twelve Labs