Video classification refers to the process of automatically assigning one or more predefined categories or labels to a video based on its content. This task involves analyzing the video's visual and sometimes audio information to recognize and understand the events, actions, objects, or other characteristics present in the video. Video classification is an important research area in computer vision and has numerous practical applications, such as video indexing, content-based video retrieval, video recommendation, video surveillance, and human activity recognition.
In the past, video classification was limited to predefined categories or labels, focusing on identifying events, actions, objects, and other features. Customizing classification criteria without retraining the model and updating criteria seemed like a distant dream. But, here's where Twelve Labs classification API enters the scene and saves the day by effortlessly and powerfully letting us classify videos based on our custom criteria, all in near real-time and without the fuss of training any models. Talk about a game-changer, yea!
Twelve Labs classification API is designed to label indexed videos based on the duration a class label occupies within each video. If the duration is less than 50%, the class label won't apply. Therefore, it's important to carefully design classes and their prompts, especially when uploading large videos. The API service can accommodate any number of classes, allowing you to add as many prompts within a class as you'd like.
For example, let's say you have a collection of hilarious videos featuring your dog, Bruno, and your cat, Karla, engaged in various antics. You can upload these videos to Twelve Labs' platform and create custom classification criteria tailored to the amusing escapades of your furry friends:
With just one API call, you can classify your uploaded videos using the criteria you've created. If you happen to forget a few prompts or wish to introduce new classes, you can easily do so by adding more classes and prompts to your JSON. There's no need to train a new model or retrain an existing one, making the whole process hassle-free.
Prerequisites: To smoothly navigate this tutorial, sign up for the Twelve Labs API suite and install the required packages. It's recommended to read the first and second tutorials to familiarize yourself with the basics 🤓.
Video Upload: Send your videos to the Twelve Labs platform, which effortlessly indexes them, enabling you to add custom classification criteria and manage your content on-the-fly! And guess what? You don't even need to train an ML model 😆😁😊.
Video Classification: Get ready for the real fun! We'll create our own custom classes and a range of prompts within each class. Once we've defined our criteria, we can use them right away to fetch the results. No delays, straight to the goodies! 🍿✌️🥳
Crafting a Demo App: We will create a Flask-based app to harness the results from the classification API, access videos stored in a local folder on our computer, and then render a custom-designed, sleek HTML page to stylishly showcase the classification results 🔍💻🎨.👨🎨
In the first tutorial, I covered the basics of using simple natural language queries to find specific moments within your videos. To keep things simple, I uploaded only one video to the platform and covered essential concepts such as creating an index, configuring the index, defining task API, basic monitoring of video indexing tasks, and step-by-step explanations of creating a Flask-based demo app.
The second tutorial went a step further, exploring the combination of multiple search queries to create more precise and targeted searches. I uploaded multiple videos asynchronously, created multiple indexes, implemented additional code for monitoring video indexing tasks and fetching details like estimated time for task completion. I also configured the Flask app to accommodate multiple videos and display them using an HTML template.
Continuing on this streak, the current tutorial will cover synchronous video uploads using Python's built-in concurrent.futures library. We will monitor the indexing statuses of the videos and record them to a CSV file. Additionally, we will surface the input classification criteria and relevant classification API response fields in the HTML template, making it easier to interpret the results.
If you encounter any difficulties while reading this or any of the previous tutorials, don't hesitate to reach out for help! We pride ourselves on providing quick support through our Discord server with response times faster than a speeding train 🚅🏎️⚡️. Alternatively, you can also reach me via email. Twelve Labs is currently in Open Beta, so you can create a Twelve Labs account and access the API Dashboard to obtain your API key. With your free credits, you'll be able to classify up to 10 hours of your video content.
Creating an index and configuring it for video upload:
This time I've cooked up the code that automatically scoops up all videos from a designated folder, assigns them the same name as their video file, and uploads them to the platform – all while strutting its stuff synchronously using a Python library. Just pop all the videos you want to index into a single folder, and you're good to go! The whole indexing process will take about 40% of the longest video's duration. Need to add more videos to the same index later? Easy peasy! No need for a new folder, just toss them into the existing one. The code's got your back, it checks for any indexed videos with the same name or pending indexing tasks before starting the process. This way, you'll dodge any pesky duplicates – pretty convenient, huh? 😄
Monitoring the indexing process
Similar to the upload function, I've designed a monitoring function that keeps track of all tasks happening concurrently. It diligently records the estimated time remaining and the upload percentage for each video being indexed simultaneously in a tidy CSV file. This attentive function continues to execute until every video in your folder has been indexed. To cap it off, it displays the total time taken for the synchronous indexing process, conveniently measured in seconds. Pretty efficient, right?
List all videos in the index:
To make sure we've got all the necessary videos indexed, let's do a thorough double-check by listing all the videos within the index. On top of that, we'll create a handy list containing all video IDs and their corresponding names. This list will come in useful later when we need to fetch the appropriate video names for video clips (those segments that match the classification criteria) returned by the classification API.
Just a heads-up, I've tweaked the page limit to 20, since we're dealing with 11 indexed videos. By default, the API returns 10 results per page, so if we don't update the limit, one sneaky result will slip onto page 2 and won't be included in the response_json we're using to create our video_id_name_list. So, let's keep it all on one page!
Before diving into the code, let's breeze through the theory behind it. Feel free to skim over and jump to the code if that's more your cup of tea. When it comes to classification, you can control how it works with the following parameters:
Let’s set our classification criteria and use the Twelve labs classify API to make a classification request, we will stick with the default threshold setting for this demo:
Class labels are assigned to the overall video according to the prompts present within the class. To pinpoint the appropriate video segments (clips relating to the prompts) and achieve precise video labeling, it's vital to supply numerous relevant prompts. Keep in mind, a class label is assigned only if the video duration matching the class label surpasses 50% of the video's total length, and this duration is established by combining video clips that align with the prompts.
Here's the outcome of the classification API call we executed. The "duration ratio" represents the proportion of video segments to the entire video length, "score" indicates the model's confidence, "name" refers to the class label, and all matched videos are showcased in descending order based on their confidence scores:
Now let's rewrite the same code and call the classification API, but with a small twist: we'll set include_clips to True. By doing this, we'll fetch all the relevant video clips along with their metadata that match the prompts provided within our classes:
To maintain succinctness, I've trimmed the output. Note how the output now displays the clip data for each video, including start and end timestamps, as well as the confidence score for the specific clip and its related prompt. We're diligently revamping the API endpoint to integrate the corresponding classification option tied to each prompt (e.g., visual and conversation, where visual represents an audio-visual match and conversation refers to a dialogue match).
Now it's time to store both the JSON results and serialize (pickle) them, along with the video_id_name_list we created earlier, into a file:
As with our previous tutorials, we'll be crafting a Flask-based demo app that hosts a web page and makes use of the serialized data. By applying this data to the videos retrieved from our local drive, we'll create a visually appealing classification results web page. This way, we can experience firsthand how our video classification API can supercharge our applications and deliver impressive results.
The directory structure will look like this:
Flask app code
In this tutorial, a slight variation is introduced on how to serve video files from a local directory and play specific segments using the HTML5 video player. The serve_video function employed in the Flask application serves video files from the classify_try directory, which is in the same directory as your Flask application script. The url_for('serve_video', filename=video_mapping[video.video_id]) expression in the HTML template generates the URL for the served video file.
As you may have noticed from the output of the classification API when we set "include_clips" to True, the API returned numerous clips along with their metadata. For simplicity's sake and to demonstrate the results that include these clips, I included a get_top_clips function. This function finds three unique prompts and returns all the clip metadata associated with them, giving a more comprehensive view of the classification results.
Below is the code for the "app.py" file:
Here's a sample Jinja2-based HTML template that integrates code within the HTML file to iterate using fields from the serialized data we prepared and passed earlier. This template fetches the required videos from the local drive and displays the results in response to our classification criteria:
Running the Flask app
Awesome! let’s just run the last cell of our Jupyter notebook to launch our Flask app:
You should see an output similar to the one below, confirming that everything went as anticipated 😊:
After clicking on the URL link http://127.0.0.1:5000, you should be greeted with the following output:
Here's the Jupyter Notebook containing the complete code that we've put together throughout this tutorial - https://tinyurl.com/classifyNotebook
Stay tuned for the forthcoming excitement! If you haven’t joined already I invite you to join our vibrant Discord community where you can connect with other like-minded individuals who are passionate about multimodal AI.
Catch you later,
Crafting stellar Developer Experiences @Twelve Labs