SkillAgentSearch skills...

Openscenesense

OpenSceneSense is a Python library that harnesses AI for advanced video analysis, offering customizable frame and audio insights for dynamic applications in media, education, and content moderation.

Install / Use

/learn @ymrohit/Openscenesense

README

OpenSceneSense

OpenSceneSense is a cutting-edge Python package that revolutionizes video analysis by seamlessly integrating OpenAI and OpenRouter Vision models. Unlock the full potential of your videos with advanced frame analysis, audio transcription, dynamic frame selection, and comprehensive summaries all powered by state-of-the-art AI.

Table of Contents

  1. 🚀 Why OpenSceneSense?
  2. 🌟 Features
  3. 📦 Installation
  4. 🔑 Setting Up API Keys
  5. 🛠️ Usage
  6. 🎯 The Power of Prompts in OpenSceneSense
  7. 📈 Applications
  8. 🚀 Future Upgrades: What's Next for OpenSceneSense?
  9. 🌐 OpenSceneSense and the Future of Content Moderation
  10. 🛠️ Contributing
  11. 📄 License
  12. 📬 Contact
  13. 📄 Additional Resources

🚀 Why OpenSceneSense?

OpenSceneSense isn't just another video analysis library, it's a gateway to a new era of video-based applications and innovations. By enabling large language models (LLMs) to process and understand video inputs, OpenSceneSense empowers developers, researchers, and creators to build intelligent video-centric solutions like never before.

Imagine the Possibilities:

  • Interactive Video Applications: Create applications that can understand and respond to video content in real-time, enhancing user engagement and interactivity.
  • Automated Video Content Generation: Generate detailed narratives, summaries, or scripts based on video inputs, streamlining content creation workflows.
  • Advanced Video-Based Datasets: Build rich, annotated video datasets for training and benchmarking machine learning models, accelerating AI research.
  • Enhanced Accessibility Tools: Develop tools that provide detailed descriptions and summaries of video content, making media more accessible to all.
  • Smart Surveillance Systems: Implement intelligent surveillance solutions that can analyze and interpret video feeds, detecting anomalies and providing actionable insights.
  • Educational Platforms: Create interactive educational tools that can analyze instructional videos, generate quizzes, and provide detailed explanations.

With OpenSceneSense, the boundaries of what's possible with video analysis are limitless. Transform your ideas into reality and lead the charge in the next wave of AI-driven video applications.

🌟 Features

  • 📸 Frame Analysis: Utilize advanced vision models to dissect visual elements, actions, and their interplay with audio.
  • 🎙️ Audio Transcription: Seamlessly transcribe audio using Whisper models, enabling comprehensive multimedia understanding.
  • 🔄 Dynamic Frame Selection: Automatically select the most relevant frames to ensure meaningful and efficient analysis.
  • 🔍 Scene Change Detection: Identify scene transitions to enhance context awareness and narrative flow.
  • 📝 Comprehensive Summaries: Generate cohesive and detailed summaries that integrate both visual and audio elements.
  • 🛠️ Customizable Prompts and Models: Tailor the analysis process with custom prompts and model configurations to suit your specific needs.
  • 📊 Metadata Extraction: Extract valuable metadata for deeper insights and data-driven applications.

📦 Installation

Prerequisites

  • Python 3.10+
  • FFmpeg installed on your system

Installing FFmpeg

On Ubuntu/Debian

sudo apt update
sudo apt install ffmpeg

On macOS (using Homebrew)

brew install ffmpeg

On Windows

  1. Download FFmpeg from ffmpeg.org/download.html.
  2. Extract the archive.
  3. Add the bin folder to your system PATH.

To verify the installation, run:

ffmpeg -version

Install OpenSceneSense

pip install openscenesense

🔑 Setting Up API Keys

OpenSceneSense requires API keys for OpenAI and/or OpenRouter to access the AI models. You can set them as environment variables:

export OPENAI_API_KEY="your-openai-api-key"
export OPENROUTER_API_KEY="your-openrouter-api-key"

Alternatively, you can pass them directly when initializing the analyzer in your code.

🛠️ Usage

Quick Start

Get up and running with OpenSceneSense in just a few lines of code. Analyze your first video and unlock rich insights effortlessly.

import logging
from openscenesense import ModelConfig, AnalysisPrompts, VideoAnalyzer, DynamicFrameSelector

# Configure logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')

# Defaults use `gpt-4o` (vision), `gpt-4o-mini` (text), and `whisper-1` (audio for segment-rich transcripts).
# Override below if you want different models.
# Set up custom models and prompts
custom_models = ModelConfig(
    vision_model="gpt-4o",           # Vision-capable model
    text_model="gpt-4o-mini",        # Chat completion model
    audio_model="whisper-1"          # Whisper model for audio transcription
)

custom_prompts = AnalysisPrompts(
    frame_analysis="Analyze this frame focusing on visible elements, actions, and their relationship with any audio.",
    detailed_summary="""Create a cohesive narrative that integrates both visual and audio elements into a single summary. 
                        Duration: {duration:.1f} seconds\nTimeline:\n{timeline}\nAudio Transcript:\n{transcript}""",
    brief_summary="""Provide a concise, easy-to-read summary combining the main visual and audio elements.
                     Duration: {duration:.1f} seconds\nTimeline:\n{timeline}\nTranscript:\n{transcript}"""
)

# Initialize the video analyzer
analyzer = VideoAnalyzer(
    api_key="your-openai-api-key",
    model_config=custom_models,
    min_frames=8,
    max_frames=32,
    frame_selector=DynamicFrameSelector(),
    frames_per_minute=8.0,
    prompts=custom_prompts,
    log_level=logging.INFO
)

# Analyze the video
video_path = "path/to/your/video.mp4"
results = analyzer.analyze_video(video_path)

# Print the results
print("\nBrief Summary:")
print(results['brief_summary'])

print("\nDetailed Summary:")
print(results['summary'])

print("\nVideo Timeline:")
print(results['timeline'])

print("\nMetadata:")
for key, value in results['metadata'].items():
    print(f"{key}: {value}")

Advanced Usage with OpenRouter Models

Leverage the power of OpenRouter models for enhanced performance and customization.

from openscenesense import ModelConfig, AnalysisPrompts, OpenRouterAnalyzer, DynamicFrameSelector
from os import getenv

custom_models = ModelConfig(
    vision_model="qwen/qwen2.5-vl-32b-instruct:free",
    text_model="meta-llama/llama-3.2-3b-instruct:free",
    audio_model="whisper-1"         
)

custom_prompts = AnalysisPrompts(
    frame_analysis="Analyze this frame focusing on visible elements, actions, and their relationship with any audio.",
    detailed_summary="""Create a cohesive narrative that integrates both visual and audio elements into a single summary. 
                        Duration: {duration:.1f} seconds\nTimeline:\n{timeline}\nAudio Transcript:\n{transcript}""",
    brief_summary="""Provide a concise, easy-to-read summary combining the main visual and audio elements.
                     Duration: {duration:.1f} seconds\nTimeline:\n{timeline}\nTranscript:\n{transcript}"""
)

analyzer = OpenRouterAnalyzer(
    openrouter_key=getenv("OPENROUTER_API_KEY"),
    openai_key=getenv("OPENAI_API_KEY"),
    model_config=custom_models,
    min_frames=8,
    max_frames=32,
    frame_selector=DynamicFrameSelector(),
    frames_per_minute=8.0,
    prompts=custom_prompts,
    log_level=logging.INFO
)

# Analyze the video
video_path = "path/to/your/video.mp4"
results = analyzer.analyze_video(video_path)

# Print the results
print("\nBrief Summary:")
print(results['brief_summary'])

print("\nDetailed Summary:")
print(results['summary'])

print("\nVideo Timeline:")
print(results['timeline'])

print("\nMetadata:")
for key, value in results['metadata'].items():
    print(f"{key}: {value}")

🎯 The Power of Prompts in OpenSceneSense

The quality and specificity of prompts play a crucial role in determining the effectiveness of the analysis OpenSceneSense provides. Thoughtfully crafted prompts can help guide the models to focus on the most important aspects of each frame, audio element, and overall video context, resulting in more accurate, relevant, and insightful outputs. OpenSceneSense allows you to define custom prompts for different types of analyses, giving you unparalleled control over the results.

Why Prompts Matter

  • Directing Focus: Prompts help guide the model’s attention to specific elements, such as actions, emotions, or interactions within the video.
  • Creating Coherent Summaries: Well-defined prompts ensure that summaries are cohesive and natural, integrating both visual and audio information seamlessly.
  • Contextualizing with Metadata: By including tags like {timeline}, {duration}, and {transcript}, prompts can encourage the model to generate outputs that are contextually aware, helping users understand the full scope of the video’s content.

Example Prompts for Enhanced Analysis

Here are some example prompts to inspire you and help you maximize the capabilities of OpenSceneSense:

  1. Frame-Level Analysis Prompt

    "Analyze this frame with a focus on visible objects, their movements, and any emotions they convey. Consider the context of prior and subsequent frames for continuity."
    

    Use this prompt to capture detailed visual elements and their implications.

  2. Detailed Summary Prompt

    "*DO NOT SEPARATE AUDIO AND VIDEO SPECIFIC
    

Related Skills

View on GitHub
GitHub Stars21
CategoryContent
Updated20d ago
Forks1

Languages

Python

Security Score

95/100

Audited on Mar 12, 2026

No findings