Openscenesense
OpenSceneSense is a Python library that harnesses AI for advanced video analysis, offering customizable frame and audio insights for dynamic applications in media, education, and content moderation.
Install / Use
/learn @ymrohit/OpenscenesenseREADME
OpenSceneSense
OpenSceneSense is a cutting-edge Python package that revolutionizes video analysis by seamlessly integrating OpenAI and OpenRouter Vision models. Unlock the full potential of your videos with advanced frame analysis, audio transcription, dynamic frame selection, and comprehensive summaries all powered by state-of-the-art AI.
Table of Contents
- 🚀 Why OpenSceneSense?
- 🌟 Features
- 📦 Installation
- 🔑 Setting Up API Keys
- 🛠️ Usage
- 🎯 The Power of Prompts in OpenSceneSense
- 📈 Applications
- 🚀 Future Upgrades: What's Next for OpenSceneSense?
- 🌐 OpenSceneSense and the Future of Content Moderation
- 🛠️ Contributing
- 📄 License
- 📬 Contact
- 📄 Additional Resources
🚀 Why OpenSceneSense?
OpenSceneSense isn't just another video analysis library, it's a gateway to a new era of video-based applications and innovations. By enabling large language models (LLMs) to process and understand video inputs, OpenSceneSense empowers developers, researchers, and creators to build intelligent video-centric solutions like never before.
Imagine the Possibilities:
- Interactive Video Applications: Create applications that can understand and respond to video content in real-time, enhancing user engagement and interactivity.
- Automated Video Content Generation: Generate detailed narratives, summaries, or scripts based on video inputs, streamlining content creation workflows.
- Advanced Video-Based Datasets: Build rich, annotated video datasets for training and benchmarking machine learning models, accelerating AI research.
- Enhanced Accessibility Tools: Develop tools that provide detailed descriptions and summaries of video content, making media more accessible to all.
- Smart Surveillance Systems: Implement intelligent surveillance solutions that can analyze and interpret video feeds, detecting anomalies and providing actionable insights.
- Educational Platforms: Create interactive educational tools that can analyze instructional videos, generate quizzes, and provide detailed explanations.
With OpenSceneSense, the boundaries of what's possible with video analysis are limitless. Transform your ideas into reality and lead the charge in the next wave of AI-driven video applications.
🌟 Features
- 📸 Frame Analysis: Utilize advanced vision models to dissect visual elements, actions, and their interplay with audio.
- 🎙️ Audio Transcription: Seamlessly transcribe audio using Whisper models, enabling comprehensive multimedia understanding.
- 🔄 Dynamic Frame Selection: Automatically select the most relevant frames to ensure meaningful and efficient analysis.
- 🔍 Scene Change Detection: Identify scene transitions to enhance context awareness and narrative flow.
- 📝 Comprehensive Summaries: Generate cohesive and detailed summaries that integrate both visual and audio elements.
- 🛠️ Customizable Prompts and Models: Tailor the analysis process with custom prompts and model configurations to suit your specific needs.
- 📊 Metadata Extraction: Extract valuable metadata for deeper insights and data-driven applications.
📦 Installation
Prerequisites
- Python 3.10+
- FFmpeg installed on your system
Installing FFmpeg
On Ubuntu/Debian
sudo apt update
sudo apt install ffmpeg
On macOS (using Homebrew)
brew install ffmpeg
On Windows
- Download FFmpeg from ffmpeg.org/download.html.
- Extract the archive.
- Add the
binfolder to your system PATH.
To verify the installation, run:
ffmpeg -version
Install OpenSceneSense
pip install openscenesense
🔑 Setting Up API Keys
OpenSceneSense requires API keys for OpenAI and/or OpenRouter to access the AI models. You can set them as environment variables:
export OPENAI_API_KEY="your-openai-api-key"
export OPENROUTER_API_KEY="your-openrouter-api-key"
Alternatively, you can pass them directly when initializing the analyzer in your code.
🛠️ Usage
Quick Start
Get up and running with OpenSceneSense in just a few lines of code. Analyze your first video and unlock rich insights effortlessly.
import logging
from openscenesense import ModelConfig, AnalysisPrompts, VideoAnalyzer, DynamicFrameSelector
# Configure logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')
# Defaults use `gpt-4o` (vision), `gpt-4o-mini` (text), and `whisper-1` (audio for segment-rich transcripts).
# Override below if you want different models.
# Set up custom models and prompts
custom_models = ModelConfig(
vision_model="gpt-4o", # Vision-capable model
text_model="gpt-4o-mini", # Chat completion model
audio_model="whisper-1" # Whisper model for audio transcription
)
custom_prompts = AnalysisPrompts(
frame_analysis="Analyze this frame focusing on visible elements, actions, and their relationship with any audio.",
detailed_summary="""Create a cohesive narrative that integrates both visual and audio elements into a single summary.
Duration: {duration:.1f} seconds\nTimeline:\n{timeline}\nAudio Transcript:\n{transcript}""",
brief_summary="""Provide a concise, easy-to-read summary combining the main visual and audio elements.
Duration: {duration:.1f} seconds\nTimeline:\n{timeline}\nTranscript:\n{transcript}"""
)
# Initialize the video analyzer
analyzer = VideoAnalyzer(
api_key="your-openai-api-key",
model_config=custom_models,
min_frames=8,
max_frames=32,
frame_selector=DynamicFrameSelector(),
frames_per_minute=8.0,
prompts=custom_prompts,
log_level=logging.INFO
)
# Analyze the video
video_path = "path/to/your/video.mp4"
results = analyzer.analyze_video(video_path)
# Print the results
print("\nBrief Summary:")
print(results['brief_summary'])
print("\nDetailed Summary:")
print(results['summary'])
print("\nVideo Timeline:")
print(results['timeline'])
print("\nMetadata:")
for key, value in results['metadata'].items():
print(f"{key}: {value}")
Advanced Usage with OpenRouter Models
Leverage the power of OpenRouter models for enhanced performance and customization.
from openscenesense import ModelConfig, AnalysisPrompts, OpenRouterAnalyzer, DynamicFrameSelector
from os import getenv
custom_models = ModelConfig(
vision_model="qwen/qwen2.5-vl-32b-instruct:free",
text_model="meta-llama/llama-3.2-3b-instruct:free",
audio_model="whisper-1"
)
custom_prompts = AnalysisPrompts(
frame_analysis="Analyze this frame focusing on visible elements, actions, and their relationship with any audio.",
detailed_summary="""Create a cohesive narrative that integrates both visual and audio elements into a single summary.
Duration: {duration:.1f} seconds\nTimeline:\n{timeline}\nAudio Transcript:\n{transcript}""",
brief_summary="""Provide a concise, easy-to-read summary combining the main visual and audio elements.
Duration: {duration:.1f} seconds\nTimeline:\n{timeline}\nTranscript:\n{transcript}"""
)
analyzer = OpenRouterAnalyzer(
openrouter_key=getenv("OPENROUTER_API_KEY"),
openai_key=getenv("OPENAI_API_KEY"),
model_config=custom_models,
min_frames=8,
max_frames=32,
frame_selector=DynamicFrameSelector(),
frames_per_minute=8.0,
prompts=custom_prompts,
log_level=logging.INFO
)
# Analyze the video
video_path = "path/to/your/video.mp4"
results = analyzer.analyze_video(video_path)
# Print the results
print("\nBrief Summary:")
print(results['brief_summary'])
print("\nDetailed Summary:")
print(results['summary'])
print("\nVideo Timeline:")
print(results['timeline'])
print("\nMetadata:")
for key, value in results['metadata'].items():
print(f"{key}: {value}")
🎯 The Power of Prompts in OpenSceneSense
The quality and specificity of prompts play a crucial role in determining the effectiveness of the analysis OpenSceneSense provides. Thoughtfully crafted prompts can help guide the models to focus on the most important aspects of each frame, audio element, and overall video context, resulting in more accurate, relevant, and insightful outputs. OpenSceneSense allows you to define custom prompts for different types of analyses, giving you unparalleled control over the results.
Why Prompts Matter
- Directing Focus: Prompts help guide the model’s attention to specific elements, such as actions, emotions, or interactions within the video.
- Creating Coherent Summaries: Well-defined prompts ensure that summaries are cohesive and natural, integrating both visual and audio information seamlessly.
- Contextualizing with Metadata: By including tags like
{timeline},{duration}, and{transcript}, prompts can encourage the model to generate outputs that are contextually aware, helping users understand the full scope of the video’s content.
Example Prompts for Enhanced Analysis
Here are some example prompts to inspire you and help you maximize the capabilities of OpenSceneSense:
-
Frame-Level Analysis Prompt
"Analyze this frame with a focus on visible objects, their movements, and any emotions they convey. Consider the context of prior and subsequent frames for continuity."Use this prompt to capture detailed visual elements and their implications.
-
Detailed Summary Prompt
"*DO NOT SEPARATE AUDIO AND VIDEO SPECIFIC
Related Skills
qqbot-channel
344.4kQQ 频道管理技能。查询频道列表、子频道、成员、发帖、公告、日程等操作。使用 qqbot_channel_api 工具代理 QQ 开放平台 HTTP 接口,自动处理 Token 鉴权。当用户需要查看频道、管理子频道、查询成员、发布帖子/公告/日程时使用。
docs-writer
99.9k`docs-writer` skill instructions As an expert technical writer and editor for the Gemini CLI project, you produce accurate, clear, and consistent documentation. When asked to write, edit, or revie
model-usage
344.4kUse CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.
project-overview
FlightPHP Skeleton Project Instructions This document provides guidelines and best practices for structuring and developing a project using the FlightPHP framework. Instructions for AI Coding A
