Results for "scene-recognition"

Claude Code Claude Desktop GitHub Copilot Cursor Windsurf Cline Zed JetBrains

📄SKILL.md 🤖CLAUDE.md ⚡Claude Commands 📐.cursorrules 📐Cursor Rules 🕹️AGENTS.md 🧬codex.md 🏄.windsurfrules 🔧.clinerules 🧑‍✈️Copilot Instructions

All Development Operations Data Product Marketing Customer Design Sales

259 skills found · Page 1 of 9

CSAILVision / Semantic Segmentation Pytorch

5.1k

Pytorch implementation for Semantic Segmentation/Scene Parsing on MIT ADE20K dataset

universal

ade20kpytorchscene-recognition+1

Updated 2d ago

Breakthrough / PySceneDetect

4.6k

:movie_camera: Python and OpenCV-based scene cut/transition detection program & library.

universal

analysisimage-processingopencv+6

Updated 10h ago

chongyangtao / Awesome Scene Text Recognition

1.7k

A curated list of resources dedicated to scene text localization and recognition

universal

natural-imagesscene-textstext-detection+1

Updated 2mo ago

MaybeShewill-CV / CRNN Tensorflow

1.0k

Convolutional Recurrent Neural Networks(CRNN) for Scene Text Recognition

universal

chinese-ocrcrnn-tensorflowctc-loss+3

Updated 11d ago

bear63 / SceneReco

944

ctpn+crnn Scene character recognition

universal

Updated 4mo ago

zhang0jhon / AttentionOCR

839

Scene text recognition

universal

Updated 1mo ago

johnolafenwa / DeepStack

809

The World's Leading Cross Platform AI Engine for Edge Devices

universal

ai-enginecomputer-visiondeepstack+4

Updated 3h ago

Jyouhou / SceneTextPapers

789

Tracking the latest progress in Scene Text Detection and Recognition: Must-read papers well organized

zed

Updated 13d ago

baudm / Parseq

698

Scene Text Recognition with Permuted Autoregressive Sequence Models (ECCV 2022)

universal

computer-visioneccveccv2022+5

Updated 5d ago

Canjie-Luo / MORAN V2

649

MORAN: A Multi-Object Rectified Attention Network for Scene Text Recognition

universal

attention-mechanismimage-deformationimage-rectification+2

Updated 3d ago

HCIILAB / Scene Text Recognition

620

No description available

universal

Updated 2mo ago

Bartzi / See

576

Code for the AAAI 2018 publication "SEE: Towards Semi-Supervised End-to-End Scene Text Recognition"

universal

chainercnncomputer-vision+3

Updated 4mo ago

Media-Smart / Vedastr

535

A scene text recognition toolbox based on PyTorch

universal

ocrocr-recognitionpytorch+3

Updated 25d ago

FangShancheng / ABINet

460

Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition

universal

Updated 1mo ago

FudanVI / FudanOCR

431

A toolbox of scene text super-resolution and recognition

universal

Updated 4d ago

tangzhenyu / Scene Text Understanding

377

OCR, Scene-Text-Understanding, Text Recognition

universal

Updated 5d ago

HCIILAB / Scene Text Recognition Recommendations

353

Papers, Datasets, Algorithms, SOTA for STR. Long-time Maintaining

universal

aster-pytorchcrnn-pytorchdatasets+4

Updated 1mo ago

roatienza / Deep Text Recognition Benchmark

313

PyTorch code of my ICDAR 2021 paper Vision Transformer for Fast and Efficient Scene Text Recognition (ViTSTR)

universal

ocrstrvision-transformer+1

Updated 3mo ago

tzutalin / Android Object Detection

307

:coffee: Fast-RCNN and Scene Recognition using Caffe

universal

androidcaffedetection+1

Updated 2mo ago

dhvanikotak / Emotion Detection In Videos

299

The aim of this work is to recognize the six emotions (happiness, sadness, disgust, surprise, fear and anger) based on human facial expressions extracted from videos. To achieve this, we are considering people of different ethnicity, age and gender where each one of them reacts very different when they express their emotions. We collected a data set of 149 videos that included short videos from both, females and males, expressing each of the the emotions described before. The data set was built by students and each of them recorded a video expressing all the emotions with no directions or instructions at all. Some videos included more body parts than others. In other cases, videos have objects in the background an even different light setups. We wanted this to be as general as possible with no restrictions at all, so it could be a very good indicator of our main goal. The code detect_faces.py just detects faces from the video and we saved this video in the dimension 240x320. Using this algorithm creates shaky videos. Thus we then stabilized all videos. This can be done via a code or online free stabilizers are also available. After which we used the stabilized videos and ran it through code emotion_classification_videos_faces.py. in the code we developed a method to extract features based on histogram of dense optical flows (HOF) and we used a support vector machine (SVM) classifier to tackle the recognition problem. For each video at each frame we extracted optical flows. Optical flows measure the motion relative to an observer between two frames at each point of them. Therefore, at each point in the image you will have two values that describes the vector representing the motion between the two frames: the magnitude and the angle. In our case, since videos have a resolution of 240x320, each frame will have a feature descriptor of dimensions 240x320x2. So, the final video descriptor will have a dimension of #framesx240x320x2. In order to make a video comparable to other inputs (because inputs of different length will not be comparable with each other), we need to somehow find a way to summarize the video into a single descriptor. We achieve this by calculating a histogram of the optical flows. This is, separate the extracted flows into categories and count the number of flows for each category. In more details, we split the scene into a grid of s by s bins (10 in this case) in order to record the location of each feature, and then categorized the direction of the flow as one of the 8 different motion directions considered in this problem. After this, we count for each direction the number of flows occurring in each direction bin. Finally, we end up with an s by s by 8 bins descriptor per each frame. Now, the summarizing step for each video could be the average of the histograms in each grid (average pooling method) or we could just pick the maximum value of the histograms by grid throughout all the frames on a video (max pooling For the classification process, we used support vector machine (SVM) with a non linear kernel classifier, discussed in class, to recognize the new facial expressions. We also considered a Naïve Bayes classifier, but it is widely known that svm outperforms the last method in the computer vision field. A confusion matrix can be made to plot results better.