26 skills found
peteanderson80 / Bottom Up AttentionBottom-up attention model for image captioning and VQA, based on Faster R-CNN and Visual Genome
mosessoh / CNN LSTM Caption GeneratorA Tensorflow implementation of CNN-LSTM image caption generator architecture that achieves close to state-of-the-art results on the MSCOCO dataset.
potterhsu / Easy Faster Rcnn.pytorchAn easy implementation of Faster R-CNN (https://arxiv.org/pdf/1506.01497.pdf) in PyTorch.
yalesong / PvsePolysemous Visual-Semantic Embedding for Cross-Modal Retrieval (CVPR 2019)
potterhsu / Easy Fpn.pytorchAn easy implementation of FPN (https://arxiv.org/pdf/1612.03144.pdf) in PyTorch.
sercant / Mobile SegmentationReal-time semantic image segmentation on mobile devices
RoyalSkye / Image CaptionUsing LSTM or Transformer to solve Image Captioning in Pytorch
zarzouram / Image Captioning With TransformersPytorch implementation of image captioning using transformer-based model.
Cheng-Lin-Li / SegCapsA Clone version from Original SegCaps source code with enhancements on MS COCO dataset.
peteanderson80 / Coco CaptionAdds SPICE metric to coco-caption evaluation server codes
brunobelloni / Binary To Coco Json ConverterConvert segmentation binary mask images to COCO JSON format.
Wentong-DST / Self CriticalPyTorch implementation of paper: "Self-critical Sequence Training for Image Captioning"
WuJie1010 / Fine Grained Image CaptioningThe pytorch implementation on “Fine-Grained Image Captioning with Global-Local Discriminative Objective”
ayansengupta17 / GANWe aim to generate realistic images from text descriptions using GAN architecture. The network that we have designed is used for image generation for two datasets: MSCOCO and CUBS.
Delphboy / Karpathy SplitsKarpathy Splits json files for image captioning
mrlooi / Convert To CocoScripts for converting various datasets to MSCOCO annotation (json) files
howardyclo / ImageNet2COCOA demo for mapping class labels from ImageNet to COCO.
CLT29 / Semantic NeighborhoodsPreserving Semantic Neighborhoods for Robust Cross-modal Retrieval [ECCV 2020]
ramanakshay / ClipCLIP & SigLIP model training from scratch
ellenzhuwang / ImplicitOODAn end-to-end vision and language model incorporating explicit knowledge graphs and OOD-detection.