DeepEmbeding
图像检索和向量搜索,similarity learning,compare deep metric and deep-hashing applying in image retrieval
Install / Use
/learn @hudengjunai/DeepEmbedingREADME
Deep Embedding Learning for Image Retrieval
Deep Embedding Introduction
DeepEmbedding 是使用深度学习的方法把多种媒体映射嵌入到相同向量空间,在统一空间中进行搜索的技术。
本项目通过视觉级别搜索,细粒度类别(实例检索)和图像-文本互搜的方式来测试通用多媒体检索。
图像检索问题的一般解法
DeepEmbedding旨在使用深度度量学习(DeepMetric)或者深度哈希(DeepHash)的方法学习关系保持的空间映射函数,能将视觉空间映射到低维嵌入空间,使用向量搜索引擎进行搜索.第一个问题为特征提取问题,即为本实验研究的问题,第二个问题为特征搜索问题.第二个问题解决方案可参见ANNS(近似邻近搜索)NNSearchService 。
Note
- 本项目实现了基于Multi-npair-loss的度量学习应用于检索和基于Sampling margin loss的方法进行检索
- 具体复现参见论文Triplet lossFaceNet:
N-pair loss
Margin lossSampling Matters in Deep Emebedding Learning
BatchHard
实验结果
(可点击下载百度网盘链接,查看图片)
- 在StanfordOnlineProduct训练,计算NMI聚类指标 nmi=0.866,对验证集的向量嵌入进行T-SNE降维后可以看出,降维图像约 43M,百度网盘链接在下:
- margin_based loss :DeepFashionhttps://pan.baidu.com/s/1zLZX24qBb_Op1vsry4LX6w
计算指标:nmi=0.866 - Mc-n-pair loss:StanfordOnlineProducthttps://pan.baidu.com/s/12eNTVsRFu--SYMW8P8HPfQ
计算指标:nmi=0.830
关于本项目的使用
1.下载相应的数据集 2.采用不同的loss类型对模型进行训练 run train cub200 model
nohup python train_mx_ebay_margin.py --gpus=1 --batch-k=5 --use_viz --epochs=30 --use_pretrained --steps=12,16,20,24 --name=CUB_200_2011 --save-model-prefix=cub200 > mycub200.out 2>&1 &
run train stanford_online_product
nohup python train_mx_ebay_margin.py --batch-k=2 --batch-size=80 --use_pretrained --use_viz --gpus=0 --name=Inclass_ebay --data=EbayInClass --save-model-prefix=ebayinclass > mytraininclass_ebay.log 2>&1
3.后续工作:
- 测试R-MAC NetVLAD等网络在视觉检索中的效果和评测
- 测试使用GAN的方法增强检索效果
__Deep Adversarial Metric Learning__
Deep Metric learning cannot get full used the easy negative examples,to in [Deep Adversarial Metirc Learning](http://openaccess.thecvf.com/content_cvpr_2018/papers/Duan_Deep_Adversarial_Metric_CVPR_2018_paper.pdf) proposal a new framework called DAML
__DeepMetric and Deep Hashing Apply__
apply method to Fashion,vehicle and Person-ReID domain
__Construct a datasets__ crawl application-domain data
Dataset
CUB200_2011: A small part of ImageNet
LFW:face dataset
StanfordOnlineProducts: a lot of types of products(furniture,bicycle,cups)
Street2Shop:products data set from ebay
DeepFashion:all are colthes
图像检索的应用
- Face indentification: deep metric learning for face cluster,from FaceNet to SphereFace
- Person ReIdentification:deep metric learning for Pedestrian Re-ID,from MARS to NPSM&SVDNet
- Vehicle Search:deep metric learning for fake-licensed car or Vehicle retrieval.
- Street2Products:search fashion clothe from street photos or in-shop photos,namely visual-search.from DeepRanking to DAML
Deep Metric Learning mile-stone paper:
1.DrLIM:Dimensionality Reduction by Learning an Invariant Mapping
2.DeepRanking:Learning fine-graied Image Similarity with DeepRanking
3.DeepID2:Deep Learning Face Representation by Joint Identification-Verification
4.FaceNet:FaceNet: A Unified Embedding for Face Recognition and Clustering
5.Defense:In Defense of the Triplet Loss for Person Re-Identification
6.N-pair:Improved Deep Metric Learning with Multi-class N-pair Loss Objective
7.Sampling:Sampling Matters in Deep Embedding Learning
8.DAML:Deep Adversarial Metric Learning
9.SphereFace:Deep Hypersphere Embedding for Face Recognition
部分DeepHash的工作
DeepHash能将图片直接哈希到汉明码,使用faiss的IVF Binary 系列搜索加速,通过存储量大大减少。
ReImplementation of HashNet
python train_hash.py --params
Deep Hash Learning mile-stone paper:
1.CNNH:Supervised Hashing for Image Retrieval via Image Representation Learning
2.DNNH:Simultaneous feature learning and hash coding with deep neural networks
3.DLBHC:Deep Learning of Binary Hash Codes for Fast Image Retrieval
4.DSH:Deep Supervised Hashing for Fast Image Retrieval
5.SUBIC:SuBiC: A Supervised, Structured Binary Code for Image Search
6.HashNet:HashNet:Deep Learning to Hash by Continuous
7.DCH:Deep Cauchy Hashing for Hamming Space Retrieval
视觉和文本共同嵌入 Visual-semantic-align embedding(cross modal retrieval)
1.VSE++: Improving Visual-Semantic Embeddings with Hard Negatives
2.Dual-Path Convolutional Image-Text Embedding with Instance Loss
python train_vse.py --params
其他类型的搜索方式
1.Sketch based Deep Sketch Hashing: Fast Free-hand Sketch-Based Image Retrieval
2.Text cross modal based Deep Cross-Modal Hashing
近似近邻搜索加速
ANNS (Approximation Nearest Neighbor Search) to search a query vector in gallery database. 测试数据集
- SIFT1M typical 128-dim sift vector
- DEEP1B,proposed by yandex.inc,this is a deep descriptor
- GIST1M typical 512-dim gist vector
papers
-- PQ based
1.将传统的标量量化,转成分段乘积量化Product Quantization for Nearest Neighbor Search
2.类似于Cartisian Quantization,将向量整体进行旋转,使得聚类的分段坐标轴和向量对齐,聚类中心点和数据之间的重建误差小,压缩损失就小Optimized Product Quantization
3.Revisiting the Inverted Indices for Billion-Scale Approximate Nearest Neighbors提出使用anchor-point也即line-quantizition点来切分区域group,在搜索是pruning部分区域加速。和RobustiQ这篇论文撞衫。该论文中提出在一层中建立HNSW加速candidates center-point搜索选取。
4.粗量化使用双倒排,可以降低聚类维度,增加聚类中心点,使用Multi-Sequence算法提高粗略命中速度The Inverted MultiIndex
5.多义码,将汉明码距离和量化中心点的距离建立映射关系,对entry-point有过滤作用 Polysemous codes
6.Additive Quantization 两篇论文:Additive Quantization for Extreme Vector Compression.使用稀疏加量化,量化误差更小,但需要额外存储与计算。
7.Composite Quantization.和上述Additive Quantizition 撞衫,额外提出NOCQ作为APQ替代,加速计算。
8.RobustiQ:A Robust ANN Search Method for Billion-scale Similarity Search on GPUs. 和第3篇论文很像,在传统IVF上添加了Line Quantization,也叫分组剪枝点,提升PQ搜索指标。
-- 索引分层:
1.Zoom:Multi-View Vector Search for Optimizing Accuracy, Latency and Memory 使用全量数据k-means聚类,大量百万级中心点建立第一层导航图HNSW来代替以前的IVF,第二层使用量化加速计算排序,不限于PQ或者APQ,第三层全精度计算排序。这种典型的三层索引方式可以使得每层能采用组件替换的方式来优化索引。
2.Pyramid:A General Framwork for Distributed Similarity Search.香港中文大学郑尚策实验室的,对比了naive-HNSW分图和metaHNSW-subHNSW两种分图方式对搜索效果影响。使用两层HNSW进行搜索。使用meta-HNSW的上层图解决了跨机器搜索的问题。该方法相对于简易切分naive-HNSW建图增加了meta-HNSW建图,全量数据点查询,分块构建三部分时间,建图比较慢。但搜索过程中meta-HNSW起到了哈希的方法,比naive-HNSW全集群请求再合并的吞吐量大不少。
3.索引分层和图优化,量化优化结合在一起,可以进行索引改进设计。
-- 子集索引
1.Reconfigurable Inverted Index:旨在解决搜索过程中既有向量近似需求,也有标签过滤需求的搜索场景。文中提出了subset search 问题,并给出了两种解决方案。虽然已经相当巧妙,但未看出来如何解决十亿级别数据如何解决。
-- Graph Based
1.可参见NSW 浏览小世界 论文,结合跳表结构构造的索引Approximate nearest neighbor algorithm based on navigable small world graphs
2.Efficient and robust approximate nearest neighbor search using hierarchical Navigable Small World graphs
7.EFANNA : An Extremely Fast Approximate Nearest Neighbor Search Algorithm Based on kNN Graph
8.A Revisit on Deep Hashings for Large-scale Content Based Image Retrieval
9.RobustiQ A Robust ANN Search Method for Billion-scale Similarity Search on GPUs
10.GGNN: Graph-based GPU Nearest Neighbor Search
11.Zoom: Multi-View Vector Search for Optimizing Accuracy, Latency and Memory
12. [Vector and Line Quantization for Billion-scale Similarity Search on GPUs](https://arxiv.o
Related Skills
YC-Killer
2.7kA library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.
groundhog
399Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).
last30days-skill
18.8kAI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary
sec-edgar-agentkit
10AI agent toolkit for accessing and analyzing SEC EDGAR filing data. Build intelligent agents with LangChain, MCP-use, Gradio, Dify, and smolagents to analyze financial statements, insider trading, and company filings.
