RecAlgorithm
主流推荐系统Rank算法的实现
Install / Use
/learn @tangxyw/RecAlgorithmREADME
主流推荐系统Rank算法的实现
项目简介
- 实现推荐系统中主要使用的Rank算法,并使用公开数据集评测,所有算法均已跑通并完成完整的训练,最终生成
saved_model和checkpoint供tf-serving部署;
- 使用微信视频号推荐算法比赛数据集,数据详情请见 ./dataset/README.md;
- 为了贴合工业界使用情况,使用
TensorFlow Estimator框架,数据format为Tfrecord; - 算法实现在
./algrithm下,每个算法单独一个文件夹,名字为普遍接受的大写算法名称,训练入口为文件夹下对应的小写算法名称py文件,如DIN文件夹下的din.py文件为训练DIN模型的入口,具体请见末尾的示例部分; - 每个算法都实现了自己的
model_fn,没有使用Keras高阶API,只使用TensorFlow的中低阶API构造静态图; - 算法超参数可由
--parameter_name=parameter_value方式传入训练入口脚本,超参数定义请见训练入口脚本tf.app.flags部分; - 单任务模型使用数据集因变量中的
read_comemnt评测,多任务模型使用read_commetlikeclick_avatar三个任务评测;
单任务Models列表
| Model | Paper | *Best_read_comment_Auc | |:------------:|:----------------------------------------------------------------------------------------------------------------------------------------------------------:|:----------------------:| | FFM | [2016] Field-aware Factorization Machines for CTR Prediction | 0.8911285 | | DeepCrossing | [2016] Deep Crossing - Web-Scale Modeling without Manually Crafted Combinatorial Features | 0.9185908 | | PNN | [2016] Product-based neural networks for user response prediction | 0.9065931 | | Wide & Deep | [2016] Wide & Deep Learning for Recommender Systems | 0.9133482 | | DeepFM | [2017] DeepFM: A Factorization-Machine based Neural Network for CTR Prediction | 0.8529998 | | DCN | [2017] Deep & Cross Network for Ad Click Predictions | 0.9183242 | | AFM | [2017] Attentional Factorization Machines: Learning the Weight of Feature Interactions via Attention Networks | 0.9117872 | | xDeepFM | [2018] xDeepFM: Combining Explicit and Implicit Feature Interactions for Recommender Systems | 0.9152467 | | FwFM | [2018] Field-weighted Factorization Machines for Click-Through Rate Prediction in Display Advertising | 0.9118794 | | DIN | [2018] Deep Interest Network for Click-Through Rate Prediction | 0.9116896 | | DIEN | [2018] Deep Interest Evolution Network for Click-Through Rate Prediction | - | | FiBiNet | [2019] FiBiNET: Combining Feature Importance and Bilinear feature Interaction for Click-Through Rate Prediction | 0.9149044 | | BST | [2019] Behavior sequence transformer for e-commerce recommendation in Alibaba | 0.9165866 |
*Best_read_comment_Auc为每个model各自调参后的测试集最大Auc,每个model各自的评测见每个model路径下的result.md。 </br>
*DIEN不适用于微信视频号数据集,故只实现了静态图,并没有评测。
多任务Models列表
| Model | Paper | *Best_read_commet_AUC | *Best_like_AUC | *Best_click_avatar_AUC | |:-----:|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:---------------------:|:--------------:|:----------------------:| | ESMM | [2018] Entire Space Multi-Task Model: An Effective Approach for Estimating Post-Click Conversion Rate | - | - | - | | MMOE | [2018] Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts | 0.91860557 | 0.8126400 | 0.8139362 | | PLE | [2020] Progressive Layered Extraction (PLE): A Novel Multi-Task Learning (MTL) Model for Personalized Recommendations | 0.91965175 | 0.8136461 | 0.8154559 |
*Best_xx_AUC为所有超参数组合中的最高值,横向的三个AUC可能不在同一组超参数中。</br> *由于ESMM的结构特殊性,不适用于微信视频号数据集,故只实现了静态图,并没有评测。
示例
# 先执行以下命令确保生成了tfrecord
# cd ./dataset/wechat_algo_data1
# python DataGenerator.py && cd ..
cd ./DIN
# 训练时可自定义参数
python din.py --use_softmax=True
To Do List
- 增加多任务学习Trick: Uncertainty, GradNorm, PCGrad, etc.
- 增加AutoInt, FLEN, etc.
- 重构特征工程部分, 包括配置化输入等, 参考https://github.com/Shicoder/Deep_Rec
