ProductTitleSummarizationCorpus
Dataset for CIKM 2018 paper "Multi-Source Pointer Network for Product Title Summarization"
Install / Use
/learn @FeiSun/ProductTitleSummarizationCorpusREADME
Product Title Summarization(PTS) Corpus
Dataset for CIKM 2018 paper "Multi-Source Pointer Network for Product Title Summarization"
Description
Each line in corpus.txt consists of a pair of titles (original title, short title), their brands, and commodity names. Each line is tab-delimited (two tabs) with the following format:
<original title>\t\t<short title>\t\t<brand>\t\t<commodity name>
File
-
corpus: the dataset used in the cikm 2018 paper, the length of short title < 11.
-
big_corpus: much larger dataset, the length of short title < 13.
We split the file into 5 files with prefix
big_corpus.tar.gz_due to the limitation on github.com (less than 100m).The way to reconstruct the big_corpus file:
cd big_corpus cat big_corpus.tar.gz_* > big_corpus.tar.gz tar zxvf big_corpus.tar.gz
Note:
brand may contain multi-language versions(separated using “/”) for some products, e.g., Nintendo/任天堂.
Citation
@inproceedings{Sun:CIKM2018,
author = {Fei Sun and Peng Jiang and Hanxiao Sun and Changhua Pei and Wenwu Ou and Xiaobo Wang},
title = {{Multi-Source Pointer Network for Product Title Summarization}},
booktitle = {CIKM},
year = 2018
}
Related Skills
node-connect
349.0kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
109.4kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
349.0kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
349.0kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
Security Score
Audited on May 6, 2025
