Lexrankr
LexRank for Korean.
Install / Use
/learn @theeluwin/LexrankrREADME
lexrankr
Clustering based multi-document selective text summarization using LexRank algorithm.
This repository is a source code for the paper 설진석, 이상구. "lexrankr: LexRank 기반 한국어 다중 문서 요약." 한국정보과학회 학술발표논문집 (2016): 458-460.
- Mostly designed for Korean, but not limited to.
- Click here to see how to install KoNLPy properly.
- Check out textrankr, which is a simpler summarizer using TextRank.
Installation
pip install lexrankr
Tokenizers
Tokenizers are not included. You have to implement one by yourself.
Example:
from typing import List
class MyTokenizer:
def __call__(self, text: str) -> List[str]:
tokens: List[str] = text.split()
return tokens
한국어의 경우 KoNLPy를 사용하는 방법이 있습니다.
from typing import List
from konlpy.tag import Okt
class OktTokenizer:
okt: Okt = Okt()
def __call__(self, text: str) -> List[str]:
tokens: List[str] = self.okt.pos(text, norm=True, stem=True, join=True)
return tokens
Usage
from typing import List
from lexrankr import LexRank
# 1. init
mytokenizer: MyTokenizer = MyTokenizer()
lexrank: LexRank = LexRank(mytokenizer)
# 2. summarize (like, pre-computation)
lexrank.summarize(your_text_here)
# 3. probe (like, query-time)
summaries: List[str] = lexrank.probe()
for summary in summaries:
print(summary)
Test
Use docker.
docker build -t lexrankr -f Dockerfile .
docker run --rm -it lexrankr
Related Skills
node-connect
347.6kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
108.4kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
347.6kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
347.6kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
