NewsArticleClustering
Proof of concept work for clustering of news articles from RSS feeds
Install / Use
/learn @MangoTheCat/NewsArticleClusteringREADME
NewsArticleClustering
The scripts in this repository represent a proof of concept in clustering news articles from RSS feeds.
Usage
The main tm_analysis.R script starts the analysis, and calls out to process_feeds.py to fetch the article feeds before performing the clustering analysis. Additional utility functions to manipulate the resulting JSON and parse in the correct metadata to the VCorpus object in tm are included in processing_utils.R
Dependencies:
Python:
- requests
- BeautifulSoup
- feedparser
R:
- jsonlite
- tm
- SnowballC
- proxy
- dendextend
Example Visualisations:
An example of the clusters formed from 475 articles published over a 4 day period is shown below where the leaf nodes are coloured according to their source, with blue corresponding to BBC News, green to The Guardian, and indigo to The Independent. The utility function plot_dend in processing_utils.R was used to make the figures.

Zooming in on a cluster of articles around Storm Desmond and the flooding in Cumbria in Dec 2015.

Related Skills
node-connect
344.4kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
99.2kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
344.4kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
344.4kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
