Wayback2csv

transform a datapoint from a website into a CSV time-series dataset using the wayback machine

Generate Convert Improve

Install / Use

/learn @jeremybmerrill/Wayback2csv

About this skill

Quality Score

0/100

README

wayback2csv

a module to get a set of URLs from the Internet Archive and parse out a specific datapoint to a CSV with the date.

like for getting follower counts historically from the internet archive.

e.g.

from wayback2csv.wayback2csv import Wayback2Csv
from sys import argv

username = argv[1]

w2c = Wayback2Csv("wayback2csv wayback2csv@example.com", f"twitter.com/{username}", from_date="2022")
w2c.download()
w2c.parse_html(".ProfileNav-item--followers .ProfileNav-value", lambda x: x.get("data-count") if x.get("data-count") else "None")
w2c.to_csv(f"data/{username}_twitter_followers_over_time.csv", [username])

if you need to parse a file multiple ways (as with Twitter), just call parse_html multiple times. As long as each method fails when it's inappropriate, it'll be skipped.

from wayback2csv.wayback2csv import Wayback2Csv
from sys import argv

username = argv[1]

w2c = Wayback2Csv("wayback2csv wayback2csv@example.com", f"twitter.com/{username}", from_date="2022")
w2c.download()
w2c.parse_html(".ProfileNav-item--followers .ProfileNav-value", lambda x: x.get("data-count") if x.get("data-count") else "None")
w2c.parse_html("script[data-rh=true]", lambda x:[i for i in json.loads(x.text)["author"]["interactionStatistic"] if i["name"] == "Follows"][0]['userInteractionCount']) # just call it twice!
w2c.to_csv(f"data/{username}_twitter_followers_over_time.csv", [username])

Install instructions

git clone the repo and cd into the dir
pip install -e .
go to town!

Related Skills

node-connect

342.5k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

85.3k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

342.5k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

342.5k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。