Wayback2csv
transform a datapoint from a website into a CSV time-series dataset using the wayback machine
Install / Use
/learn @jeremybmerrill/Wayback2csvREADME
wayback2csv
a module to get a set of URLs from the Internet Archive and parse out a specific datapoint to a CSV with the date.
like for getting follower counts historically from the internet archive.
e.g.
from wayback2csv.wayback2csv import Wayback2Csv
from sys import argv
username = argv[1]
w2c = Wayback2Csv("wayback2csv wayback2csv@example.com", f"twitter.com/{username}", from_date="2022")
w2c.download()
w2c.parse_html(".ProfileNav-item--followers .ProfileNav-value", lambda x: x.get("data-count") if x.get("data-count") else "None")
w2c.to_csv(f"data/{username}_twitter_followers_over_time.csv", [username])
if you need to parse a file multiple ways (as with Twitter), just call parse_html multiple times. As long as each method fails when it's inappropriate, it'll be skipped.
from wayback2csv.wayback2csv import Wayback2Csv
from sys import argv
username = argv[1]
w2c = Wayback2Csv("wayback2csv wayback2csv@example.com", f"twitter.com/{username}", from_date="2022")
w2c.download()
w2c.parse_html(".ProfileNav-item--followers .ProfileNav-value", lambda x: x.get("data-count") if x.get("data-count") else "None")
w2c.parse_html("script[data-rh=true]", lambda x:[i for i in json.loads(x.text)["author"]["interactionStatistic"] if i["name"] == "Follows"][0]['userInteractionCount']) # just call it twice!
w2c.to_csv(f"data/{username}_twitter_followers_over_time.csv", [username])
Install instructions
- git clone the repo and cd into the dir
pip install -e .- go to town!
Related Skills
node-connect
342.5kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
85.3kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
342.5kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
342.5kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
