Wikifier
Pytorch implementation of a BiLSTM model for the Wikification project.
Install / Use
/learn @LeonardoEmili/WikifierREADME
Wikifier
Wikification is the process of labeling input sentences into concepts from Wikipedia. The repository contains a major script for scraping text from Wikipedia dumps and parsing it into a dataset, the model for annotating sentences and an asynchronous web scraper for generating the dataset dynamically starting from a Wikipedia page used as seed.
Prerequisites
You can install the required dependencies using the Python package manager (pip):
pip3 install aiohttp
pip3 install cchardet
pip3 install aiodns
pip3 install wikipedia
pip3 install requests
Getting Started
First, we need to get the data. Wikiparser is a web scraper that loads dumps from XML files and stores the dataset as a collection of compressed files. You can run the script using the following syntax:
python3 WikiParser.py [OPTION]... URL... [-n NUM]
python3 WikiParser.py [OPTION]... [-n NUM]
python3 WikiParser.py [OPTION]... URL...
Built With
- AIOHTTP - Asynchronous HTTP Client used
- Beautiful Soup - Library for parsing HTML
- mwparserfromhell - A parser for MediaWiki wikicode
- wikipedia - A wrapper for the MediaWiki API
Authors
- Leonardo Emili - LeonardoEmili
Related Skills
YC-Killer
2.7kA library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.
groundhog
399Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).
last30days-skill
18.8kAI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary
sec-edgar-agentkit
10AI agent toolkit for accessing and analyzing SEC EDGAR filing data. Build intelligent agents with LangChain, MCP-use, Gradio, Dify, and smolagents to analyze financial statements, insider trading, and company filings.
