Ruia
Async Python 3.6+ web scraping micro-framework based on asyncio
Install / Use
/learn @howie6879/RuiaREADME

Overview
Ruia is an async web scraping micro-framework, written with asyncio and aiohttp,
aims to make crawling url as convenient as possible.
Write less, run faster:
- Documentation: 中文文档 |documentation
- Organization: python-ruia
- Plugin: awesome-ruia(Any contributions you make are greatly appreciated!)
Features
- Easy: Declarative programming
- Fast: Powered by asyncio
- Extensible: Middlewares and plugins
- Powerful: JavaScript support
Installation
# For Linux & Mac
pip install -U ruia[uvloop]
# For Windows
pip install -U ruia
# New features
pip install git+https://github.com/howie6879/ruia
Tutorials
- Overview
- Installation
- Define Data Items
- Spider Control
- Request & Response
- Customize Middleware
- Write a Plugins
TODO
- [x] Cache for debug, to decreasing request limitation, ruia-cache
- [x] Provide an easy way to debug the script, ruia-shell
- [ ] Distributed crawling/scraping
Contribution
Ruia is still under developing, feel free to open issues and pull requests:
- Report or fix bugs
- Require or publish plugins
- Write or fix documentation
- Add test cases
<a href="https://github.com/howie6879"><img src="https://avatars.githubusercontent.com/u/17047388?s=60&v=4" title="howie6879" width="50" height="50"></a> <a href="https://github.com/panhaoyu"><img src="https://avatars.githubusercontent.com/u/23495987?s=60&v=4" title="panhaoyut" width="50" height="50"></a> <a href="https://github.com/mirzazulfan"><img src="https://avatars.githubusercontent.com/u/36124339?s=64&v=4" title="mirzazulfan" width="50" height="50"></a> <a href="https://github.com/abmyii"><img src="https://avatars.githubusercontent.com/u/52673001?s=60&v=4" title="abmyii" width="50" height="50"></a> <a href="https://github.com/maxzheng"><img src="https://avatars.githubusercontent.com/u/9684260?s=60&v=4" title="maxzheng" width="50" height="50"></a> <a href="https://github.com/ruter"><img src="https://avatars.githubusercontent.com/u/8568876?s=60&v=4" title="ruter" width="50" height="50"></a> <a href="https://github.com/duolaAOA"><img src="https://avatars.githubusercontent.com/u/26339233?s=60&v=4" title="duolaAOA" width="50" height="50"></a> <a href="https://github.com/fengdongfa1995"><img src="https://avatars.githubusercontent.com/u/20141092?s=60&v=4" title="fengdongfa1995" width="50" height="50"></a> <a href="https://github.com/daijiangtian"><img src="https://avatars.githubusercontent.com/u/18069191?s=60&v=4" title="daijiangtian" width="50" height="50"></a> <a href="https://github.com/scott-stoltzman-consulting"><img src="https://avatars.githubusercontent.com/u/66376167?s=60&v=4" title="consulting" width="50" height="50"></a> <a href="https://github.com/Leezj9671"><img src="https://avatars.githubusercontent.com/u/11917826?s=60&v=4" title="Leezj9671" width="50" height="50"></a>
!!!Notice: We use black to format the code.
