XzCrawler
No description available
Install / Use
/learn @Mech0n/XzCrawlerREADME
xzCrawler
Save a file for xianzhi. ** Highly recommended ArchiveBox **
How to Use?
.
├── LICENSE
├── README.md
├── crawler.py # use sqlite to incremental update
├── crawler_set.py # use python set to incremental update
├── db.py # sqlite module
├── main.py # Main script to download all the post
├── requirements.txt
└── xzCrawler-Flask # Flask Server (Independent part)
├── app.py
├── crawler_flask.py # the magically modified crawler to match flask server
├── db.py
├── index.py # index module
├── indexDoc.py # index all the post use index module
├── main.py
└── templates
├── form.html
└── search_result.html
Just Use Crawler
⚠️: Change the range in main.py first.
- pip install the requirments.
pip install -r requirements.txt - Run the script and all the htmls in
docpython main.py
Flask Extend
-
cd xzCrawler-Flask -
Run
main.pyuse the another crawler(crawler_sel.py) -
Index all html, result will be stored in
SearchIndexor the path you chose.python indexDoc.py -
Run Flask Server
python app.py
Features
-
Incremental update
- Python sqlite :
database.dbstores the requested past-url
- Python sqlite :
-
Index(Flask Extend)
A simple index script, Implemented by adding content sequentially, a strong script will coming soon😝.
WhooshJieba
-
Main site
A simple main site just provide a search bur.
- search
