xzCrawler

Save a file for xianzhi. ** Highly recommended ArchiveBox **

How to Use?

.
├── LICENSE
├── README.md
├── crawler.py          # use sqlite to incremental update
├── crawler_set.py      # use python set to incremental update
├── db.py               # sqlite module
├── main.py             # Main script to download all the post
├── requirements.txt
└── xzCrawler-Flask         # Flask Server (Independent part)
    ├── app.py
    ├── crawler_flask.py    # the magically modified crawler to match flask server
    ├── db.py
    ├── index.py            # index module
    ├── indexDoc.py         # index all the post use index module
    ├── main.py
    └── templates
        ├── form.html
        └── search_result.html

Just Use Crawler

⚠️: Change the range in main.py first.

pip install the requirments.
```
pip install -r requirements.txt
```
Run the script and all the htmls in doc
```
python main.py
```

Flask Extend

cd xzCrawler-Flask
Run main.py use the another crawler(crawler_sel.py)
Index all html, result will be stored in SearchIndex or the path you chose.
```
python indexDoc.py
```
Run Flask Server
```
python app.py
```

Features

Incremental update
- Python sqlite : database.db stores the requested past-url
Index(Flask Extend)

A simple index script, Implemented by adding content sequentially, a strong script will coming soon😝.
- Whoosh
- Jieba
Main site

A simple main site just provide a search bur.
- search

XzCrawler

Install / Use

README

xzCrawler

How to Use?

Just Use Crawler

Flask Extend

Features