TelegramScraperDump
Index Message from your telegram account and find juicy content in it.
Install / Use
/learn @msterhuj/TelegramScraperDumpREADME
💽 Telegram Scraper Dump
Index Message from your telegram account and find juicy content in it.
My usecase
I'm in several leak channels and I needed to be able to quickly search for interesting files and download them
❓ How to use
🔧 Setup
- Clone this repo
- Install dependencies with
poetry install - Run
poetry run python indexer.pyfor init config.yaml file and follow the instructions
🗃️ Indexing
- Run
poetry run python indexer.pyagain to index your dialogs - Go to on your mongodb on collection
channelsand enable the channels you want to index - Run
poetry run python indexer.pyagain to index the selected channels (first run take a while depending on the number of messages and dialogs you have enabled)
Note
You can rerun the indexer at any time to update the index with new messages
🔍 Searching
- Go on your mongodb on collection
messagesand find type of content you want to download - Write a query to find this content and add it in the
mongodb_download_filterfield of the config.yaml file as yaml (see example on the config.yaml file) - Run
poetry run python downloader.pyto download the content
Note
You can rerun the downloader at any time to download only the new content
The telegram api is rate limited, so you download speed is limited by telegram
Results
<img src="img/sample.png" alt="result sample">Query
db.messages.find({type: 'messagemediadocument',mime_type: 'text/plain'}, {_id: 0, filename:1})
🐳 MongoDB With docker
You can use the docker-compose file to run a mongodb instance with web interface
docker-compose up
- MongoDB port:
27017 - MongoDB data directory:
./data - MongoExpress web port:
8081
🔩 Tools
- Poetry - Python dependency management
- MongodbClient - Mongodb GUI
