NetExtract
NetExtract: Efficiently extract core content from any webpage and convert it to clean, LLM-optimized Markdown with a simple API.
Install / Use
/learn @sabber-slt/NetExtractREADME

Features
- Core Content Extraction: Seamlessly extracts essential content from any URL.
- Markdown Conversion: Converts webpage content into clean, well-formatted Markdown.
- Social Media Scraping: Efficiently scrapes and formats X (Twitter) posts.
- Simple API Integration: Easily integrates with existing systems.
- LLM-Powered Conversion: Utilizes open-source large language models to enhance the extraction and conversion process, ensuring high-quality output.
📖 Usage
To use NetExtract, prepend the API endpoint to your desired URL:
http://{your_address}/api?url={url}
🗂️ Getting started with Docker
git clone https://github.com/sabber-slt/NetExtract
cd NetExtract
Then run the application with Docker:
docker compose up -d
⚡️ Acknowledgments
- Inspired by jina.ai
- Built with Node.js, Express.js, TypeScript, and Puppeteer
🧩 Structure
.
├── cookie
│ └── twitter.json # Twitter cookie for X (Twitter) post scraping
├── docs # Documentation files
├── search # Searxng engine
├── src # Source code
│ ├── interfaces # TypeScript interfaces
│ ├── lib # Utility libraries
│ ├── routes # Express route handlers
│ ├── services # Core service layer for business logic
│ ├── utils # Helper functions and utilities
│ └── app.ts # Main application entry point
├── .env # Environment variables
├── .gitignore # Git ignored files
├── .prettierignore # Prettier ignored files
├── .prettierrc.js # Prettier configuration
├── app.log # Log file
├── Dockerfile # Dockerfile
├── docker-compose.yaml # Docker Compose configuration
├── package.json # Node.js project metadata
├── README.md # Project README
├── tsconfig.json # TypeScript configuration
└── yarn.lock # Yarn lockfile for dependency management
🤝 Contributing
I welcome and appreciate contributions! If you'd like to contribute, please feel free to submit issues, fork the repository, and send pull requests.
Related Skills
gh-issues
343.1kFetch GitHub issues, spawn sub-agents to implement fixes and open PRs, then monitor and address PR review comments. Usage: /gh-issues [owner/repo] [--label bug] [--limit 5] [--milestone v1.0] [--assignee @me] [--fork user/repo] [--watch] [--interval 5] [--reviews-only] [--cron] [--dry-run] [--model glm-5] [--notify-channel -1002381931352]
oracle
343.1kBest practices for using the oracle CLI (prompt + file bundling, engines, sessions, and file attachment patterns).
tmux
343.1kRemote-control tmux sessions for interactive CLIs by sending keystrokes and scraping pane output.
xurl
343.1kA CLI tool for making authenticated requests to the X (Twitter) API. Use this skill when you need to post tweets, reply, quote, search, read posts, manage followers, send DMs, upload media, or interact with any X API v2 endpoint.
