PyPlexitas
PyPlexitas is an open-source Python CLI alternative to Perplexity AI, designed to perform web searches, scrape content, generate embeddings, and answer questions using state-of-the-art language models.
Install / Use
/learn @dkruyt/PyPlexitasREADME
🌟 PyPlexitas
PyPlexitas is a Python script that is designed to create an open-source alternative to Perplexity AI, a tool that provides users with detailed answers to their queries by searching the web, extracting relevant content, and using advanced language models to generate responses.
The script operates by first taking a user’s query and using search engines like Bing or Google and GMAIL to find relevant web pages or e-mails. It then scrapes the content from these web pages or mails, processes the text into manageable chunks, and generates vector embeddings for these chunks. Vector embeddings are mathematical representations of text that allow for efficient searching and comparison of content. These embeddings are stored in a database, enabling quick retrieval of relevant information based on the user’s query.
Once the content is processed and stored, the script uses a language model to generate a detailed answer to the user’s query, using the information extracted from the web pages. This response is designed to be accurate and informative, drawing directly from the content found during the search process.
Example:
python PyPlexitas.py -q "When will the next model GPT-5 be released" -s 10 --engine google
Expected Output:
Searching for 🔎: When will the next model GPT-5 be released using google
Starting Google search ⏳
Google search returned 🔗: 10 results
From domains 🌐: mashable.com www.reddit.com www.tomsguide.com www.datacamp.com medium.com www.standard.co.uk www.theverge.com arstechnica.com
Scraping content from search results...
Embedding content ✨
Total embeddings 📊: 10
Total chunks processed 🧩: 7
Answering your query: When will the next model GPT-5 be released 🙋
The release date for GPT-5 is currently expected to be sometime in mid-2024, likely during the summer, according to a report from Business Insider [1][2]. OpenAI representatives have not provided a specific release date, and the timeline may be subject to change depending on the duration of safety testing and other factors [1][2]. OpenAI CEO Sam Altman has indicated that a major AI model will be released this year, but it is unclear whether it will be called GPT-5 or something else [1].
### Sources
1. Benj Edwards - https://arstechnica.com/information-technology/2024/03/gpt-5-might-arrive-this-summer-as-a-materially-better-update-to-chatgpt/
2. Saqib Shah - https://www.standard.co.uk/tech/openai-chatgpt-5-release-date-b1076129.html
Table of Contents
Features
- Web Search: Perform web searches using Bing, Google, or Gmail APIs.
- Content Scraping: Scrape content from search results.
- Embedding Generation: Generate embeddings for content using OpenAI or Ollama models.
- Question Answering: Answer questions based on the scraped content.
- Vector Database: Use Qdrant for storing and querying embeddings.
Installation
-
Clone the repository:
git clone https://github.com/dkruyt/PyPlexitas.git cd PyPlexitas -
Install the required Python packages:
pip install -r requirements.txt -
Set up the Qdrant service using Docker:
docker-compose up -d
Configuration
Configure your environment variables by creating a .env file in the project root. Use the provided example.env as a template:
cp example.env .env
Fill in your API keys and other necessary details in the .env file:
OPENAI_API_KEYGOOGLE_API_KEYGOOGLE_CXBING_SUBSCRIPTION_KEY
Getting API Keys
OpenAI API Key
- Sign up or log in to your OpenAI account.
- Go to the API section and generate a new API key.
- Copy the API key and add it to the
OPENAI_API_KEYfield in your.envfile.
Google Custom Search API Key and CX
- Go to the Google Cloud Console.
- Create a new project or select an existing project.
- Enable the Custom Search API in the API & Services library.
- Go to the Credentials page and create an API key.
- Copy the API key and add it to the
GOOGLE_API_KEYfield in your.envfile. - To get the Custom Search Engine (CX) ID, go to the Custom Search Engine page.
- Create a new search engine or select an existing one.
- Copy the Search Engine ID (CX) and add it to the
GOOGLE_CXfield in your.envfile.
Bing Search API Key
- Sign up or log in to your Microsoft Azure account.
- Create a new Azure resource for Bing Search v7.
- Go to the Keys and Endpoint section to find your API key.
- Copy the API key and add it to the
BING_SUBSCRIPTION_KEYfield in your.envfile.
Gmail API Key
- Go to the Google Cloud Console.
- Create a new project or select an existing project.
- Enable the Gmail API in the API & Services library.
- Go to the Credentials page and create OAuth 2.0 credentials.
- Download the credentials file and save it as
credentials.jsonin the project root.
Usage
Run the PyPlexitas script with your query:
python PyPlexitas.py -q "Your search query" -s 10 --engine bing
Options:
-q, --query: Search query (required)--llm-query: Optional LLM query for answering (defaults to the search query)-s, --search: Number of search results to parse (default: 10)--engine: Search engine to use (bing,google, orgmail, default:bing)-l, --log-level: Set the logging level (DEBUG,INFO,WARNING,ERROR,CRITICAL, default:ERROR)-t, --max-tokens: Maximum token limit for model input (default: 16000)--quiet: Suppress print messages
Project Structure
PyPlexitas.py: Main script for running the application.example.env: Example configuration file for environment variables.docker-compose.yml: Docker Compose configuration for Qdrant.requirements.txt: List of required Python packages.README.md: Project documentation.
Contributing
Contributions are welcome! Please fork the repository and submit a pull request for any improvements or bug fixes.
License
This project is licensed under the GPL 3.0 License. See the LICENSE file for details.
