Synthesizer
A multi-purpose LLM framework for RAG and data creation.
Install / Use
/learn @SciPhi-AI/SynthesizerREADME
Synthesizer[ΨΦ]: A multi-purpose LLM framework 💡
<p align="center"> <img width="716" alt="SciPhi Logo" src="https://github.com/emrgnt-cmplxty/sciphi/assets/68796651/195367d8-54fd-4281-ace0-87ea8523f982"> </p>With Synthesizer, users can:
- Custom Data Creation: Generate datasets via LLMs that are tailored to your needs.
- Anthropic, OpenAI, vLLM, and HuggingFace.
- Retrieval-Augmented Generation (RAG) on Demand: Built-in RAG Provider Interface to anchor generated data to real-world sources.
- Turnkey integration with Agent Search API.
- Custom Data Creation: Generate datasets via LLMs that are tailored to your needs, for LLM training, RAG, and more.
Fast Setup
pip install sciphi-synthesizer
Using Synthesizer
-
Generate synthetic question-answer pairs
export SCIPHI_API_KEY=MY_SCIPHI_API_KEY python -m synthesizer.scripts.data_augmenter run --dataset="wiki_qa"tail augmented_output/config_name_eq_answer_question__dataset_name_eq_wiki_qa.jsonl { "formatted_prompt": "... ### Question:\nwhat country did wine originate in\n\n### Input:\n1. URL: https://en.wikipedia.org/wiki/History%20of%20wine (Score: 0.85)\nTitle:History of wine....", { "completion": "Wine originated in the South Caucasus, which is now part of modern-day Armenia ..." -
Evaluate RAG pipeline performance
export SCIPHI_API_KEY=MY_SCIPHI_API_KEY python -m synthesizer.scripts.rag_harness --rag_provider="agent-search" --llm_provider_name="sciphi" --n_samples=25
Documentation
For more detailed information, tutorials, and API references, please visit the official Synthesizer Documentation.
Community & Support
Developing with Synthesizer
Quickly set up RAG augmented generation with your choice of provider, from OpenAI, Anhtropic, vLLM, and SciPhi:
# Requires SCIPHI_API_KEY in env
from synthesizer.core import LLMProviderName, RAGProviderName
from synthesizer.interface import LLMInterfaceManager, RAGInterfaceManager
from synthesizer.llm import GenerationConfig
# RAG Provider Settings
rag_interface = RAGInterfaceManager.get_interface_from_args(
RAGProviderName("agent-search"),
limit_hierarchical_url_results=rag_limit_hierarchical_url_results,
limit_final_pagerank_results=rag_limit_final_pagerank_results,
)
rag_context = rag_interface.get_rag_context(query)
# LLM Provider Settings
llm_interface = LLMInterfaceManager.get_interface_from_args(
LLMProviderName("openai"),
)
generation_config = GenerationConfig(
model_name=llm_model_name,
max_tokens_to_sample=llm_max_tokens_to_sample,
temperature=llm_temperature,
top_p=llm_top_p,
# other generation params here ...
)
formatted_prompt = raw_prompt.format(rag_context=rag_context)
completion = llm_interface.get_completion(formatted_prompt, generation_config)
Related Skills
YC-Killer
2.7kA library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.
flutter-tutor
Flutter Learning Tutor Guide You are a friendly computer science tutor specializing in Flutter development. Your role is to guide the student through learning Flutter step by step, not to provide d
groundhog
398Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).
last30days-skill
16.9kAI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary
