SkillAgentSearch skills...

Cognita

RAG (Retrieval Augmented Generation) Framework for building modular, open source applications for production by TrueFoundry

Install / Use

/learn @truefoundry/Cognita

README

[!Note] This project is no longer actively maintained. We thank all contributors for their contributions.

Cognita

RAG_TF

Why use Cognita?

Langchain/LlamaIndex provide easy to use abstractions that can be used for quick experimentation and prototyping on jupyter notebooks. But, when things move to production, there are constraints like the components should be modular, easily scalable and extendable. This is where Cognita comes in action. Cognita uses Langchain/Llamaindex under the hood and provides an organisation to your codebase, where each of the RAG component is modular, API driven and easily extendible. Cognita can be used easily in a local setup, at the same time, offers you a production ready environment along with no-code UI support. Cognita also supports incremental indexing by default.

You can try out Cognita at: https://cognita.truefoundry.com

RAG_TF

🎉 What's new in Cognita

  • [September, 2024] Cognita now has AudioParser (https://github.com/fedirz/faster-whisper-server) and VideoParser (AudioParser + MultimodalParser).
  • [August, 2024] Cognita has now moved to using pydantic v2.
  • [July, 2024] Introducing model gateway a single file to manage all the models and their configurations.
  • [June, 2024] Cognita now supports it's own Metadatastore, powered by Prisma and Postgress. You can now use Cognita via UI completely without the need of local.metadata.yaml file. You can create collections, data sources, and index them via UI. This makes it easier to use Cognita without any code changes.
  • [June, 2024] Added one click local deployment of cognita. You can now run the entire cognita system using docker-compose. This makes it easier to test and develop locally.
  • [May, 2024] Added support for Embedding and Reranking using Infninty Server. You can now use hosted services for variatey embeddings and reranking services available on huggingface. This reduces the burden on the main cognita system and makes it more scalable.
  • [May, 2024] Cleaned up requirements for optional package installations for vector dbs, parsers, embedders, and rerankers.
  • [May, 2024] Conditional docker builds with arguments for optional package installations
  • [April, 2024] Support for multi-modal vision parser using GPT-4

Contents

Introduction

Cognita is an open-source framework to organize your RAG codebase along with a frontend to play around with different RAG customizations. It provides a simple way to organize your codebase so that it becomes easy to test it locally while also being able to deploy it in a production ready environment. The key issues that arise while productionizing RAG system from a Jupyter Notebook are:

  1. Chunking and Embedding Job: The chunking and embedding code usually needs to be abstracted out and deployed as a job. Sometimes the job will need to run on a schedule or be triggered via an event to keep the data updated.
  2. Query Service: The code that generates the answer from the query needs to be wrapped up in a api server like FastAPI and should be deployed as a service. This service should be able to handle multiple queries at the same time and also autoscale with higher traffic.
  3. LLM / Embedding Model Deployment: Often times, if we are using open-source models, we load the model in the Jupyter notebook. This will need to be hosted as a separate service in production and model will need to be called as an API.
  4. Vector DB deployment: Most testing happens on vector DBs in memory or on disk. However, in production, the DBs need to be deployed in a more scalable and reliable way.

Cognita makes it really easy to customize and experiment everything about a RAG system and still be able to deploy it in a good way. It also ships with a UI that makes it easier to try out different RAG configurations and see the results in real time. You can use it locally or with/without using any Truefoundry components. However, using Truefoundry components makes it easier to test different models and deploy the system in a scalable way. Cognita allows you to host multiple RAG systems using one app.

Advantages of using Cognita are:

  1. A central reusable repository of parsers, loaders, embedders and retrievers.
  2. Ability for non-technical users to play with UI - Upload documents and perform QnA using modules built by the development team.
  3. Fully API driven - which allows integration with other systems.

    If you use Cognita with Truefoundry AI Gateway, you can get logging, metrics and feedback mechanism for your user queries.

Features:

  1. Support for multiple document retrievers that use Similarity Search, Query Decompostion, Document Reranking, etc
  2. Support for SOTA OpenSource embeddings and reranking from mixedbread-ai
  3. Support for using LLMs using ollama
  4. Support for incremental indexing that ingests entire documents in batches (reduces compute burden), keeps track of already indexed documents and prevents re-indexing of those docs.

:rocket: Quickstart: Running Cognita Locally

:whale: Using Docker compose (recommended - version 25+)

Cognita and all of its services can be run using docker-compose. This is the recommended way to run Cognita locally. Install Docker and docker-compose for your system from: Docker Compose

Configuring Model Providers

Before starting the services, we need to configure model providers that we would need for embedding and generating answers.

To start, copy models_config.sample.yaml to models_config.yaml

cp models_config.sample.yaml models_config.yaml

By default, the config has local providers enabled that need infinity and ollama server to run embedding and LLMs locally. However, if you have a OpenAI API Key, you can uncomment the openai provider in models_config.yaml and update OPENAI_API_KEY in compose.env

Now, you can run the following command to start the services:

docker-compose --env-file compose.env up
  • The compose file uses compose.env file for environment variables. You can modify it as per your needs.
  • The compose file will start the following services:
    • cognita-db - Postgres instance used to store metadata for collections and data sources.
    • qdrant-server - Used to start local vector db server.
    • cognita-backend - Used to start the FastAPI backend server for Cognita.
    • cognita-frontend - Used to start the frontend for Cognita.
  • Once the services are up, you can access the qdrant server at http://localhost:6333, the backend at http://localhost:8000 and frontend at http://localhost:5001.

To start additional services such as ollama and infinity-server you can run the following command:

docker-compose --env-file compose.env --profile ollama --profile infinity up
  • This will start additional servers for ollama and infinity-server which can be used for LLM, Embeddings and reranking respectively. You can access the infinity-server at http://localhost:7997.

  • If you want to build backend / frontend image locally, for e.g when you add new requirements/packages/take a new pull from Github you can add --build flag to the command.

docker-compose --env-file compose.env up --build

OR

docker-compose --env-file compose.env --profile ollama --profile infinity up --build

Developing in Cognita

Docker compose is a great way to run the entire Cognita system locally. Any changes that you make in the backend folder will be automatically reflected in the running backend server. You can test out different APIs and endpoints by making changes in the backend code.

:hammer_and_pick: Project Architecture

Overall the architecture of Cognita is composed of several entities

Cognita Components:

  1. Data Sources - These are the places that contain your documents to be indexed. Usually these are S3 buckets, databases, TrueFoundry Artifacts or even local disk

  2. Metadata Store - This stor

Related Skills

View on GitHub
GitHub Stars4.4k
CategoryOperations
Updated46m ago
Forks379

Languages

Python

Security Score

100/100

Audited on Mar 27, 2026

No findings