15 skills found
city96 / ComfyUI GGUFGGUF Quantization support for native ComfyUI models
brontoguana / KrasisKrasis is a Hybrid LLM runtime which focuses on efficient running of larger models on consumer grade VRAM limited hardware
1038lab / ComfyUI JoyCaptionJoy Caption is a ComfyUI node using the LLaVA model to generate stylized image captions, supporting batch processing and GGUF models.
1038lab / ComfyUI MiniCPMA custom ComfyUI node for MiniCPM vision-language models, supporting v4, v4.5, and v4 GGUF formats, enabling high-quality image captioning and visual analysis.
ai-joe-git / ComfyUI Intel Arc Clean Install Windows Venv XPU Fully automated installation scripts for ComfyUI optimized for Intel Arc GPUs (A-Series) and Intel Core Ultra iGPUs with XPU backend, Triton acceleration, and GGUF quantized model support.
airesearch-official / Z Image Turbo WindowsOne-click Windows installer for Z-Image Turbo AI image generation. Optimized for low-VRAM GPUs (4GB+). Features Gradio web UI, automatic setup, and GGUF model support.
meganoob1337 / Llama Swap Vllm BoilerplateDynamic LLM model swapping system with Docker, vLLM integration, and GPU acceleration. Supports GGUF & Hugging Face models with automatic swapping and Traefik routing.
kantan-kanto / ComfyUI LLM SessionLocal LLM session nodes for ComfyUI using GGUF and llama.cpp, supporting Llama, Mistral, Qwen, DeepSeek, GLM, Gemma, Phi, LLaVA and gpt-oss, enabling both user–model chat and model-to-model dialogue without external runtimes like Ollama.
DevMaan707 / Llm ToolkitA comprehensive Flutter SDK for running Large Language Models (LLMs) locally on mobile and desktop devices. Supports multiple inference engines including Gemma (TFLite) and Llama (GGUF) with integrated model discovery, download, and chat capabilities.
nexusjuan12 / FLUX.1 Kontext Multi ImageMulti-image implementation of Flux.1-Kontext with quantized model support in GGUF format. Also includes an app that produces a series of portraits using the same model.
nareshis21 / Truelarge RTAndroid inference engine running 20B+ parameter LLMs on 4GB-8GB RAM devices. Features proprietary Layer-by-Layer (LBL) streaming, zero-copy mmap loading, and native C++/Kotlin architecture.
kantan-kanto / ComfyUI MultiModal Prompt NodesMultimodal prompt generator nodes for ComfyUI, designed to generate prompts for QwenImageEdit and Wan2.2. Supports local LLM / local GGUF models (Qwen3.5, Qwen3-VL and Qwen2.5-VL) and Qwen API for image and video prompt generation and enhancement.
ml-rust / BlazrProduction-grade inference server for LLMs. Supports standard HuggingFace models (Llama, Mistral, Qwen, Phi, Gemma, DeepSeek) and custom hybrid architectures (Mamba2, MLA, MoE). Loads SafeTensors, AWQ, GPTQ, and GGUF formats
Divith123 / LoRA The Second BrainAn open-source AI chatbot app that runs models locally using Ollama, supporting a wide variety of Small Language Models (SLMs) from Meta, Google, Alibaba, and others in GGUF and H2O-Danube formats. Features
duoyuncloud / ModelConverterToolA CLI and API tool for converting, validating, and managing machine learning models across multiple formats. Supports ONNX, FP16, HuggingFace, TorchScript, GGUF, MLX, GPTQ, AWQ, and more.