SkillAgent Search skills...⌘K

Llm

Fine-tuning, DPO, RLHF, RLAIF on LLMs - Qwen3, Zephyr 7B GPTQ with 4-Bit Quantization, Mistral-7B-GPTQ

Generate Convert Improve

Install / Use

/learn @bayjarvis/Llm

About this skill

Quality Score

0/100

Category

Development & Engineering

Supported Platforms

Universal

Tags

llama2 llm llm-training

README

LLM Training and Fine-tuning Projects

This repository contains various LLM training implementations using different optimization approaches.

Projects by Training Approach

Self-Supervised Alignment

MLX-GRPO: Group Relative Policy Optimization - Complete GRPO implementation for Apple Silicon using MLX framework with Qwen3-0.6B model support. Uses group comparisons for alignment without requiring human feedback data.

Human Feedback-Based Training

Harnessing Zephyr's Breeze: DPO Training on Mistral-7B-GPTQ - Direct Preference Optimization for language model alignment using human preference datasets on quantized models.

Supervised Fine-tuning

Fine-tuning Zephyr 7B GPTQ with 4-Bit Quantization - Custom data fine-tuning with 4-bit quantization for efficient inference and deployment.

Architectural Implementations

Mixture of Experts (MoE) in PyTorch - A from-scratch implementation of a sparse Mixture of Experts layer in PyTorch, demonstrating a key technique for building large, efficient language models.

bayjarvis

View profile

GitHub Stars15

CategoryDevelopment

Updated4mo ago

Forks0

Languages

Python

Security Score

77/100

Audited on Nov 11, 2025

No findings