Lmdocs
Generate python documentation using LLMs
Install / Use
/learn @MananSoni42/LmdocsREADME
lmdocs: Generative AI for code documentation :brain: :arrow_right: :computer: :snake:
lmdocs automatically generates documentation for your Python code using LLMs.
( Features | Examples | Quickstart :rocket: | How it works | Additional options :gear: | Caveats and limitations )
:bulb: All documentation for this repository has been generated by lmdocs itself!
Features
- Codebase Preservation: Guarantees no changes to your code
- Context aware Documentation: References relevant documentation from imported libraries and your codebase
- LLM-Generated Comments: Understands your code and adds relevant docstrings and comments
- No dependancies: Written in pure Python, no dependancies on any external packages *It is recommended that you install libraries specific to your project before running
lmdocs in Action :hammer:
<details> <summary>Example 1 - Fibonacci numbers</summary># Original function
def fibonacci(n):
a, b = 0, 1
fib_seq = []
for i in range(n):
fib_seq.append(a)
a, b = b, a + b
return fib_seq
# Commented using lmdocs
def fibonacci(n):
"""
Generates the Fibonacci sequence up to n terms.
Input:
n (int): The number of terms in the Fibonacci sequence to generate.
Returns:
list: A list containing the first n terms of the Fibonacci sequence.
Raises:
ValueError: If n is less than 1.
"""
a, b = 0, 1 # Initialize two variables to store the last and current term in the sequence
fib_seq = [] # Initialize an empty list to store the generated Fibonacci sequence
for i in range(n): # Generate n terms of the Fibonacci sequence
fib_seq.append(a) # Append the current term to the sequence
# Update the last two terms for the next iteration
a, b = b, a + b
return fib_seq # Return the generated Fibonacci sequence
</details>
<details>
<summary>Example 2 - K means</summary>
# Original function
def k_means(X, k, max_iter=300, tol=1e-4, random_state=None):
np.random.seed(random_state)
centroids = X[np.random.choice(X.shape[0], k, replace=False), :]
for _ in range(max_iter):
distances = np.sqrt(((X - centroids[:, np.newaxis])**2).sum(axis=2))
cluster_assignments = np.argmin(distances, axis=0)
new_centroids = np.array([X[cluster_assignments == i].mean(axis=0) for i in range(k)])
if np.abs(centroids - new_centroids).sum() < tol:
break
centroids = new_centroids
return cluster_assignments, centroids
# Commented using lmdocs
def k_means(X, k, max_iter=300, tol=1e-4, random_state=None):
'''
Perform K-Means clustering.
Input:
X : array-like of shape (n_samples, n_features)
The input data.
k : int
The number of clusters to form.
max_iter : int, default=300
Maximum number of iterations of the k-means algorithm for a single run.
tol : float, default=1e-4
Relative tolerance with regards to Frobenius norm of the difference in the cluster centers
of two consecutive iterations to declare convergence.
random_state : int, default=None
Determines random number generation for centroid initialization. Use an integer to
get reproducible results.
Returns:
tuple : (cluster_assignments, centroids)
cluster_assignments : array-like of shape (n_samples,)
Cluster assignments for each sample in the input data.
centroids : array-like of shape (k, n_features)
Coordinates of cluster centers.
Raises:
ValueError : If k greater than number of samples or less than one.
'''
np.random.seed(random_state)
centroids = X[np.random.choice(X.shape[0], k, replace=False), :]
for _ in range(max_iter):
distances = np.sqrt(((X - centroids[:, np.newaxis])**2).sum(axis=2)) # Calculate Euclidean distance to each centroid
cluster_assignments = np.argmin(distances, axis=0) # Assign sample to nearest centroid
# Recalculate centroids as mean of samples in the same cluster
new_centroids = np.array([X[cluster_assignments == i].mean(axis=0) for i in range(k)])
if np.abs(centroids - new_centroids).sum() < tol: # Check if centroids have converged
break
centroids = new_centroids # Update centroids for next iteration
return cluster_assignments, centroids
</details>
The examples above were generated locally using lmdocs with the DeepSeek coder 6.7B model.
Quickstart :rocket:
Using an OpenAI model
python lmdocs.py <project path> --openai_key <key>
Tested with gpt-3.5-turbo, gpt-4-turbo, gpt-4o
Using a local model
python lmdocs.py <project path> --port <local LLM server port>
Setup
To use local LLMs, you need to set up an openAI compatible server.
You can use local desktops apps like LM Studio, Ollama, GPT4All, llama.cpp or any other method to set up your LLM server.
Although lmdocs is compatible with any local LLM, I have tested that it works for the following models:
deepseek-coder-6.7b-instruct, WizardCoder-Python-7B-V1, Meta-Llama-3-8B-Instruct, Mistral-7B-Instruct-v0.2, Phi-3-mini-4k-instruct
How it works
Step 1: Collect and Analyze Code
Gather all Python files from the project directory and identify all function, class, and method calls
Step 2: Create Dependency Graph
Map out the dependencies between the identified calls to create a dependency graph of the entire codebase
Step 3: Retrieve and Generate Documentation
For calls with no dependencies, retrieve existing documentation using their __doc__ attribute
For calls with dependents, prompt the LLM to generate documented code, providing the original code and reference documentation for all its dependencies in the prompt
Step 4: Verify and Replace Code
Compare the Abstract Syntax Tree (AST) of the original and generated code
If they match, replace the original code with the documented code
If they don't match, retry the generation and verification process (up to three times)
Additional options :gear:
usage: lmdocs.py [-h] [-v] [--openai_key OPENAI_KEY] [--openai_key_env OPENAI_KEY_ENV] [--openai_model {gpt-3.5-turbo,gpt-4-turbo,gpt-4o}] [-p PORT]
[--ref_doc {truncate,summarize,full}] [--max_retries MAX_RETRIES] [--temperature TEMPERATURE] [--max_tokens MAX_TOKENS]
path
positional arguments:
path Path to the file/folder of project
optional arguments:
-h, --help show this help message and exit
-v, --verbose Give out verbose logs
--openai_key OPENAI_KEY
Your Open AI key
--openai_key_env OPENAI_KEY_ENV
Environment variable where Open AI key is stored
--openai_model {gpt-3.5-turbo,gpt-4-turbo,gpt-4o}
Which openAI model to use. Supported models are ['gpt-3.5-turbo', 'gpt-4-turbo', 'gpt-4o']
gpt-3.5-turbo is used by default
-p PORT, --port PORT Port where Local LLM server is hosted
--ref_doc {truncate,summarize,full}
Strategy to process reference documentation. Supported choices are:
truncate - Truncate documentation to the first paragraph
summarize - Generate a single summary of the documentation using the given LLM
full - Use the complete documentation (Can lead to very long context length)
"truncate" is used as the default strategy
--max_retries MAX_RETRIES
Number of attempts that the LLM gets to generate the documentation for each function/method/class
--temperature TEMPERATURE
Temperature parameter used to sample output from the LLM
--max_tokens MAX_TOKENS
Maximum number of tokens that the LLM is allowed to generate
Caveats and limitations
Language Support
Only supports Python 3.0+
Dependancy extraction
The ast module is used to analyze the Abstract Syntax Tree of every Python file in the codebase.
Only functional and class dependancies are tracked i.e Only code written within a class, method or function, is tracked and documented
Package Dependancies
lmdocs is written in pure Python, it does not depend on any other packages.
It is strongly recommended that you install the libraries/packages for the project that needs to be documented for reference document extraction
Reference documentation extraction
Documentation for functions which have no dependancies is extracted using Pythons ___doc___() method
For external libraries (e.g numpy), the library is imported as it is from the original code
Note that, since Python does no
Related Skills
node-connect
337.3kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
83.2kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
337.3kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
83.2kCommit, push, and open a PR
