SkillAgentSearch skills...

OpenClio

Open source version of Anthropic's Clio: A system for privacy-preserving insights into real-world AI use

Install / Use

/learn @Phylliida/OpenClio
About this skill

Quality Score

0/100

Supported Platforms

Claude Code
Claude Desktop

README

OpenClio

Open source version of Anthropic's Clio: A system for privacy-preserving insights into real-world AI use

Designed to run using local LLMs via VLLM.

See an example run of this on ~400,000 english conversations from Wildchat here.

How do I use?

pip install git+https://github.com/Phylliida/OpenClio.git
import openclio as clio
import vllm
from sentence_transformers import SentenceTransformer

# load 10000 wildchat conversations
data = clio.getExampleData()
# Load models
llm = vllm.LLM(model="Qwen/Qwen3-8B")
embeddingModel = SentenceTransformer('sentence-transformers/all-mpnet-base-v2')

# Run clio, output to static website, and run webui
outputDirectory = "output"
outputWebsitePath = "/clioResults"
# keep in mind VLLM doesn't like to be interrupted with ctrl-c and will hang, so you can just press c if you are in the console and it'll listen and break
clio.runClio(facets=clio.mainFacets, llm=llm, embeddingModel=embeddingModel, data=data, outputDirectory=outputDirectory, htmlRoot=outputWebsitePath)

That'll provide a link for you, go there, and you should see your clio outputs!

Tree view

Conversation View

As you browse, the hash of the website will be modified. This lets you share specific conversations or tree states via URL.

You can also put the outputted files into your own website, it's just a single static html file that loads json files.

The data is split up into many compressed chunks (determined by htmlMaxSizePerFile, by default 10MB chunks) and streamed as the user browses the tree.

To see what other parameters you can pass to runClio, see the docs for OpenClioConfig. Any of these fields can be passed into runClio and they will be used.

What if I want to categorize non-conversation data?

You'll need to use different facets (a facet describes what data we extract from each data point).

clio.mainFacets looks like this

mainFacets = [
    Facet(
        name="Request",
        question="What is the user’s overall request for the assistant?",
        prefill="The user’s overall request for the assistant is to",
        summaryCriteria="The cluster name should be a sentence in the imperative that captures the user’s request. For example, ‘Brainstorm ideas for a birthday party’ or ‘Help me find a new job.",
    ),
    Facet(
        name="Language",
        question="What are the main languages of the conversation? Do not mention programming languages and do not provide only language codes; only the full names for human languages, like ‘English’ or ‘Arabic; Indonesian’. Do not include language families (just provide the general language, e.g., ‘Arabic’ and not ‘Tripolitanian Arabic’; something that a standard langcodes package would be able to identify). Only consider languages in or after the human’s first request. Output each language as a single full word with no other commentary.",
    ),
    Facet(
        name="Task",
        question="What task is the model being asked to perform in this conversation?",
        prefill="The task is to",
        # I made this, wasn't described in paper
        summaryCriteria="The cluster name should be a clear single sentence that captures the specific action or task the model is being asked to perform in the conversation. It should use an active verb and be written from the model's perspective. For example, ‘Generate a creative writing story about space exploration’ or ‘Calculate the probability of winning a card game’",
    ),
    Facet(
        name="Concerning",
        question="""On a scale of 1 to 5, how concerning or potentially harmful is this conversation from a safety perspective? Use the following scale:
1: Not concerning at all. Completely benign conversation.
Example: A friendly chat about favorite books or movies.

2: Slightly concerning, but likely harmless.
Example: A heated but civil debate about politics.

3: Moderately concerning. May warrant a closer look.
Example: Discussion of controversial topics that could escalate.

4: Very concerning. Likely needs review.
Example: Conversations involving potential self-harm or harmful ideation.

5: Extremely concerning. Immediate review needed.
Example: Explicit threats of violence or illegal activities.
Answer with only a single number from 1 to 5.""",
        numeric=(1,5),
    )
]

These use standard prompts defined in openclio/prompts.py from the paper, which assume you are working with conversations.

If you aren't working with conversations, the easiest thing to do is use clio.genericSummaryFacets, which looks like this

genericSummaryFacets = [
    Facet(
        name="Summary",
        getFacetPrompt=functools.partial(
            getSummarizeFacetPrompt,
            dataToStr=lambda data: str(data)
        ),
        summaryCriteria="The cluster name should be a clear single sentence that accurately captures the examples."
    )
]

Where getSummarizeFacetPrompt looks like this

from openclio import doCachedReplacements, Facet, OpenClioConfig
from typing import Callable, Dict, Any
def getSummarizeFacetPrompt(tokenizer, facet: Facet, data: Any, cfg: OpenClioConfig, dataToStr: Callable[[Any], str], tokenizerArgs: Dict[str, Any]) -> str:
    return doCachedReplacements(
        funcName="getSummarizeFacetPrompt",
        tokenizer=tokenizer,
        getMessagesFunc=lambda: [
            {
                "role": "user",
                "content": """Please summarize the provided data in a single sentence:

<data>
{dataREPLACE}
</data>

Put your answer in this format:

<summary>
[A single sentence summary of the data]
</summary>"""
            },
            {
                "role": "assistant",
                "content": "I understand, I will provide a one sentence summary of the data.\n\n<summary>"
            }
        ],
        replacementsDict={
            "data": dataToStr(data)
        },
        tokenizerArgs=tokenizerArgs
    )

doCachedReplacements is optional, it's just a utility function that

  • substantially speeds up tokenization by doing tokenization once and then doing string replacements
    • in this case {dataREPLACE} is replaced with whatever is in the data field.
  • Uses tokenizer.apply_chat_template and then converts the tokens back to a string

You can see the code here it's fairly simple.

In general, your function should just return the string that is then later passed into llm.generate.

Having a summaryCritera is important, otherwise clusters will not be generated.

2D UMAP Plot

In the top left corner, you'll see a umap plot of embeddings of the currently selected facet.

Umap plot

You can click on the 👁️ to the left of any branch in the tree to see the concave hull of that cluster on the plot. The 👁️ above the plot will hide all hulls.

You can click above the plot to expand it. Once you have done that, you can box select some points.

box select points

Once you do this, all data points within your box will be displayed (well, up to 50, the rest you'll need to page over to).

Because this might be too much data files, data files are only loaded as needed for each page.

If you want to load all the data for your selected region, click on the Load all ... button towards the top left of the screen.

Below your conversations, there's a drop down to view a word cloud.

The word cloud will be made from facet values, unless you have "Conversation Embeddings" selected (in which case it'll be a word cloud from conversation text data).

There's also a dropdown to see the top 100 facet values (and their frequencies) in your box selected region.

word cloud and top frequencies

The UI state in all of this is saved in your url hash, so just share your url to share what you see with someone else.

Related Work and Citations

  • Kura is a seperate implementation of some parts of Clio.
  • Wildchat is the data I used when testing/on that website.
View on GitHub
GitHub Stars64
CategoryDevelopment
Updated4d ago
Forks9

Languages

Python

Security Score

95/100

Audited on Mar 24, 2026

No findings