Phrame

Phrame generates captivating and unique art by listening to conversations around it, transforming spoken words and emotions into visually stunning masterpieces. Unleash your creativity and transform the soundscape around you.

How

Phrame relies on the SpeechRecognition interface of the Web Speech API to transform audio into text. This text is processed by OpenAI, producing a condensed summary. The summary is then combined with the configured generative AI image services and the final images are saved.

Donations

If you would like to make a donation to support development, please use GitHub Sponsors.

Features

Create unique AI-generated artwork from spoken conversations
Automatic, manual or voice-activated summary generation for on-demand art
User-friendly UI, optimized for both desktop and mobile
Real-time updates and remote control via WebSockets
Integrated config editor for customization
Support for multiple generative AI image services
Voice commands for image generation and navigation
Manage your gallery effortlessly: browse, favorite, delete images, and navigate using keyboard shortcuts
Access and manage logs for efficient troubleshooting

Supported Architecture

amd64
arm64

Supported AIs

* Midjourney currently uses an unofficial third party package. Use this integration at your own risk.

Voice Commands

Activate the microphone to interact with Phrame using the following voice commands.

| Command | Action | | ---------------- | -------------------------------------- | | Hey Phrame | Wake word to generate images on demand | | Next Image | Advance to next image | | Previous Image | Advance to previous image | | Last Image | Advance to previous image |

UI

Phrame has a responsive UI available at localhost:3000.

| Path | Name | | ------------- | --------------------------------- | | / | Controller | | /phrame?mic | Phrame with microphone support | | /phrame | Phrame without microphone support | | /gallery | Gallery | | /config | Config | | /logs | Logs |

Privacy

Speech recognition in Phrame is managed by the browser. The handling of audio data for speech recognition depends on the specific browser used. For instance, Chrome takes the audio and sends it to Google's servers to perform the transcription. It is encouraged to review the privacy policy of your chosen browser to fully understand how speech data is handled.

Once transcribed, Phrame saves these transcriptions into a local database. They are then processed by OpenAI to generate a summary, and immediately after, the original transcriptions are deleted. This summary is used in conjunction with the configured generative AI image services and the final pieces of art are saved locally.

It's important to clarify that Phrame does not retain or transmit your transcripts beyond the local device, except for the brief period required for generating the summary through OpenAI. Apart from these specific instances, no personal data is used, stored, or transmitted for any other purposes.

Usage

Phrame operates as a single Docker container and is easily accessible using any modern browser, even without a microphone.

To take advantage of the speech recognition feature, a compatible browser and microphone are required. At this time Chrome and Safari are the only browsers that support speech recognition.

Artwork within Phrame is displayed according to the image.order value. The latest summary and any favorite images are seamlessly merged, providing an evolving canvas of unique AI-generated art. As new images are created, they are instantly displayed by Phrame.

Quick Start

Start Phrame
Go to localhost:3000/config
1. Add your OpenAI API key and save
2. Verify OpenAI shows as configured with a green circle
In a new window go to localhost:3000/phrame?mic and follow the on screen instructions
Go to localhost:3000 and verify the microphone and speech recognition are working

Docker Run

docker run -d --restart=unless-stopped --name=phrame -v phrame:/.storage -p 3000:3000 jakowenko/phrame

Docker Compose

version: '3.9'

volumes:
  phrame:

services:
  phrame:
    container_name: phrame
    image: jakowenko/phrame
    restart: unless-stopped
    volumes:
      - phrame:/.storage
    ports:
      - 3000:3000

Launch on Boot

Modern browsers require a user click to access the microphone. To automatically start Phrame on boot, you can use the following script. This requires ydotool or xdotool (depending on your display server) to be installed which allows you to simulate keyboard input and mouse activity.

The script will wait 15 seconds for the Docker Engine and Phrame to start before launching Chrome. You can adjust the delay by changing the sleep value. After launching the browser, the script will wait 5 seconds before sending a click to get microphone access and start speech recognition.

Depending on your system, you may need to adjust the path to Chrome.

ydotool

#!/bin/bash

export YDOTOOL_SOCKET=/tmp/.ydotool_socket

# wait for the desktop and docker to be fully loaded
sleep 15s

# launch chrome in kiosk mode for microphone access
/usr/bin/google-chrome-stable --kiosk --no-first-run --hide-crash-restore-bubble --password-store=basic "http://localhost:3000/phrame?mic" &

# wait for chrome and phrame to load
sleep 5s

# move the mouse to the coordinates and click the left mouse button
ydotool mousemove --absolute 0 0
ydotool click 0xC0

xdotool

#!/bin/bash

# wait for the desktop and docker to be fully loaded
sleep 15s

# launch chrome in kiosk mode for microphone access
/usr/bin/google-chrome-stable --kiosk --no-first-run --hide-crash-restore-bubble --password-store=basic "http://localhost:3000/phrame?mic" &

# wait for chrome and phrame to load
sleep 5s

# move the mouse to the coordinates and click the left mouse button
xdotool mousemove --sync 0 0 click 1

Configuration

Configurable options are saved to /.storage/config/config.yml and are editable via the UI at localhost:3000/config.

Note: Default values do not need to be specified in configuration unless they need to be overwritten.

`image`

# image settings (default: shown below)

image:
  # time in seconds between image transitions
  interval: 60
  # order of images to display: random, recent
  order: recent

`autogen`

Images can be automatically generated by creating random summaries. This can be scheduled with a cron expression. Keywords can be passed to help guide the summary.

# autogen settings (default: shown below)

autogen:
  # schedule as a cron expression for processing transcripts (at every 15th and 45th minute)
  cron: '15,45 * * * *'
  prompt: Provide a random short description to describe a picture. It should be no more than one or two sentences. If keywords are provided select a couple at random to help guide the description.
  # keywords to guide the summary
  keywords: []

`transcript`

Images are generated by processing transcripts. This can be scheduled with a cron expression. All of the transcripts within X minutes will then be processed by OpenAI using openai.summary.prompt to summarize the transcripts.

# transcript settings (default: shown below)

transcript:
  # schedule as a cron expression for processing transcripts (at every 30th minute)
  cron: '*/30 * * * *'
  # how many minutes of files to look back for (process the last 30 minutes of transcripts)
  minutes: 30
  # minimum number of transcripts required to process
  minimum: 5

`openai`

To configure OpenAI, obtain an API key and add it to your config like the following. All other default settings found bellow w

Phrame

Install / Use

README

Phrame

How

Donations

Features

Supported Architecture

Supported AIs

Voice Commands

UI

Privacy

Usage

Quick Start

Docker Run

Docker Compose

Launch on Boot

Configuration

`image`

`autogen`

`transcript`

`openai`