Laundry
data sanitation services
Install / Use
/learn @solita/LaundryREADME
laundry
laundry converts user-supplied possibly dangerous files to more static and safer versions. Use it to reduce the risks of malware spreading via files supplied by external users or systems. The conversions are done with an up-to-date toolchain in a hardened stateless sandbox.
Antivirus products can mitigate the risks of malware, but they are imperfect. They mostly work against mass malware and have their own attack surfaces. laundry provides optional antivirus scans with ClamAV open-source antivirus engine for additional level of security.
Features
laundry provides an HTTP API for the conversions below.
| Input | Output | Uses | Purpose | |--------|--------|---------------------------------------------|---------| | doc(x) | pdf | LibreOffice | Removes any embedded macros etc and turns .doc(x) to portable PDF which can be e.g. embedded in HTML. | | jpeg | jpeg | ImageMagick | Strip away all metadata and extraneous bytes, keep only pixel-by-pixel color data. Conversion performed with intermediate PPM format. | | pdf | pdf/a | Ghostscript | Clean up a PDF with conversion to PDF/A for archival purposes. Beware the potentially large file sizes. | | pdf | jpeg | Ghostscript | Converts the first page to jpeg for thumbnails or previews. | | pdf | text | Ghostscript | Extract plain text from a PDF. Does not perform OCR. | | png | png | ImageMagick | Strip away all metadata and extraneous bytes, keep only pixel-by-pixel color data. Conversion performed with intermediate PPM format. | | xls(x) | pdf | LibreOffice | Removes any embedded macros etc and turns .xls(x) to portable PDF which can be e.g. embedded in HTML. |
The laundry HTTP server provides an REST API and online tool to try out the conversions and antivirus scans directly from the browser. Optional API-key-based authorization is available.
Conversions are performed in single-use disposable Docker containers. The containers are secured, and their runtime is gVisor runsc. It provides an additional layer of isolation for the containers.
Antivirus scan is exposed as an HTTP API. It takes in one file and the response tells whether there were any viruses in the file. The scans are performed with ClamAV clamdscan from their official Docker image. This container is not a single-use; instead it is kept alive for extended periods in order to keep the anti-virus signature database up-to-date.
HTTP API documentation
The examples here use service address http://192.168.123.123:8080 of local development environment. See CONTRIBUTING.md for instructions how to set it up.
Use the HTTP API in asynchronous manner; The provided endpoints can be slow. Processing a large file might take tens of seconds.
Each operation requires potentially hundreds of mebibytes of memory. Limit the amount of concurrent requests according to your server constraints.
GET /alive
Endpoint for healthchecks. Invoke it to check whether the service is up and running.
Authorization: No authorization required.
Example request:
curl http://192.168.123.123:8080/alive
Responses:
- HTTP status 200 with response body
yes.
GET /auth-test
Endpoint for testing your API KEY authorization without any actual operation.
Authorization: Optional HTTP Basic authentication with user name laundry-api and your api-key as password. Authorization is required when the server is launched with -k or --api-key-file option.
Example request:
curl -u "laundry-api:abcd1234" http://192.168.123.123:8080/auth-test
Responses:
- HTTP status 200 when authorization is successful or when the server is running without authorization.
- HTTP status 401 for failed authorization with response body
access denied.
POST /antivirus/scan
Scans the attached file with ClamAV and indicates whether there were any viruses detected. The request must be multipart/form-data and the file in a part named file.
Authorization: Optional HTTP Basic authentication as documented in GET /auth-test.
Example request:
curl -F file=@input.xxx http://192.168.123.123:8080/antivirus/scan
Responses:
- HTTP status 200 when the file was clean and no viruses were detected.
- HTTP status 400 when viruses were detected! See response body for detailed response from
clamdscan. It includes the virus name. - HTTP status 401 for failed authorization. See GET
/auth-testfor details. - HTTP status 500 when the scan can not be performed. See response body for detailed error message.
Example response when virus detected:
HTTP/1.1 400 Bad Request
Content-Type: text/plain;charset=utf-8
Viruses found! stream: Win.Test.EICAR_HDB-1 FOUND
----------- SCAN SUMMARY -----------
Infected files: 1
Time: 0.006 sec (0 m 0 s)
Start Date: 2022:10:19 07:22:16
End Date: 2022:10:19 07:22:16
POST /docx/docx2pdf
Converts the provided .doc or .docx to a PDF. The request must be multipart/form-data and the file in a part named file.
Authorization: Optional HTTP Basic authentication as documented in GET /auth-test.
Example request:
curl -F file=@input.docx --output result.pdf http://192.168.123.123:8080/docx/docx2pdf
Responses:
- HTTP status 200 when the conversion succeeded. The
content-typeisapplication/pdfand the PDF is transferred in response body. - HTTP status 401 for failed authorization. See GET
/auth-testfor details. - HTTP status 500 when conversion failed. See server logs for details.
POST /xlsx/xlsx2pdf
Converts the provided .xls or .xlsx to a PDF. The request must be multipart/form-data and the file in a part named file.
Authorization: Optional HTTP Basic authentication as documented in GET /auth-test.
Example request:
curl -F file=@input.xlsx --output result.pdf http://192.168.123.123:8080/xlsx/xlsx2pdf
Responses:
- HTTP status 200 when the conversion succeeded. The
content-typeisapplication/pdfand the PDF is transferred in response body. - HTTP status 401 for failed authorization. See GET
/auth-testfor details. - HTTP status 500 when conversion failed. See server logs for details.
POST /image/png2png
Cleans up the provided .png keeping only pixel-by-pixel color data. The request must be multipart/form-data and the file in a part named file.
Authorization: Optional HTTP Basic authentication as documented in GET /auth-test.
Example request:
curl -F file=@input.png --output result.png http://192.168.123.123:8080/image/png2png
Responses:
- HTTP status 200 when the conversion succeeded. The
content-typeisimage/pngand the image is transferred in response body. - HTTP status 401 for failed authorization. See GET
/auth-testfor details. - HTTP status 500 when cleanup failed. See server logs for details.
POST /image/jpeg2jpeg
Cleans up the provided .jpg or .jpeg keeping only pixel-by-pixel color data. The request must be multipart/form-data and the file in a part named file.
Authorization: Optional HTTP Basic authentication as documented in GET /auth-test.
Example request:
curl -F file=@input.jpeg --output result.jpeg http://192.168.123.123:8080/image/jpeg2jpeg
Responses:
- HTTP status 200 when the conversion succeeded. The
content-typeisimage/jpegand the image is transferred in response body. - HTTP status 401 for failed authorization. See GET
/auth-testfor details. - HTTP status 500 when cleanup failed. See server logs for details.
POST /pdf/pdf-preview
Converts the first page of the PDF to jpeg. The request must be multipart/form-data and the file in a part named file.
Authorization: Optional HTTP Basic authentication as documented in GET /auth-test.
Example request:
curl -F file=@input.pdf --output result.jpeg http://192.168.123.123:8080/pdf/pdf-preview
Responses:
- HTTP status 200 when the conversion succeeded. The
content-typeisimage/jpegand the image is transferred in response body. - HTTP status 401 for failed authorization. See GET
/auth-testfor details. - HTTP status 500 when conversion failed. See server logs or response body for details.
POST /pdf/pdf2txt
Extracts the contents of PDF to plain text. The request must be multipart/form-data and the file in a part named file.
Authorization: Optional HTTP Basic authentication as documented in GET /auth-test.
Example request:
curl -F file=@input.pdf --output result.txt http://192.168.123.123:8080/pdf/pdf2txt
Responses:
- HTTP status 200 when the extraction succeeded. The
content-typeistext/plainand the text is transferred in response body. - HTTP status 401 for failed authorization. See GET
/auth-testfor details. - HTTP status 500 when extraction failed. See server logs or response body for details.
POST /pdf/pdf2pdfa
Converts the PDF to safer PDF/A, which is often used for archival purposes. This removes embedded scripts etc, but might also convert custom fonts to images. Thus the result might contain text as images, have large file sizes and be slow to open. The reques
Related Skills
node-connect
349.9kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
109.8kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
349.9kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
349.9kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
