DocSharp
Pure C# library to convert between document formats (Office 97-2003, Open XML, RTF, Markdown)
Install / Use
/learn @manfromarce/DocSharpREADME
DocSharp
DocSharp is a pure C# library to convert between document formats without Office interop or native dependencies (except for some special packages, see requirements).
The following packages are currently available:
- DocSharp.Binary: convert Office 97-2003 binary documents (doc, xls, ppt) to OpenXML documents (docx, xlsx, pptx). This is a fork of the abandoned b2xtranslator project which provides critical fixes.
Note: pre-97 formats (not documented) and XLSB (very different) are not supported. - DocSharp.Docx: convert DOCX to RTF, HTML, Markdown and plain text (.txt), and RTF to DOCX. Possible applications include generating Open XML documents in C# and exporting for other editors/services, or loading and saving Microsoft Word documents into/from a RichTextBox/RichEditBox control.
- DocSharp.Markdown: convert Markdown to DOCX or RTF using custom Markdig renderers.
- DocSharp.Renderer: provides basic DOCX to PDF/images/SVG/XPS conversion using QuestPDF.
Packages can be installed via NuGet:
The optional extra packages DocSharp.ImageSharp, DocSharp.SystemDrawing, DocSharp.SkiaSharp, DocSharp.MagickNET allow to convert unsupported images (e.g. GIF / TIFF for DOCX -> RTF or WMF / EMF / TIFF for DOCX -> Markdown/HTML). Each of these has pros and cons, the choice depends on your requirements. More information can be found in the Wiki.
There is no common DOM to manipulate or generate documents, this library is mainly for conversion. Some helper methods on top of the Open XML SDK and format-specific writers are available, but they are mostly intended for internal use; however they could be extended/improved in the future.
You can consider using the Open XML SDK itself or other <a href="#recommended_libraries">recommended libraries</a> for documents creation and manipulation. Some of these are used in the sample app to test two-steps conversions, compare results, or generate documents in multiple formats with the same code.
DocSharp provides methods to accept/return a WordprocessingDocument directly (in addition to file path / Stream / byte array), and a SaveTo extension method for WordprocessingDocument.
Supported features
Supported elements vary depending on input and output formats, see Supported features for an overview.
<a id="Requirements"></a>
Requirements
- Supported targets are .NET 8, 9, 10 and .NET Framework 4.6.2 (minimum netfx version still supported).
- DocSharp.SystemDrawing is for Windows only (.NET Framework or net*-windows), as System.Drawing.Common is based on GDI+ and only supported on Windows since .NET 6.
- DocSharp.ImageSharp is cross-platform for .NET 8+, as ImageSharp is fully managed C# code but does not support .NET Framework.
- DocSharp.MagickNET is cross-platform for both .NET and .NET Framework, but Magick.NET bundles many native libraries that might not work on non-desktop platforms (Android / iOS / WASM)
- DocSharp.Renderer depends on QuestPDF, which currently supports Windows x64 / x86, macOS x64 / ARM64, Linux x64 / ARM64. Windows ARM64, Android, iOS are not supported yet, due to a custom Skia build. Plus, the XPS generation is only supported on Windows.
Usage
You can refer to the project Wiki or sample apps.
Roadmap
- Finish and publish experimental converters
- Support more elements and attributes, and fix issues on edge cases
- Reduce code duplication, cleanup
- Async functions/progress callback (some tasks such as downloading images referenced in Markdown may take some time)
- Improve support for right-to-left and complex script languages
- Evaluate feasibility of making converters thread-safe and totally NativeAOT-compatible
Credits
Dependencies:
- Open XML SDK
- Markdig - for DocSharp.Markdown
- ImageSharp and VectSharp - for DocSharp.ImageSharp
- System.Drawing.Common and SVG.NET - for DocSharp.SystemDrawing
- CoreJ2K - for JPEG2000 support in both DocSharp.ImageSharp and DocSharp.System.Drawing
- Magick.NET-Q8-AnyCPU - for DocSharp.MagickNET
- QuestPDF - for DocSharp.Renderer
Forked:
Others (credits for parts of the logic, not direct dependencies):
- wmf-to-svg for the WMF parser
- Html2OpenXml for images header decoding and unit conversions.
- dwml_cs for Office Math (OMML) to LaTex conversion
- addFormula2docx for Office Math (OMML) to MathML conversion
- RtfPipe (and forks: 1, 2), RtfConverter, OpenRTFDoc, ReasonableRTF.Standard for RTF parsing logic.
- ExcelNumberFormat for Excel format strings parsing logic.
<a id="recommended_libraries"></a> Other recommended libraries (some of these are used in the sample app, not dependencies when installing packages):
- Read, write, manipulate docuents:
- Open XML SDK - DOCX, XLSX, PPTX
- OfficeIMO - DOCX, XLSX, PPTX, Markdown, CSV; can also merge, compare and convert some formats
- Clippit - DOCX, XLSX, PPTX; can also merge, compare and convert some formats
- ClosedXML - XLSX
- Sylvan.Data.Excel - XLSX, XLS, XLSB
- ShapeCrawler - PPTX; can also render slides to images
- NPOI - DOCX, XLSX, XLS; partial port of Apache POI
- FluentNPOI - XLSX, XLS; HTML/PDF export
- Read only / Extract data:
- GustavoHennig/b2xtranslator - DOC prior to Office 97
- ExcelDataReader - XLS (pre-97 too), XLSB, XLSX, CSV
- PdfPig, Tabula.Csv - PDF
- OpenMcdf - Microsoft Compound format
- Generate documents:
- PDF, XPS, SVG, images: QuestPDF, FossPDF.NET
- PDF and RTF: PdfSharp / MigraDoc
- DOCX: SharpDocx, DocxTemplater, MiniWord
- XLSX: MiniExcel, ClosedXML.Report
- XLSX, ODS, CSV: FreeDataExports
- Convert or render documents:
- XLSX: XlsxToHtmlConverter
- HTML rendering: HTML-Renderer, PeachPdf, Puppeteer Sharp, Westwind.WebView
- HTML to Open XML: Html2OpenXml (DOCX), HtmlToExcel
- HTML to Markdown: ReverseMarkdown
- Markdown rendering: Markdig, QuestPDF.Markdown, VectSharp.Markdown + VectSharp.PDF, [MarkdownToPdf](https://github.com/geertjanthomas/MarkdownToPd
Related Skills
node-connect
354.2kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
112.2kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
354.2kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
354.2kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
