SkillAgentSearch skills...

Utfcpp

UTF-8 with C++ in a Portable Way

Install / Use

/learn @nemtrif/Utfcpp
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

<!-- TOC --><a name="utf8-cpp-utf-8-with-c-in-a-portable-way"></a>

UTF8-CPP: UTF-8 with C++ in a Portable Way

<!-- TOC --><a name="introduction"></a>

Introduction

C++ developers still miss an easy and portable way of handling Unicode encoded strings. The original C++ standard (known as C++98 or C++03) is Unicode agnostic. Some progress has been made in the later editions of the standard, but it is still hard to work with Unicode using only the standard facilities.

I came up with a small, C++98 compatible generic library in order to handle UTF-8 encoded strings. For anybody used to work with STL algorithms and iterators, it should be easy and natural to use. The code is freely available for any purpose - check out the license. The library has been used a lot since the first release in 2006 both in commercial and open-source projects and proved to be stable and useful.

Table of Contents

<!-- TOC end --> <!-- TOC --><a name="installation"></a>

Installation

This is a header-only library and the supported way of deploying it is:

  • Download a release from https://github.com/nemtrif/utfcpp/releases into a temporary directory
  • Unzip the release
  • Copy the content of utfcpp/source file into the directory where you keep include files for your project

The CMakeList.txt file was originally made for testing purposes only, but unfortunately over time I accepted contributions that added install target. This is not a supported way of installing the utfcpp library and I am considering removing the CMakeList.txt in a future release.

<!-- TOC --><a name="examples-of-use"></a>

Examples of use

<!-- TOC --><a name="introductory-sample"></a>

Introductory Sample

To illustrate the use of the library, let's start with a small but complete program that opens a file containing UTF-8 encoded text, reads it line by line, checks each line for invalid UTF-8 byte sequences, and converts it to UTF-16 encoding and back to UTF-8:

#include <fstream>
#include <iostream>
#include <string>
#include <vector>
#include "utf8.h"
using namespace std;
int main(int argc, char** argv)
{
    if (argc != 2) {
        cout << "\nUsage: docsample filename\n";
        return 0;
    }
    const char* te
View on GitHub
GitHub Stars1.9k
CategoryDevelopment
Updated4d ago
Forks232

Languages

C++

Security Score

100/100

Audited on Mar 24, 2026

No findings