Utfcpp
UTF-8 with C++ in a Portable Way
Install / Use
/learn @nemtrif/UtfcppREADME
UTF8-CPP: UTF-8 with C++ in a Portable Way
<!-- TOC --><a name="introduction"></a>Introduction
C++ developers still miss an easy and portable way of handling Unicode encoded strings. The original C++ standard (known as C++98 or C++03) is Unicode agnostic. Some progress has been made in the later editions of the standard, but it is still hard to work with Unicode using only the standard facilities.
I came up with a small, C++98 compatible generic library in order to handle UTF-8 encoded strings. For anybody used to work with STL algorithms and iterators, it should be easy and natural to use. The code is freely available for any purpose - check out the license. The library has been used a lot since the first release in 2006 both in commercial and open-source projects and proved to be stable and useful.
Table of Contents
- UTF8-CPP: UTF-8 with C++ in a Portable Way
- Introduction
- Installation
- Examples of use
- Points of interest - Design goals and decisions - Alternatives
- Reference
- Functions From utf8 Namespace
- utf8::append
- utf8::append16
- utf8::next
- utf8::next16
- utf8::peek_next
- utf8::prior
- utf8::advance
- utf8::distance
- utf8::utf16to8
- utf8::utf16tou8
- utf8::utf8to16
- utf8::utf32to8
- octet_iterator utf32to8 (u32bit_iterator start, u32bit_iterator end, octet_iterator result)
- std::string utf32to8(const std::u32string& s)
- std::u8string utf32to8(const std::u32string& s)
- std::u8string utf32to8(const std::u32string_view& s)
- std::string utf32to8(const std::u32string& s)
- std::string utf32to8(std::u32string_view s)
- utf8::utf8to32
- utf8::find_invalid
- utf8::is_valid
- utf8::replace_invalid
- utf8::starts_with_bom
- Types From utf8 Namespace
- Functions From utf8::unchecked Namespace
- utf8::unchecked::append
- utf8::unchecked::append16
- utf8::unchecked::next
- utf8::next16
- utf8::unchecked::peek_next
- utf8::unchecked::prior
- utf8::unchecked::advance
- utf8::unchecked::distance
- utf8::unchecked::utf16to8
- utf8::unchecked::utf8to16
- utf8::unchecked::utf32to8
- utf8::unchecked::utf8to32
- utf8::unchecked::replace_invalid
- Types From utf8::unchecked Namespace
- Functions From utf8 Namespace
Installation
This is a header-only library and the supported way of deploying it is:
- Download a release from https://github.com/nemtrif/utfcpp/releases into a temporary directory
- Unzip the release
- Copy the content of utfcpp/source file into the directory where you keep include files for your project
The CMakeList.txt file was originally made for testing purposes only, but unfortunately over time I accepted contributions that added install target. This is not a supported way of installing the utfcpp library and I am considering removing the CMakeList.txt in a future release.
<!-- TOC --><a name="examples-of-use"></a>Examples of use
<!-- TOC --><a name="introductory-sample"></a>Introductory Sample
To illustrate the use of the library, let's start with a small but complete program that opens a file containing UTF-8 encoded text, reads it line by line, checks each line for invalid UTF-8 byte sequences, and converts it to UTF-16 encoding and back to UTF-8:
#include <fstream>
#include <iostream>
#include <string>
#include <vector>
#include "utf8.h"
using namespace std;
int main(int argc, char** argv)
{
if (argc != 2) {
cout << "\nUsage: docsample filename\n";
return 0;
}
const char* te
