SkillAgentSearch skills...

Pcre2

This is a clone of an SVN repository at svn://vcs.exim.org/pcre2/code/trunk. It had been cloned by http://svn2github.com/ , but the service was since closed. Please read a closing note on my blog post: http://piotr.gabryjeluk.pl/blog:closing-svn2github . If you want to continue synchronizing this repo, look at https://github.com/gabrys/svn2github

Install / Use

/learn @svn2github/Pcre2
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

README file for PCRE2 (Perl-compatible regular expression library)

PCRE2 is a re-working of the original PCRE library to provide an entirely new API. The latest release of PCRE2 is always available in three alternative formats from:

ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/pcre2-xxx.tar.gz ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/pcre2-xxx.tar.bz2 ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/pcre2-xxx.zip

There is a mailing list for discussion about the development of PCRE (both the original and new APIs) at pcre-dev@exim.org. You can access the archives and subscribe or manage your subscription here:

https://lists.exim.org/mailman/listinfo/pcre-dev

Please read the NEWS file if you are upgrading from a previous release. The contents of this README file are:

The PCRE2 APIs Documentation for PCRE2 Contributions by users of PCRE2 Building PCRE2 on non-Unix-like systems Building PCRE2 without using autotools Building PCRE2 using autotools Retrieving configuration information Shared libraries Cross-compiling using autotools Making new tarballs Testing PCRE2 Character tables File manifest

The PCRE2 APIs

PCRE2 is written in C, and it has its own API. There are three sets of functions, one for the 8-bit library, which processes strings of bytes, one for the 16-bit library, which processes strings of 16-bit values, and one for the 32-bit library, which processes strings of 32-bit values. There are no C++ wrappers.

In addition, the distribution contains a set of C wrapper functions for the 8-bit library that are based on the POSIX regular expression API (see the pcre2posix man page). These are built into a library called libpcre2-posix. Note that this just provides a POSIX calling interface to PCRE2; the regular expressions themselves still follow Perl syntax and semantics. The POSIX API is restricted, and does not give full access to all of PCRE2's facilities.

The header file for the POSIX-style functions is called pcre2posix.h. The official POSIX name is regex.h, but I did not want to risk possible problems with existing files of that name by distributing it that way. To use PCRE2 with an existing program that uses the POSIX API, pcre2posix.h will have to be renamed or pointed at by a link (or the program modified, of course).

If you are using the POSIX interface to PCRE2 and there is already a POSIX regex library installed on your system, as well as worrying about the regex.h header file (as mentioned above), you must also take care when linking programs to ensure that they link with PCRE2's libpcre2-posix library. Otherwise they may pick up the POSIX functions of the same name from the other library.

To help with this issue, the libpcre2-posix library provides alternative names for the POSIX functions. These are the POSIX names, prefixed with "pcre2_", for example, pcre2_regcomp(). If an application can be compiled to use the alternative names (for example by the use of -Dregcomp=pcre2_regcomp etc.) it can be sure of linking with the PCRE2 functions.

Documentation for PCRE2

If you install PCRE2 in the normal way on a Unix-like system, you will end up with a set of man pages whose names all start with "pcre2". The one that is just called "pcre2" lists all the others. In addition to these man pages, the PCRE2 documentation is supplied in two other forms:

  1. There are files called doc/pcre2.txt, doc/pcre2grep.txt, and doc/pcre2test.txt in the source distribution. The first of these is a concatenation of the text forms of all the section 3 man pages except the listing of pcre2demo.c and those that summarize individual functions. The other two are the text forms of the section 1 man pages for the pcre2grep and pcre2test commands. These text forms are provided for ease of scanning with text editors or similar tools. They are installed in <prefix>/share/doc/pcre2, where <prefix> is the installation prefix (defaulting to /usr/local).

  2. A set of files containing all the documentation in HTML form, hyperlinked in various ways, and rooted in a file called index.html, is distributed in doc/html and installed in <prefix>/share/doc/pcre2/html.

Building PCRE2 on non-Unix-like systems

For a non-Unix-like system, please read the file NON-AUTOTOOLS-BUILD, though if your system supports the use of "configure" and "make" you may be able to build PCRE2 using autotools in the same way as for many Unix-like systems.

PCRE2 can also be configured using CMake, which can be run in various ways (command line, GUI, etc). This creates Makefiles, solution files, etc. The file NON-AUTOTOOLS-BUILD has information about CMake.

PCRE2 has been compiled on many different operating systems. It should be straightforward to build PCRE2 on any system that has a Standard C compiler and library, because it uses only Standard C functions.

Building PCRE2 without using autotools

The use of autotools (in particular, libtool) is problematic in some environments, even some that are Unix or Unix-like. See the NON-AUTOTOOLS-BUILD file for ways of building PCRE2 without using autotools.

Building PCRE2 using autotools

The following instructions assume the use of the widely used "configure; make; make install" (autotools) process.

To build PCRE2 on system that supports autotools, first run the "configure" command from the PCRE2 distribution directory, with your current directory set to the directory where you want the files to be created. This command is a standard GNU "autoconf" configuration script, for which generic instructions are supplied in the file INSTALL.

Most commonly, people build PCRE2 within its own distribution directory, and in this case, on many systems, just running "./configure" is sufficient. However, the usual methods of changing standard defaults are available. For example:

CFLAGS='-O2 -Wall' ./configure --prefix=/opt/local

This command specifies that the C compiler should be run with the flags '-O2 -Wall' instead of the default, and that "make install" should install PCRE2 under /opt/local instead of the default /usr/local.

If you want to build in a different directory, just run "configure" with that directory as current. For example, suppose you have unpacked the PCRE2 source into /source/pcre2/pcre2-xxx, but you want to build it in /build/pcre2/pcre2-xxx:

cd /build/pcre2/pcre2-xxx /source/pcre2/pcre2-xxx/configure

PCRE2 is written in C and is normally compiled as a C library. However, it is possible to build it as a C++ library, though the provided building apparatus does not have any features to support this.

There are some optional features that can be included or omitted from the PCRE2 library. They are also documented in the pcre2build man page.

. By default, both shared and static libraries are built. You can change this by adding one of these options to the "configure" command:

--disable-shared --disable-static

(See also "Shared libraries on Unix-like systems" below.)

. By default, only the 8-bit library is built. If you add --enable-pcre2-16 to the "configure" command, the 16-bit library is also built. If you add --enable-pcre2-32 to the "configure" command, the 32-bit library is also built. If you want only the 16-bit or 32-bit library, use --disable-pcre2-8 to disable building the 8-bit library.

. If you want to include support for just-in-time (JIT) compiling, which can give large performance improvements on certain platforms, add --enable-jit to the "configure" command. This support is available only for certain hardware architectures. If you try to enable it on an unsupported architecture, there will be a compile time error. If in doubt, use --enable-jit=auto, which enables JIT only if the current hardware is supported.

. If you are enabling JIT under SELinux you may also want to add --enable-jit-sealloc, which enables the use of an execmem allocator in JIT that is compatible with SELinux. This has no effect if JIT is not enabled.

. If you do not want to make use of the default support for UTF-8 Unicode character strings in the 8-bit library, UTF-16 Unicode character strings in the 16-bit library, or UTF-32 Unicode character strings in the 32-bit library, you can add --disable-unicode to the "configure" command. This reduces the size of the libraries. It is not possible to configure one library with Unicode support, and another without, in the same configuration. It is also not possible to use --enable-ebcdic (see below) with Unicode support, so if this option is set, you must also use --disable-unicode.

When Unicode support is available, the use of a UTF encoding still has to be enabled by setting the PCRE2_UTF option at run time or starting a pattern with (*UTF). When PCRE2 is compiled with Unicode support, its input can only either be ASCII or UTF-8/16/32, even when running on EBCDIC platforms.

As well as supporting UTF strings, Unicode support includes support for the \P, \p, and \X sequences that recognize Unicode character properties. However, only the basic two-letter properties such as Lu are supported. Escape sequences such as \d and \w in patterns do not by default make use of Unicode properties, but can be made to do so by setting the PCRE2_UCP option or starting a pattern with (*UCP).

. You can build PCRE2 to recognize either CR or LF or the sequence CRLF, or any of the preceding, or any of the Unicode newline sequences, or the NUL (zero) character as indicating the end of a line. Whatever you specify at build time is the default; the caller of PCRE2 can change the selection at run time. The default newline indicator is a single LF character (the Unix standard). You

Related Skills

View on GitHub
GitHub Stars12
CategoryDevelopment
Updated2y ago
Forks8

Languages

C

Security Score

60/100

Audited on Jan 13, 2024

No findings