CText

Advanced text processing library for C++ & Python

About

CText is a Modern C++ library that offers a wide range of text processing routines. It addresses many complex tasks that can be time-consuming in both C++ and Python. While features like line and word manipulation are readily available in higher-level languages such as C#, Java, and Python, they are often lacking in C++. CText fills this gap by providing those missing capabilities while preserving the low-level control that C++ offers. In addition to essential functions, it includes numerous optimized routines for efficient text handling. The library is highly flexible and scalable, making it easy to extend with custom processing routines. It’s well-suited for tackling preprocessing challenges in NLP and machine learning tasks, or simply for honing your Modern C++ skills.

Main Features

Modern C++ Template library: Simple to use, just include a single header file.
Unicode Support: - Seamlessly handle both UNICODE and ANSI in the same project.
Extensive Text Processing Features: - Includes hundreds of optimized methods for both standard and advanced operations, with many more planned.
Clean and Readable Codebase: - Designed to help you build complex text-processing applications quickly, abstracting away low-level details and optimizations.
Cross-Platform Compatibility: Tested with Visual Studio and GCC 7.4, easily portable to other environments.
No External Dependencies: CText do not depends on any other libraries, the only requirements are C++11 and STL
Easily Extensible: Text routines are designed to be scalable and adaptable across character types and platforms.
Python Integration: Compatible with all versions of Python

Have questions or suggestions? Feel free to reach out: email.

🔹 Overview

ctextlib brings the powerful C++ CText library to Python.
It offers hundreds of optimized, extensible text manipulation routines for NLP, machine learning, and general-purpose use.
Install via pip:
```
pip install ctextlib
```

Basic usage:

from ctextlib import Text as text
a = text("Hello World")
print(a)

🔹 Core Features in Python

🛠️ String Construction & Conversion

append, appendRange: Concatenate strings or character ranges.
fromArray, fromMatrix, fromBinary, fromDouble, fromHex, fromInteger: Convert arrays/numbers into text.
fromArrayAsHex, fromMatrixAsHex: Hexadecimal formatting for array/matrix data.

📁 File & Path Operations

readFile, writeFile: Load and save files.
getDir, getFileName, getExtension, pathCombine: Path manipulation.
removeFileName, removeExtension: Modify file paths.

🔍 Search & Analysis

contain, containAny, containOnly, count, countWordFrequencies: Check for presence and frequency of substrings/words.
indexOf, indexOfAny, lastIndexOf, find: Locate substrings.
regexMatch, regexSearch, regexReplace, regexLines, regexWords: Regex utilities.

✂️ Cutting & Slicing

cutAfterFirst/Last, cutBeforeFirst, cutEnds, cutLeft, cutRight: Trim text from specific positions.
keep, keepLeft, keepRight, limit, mid: Keep specific parts of the text.

✏️ Modification & Mutation

insert, insertAtBegin, insertAtEnd: Insert text.
replace, replaceAny, remove, removeAny, erase: Modify content.
enclose, quote: Wrap content with characters.
trim, trimLeft, trimRight: Remove leading/trailing characters.

🔁 Transformations

lower, upper, reverse, rotateLeft, rotateRight, makeUnique, sort, shuffle: Alter casing or order.
random, randomAlpha, randomNumber: Generate random strings/numbers.

📏 Checks & Tests

isAlpha, isBinary, isNumber, isHexNumber, isPalindrome, isEmpty, isLower, isUpper: Type and pattern validation.

🧠 Advanced Utilities

split, words, lines, linesRemoveEmpty, linesSort: Line and word processing.
nextLine, nextWord: Iterator-like access to lines or words.
toBinary, toHex: Format conversion.

🔹 Ideal For

🧠 Natural Language Processing (NLP) preprocessing
🤖 Machine Learning text cleaning
🗃️ File & Data transformation tasks
⚡ High-performance and extensible Python applications requiring fine-grained text control

Python

To install CText:

pip install ctextlib

To test if CText is installed:

import ctextlib
a = ctextlib.Text("Hello World")
print(a)

Or:

from ctextlib import Text as text
a = text("Hello World")
print(a)

Python methods reference:

addToFileName

a = text("C:\\Temp\\Temp2\\File.bmp")
a.addToFileName("_mask")
print(a)

C:\Temp\Temp2\File_mask.bmp

append

a = text("Hello ")
a.append("World")

Hello World

a = text("123")
a.append('4',4)

a = text("")
a.append(['Hello', ' ', 'World'])

Hello World

appendRange

a = text()
a.appendRange('a','z').appendRange('0','9')

abcdefghijklmnopqrstuvwxyz0123456789

between

a = text('The quick brown fox jumps over the lazy dog')
a.between('q','d')
print(a)

uick brown fox jumps over the lazy

a = text('The quick brown fox jumps over the lazy dog')
a.between('quick','lazy')
print(a)

 brown fox jumps over the

contain

a = text('The quick brown fox jumps over the lazy dog')
if a.contain('quick') :
    print("contain 'quick'")

contain 'quick'

Case-incensitive

a = text('The quick brown fox jumps over the lazy dog')
if a.contain('Quick', False) :
    print("contain 'quick'")

contain 'quick'

a = text('The quick brown fox jumps over the lazy dog')
if a.contain(['slow','fast','quick']):
    print("contain 'quick'")

contain 'quick'

containAny

a = text('Hello World')
a.containAny('abcd')
True

containOnly

a = text('4365767')
a.containOnly('0123456789')
True

count

a = text('The quick brown fox jumps over the lazy dog')
a.count('the', False)

countWordFrequencies

from ctextlib import Text as text
a = text("The quick brown fox jumps over the lazy dog")
a.countWordFrequencies(False)

[(2, 'the'), (1, 'brown'), (1, 'dog'), (1, 'fox'), (1, 'jumps'), (1, 'lazy'), (1, 'over'), (1, 'quick')]

cutAfterFirst

s = text('The quick brown fox jumps over the lazy dog')
a.cutAfterFirst('o')

The quick br

cutAfterLast

s = text('The quick brown fox jumps over the lazy dog')
a.cutAfterLast('o')

The quick brown fox jumps over the lazy d

cutBeforeFirst

s = text('The quick brown fox jumps over the lazy dog')
a.cutBeforeFirst('o')

own fox jumps over the lazy dog

cutEnds

s = text('The quick brown fox jumps over the lazy dog')
a.cutEnds(4)

quick brown fox jumps over the lazy

cutLeft

s = text("Hello World")
s.cutLeft(6)

World

cutRight

s = text("Hello World")
s.cutRight(6)

Hello

enclose

a = text("Hello World")
a.enclose('<','>')
a.enclose('"')

<Hello World>
"Hello World"

endsWith

a = text("Hello World")
if a.endsWith('World'):
    print("ends with 'World'")

ends with 'World'

With case-insensitive search:

a = text("Hello World")
if a.endsWith('world', False):
    print("ends with 'world'")

ends with 'world'

endsWithAny

if(a.endsWithAny(['cat','dog'])):
    print('end to animal...')

end to animal...

erase

a = text('The quick brown fox jumps over the lazy dog')
a.erase(8, 10)
print(a)

The quicx jumps over the lazy dog

equal

a = text()
a.equal('A',10)

AAAAAAAAAA

find

a = text('The quick brown fox jumps over the lazy dog')
a.find('brown')

'brown fox jumps over the lazy dog'

With case-incensitive search:

a = text('The quick brown fox jumps over the lazy dog')
a.find('Brown', False)

'brown fox jumps over the lazy dog'

fromArray

a = text()
a.fromArray([1,2,3,4])
print(a)

1 2 3 4

a = text()
a.fromArray([1,2,3,4], '|')
print(a)

1|2|3|4

a = text()
a.fromArray([1,2,3,4], '')
print(a)

Array of floats

a = text()
a.fromArray([1.1,2.2,3.3,4.4])
print(a)

1.1 2.2 3.3 4.4

Array of strings

a = text()
a.fromArray(['hello','world'])
print(a)

hello world

import numpy as np
a = text()
a.fromArray(np.array(["hello","world"]))
print(a)

hello world

fromArrayAsHex

a = text()
a.fromArrayAsHex([10,20,30,40])
print(a)

0A 14 1E 28

Use without separator

a.fromArrayAsHex([10,20,30,40],2,'')
print(a)

0A141E28

a = text()
a.fromArrayAsHex([1000,2000,3000,4000])
print(a)

3E8 7D0 BB8 FA0

a = text()
a.fromArrayAsHex([1000,2000,3000,4000], 4, ',')
print(a)

03E8,07D0,0BB8,0FA0

fromBinary

a = text()
a.fromBinary(12345)
print(a)

00000000000000000011000000111001

fromDouble

a = text()
a.fromDouble(3.333338478)
print(a)
a.fromDouble(3.33989, 4)
print(a)
a.fromDouble(3.333338478, 10)

3.333338
3.3399
3.3333384780

fromHex

a = text()
a.fromHex(1234567)
a.fromHex('a')
a.fromHex("48 65 6C 6C 6F 20 57 6F 72 6C 64")

0012D687
61
Hello World

CText

Install / Use

README

CText

Advanced text processing library for C++ & Python

About

Main Features

🔹 Overview

🔹 Core Features in Python

🛠️ String Construction & Conversion

📁 File & Path Operations

🔍 Search & Analysis

✂️ Cutting & Slicing

✏️ Modification & Mutation

🔁 Transformations

📏 Checks & Tests

🧠 Advanced Utilities

🔹 Ideal For

Python

Python methods reference: