CText
C++ advanced text processing library
Install / Use
/learn @antonmilev/CTextREADME
CText
Advanced text processing library for C++ & Python
About
CText is a Modern C++ library that offers a wide range of text processing routines. It addresses many complex tasks that can be time-consuming in both C++ and Python. While features like line and word manipulation are readily available in higher-level languages such as C#, Java, and Python, they are often lacking in C++. CText fills this gap by providing those missing capabilities while preserving the low-level control that C++ offers. In addition to essential functions, it includes numerous optimized routines for efficient text handling. The library is highly flexible and scalable, making it easy to extend with custom processing routines. It’s well-suited for tackling preprocessing challenges in NLP and machine learning tasks, or simply for honing your Modern C++ skills.
Main Features
- Modern C++ Template library: Simple to use, just include a single header file.
- Unicode Support: - Seamlessly handle both UNICODE and ANSI in the same project.
- Extensive Text Processing Features: - Includes hundreds of optimized methods for both standard and advanced operations, with many more planned.
- Clean and Readable Codebase: - Designed to help you build complex text-processing applications quickly, abstracting away low-level details and optimizations.
- Cross-Platform Compatibility: Tested with Visual Studio and GCC 7.4, easily portable to other environments.
- No External Dependencies: CText do not depends on any other libraries, the only requirements are C++11 and STL
- Easily Extensible: Text routines are designed to be scalable and adaptable across character types and platforms.
- Python Integration: Compatible with all versions of Python
Have questions or suggestions? Feel free to reach out: email.
🔹 Overview
-
ctextlibbrings the powerful C++ CText library to Python. -
It offers hundreds of optimized, extensible text manipulation routines for NLP, machine learning, and general-purpose use.
-
Install via pip:
pip install ctextlib -
Basic usage:
from ctextlib import Text as text a = text("Hello World") print(a)
🔹 Core Features in Python
🛠️ String Construction & Conversion
append,appendRange: Concatenate strings or character ranges.fromArray,fromMatrix,fromBinary,fromDouble,fromHex,fromInteger: Convert arrays/numbers into text.fromArrayAsHex,fromMatrixAsHex: Hexadecimal formatting for array/matrix data.
📁 File & Path Operations
readFile,writeFile: Load and save files.getDir,getFileName,getExtension,pathCombine: Path manipulation.removeFileName,removeExtension: Modify file paths.
🔍 Search & Analysis
contain,containAny,containOnly,count,countWordFrequencies: Check for presence and frequency of substrings/words.indexOf,indexOfAny,lastIndexOf,find: Locate substrings.regexMatch,regexSearch,regexReplace,regexLines,regexWords: Regex utilities.
✂️ Cutting & Slicing
cutAfterFirst/Last,cutBeforeFirst,cutEnds,cutLeft,cutRight: Trim text from specific positions.keep,keepLeft,keepRight,limit,mid: Keep specific parts of the text.
✏️ Modification & Mutation
insert,insertAtBegin,insertAtEnd: Insert text.replace,replaceAny,remove,removeAny,erase: Modify content.enclose,quote: Wrap content with characters.trim,trimLeft,trimRight: Remove leading/trailing characters.
🔁 Transformations
lower,upper,reverse,rotateLeft,rotateRight,makeUnique,sort,shuffle: Alter casing or order.random,randomAlpha,randomNumber: Generate random strings/numbers.
📏 Checks & Tests
isAlpha,isBinary,isNumber,isHexNumber,isPalindrome,isEmpty,isLower,isUpper: Type and pattern validation.
🧠 Advanced Utilities
split,words,lines,linesRemoveEmpty,linesSort: Line and word processing.nextLine,nextWord: Iterator-like access to lines or words.toBinary,toHex: Format conversion.
🔹 Ideal For
- 🧠 Natural Language Processing (NLP) preprocessing
- 🤖 Machine Learning text cleaning
- 🗃️ File & Data transformation tasks
- ⚡ High-performance and extensible Python applications requiring fine-grained text control
Python
To install CText:
pip install ctextlib
To test if CText is installed:
import ctextlib
a = ctextlib.Text("Hello World")
print(a)
Or:
from ctextlib import Text as text
a = text("Hello World")
print(a)
Python methods reference:
<b>addToFileName</b>
a = text("C:\\Temp\\Temp2\\File.bmp")
a.addToFileName("_mask")
print(a)
C:\Temp\Temp2\File_mask.bmp
<b>append</b>
a = text("Hello ")
a.append("World")
Hello World
a = text("123")
a.append('4',4)
1234444
a = text("")
a.append(['Hello', ' ', 'World'])
Hello World
<b>appendRange</b>
a = text()
a.appendRange('a','z').appendRange('0','9')
abcdefghijklmnopqrstuvwxyz0123456789
<b>between</b>
a = text('The quick brown fox jumps over the lazy dog')
a.between('q','d')
print(a)
uick brown fox jumps over the lazy
a = text('The quick brown fox jumps over the lazy dog')
a.between('quick','lazy')
print(a)
brown fox jumps over the
<b>contain</b>
a = text('The quick brown fox jumps over the lazy dog')
if a.contain('quick') :
print("contain 'quick'")
contain 'quick'
Case-incensitive
a = text('The quick brown fox jumps over the lazy dog')
if a.contain('Quick', False) :
print("contain 'quick'")
contain 'quick'
a = text('The quick brown fox jumps over the lazy dog')
if a.contain(['slow','fast','quick']):
print("contain 'quick'")
contain 'quick'
<b>containAny</b>
a = text('Hello World')
a.containAny('abcd')
True
<b>containOnly</b>
a = text('4365767')
a.containOnly('0123456789')
True
<b>count</b>
a = text('The quick brown fox jumps over the lazy dog')
a.count('the', False)
2
<b>countWordFrequencies</b>
from ctextlib import Text as text
a = text("The quick brown fox jumps over the lazy dog")
a.countWordFrequencies(False)
[(2, 'the'), (1, 'brown'), (1, 'dog'), (1, 'fox'), (1, 'jumps'), (1, 'lazy'), (1, 'over'), (1, 'quick')]
<b>cutAfterFirst</b>
s = text('The quick brown fox jumps over the lazy dog')
a.cutAfterFirst('o')
The quick br
<b>cutAfterLast</b>
s = text('The quick brown fox jumps over the lazy dog')
a.cutAfterLast('o')
The quick brown fox jumps over the lazy d
<b>cutBeforeFirst</b>
s = text('The quick brown fox jumps over the lazy dog')
a.cutBeforeFirst('o')
own fox jumps over the lazy dog
<b>cutEnds</b>
s = text('The quick brown fox jumps over the lazy dog')
a.cutEnds(4)
quick brown fox jumps over the lazy
<b>cutLeft</b>
s = text("Hello World")
s.cutLeft(6)
World
<b>cutRight</b>
s = text("Hello World")
s.cutRight(6)
Hello
<b>enclose</b>
a = text("Hello World")
a.enclose('<','>')
a.enclose('"')
<Hello World>
"Hello World"
<b>endsWith</b>
a = text("Hello World")
if a.endsWith('World'):
print("ends with 'World'")
ends with 'World'
With case-insensitive search:
a = text("Hello World")
if a.endsWith('world', False):
print("ends with 'world'")
ends with 'world'
<b>endsWithAny</b>
if(a.endsWithAny(['cat','dog'])):
print('end to animal...')
end to animal...
<b>erase</b>
a = text('The quick brown fox jumps over the lazy dog')
a.erase(8, 10)
print(a)
The quicx jumps over the lazy dog
<b>equal</b>
a = text()
a.equal('A',10)
AAAAAAAAAA
<b>find</b>
a = text('The quick brown fox jumps over the lazy dog')
a.find('brown')
'brown fox jumps over the lazy dog'
With case-incensitive search:
a = text('The quick brown fox jumps over the lazy dog')
a.find('Brown', False)
'brown fox jumps over the lazy dog'
<b>fromArray</b>
a = text()
a.fromArray([1,2,3,4])
print(a)
1 2 3 4
a = text()
a.fromArray([1,2,3,4], '|')
print(a)
1|2|3|4
a = text()
a.fromArray([1,2,3,4], '')
print(a)
1234
Array of floats
a = text()
a.fromArray([1.1,2.2,3.3,4.4])
print(a)
1.1 2.2 3.3 4.4
Array of strings
a = text()
a.fromArray(['hello','world'])
print(a)
hello world
import numpy as np
a = text()
a.fromArray(np.array(["hello","world"]))
print(a)
hello world
<b>fromArrayAsHex</b>
a = text()
a.fromArrayAsHex([10,20,30,40])
print(a)
0A 14 1E 28
Use without separator
a.fromArrayAsHex([10,20,30,40],2,'')
print(a)
0A141E28
a = text()
a.fromArrayAsHex([1000,2000,3000,4000])
print(a)
3E8 7D0 BB8 FA0
a = text()
a.fromArrayAsHex([1000,2000,3000,4000], 4, ',')
print(a)
03E8,07D0,0BB8,0FA0
<b>fromBinary</b>
a = text()
a.fromBinary(12345)
print(a)
00000000000000000011000000111001
<b>fromDouble</b>
a = text()
a.fromDouble(3.333338478)
print(a)
a.fromDouble(3.33989, 4)
print(a)
a.fromDouble(3.333338478, 10)
3.333338
3.3399
3.3333384780
<b>fromHex</b>
a = text()
a.fromHex(1234567)
a.fromHex('a')
a.fromHex("48 65 6C 6C 6F 20 57 6F 72 6C 64")
0012D687
61
Hello World
