SkillAgentSearch skills...

Bsdconv

A simple but powerful DSL for charset/encoding conversion and transformation, pure C implementation with no extra dependencies

Install / Use

/learn @buganini/Bsdconv
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

Documentation & Support

http://www.slideshare.net/buganini/bsdconv

http://www.slideshare.net/Buganini/journey-of-bsdconv

API Reference: http://buganini.github.io/bsdconv/

Use bsdconv-man to show manual page for each module

IRC: irc://irc.freenode.net#bsdconv

Compilation & Installation

make PREFIX=${prefix} # default to /usr/local
sudo make install PREFIX=${prefix} # default to /usr/local
sudo ldconfig ${prefix}/lib # Linux
sudo ldconfig -m ${prefix}/lib # FreeBSD

Add codec alias

Update modules/{from,inter,to}/alias
make alias

Example

Convert traditional chinese big5 to simplified chinese utf-8

bsdconv big5:zhcn:utf-8 in.txt > out.txt
bsdconv big5:zhcn:utf-8 -i in.txt #inplace

Convert traditional chinese utf-8 to simplified chinese GB2312 with transliteration

bsdconv utf-8:zhcn:cp936,cp936-trans in.txt > out.txt

Convert simplified chinese to traditional chinese

bsdconv utf-8:zhtw:zhtw-words:utf-8

And ignoring whitespaces mixed in words

bsdconv utf-8:whitespace-derail:zhtw:zhtw-words:whitespace-rerail:utf-8

Convert big5 data, traditional chinese to simplified chinese, CRLF/CR/LF to CRLF, to big5 data, translate simplified chinese words, which are not in big5, to HTML entities, and uppercase the ascii characters.

bsdconv big5:zhcn:win:upper:big5,htmlentity in.txt > out.txt

Counting character width

echo -n "aa" | bsdconv utf-8:width:null
FULL: 1
HALF: 1

echo -n "aaˇ" | bsdconv utf-8:width:null
FULL: 1
HALF: 1
AMBI: 1

Very useful for migrating MySQL DB from Big5 to UTF-8

bsdconv htmlentity,big5-5c,big5:utf-8 in.sql > out.sql

Recover from mis-decoding/encoding (mistreated big5 as iso-8859-1 and converted to utf-8)

bsdconv 'utf-8:iso-8859-1|big5:utf-8'

Decode escaped data (byte/unicode mixed) like %u9644%20

bsdconv 'escape,byte:unicode,byte|skip,ascii:utf-8'

Generate string for fuzzy comparison

echo ¼ℌăDžⓐ⁹灣湾ド鬒鬒æß | bsdconv UTF-8:ZH-FUZZY-TW:KANA-PHONETIC:NFKD-CASEFOLD:UTF-8
    1⁄4hădža9灣灣do鬒鬒æss

Translate text to HTML <IMG />

bsdconv big5:nl2br:ascii,html-img in.txt > out.htm

Use glyph image from http://www.cns11643.gov.tw

bsdconv utf-8:ascii,ascii-html-cns11643-img in.txt out.htm

Maintain inter map:

bsdconv bsdconv-keyword,bsdconv:bsdconv-keyword,utf-8 inter/FOO.txt > edit.tmp
vi edit.tmp
bsdconv bsdconv-keyword,utf-8:bsdconv-keyword,bsdconv edit.tmp > inter/FOO.txt

Windows

Use mingw with Makefile.win to build it, then copy everythings in build/ to c:\bsdconv
the path of the executable will be c:\bsdconv\bsdconv.exe

If you want to install to directory other than default path set BSDCONV_PATH environment variable to your path.

Run setEnvVar.bat as administrator could help you set proper environment variables.

Bindings

Python

Perl

PHP

Ruby

Go

Java

Haskell

Elasticsearch

PostgreSQL

MySQL

View on GitHub
GitHub Stars53
CategoryDevelopment
Updated1y ago
Forks6

Languages

C

Security Score

80/100

Audited on Sep 30, 2024

No findings