UNICODECS
This library allows easy handling
and conversions among several codepages, unicode and UTF-8.
iconv does this also, but is not
very suitable for MSDOS programs, for example. It takes thousands of
bytes! You can think: Why i need to have greek or hangul codepage
translation tables what take my program be four times bigger and a
little slower?
The goal of this
lilbrary is to have some user defines codepages ( or none ) hardcoded,
and some more others ( or all of them ) in the popular unicode ASCII
files.
If the above
aplicattion need tomorrow manage hangul, yo must simply take hangul
file from unicode, and copy it in the aplication filesystem, without
the need of relinking.
Coding details
Tables are stored in memory in a tree fashion. This can take avantage
of "freeing the holes". Unicode usually keeps codepages grouped in the
65535 possible unicodes. This structure makes possible find the unicode
form a codepage or codepage from a unicode in a few CPU clocks, and
without branch code (this unloads prefetch queue in the CPU, and slows
it).
Additionally, not convertible characters (for
example getting a european codepage code from a chinesse
kanji ) make the algorithm
iterate over the root leaf ot the tree. This means no branches
and munch more speed.
Aditional features:
- accented, caron, etc inensitive strcmp()
- Example áçÇ would be equal to àCc
- Normalizing to ASCII
- Example àèâñ would be AEAN
Supported
platforms:
- MSDOS ( 32 bit, flat memory model )
- UNIX