Forums / Developer / Adding a New Transformation Language File

Adding a New Transformation Language File

Author Message

Stéphane Cloutier

Saturday 26 March 2005 10:18:21 am

I created a new transformation file for the Inuktitut language. It transliterates from unicode to ASCII for nice urls.

I created an override transform.ini.append.php and added the following:

[Transformation]
# The inuktitut group
Charsets[]=utf-8;inuktitut

[inuktitut]
Files[]=inuktitut.tr

and placed - inuktitut.tr - in share/transformations folder.

I cleared ini, transformation and url alias caches.

Here's my transformation file:

# Rules related to Inuktitut
#
# The following charsets uses characters from inuktitut:
# utf-8
#
# See basic.tr for formatting options

# Transliteration of inuktitut,is used in URLs and identifiers

inuktitut_transliterate_ascii:

U+1449 = "p"
U+1466 = "t"
U+1483 = "k"
U+14A1 = "g"
U+14BB = "m"
U+14D0 = "n"
U+1505 = "s"
U+14EA = "l"
U+143E = "j"
U+155D = "v"
U+1550 = "r"
U+1585 = "q"
U+1595 = "ng"
U+1596 = "nng"
U+15A6 = "lh"

U+1403 = "i"
U+1404 = "ii"
U+1405 = "u"
U+1406 = "uu"
U+140A = "a"
U+140B = "aa"

U+1431 = "pi"
U+1432 = "pii"
U+1433 = "pu"
U+1434 = "puu"
U+1438 = "pa"
U+1439 = "paa"

U+144E = "ti"
U+144F = "tii"
U+1450 = "tu"
U+1451 = "tuu"
U+1455 = "ta"
U+1456 = "taa"

U+146D = "ki"
U+146E = "kii"
U+146F = "ku"
U+1470 = "kuu"
U+1472 = "ka"
U+1473 = "kaa"

U+148B = "gi"
U+148C = "gii"
U+148D = "gu"
U+148E = "guu"
U+1490 = "ga"
U+1491 = "gaa"

U+14A5 = "mi"
U+14A6 = "mii"
U+14A7 = "mu"
U+14A8 = "muu"
U+14AA = "ma"
U+14AB = "maa"

U+14C2 = "ni"
U+14C3 = "nii"
U+14C4 = "nu"
U+14C5 = "nuu"
U+14C7 = "na"
U+14C8 = "naa"

U+14EF = "si"
U+14F0 = "sii"
U+14F1 = "su"
U+14F2 = "suu"
U+14F4 = "sa"
U+14F5 = "saa"

U+14D5 = "li"
U+14D6 = "lii"
U+14D7 = "lu"
U+14D8 = "luu"
U+14DA = "la"
U+14DB = "laa"

U+1528 = "ji"
U+1529 = "jii"
U+152A = "ju"
U+152B = "juu"
U+152D = "ja"
U+152E = "jaa"

U+1555 = "vi"
U+1556 = "vii"
U+1557 = "vu"
U+1558 = "vuu"
U+1559 = "va"
U+155A = "vaa"

U+1546 = "ri"
U+1547 = "rii"
U+1548 = "ru"
U+1549 = "ruu"
U+154B = "ra"
U+154C = "raa"

U+157F = "qi"
U+1580 = "qii"
U+1581 = "qu"
U+1582 = "quu"
U+1583 = "qa"
U+1584 = "qaa"

U+158F = "ngi"
U+1590 = "ngii"
U+1591 = "ngu"
U+1592 = "nguu"
U+1593 = "nga"
U+1594 = "ngaa"

U+1671 = "nngi"
U+1672 = "nngii"
U+1673 = "nngu"
U+1674 = "nnguu"
U+1675 = "nnga"
U+1676 = "nngaa"

U+15A0 = "lhi"
U+15A1 = "lhii"
U+15A2 = "lhu"
U+15A3 = "lhuu"
U+15A4 = "lha"
U+15A5 = "lhaa"

U+157C + "H"

Qupanuaq - Snow bunting

Stéphane Cloutier

Sunday 27 March 2005 4:28:22 am

Is there a way to have content transliterated, depending on user's preferences?

Qupanuaq - Snow bunting

Stéphane Cloutier

Sunday 27 March 2005 7:33:19 am

I need some assisance. How do you associate two unicode characters and have the combination checked first (avoiding duplication with single unicode characters), but if no match found, try to match the single unicode characters.

I have to associate
U+1585 = "q"

With the following:
U+1585 - U+146D = "qqi"
U+1585 - U+146E = "qqii"
U+1585 - U+146F = "qqu"
U+1585 - U+1470 = "qquu"
U+1585 - U+1472 = "qqa"
U+1585 - U+1473 = "qqaa"

When typing in unicode "ta - q - ki - q"
I should be able to get "taqqiq" and not "taqkiq". Any combination of "q followed by k(i,u,a)" should produce double "qq(i,u,a)".

How do you set such rules in the transformation system?

Qupanuaq - Snow bunting