People interested in technical explanations and legacy tricks may read the rest of this text. People interested in an up to date practical solution shall read about the HANDLE_CHARSETS
function added in WeiDU 237 instead. It implements an easy, well integrated (i.e. easy to uninstall), multi-platform compatible way of converting the texts as described below.
BGEE uses a new encoding for special characters used in international language such as French (à é ...). It is based on UTF8. That encoding stores special characters on 2 bytes instead of one in the past. BG and BG II were based on ISO-8859 encoding (special character on a single byte). It seems ISO-8859-1 was used for Western Europe, and at least another one for Poland and probably another one for Russian.
To adjust to different language charset, as far as I know, the games were using specific BAM files containing characters shape matching the encoding.
With BGEE, mods such as BG2 Tweak Pack don't install properly in language using special characters. In the game, the string is displayed only until a special character is found, so it is shortened, possibly a lot, so that's unreadable. I assume that this is because a byte with the most significant bit set (as is the case for special characters in ISO-8859 encoding) is invalid in UTF8.
In the cas of BPSeries, although the mod installs properly and the description are properly added to scrpdesc.2da, with StrRef matching the tlk file, as can be checked with Near Infinity, the BP scripts description don't display properly. It seems each line of a multiline text ends as soon as a special is found:
The solution I found is to convert the tra files into UTF8 encoding before installing the mod. Since the original game don't handle UTF8, this means you have to choose either BG2 or BGEE if you still target international players.
Ideally, WeiDU would convert the tra files during installation. However this is not possible right now, and would require some addition in any case, since the encoding of texts for BG2 is not the same for all languages, and the proper encoding for each language should be either hardcoded into WeiDU or given to it an extended LANGUAGE instruction.
In the mean time, the solution is to convert the tra file during installation, depending on the language chosen by the user. I made a preliminary attempt based on BPSeries v1010, as it is very easy to check in game if the encoding is working. Here are my results so far.
First of all here how it looks with proper encoding for BGEE:
I used a version of iconv compiled for Windows to perform the conversion. In principle Linux and MacOSX should have iconv already, so this solution can be used as well.
Here is the batch I used to convert:
:: The files to convert are the .tra files
:: Adding .tpa files is also necessary for BG2 Tweak Pack
:: ~nx is used to keep only the filename (n) with extension (x), without the full path
:: See iconv manual
:: -f to give the original file encoding (here CP1252 / WINDOWS-1252 / ISO8859-1 for French)
:: -t to give the final encoding (UTF-8)
for %%i in (BPSeries\LANGUAGE\FRENCH\*.tra) do BPSeries\winutils\iconv -f CP1252 -t UTF-8 "BPSeries\LANGUAGE\FRENCH\%%~nxi" > "BPSeries\LANGUAGE\FRENCH\%%~nxi_utf8"
:: Note: copying converted files back upon the original .tra files is performed in the tp2 file
:: in order to take benefit of the restore capabilities from WeiDU during uninstall
In order to integrate that conversion in the installation process, the only idea that came to me was to add a component at the beginning of the tp2 file for BPSeries. The following block was inserted just after the LANGUAGE
instructions and the BEGIN @5001
for the first component:
// Isaya : special component, mandatory, to install first in case of BGEE, to convert tra files to UTF8
// BGEE test borrowed from BG2 Tweak Pack
BEGIN ~tra conversion for BGEE (French and Windows only, for now)~ // NO_LOG_RECORD
REQUIRE_PREDICATE FILE_EXISTS_IN_GAME ~oh9350.are~ ~Only for BGEE~
// Only applies for specific languages, here french
// This could easily be done with Linux and Mac, since they must have built-in iconv
// But I don't know how to write a .sh script
ACTION_IF ("%WEIDU_OS%" STRING_COMPARE_CASE ~WIN32~ = 0) AND
("%LANGUAGE%" STRING_COMPARE_CASE ~french~ = 0) THEN BEGIN //Windows
END // ELSE BEGIN
// AT_NOW ~bpseries/convertbpseriestra.sh~
// Replace the original tra files (Weidu should restore the original at uninstall)
// Note: unfortunately, MOVE does not remove the .tra_utf8 file after overwriting the tra file, it seems
// After conversion, we need to reload the tra file
// For french only, as example
ACTION_IF ("%WEIDU_OS%" STRING_COMPARE_CASE ~WIN32~ = 0) AND
("%LANGUAGE%" STRING_COMPARE_CASE ~french~ = 0) THEN BEGIN
MOVE ~bpseries/language/%LANGUAGE%/setup.tra_utf8~ ~bpseries/language/%LANGUAGE%/setup.tra~
A few notes:
- I couldn't come up with a conversion script for Linux or MacOSX, so the code only check for Windows
- I only dealt with French language, since it's mine and I could properly check the result. However I assume that the exact same script would work as well for other Western European languages such as Italian, Spanish and German, provided they are all based on Windows-1252 code page (or ISO8859-1).
- additional batch/script files would be required for language using different languages, unless parameters are passed to the scripts to tell them the encoding of the original file (I assume that UTF8 applies to all languages in BGEE, why wouldn't it?)
- I have an issue with the MOVE instruction, which doesn't remove the .tra_utf8 after overwriting the .tra file, maybe it's a bug in WeiDU 231
A huge drawback of this solution is that it only works if the user cooperates and installs the conversion component, which is also useless in English but will be offered nonetheless.
Maybe WeiDU gurus can find out a much better solution for this issue. I do hope so. Otherwise I'm afraid international players will be left out of the BGEE community as far as mods are concerned. I'm posting these findings in the hope more knowledgeable people find a much better way.