Localisation of mods for BGEE
Isaya
Member, Translator (NDA) Posts: 752
People interested in technical explanations and legacy tricks may read the rest of this text. People interested in an up to date practical solution shall read about the HANDLE_CHARSETS function added in WeiDU 237 instead. It implements an easy, well integrated (i.e. easy to uninstall), multi-platform compatible way of converting the texts as described below.
BGEE uses a new encoding for special characters used in international language such as French (à é ...). It is based on UTF8. That encoding stores special characters on 2 bytes instead of one in the past. BG and BG II were based on ISO-8859 encoding (special character on a single byte). It seems ISO-8859-1 was used for Western Europe, and at least another one for Poland and probably another one for Russian.
To adjust to different language charset, as far as I know, the games were using specific BAM files containing characters shape matching the encoding.
With BGEE, mods such as BG2 Tweak Pack don't install properly in language using special characters. In the game, the string is displayed only until a special character is found, so it is shortened, possibly a lot, so that's unreadable. I assume that this is because a byte with the most significant bit set (as is the case for special characters in ISO-8859 encoding) is invalid in UTF8.
In the cas of BPSeries, although the mod installs properly and the description are properly added to scrpdesc.2da, with StrRef matching the tlk file, as can be checked with Near Infinity, the BP scripts description don't display properly. It seems each line of a multiline text ends as soon as a special is found:
The solution I found is to convert the tra files into UTF8 encoding before installing the mod. Since the original game don't handle UTF8, this means you have to choose either BG2 or BGEE if you still target international players.
Ideally, WeiDU would convert the tra files during installation. However this is not possible right now, and would require some addition in any case, since the encoding of texts for BG2 is not the same for all languages, and the proper encoding for each language should be either hardcoded into WeiDU or given to it an extended LANGUAGE instruction.
In the mean time, the solution is to convert the tra file during installation, depending on the language chosen by the user. I made a preliminary attempt based on BPSeries v1010, as it is very easy to check in game if the encoding is working. Here are my results so far.
First of all here how it looks with proper encoding for BGEE:
I used a version of iconv compiled for Windows to perform the conversion. In principle Linux and MacOSX should have iconv already, so this solution can be used as well.
Here is the batch I used to convert:
convertbpseriestra.bat
Maybe WeiDU gurus can find out a much better solution for this issue. I do hope so. Otherwise I'm afraid international players will be left out of the BGEE community as far as mods are concerned. I'm posting these findings in the hope more knowledgeable people find a much better way.
BGEE uses a new encoding for special characters used in international language such as French (à é ...). It is based on UTF8. That encoding stores special characters on 2 bytes instead of one in the past. BG and BG II were based on ISO-8859 encoding (special character on a single byte). It seems ISO-8859-1 was used for Western Europe, and at least another one for Poland and probably another one for Russian.
To adjust to different language charset, as far as I know, the games were using specific BAM files containing characters shape matching the encoding.
With BGEE, mods such as BG2 Tweak Pack don't install properly in language using special characters. In the game, the string is displayed only until a special character is found, so it is shortened, possibly a lot, so that's unreadable. I assume that this is because a byte with the most significant bit set (as is the case for special characters in ISO-8859 encoding) is invalid in UTF8.
In the cas of BPSeries, although the mod installs properly and the description are properly added to scrpdesc.2da, with StrRef matching the tlk file, as can be checked with Near Infinity, the BP scripts description don't display properly. It seems each line of a multiline text ends as soon as a special is found:
The solution I found is to convert the tra files into UTF8 encoding before installing the mod. Since the original game don't handle UTF8, this means you have to choose either BG2 or BGEE if you still target international players.
Ideally, WeiDU would convert the tra files during installation. However this is not possible right now, and would require some addition in any case, since the encoding of texts for BG2 is not the same for all languages, and the proper encoding for each language should be either hardcoded into WeiDU or given to it an extended LANGUAGE instruction.
In the mean time, the solution is to convert the tra file during installation, depending on the language chosen by the user. I made a preliminary attempt based on BPSeries v1010, as it is very easy to check in game if the encoding is working. Here are my results so far.
First of all here how it looks with proper encoding for BGEE:
I used a version of iconv compiled for Windows to perform the conversion. In principle Linux and MacOSX should have iconv already, so this solution can be used as well.
Here is the batch I used to convert:
convertbpseriestra.bat
:: The files to convert are the .tra filesIn order to integrate that conversion in the installation process, the only idea that came to me was to add a component at the beginning of the tp2 file for BPSeries. The following block was inserted just after the LANGUAGE instructions and the BEGIN @5001 for the first component:
:: Adding .tpa files is also necessary for BG2 Tweak Pack
:: ~nx is used to keep only the filename (n) with extension (x), without the full path
:: http://technet.microsoft.com/en-us/library/cc755694(WS.10).aspx
:: See iconv manual
:: -f to give the original file encoding (here CP1252 / WINDOWS-1252 / ISO8859-1 for French)
:: -t to give the final encoding (UTF-8)
for %%i in (BPSeries\LANGUAGE\FRENCH\*.tra) do BPSeries\winutils\iconv -f CP1252 -t UTF-8 "BPSeries\LANGUAGE\FRENCH\%%~nxi" > "BPSeries\LANGUAGE\FRENCH\%%~nxi_utf8"
:: Note: copying converted files back upon the original .tra files is performed in the tp2 file
:: in order to take benefit of the restore capabilities from WeiDU during uninstall
// Isaya : special component, mandatory, to install first in case of BGEE, to convert tra files to UTF8A few notes:
// BGEE test borrowed from BG2 Tweak Pack
BEGIN ~tra conversion for BGEE (French and Windows only, for now)~ // NO_LOG_RECORD
REQUIRE_PREDICATE FILE_EXISTS_IN_GAME ~oh9350.are~ ~Only for BGEE~
// Only applies for specific languages, here french
// Convert
// This could easily be done with Linux and Mac, since they must have built-in iconv
// But I don't know how to write a .sh script
ACTION_IF ("%WEIDU_OS%" STRING_COMPARE_CASE ~WIN32~ = 0) AND
("%LANGUAGE%" STRING_COMPARE_CASE ~french~ = 0) THEN BEGIN //Windows
AT_NOW ~bpseries/convertbpseriestra.bat~
END // ELSE BEGIN
// AT_NOW ~bpseries/convertbpseriestra.sh~
//END
// Replace the original tra files (Weidu should restore the original at uninstall)
// Note: unfortunately, MOVE does not remove the .tra_utf8 file after overwriting the tra file, it seems
// After conversion, we need to reload the tra file
// For french only, as example
ACTION_IF ("%WEIDU_OS%" STRING_COMPARE_CASE ~WIN32~ = 0) AND
("%LANGUAGE%" STRING_COMPARE_CASE ~french~ = 0) THEN BEGIN
MOVE ~bpseries/language/%LANGUAGE%/setup.tra_utf8~ ~bpseries/language/%LANGUAGE%/setup.tra~
LOAD_TRA ~bpseries/language/%LANGUAGE%/setup.tra~
END
- I couldn't come up with a conversion script for Linux or MacOSX, so the code only check for Windows
- I only dealt with French language, since it's mine and I could properly check the result. However I assume that the exact same script would work as well for other Western European languages such as Italian, Spanish and German, provided they are all based on Windows-1252 code page (or ISO8859-1).
- additional batch/script files would be required for language using different languages, unless parameters are passed to the scripts to tell them the encoding of the original file (I assume that UTF8 applies to all languages in BGEE, why wouldn't it?)
- I have an issue with the MOVE instruction, which doesn't remove the .tra_utf8 after overwriting the .tra file, maybe it's a bug in WeiDU 231
Maybe WeiDU gurus can find out a much better solution for this issue. I do hope so. Otherwise I'm afraid international players will be left out of the BGEE community as far as mods are concerned. I'm posting these findings in the hope more knowledgeable people find a much better way.
Post edited by Isaya on
6
Comments
If there's any way I can help out international players to fully enjoy my work, I'm more than willing to offer it. If you don't mind, I'll include this for optional component in the next edition of the BP series. Naturally, I won't be able to verify much of it installed linguistically, but at least I can set up all the batch files and monkey-see-monkey-do the supporting weidu code together from what you posted. If this is solved properly in either a future update of BGEE or WeiDu itself, I can just as easily remove the code
I'll keep an eye on how things are moving.
Great to see you there Isaya... I should go back to "La Couronne de Cuivre" some times ^^
So feel free to work on translation. There will always be a way to integrate them, don't worry. ;-)
And you're welcome back at "La Couronne", especially if you feel like translating!
@horredtheplague, feel free to use that trick. However I believe you don't need to, at least for BPSeries, since the new release seems specific to BGEE, if I understand correctly. Correct me if I'm wrong.
I checked the new release, and the changes to the descriptions compared to the original French translation show that you removed HLA and such stuff, so that release of BPSeries wouldn't fit BG II/BGT very well. That's why I assume this is only for BGEE.
Currently I'm updating the translation to add description of the new scripts.
I propose to send you the file with both encoding. You could the UTF8 version directly if you indeed target only BGEE with this release.
As for the other languages, I believe that spanish and german should use the same original encoding as French (Windows-1252), if this Wikipedia page is to be trusted. The other code pages of interest would be 1250 for Polish and 1251 for Russian, among the languages officially supported by BGEE.
So I decided to make an experiment with the russian translated Prowler posted on SHS. The original file posted was using the standard 8 bits character set, so that notepad++ was only displaying a series of special characters in my West Europe character set, but not russian ones.
After conversion to UTF8 (notepad++ recognizes it as UTF8 without BOM), it looked far better:
In reinstalled BPSeries and selected Russian. Unfortunately, it doesn't look right in the game:
I uninstalled BPSeries, changed my game to English, linked to en_US/dialog.tlk and installed again in Russian. I can't say it's any better, unfortunately:
I don't know what to think. Since the cyrillic characters display properly (or seem to) in notepad++, I assume that Windows can display them well enough. So either BGEE uses a system font that doesn't include the cyrillic characters, or it uses an internal file for the font that doesn't handle them (BG II was using a BAM file for font) or whatever. But since the game is not available with variations as BG II was (English and International versions, officially, plus specific ones for Eastern Europe, I think) I'm afraid the cyrillic characters may not work either for Russian players.
In any case, I would suggest you include the russian tra file converted to UTF8 in BPSeries so that you can get a feedback. We already know the previous encoding will not work anyway.
One last word on conversion. To convert files to UTF8 without having to learn the command line parameters of iconv, I recommend Cp Converter, available on SourceForge, if you're using Windows. It requires .Net, but should work out of the box on Windows 7 at least. It's very easy to use. One just need to select the original encoding and the final encoding (UTF8 for our purpose).
The names are localised by .Net, so they appeared in French for me. I suggest to rely on the code page at the end of the name to select the right ones among the huge list.
I think the typical ones are:
- 1252 for French, Spanish, German, Italian and also English (in case of special characters that word processors may use, as mentionned on the WeiDU forum)
- 1251 for Russian (cyrillic characters)
- 1250 for Polish and Czech
Its output is the same as iconv when I convert the tra file from French.@horredtheplague, I'll send you the tra file for BP Series through SHS forum.
Edited: a wrong assumption about Rogue Rebalancing made me draw a false conclusion.
WeiDU does not do any charset conversions.
RR loads UTF8-encoded text ("rr/tra/bgee/*") when installed on BGEE.
Stupid me. How did I miss that?! I checked the tp2 file but didn't notice the check and the call to LAM bgee_language, and the presence of the bgee directory.
I'll have to write a feature request for WeiDU in the future then. ;-)
@Isaya, thank you very much for pointing out a solution. (I'll have to look at it again closely to understand it, though.)
But do I see it right that it won't be possible to just install tra lines with special characters in without preparing them in some way?
With the solution I proposed, you didn't have to prepare the files as long as they were consistent in terms of encoding (all cp something and not UTF8). However I wouldn't recommend this way nowadays.
Horred and Wisp have devised another way of doing it, which is much simpler, especially since you don't have to worry about the operating system for the script to convert texts. Instead of converting at installation time, they provide in BP Series and Rogue Rebalancing the two sets of files, for old games and for new ones. This also saves time testing the conversion script for each language.
I think cpXXXX is the default encoding and they use the same kind of conditional code to load the tra file with the different encoding if required for BGEE. I suggest you take these mods as reference instead.
You can still use iconv rather than the GUI tool to make the conversions if you have many files but that becomes a packaging task instead.
Warning ! In BGEE v1.2, it seems that not converting cp1252 characters to UTF8 before or during installation may lead the game to crash when the corresponding text is used (reports about SCS and my own experiment with RR if I prevented it to use the right text). So it may be even more mandatory to include the proper encoding of texts in BGEE mods.
@Isaya, would you be so kind and explain to me how the Cp Converter is started? I have Windows 7 and installed .NET v4.5, but I have no clue as to how to start the program.
Unfortunately you can't simply drag en drop files and have to use the File menu. You can select several files for conversion at the same time (that's probably why it's written "Sources files", why didn't I realise it before :facepalm:).
Select the source file encoding, typically "Western Europe (Windows) - 1252" in many cases (I expect the actual name to be in german if that's the language you use in Windows). For Destination, I used "Unicode (UTF8) - 65001" for BGEE.
I didn't use any of the two options, actually I don't know what they mean.
But still, if I start the exe, all I get is a menue with an empty source file box (no possibility to navigate to any files), and the source and destination drop menues for chosing the encodings. How do I actually get to the files I want to encode?
Otherwise it means there is something wrong with the interaction with .Net framework. All the computers I tried the tool with were having a development tool installed (hence development version of .Net too). I hope the problem you face is not specific to runtime environment of .Net or with a specific version of .Net.
as of November 16th, 2013,
your recommendation for mods that are written exclusively for BG:EE or BGII:EE is to use UTF-8 (no BOM).
your recommendation for mods that are written to install on BG/BGII/TUTU/BGT/BG:EE/BGII:EE is to do the following:
1. provide two pre-converted copies of the .tra, one in ASCII (ISO-8859) for the original games, and one in UTF-8 (no BOM) for the new games.
2. in .tp2, detect the game and use the appropriate .tra
And we hope eventually to come up with a way that does not require two complete copies of the .tra for each language.
(I am most of all trying to confirm that you do not recommend the "convert at install time using batch file and iconv" in your original post - your BG1NPC fix is elegant, but if it doesn't work then I need to do some serious file editing.)
Do I have this correct?
You want CP1252 (aka Windows-1252) or others, not ISO-8859-1 (aka Latin-1) or others. They are subtly different. (Also, none of these are ASCII, they are all (mutually incompatible) extensions of ASCII.) Edit: languages east of ~Germany use other encodings, CP1251 for Russian, CP1250 for Polish, for example.
I guess porviding two sets of tra files is the current way to go, and although @DavidW's idea of defining e.g. ~German (BGII:EE)~ and ~German (original BGII)~ as two different languages is really great (because so simple), having an automated solution that would get the right files depending on the game without the need for confusing players would be great.
I'm not saying that the conversion at install time is not a viable solution. The solution I provided and that is currently included in the BG1 NPC beta is only for Windows so far. I'm convinced it wouldn't be hard to make a sh script compatible with Mac and Linux given that they most probably have iconv as a built-in command. I will look into this.
The current script only works for Western countries languages. I remember a polish version of BG1 NPC was mentionned and it would need a modification of the conversion script. The fact that it requires a different encoding could easily be handled by adding a second parameter to the script to give the encoding to apply (CP1252 for french and spanish, CP1250 for polish).
:: Use a parameter to the script (%1) to specify the directory that must be converted :: Use a second parameter (%2) to specify the initial encoding @echo off for %%i in (bg1npc\tra\%1\*.tra) do bg1npc\iconv -f %2 -t UTF-8 "bg1npc\tra\%1\%%~ni.tra" > "bg1npc\tra\%1\%%~ni.tra_utf8"
with
AT_NOW ~bg1npc/conv_tra.bat %LANGUAGE% CP1252~
in the TP2 file (or better, with a variable for encoding set according to language).
To avoid dependency on the operating system, on the fly conversion would benefit from a command included in WeiDU to convert a file. I'm wondering if the iconv library is not already in the build process, maybe not explicitely but through another dependency. Maybe @Wisp could tell.
That wouldn't change the fact that the mod would still have to detect BGEE/BG2EE, convert the files and load them explicitely. It could look like this, instead of AT_NOW and all the LOAD_TRA as of now:
ACTION_DEFINE_ASSOCIATIVE_ARRAY trafiles BEGIN ~p#brlt.tra~ => ~p#brlt.tra_utf8~ ... (to list all the files to handle) END ... if BGEE is detected ACTION_PHP_EACH tratrafiles AS original => bgee BEGIN // CONVERT_ENCODING input_encoding output_encoding input_file output_file CONVERT_ENCODING ~CP1252~ ~UTF-8~ ~bg1npc\tra\%LANGUAGE%\%original%~ ~bg1npc\tra\%LANGUAGE%\%bgee%~ MOVE ~bg1npc\tra\%LANGUAGE%\%bgee%~ ~bg1npc\tra\%LANGUAGE%\%original%~ END
I think that including two sets of files, like Rogue Rebalancing and BP Series, or converting on the fly, are both working solutions (we need a framework for Linux and Mac in the second case though).
Using preconverted files moves the complexity of conversion out of the TP2 and is probably easier.
Moreover conversion on the fly requires a specific test to avoid doing it several times in case you put it in an ALWAYS block when you have multiple components and none that is mandatory. In BG1 NPC, I used the fact that the bg1npc_tmp.tra file is processed to create bg1npc.tra to determine that a component has already been installed and that conversion is therefore done.
Other mods don't necessarily have such a thing to track an initial preparation and a specific check is then required, such as creating an additional empty file with weird extension as some mods do.
[1] Modders creating mods that are written exclusively for BG:EE or BGII:EE should use code editors like NotePad++ or JEdit and save in encoding UTF-8 (no BOM).
[2] Modders writing for multiple game variants on BG/BG2 content should (at least for now) provide two separate .tra files, one encoded in CP1252 for the older game variants, one encoded in UTF-8 (no BOM) for the :EE series. Several models for this exist, including using things in your .tp2 like
Wisp's
ALWAYS ACTION_IF GAME_IS ~bgee bg2ee~ BEGIN OUTER_SPRINT tra_path ~mymod/tra/bgee~ END ELSE BEGIN OUTER_SPRINT tra_path ~mymod/tra~ END END // elsewhere COMPILE ~mymod/foo.d~ USING ~%tra_path%/%LANGUAGE%/foo.tra~ LOAD_TRA ~%tra_path%/%LANGUAGE%/bar.tra~
Or variants.
[3] Modders porting existing mods that have large numbers of .tra files and/or HUGE .tra files for multiple game variants on BG/BG2 content should continue to explore how to use and extend @Isaya's "convert using shell or bat scripting at install". The caveats seem to be that currently using AUTO_TRA and language declarations seem to bork, and if a user changes the language in BG[II]:EE they can bork the .tra loads.
[4] We need to take up a collection for Wisp so that he can dedicate 40+ hours per day to recreating the Rosetta Stone in WeiDU so that all this encoding stuff will magically be fixed by pre- and post-processing .tra files to and from various encodings, languages, and platforms.
LANGUAGE is evaluated before ALWAYS. So one way to deal with setting up different character encodings is to play a shell game...
1. Declare LANGUAGE with a "hardcoded" default (so the .tp2 can fall back on it) for that particular language
2. Override the default language in the ALWAYS block.
With two .tras, one in windows-1252 and the other in UTF-9 (no BOM)
ALWAYS ACTION_IF GAME_IS ~bg2ee~ BEGIN OUTER_SPRINT tra_version ~c-aranw_utf8nb~ LOAD_TRA ~aranw\tra\english\c-aranw_utf8nb.tra~ LOAD_TRA ~aranw\tra\%LANGUAGE%\c-aranw_utf8nb.tra~ END ELSE BEGIN OUTER_SPRINT tra_version ~c-aranw_cp1252~ LOAD_TRA ~aranw\tra\english\c-aranw_cp1252.tra~ LOAD_TRA ~aranw\tra\%LANGUAGE%\c-aranw_cp1252.tra~ END END LANGUAGE ~English~ ~English~ ~aranw\tra\english\c-aranw_cp1252.tra~
with entries that use USING within components, the ALWAYS block will load the wanted .tra, and we can make sure USING is looking correctly by referencing the variable set in the ALWAYS block above:
/* SoA dialog file */ COMPILE ~aranw/dialog/c-arandialog.d~ USING ~aranw/tra/%LANGUAGE%/%tra_version%.tra~ ACTION_IF GAME_IS ~bg2ee~ BEGIN COMPILE ~aranw/dialog/c-aranbg2ee_content.d~ USING ~aranw/tra/%LANGUAGE%/%tra_version%.tra~ END
Tested using WedDU v235 and current BGII:EE.
I agree the solution able to maintain the use of AUTO_TRA would be better.
I made some small changes to the TP2 for BG1 NPC in my copy of the repository, in order to handle Linux and Mac conversions. They will need to be tested with the game on these OS as all I could do was test the conversion script itself on Linux. I ensured the command line argument for iconv was compatible with the Mac OS documentation found online. However the script on Linux and the code checking the installation is made on Mac remains to be done.
I also introduced a change to allow the script to use any encoding as input so that a polish translation would be easy to add, for instance.
I checked installation on BGEE V1.2 on Windows. I checked only with Imoen interactions (including Xzar and Montaron) on the starting area.
I'll upload the changes to github so that you can review them.
Converted texts to UTF-8 without BOM and install all components on EN version still mess up dialogs
When I get back to home I'll post list of installed components.
EDIT:
More interactions: v14
Max exp -> Remove Exp limits: v14
Wearing many items with AC bonus -> No limits: v14
Max HP on levelup -> Maximum: v14
Unlimited stacking ammo: v14
Unlimited stacking gems and jewelry: v14
Unlimited stacking potions: v14
Unlimited stacking scrolls: v14
Rest anywhere (Japheth): v14
Now, when I install the mod while language is set to Polish, item descriptions shows "Invalid xxxx" (where xxxx seems to be random string number). When I switch language to English all items descriptions show correctly in Polish (but obviously the rest of dialogues are in English). When I switch language to Polish again, it is once again a mess.
On both BG:EE and BG2:EE installation of the mod using Polish translation went fine, the mess is shown only when the language in game is set to Polish.
@Isaya, @Wisp you're main experts here. Any clues what might be the culprit? Probably something obvious...
Also, @Isaya, wouldn't it be easier to make this an Action Function and include with WeiDU (like HANDLE_TILESETS and the patches ADD_SPELL_EFFECT, etc).
It's not that obvious there, that one should do it first. Shouldn't WeiDU automatically detect in which language the game is installed and modify correct dialog.tlk accordingly? Or maybe I should set it up somehow permanently, not only by changing line in weidu.conf, but somewhere else?
As the question asked by WeiDU and the weidu.conf file didn't exist at the time this topic was started, I couldn't mention it. In principle, you never have to edit the file. It is created the first time you install a mod, WeiDU asks you which language you intend to play the game with. If you remove it, WeiDU will ask you again next time you install a mod.
Unless you reinstall all your mods, you should never change the game language in weidu.conf. WeiDU only adds texts in the dialog.tlk file of the game language specified and saved in weidu.conf. If you change language at some point during the installation of mods, the texts will be added partially in the first language and in the second language. In that case, you will get Invalid xxx whatever language you select in game.
@CrevsDaak Wisp created a new function called HANDLE_CHARSETS that integrates the ability to make install time conversion in a way compatible with Windows, Mac OS X and Linux. It requires that a Windows version of iconv is included in the mod (Mac OS X and Linux have it as part of the OS). This functionality couldn't be added directly in WeiDU because of open source licenses conflicts.
This capability is still being worked on, as you can see in the topic. However the initial version of the function is used in Edwin Romance V2.06 and is integrated in a beta version of WeiDU (236.01).