Joined: 09 Mar 2008 Location: Osaka (JP)/Hong Kong/Germany
Posted: Tue Dec 22, 2015 6:33 pm Post subject:
Unicode for INIs
It appears INI files in RA2 are not using Unicode character encoding, but ANSI. Will the engine encounter any difficulties if I save an INI in Unicode (and with characters that are not part of the ANSI range)?
Perhaps this question's answer is to someone who has a even a little understanding of how software works - if so, please excuse me. I'm totally ignorant about it for the most part.
Thank you! _________________
Mao Zedong wrote:
Our mission, unfinished, may take a thousand years.
Technically unicode is supposed to be 2-byte per letter, but being a standard driven by Amrika, they made silly exceptions to allow a class called UTF-8, which for all intents is identical to the 1-byte ANSI/ASCII.
Joined: 22 Nov 2010 Location: Iszkaszentgyorgy, Hungary
Posted: Wed Dec 23, 2015 10:21 am Post subject:
G-E wrote:
Technically unicode is supposed to be 2-byte per letter
That unicode fell out of use not because American standards but because it couldn't solve the issues it meant to solve. The 65k characters weren't enough. UTF-8 still has unused ranges even today - while nowadays it gets extended with unused/dead languiages/glyphs even. _________________ "If you didn't get angry and mad and frustrated, that means you don't care about the end result, and are doing something wrong." - Greg Kroah-Hartman
=======================
Past C&C projects: Attacque Supérior (2010-2019); Valiant Shades (2019-2021)
=======================
WeiDU mods: Random Graion Tweaks | Graion's Soundsets
Maintainance: Extra Expanded Enhanced Encounters! | BGEESpawn
Contributions: EE Fixpack | Enhanced Edition Trilogy | DSotSC (Trilogy) | UB_IWD | SotSC & a lot more... QUICK_EDIT
Since UTF-8 is backwards compatible with ASCII it might actually work already, but it probably wouldn't be very useful given that the maximum number of bytes for an identifier is 24 or so (?) and the UTF characters you are probably interested in can cost up to 6 bytes.
Maybe instead you should think about a systematic naming scheme for your identifiers to minimize the number of characters. For instance:
Joined: 09 Mar 2008 Location: Osaka (JP)/Hong Kong/Germany
Posted: Wed Dec 23, 2015 12:59 pm Post subject:
Askeladd wrote:
Since UTF-8 is backwards compatible with ASCII it might actually work already, but it probably wouldn't be very useful given that the maximum number of bytes for an identifier is 24 or so (?) and the UTF characters you are probably interested in can cost up to 6 bytes.
Maybe instead you should think about a systematic naming scheme for your identifiers to minimize the number of characters. For instance:
[ATANK]
Primary=ATANKW1
Secondary=ATANKW2
[ATANKW1]
Projectile=ATANKPR
...and so on.
Using kanji in strings has been one idea that I wanted to explore (if for nothing else, then for NOSTR'ing UINames) in my search for a naming scheme.
I had no idea how the string storage space works or how character encoding relates to it, but this thread, for all its vitriol, has been informative.
Given the way the limitation works, I will probably explore other schemes. _________________
Mao Zedong wrote:
Our mission, unfinished, may take a thousand years.
To be honest it would've taken less time to just open Rules.ini, change the name of any unit or structure in the game to something with kanji characters and then confirm for yourself that it doesn't work than it took you to to make a topic to ask about it instead... _________________ Last edited by Bittah Commander on Wed Dec 23, 2015 2:41 pm; edited 1 time in total QUICK_EDIT
Unicode is supposed to be 4 bytes per character (it was once thought that 2 bytes would be enough, but that was short sighted) and there are various ways of encoding the value into variable bytes per character encodings. By far the most widely used now is UTF-8 for document encoding as it maps the 7bit ASCII chars to the same values making any valid ASCII automatically valid UTF-8 and it also has no endian issues. Most WINAPI functions actually use UTF-16 internally for their unicode versions and there are a bunch of windows only macros for doing UTF-16 string literals and such. However I doubt that the internal ini parser uses the wide string functions since it all the ini files are currently ASCII which isn't valid UTF-16 AFAIK. QUICK_EDIT
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum You cannot attach files in this forum You can download files in this forum