Unicode for INIs

Millennium · Posted: Tue Dec 22, 2015 6:33 pm Post subject: Unicode for INIs

It appears INI files in RA2 are not using Unicode character encoding, but ANSI. Will the engine encounter any difficulties if I save an INI in Unicode (and with characters that are not part of the ANSI range)?

Perhaps this question's answer is to someone who has a even a little understanding of how software works - if so, please excuse me. I'm totally ignorant about it for the most part.

Thank you!

_________________

G-E · Defense Minister **Joined**: 09 Feb 2015

Technically unicode is supposed to be 2-byte per letter, but being a standard driven by Amrika, they made silly exceptions to allow a class called UTF-8, which for all intents is identical to the 1-byte ANSI/ASCII.

If you open Charmap in Winderos, it will give you the list of possible characters...

_________________
http://www.moddb.com/mods/scorched-earth-ra2-mod-with-smart-ai

Graion Dilach · Posted: Wed Dec 23, 2015 10:21 am Post subject:

G-E wrote:

Technically unicode is supposed to be 2-byte per letter

That unicode fell out of use not because American standards but because it couldn't solve the issues it meant to solve. The 65k characters weren't enough. UTF-8 still has unused ranges even today - while nowadays it gets extended with unused/dead languiages/glyphs even.

_________________
"If you didn't get angry and mad and frustrated, that means you don't care about the end result, and are doing something wrong." - Greg Kroah-Hartman
=======================
Past C&C projects: Attacque Supérior (2010-2019); Valiant Shades (2019-2021)
=======================
WeiDU mods: Random Graion Tweaks | Graion's Soundsets
Maintainance: Extra Expanded Enhanced Encounters! | BGEESpawn
Contributions: EE Fixpack | Enhanced Edition Trilogy | DSotSC (Trilogy) | UB_IWD | SotSC & a lot more...

Askeladd · Light Infantry **Joined**: 29 Dec 2013

Since UTF-8 is backwards compatible with ASCII it might actually work already, but it probably wouldn't be very useful given that the maximum number of bytes for an identifier is 24 or so (?) and the UTF characters you are probably interested in can cost up to 6 bytes.

Maybe instead you should think about a systematic naming scheme for your identifiers to minimize the number of characters. For instance:

[ATANK]
Primary=ATANKW1
Secondary=ATANKW2

[ATANKW1]
Projectile=ATANKPR

...and so on.

Millennium · Posted: Wed Dec 23, 2015 12:59 pm Post subject:

Askeladd wrote:

Since UTF-8 is backwards compatible with ASCII it might actually work already, but it probably wouldn't be very useful given that the maximum number of bytes for an identifier is 24 or so (?) and the UTF characters you are probably interested in can cost up to 6 bytes.

Maybe instead you should think about a systematic naming scheme for your identifiers to minimize the number of characters. For instance:

[ATANK]
Primary=ATANKW1
Secondary=ATANKW2

[ATANKW1]
Projectile=ATANKPR

...and so on.

Using kanji in strings has been one idea that I wanted to explore (if for nothing else, then for NOSTR'ing UINames) in my search for a naming scheme.
I had no idea how the string storage space works or how character encoding relates to it, but this thread, for all its vitriol, has been informative.

Given the way the limitation works, I will probably explore other schemes.

_________________

Bittah Commander · Posted: Wed Dec 23, 2015 1:45 pm Post subject:

To be honest it would've taken less time to just open Rules.ini, change the name of any unit or structure in the game to something with kanji characters and then confirm for yourself that it doesn't work than it took you to to make a topic to ask about it instead...

_________________

Blade · Cyborg Commando **Joined**: 23 Dec 2003

Unicode is supposed to be 4 bytes per character (it was once thought that 2 bytes would be enough, but that was short sighted) and there are various ways of encoding the value into variable bytes per character encodings. By far the most widely used now is UTF-8 for document encoding as it maps the 7bit ASCII chars to the same values making any valid ASCII automatically valid UTF-8 and it also has no endian issues. Most WINAPI functions actually use UTF-16 internally for their unicode versions and there are a bunch of windows only macros for doing UTF-16 string literals and such. However I doubt that the internal ini parser uses the wide string functions since it all the ini files are currently ASCII which isn't valid UTF-16 AFAIK.

RP · Posted: Thu Dec 24, 2015 5:32 pm Post subject:

NOSTR UIName does not support non-ASCII text as the tag is read as ASCII.

_________________

Mental Omega 3.0 Mission creator - Creator of FinalOmega: APYR 3.0 Map Editor

/ppm/'s stupidity