Pindiro

Many thanks to Vehek again, who has been giving us some very important indications! Before going on I’d like to point out that I haven’t been able to figure out much myself… I’m mostly trying to illustrate and understand what Vehek says 🙂

So, based on his comments…

Dialogs

So, at ox160d4d, we have the following (Removing the character usage value 05)

0B 10 31 13 33 34 35 36 37 38 2F 2C 20 39 0B 13 3A 3B 10 35 3C 35 3D 3E 3F 40 24 41 3F 42 07 43 32

In which the 1st and 15th bytes match (0B), indicating they correspond to イ, same goes for the 4th and 16th bytes (13 = ー (the ~ like symbol)) . 0B means the 12th character of the set loaded for this dialog.

Likewise, 32 = 。11 = 、

Then comes another probable command at 0x160D8F (Probably an indicator of end of dialog, which makes sense, since this first dialog has 33 characters).

Note: This turned out to be a line break comand

01 01

And then the next one

0A 30 31 13 1B 28 32 2C 20 05 44 06 45 2F 08 39 46 27 47 48 06 43 35 49 32 49 38 2F 4A 29 2F 24 4B 2D 2E 28 4C 2A 4B 43 32.

In this one, the 24th and 26th = 49 = だ (hiragana) and the ~ like one = 13 (same as above… hmm..) 、and 。are also the same. So maybe they share the same loaded characters. ダ is also the same == 31

01

2C 20 39 2F 4D 35 19 13 1D 1E 3F 4E 4F 29 32 4D 35 3C 39 06 50 […]

So, several dialogues in a same conversation might share loaded characters…

Back to the tables pointed by Vehek at 0x110100 (Font, 24 bits pointer table) and 0x1095FE (Character Selection, (for the Intanya dialog?))

Font table

The font table basically starts with value A48000 and sums 64 for each entry, pointing to 63 characters before having a blank space (0x000000) at 0x1101BD. Then a short section (4 entries) adding from A28000, then blank space up to 0x1102f2, where it starts summing from A4A000 to A4A4C0. It goes like that, and seems to have its last entry (AAFFC0) at 0x1147b6.

Question: How to map these addresses to the actual offsets of the tiles in the ROM, which start from 0x120000

Character Set selection table

Determines what characters to load on VRAM.

This table seems to be 2 byte offsets, and judging from that VRAM screenshot I can see 4 close values (0183, 0186, 0189 and 018C), which are likely to be the up, down, left and right “characters”. That’d mean that the next one, 04A7 is what represents the “2U” tile I was referring to on my previous post, and since it’s followed by other two kana, it seems that kana are on the 04 range, then come 2 kanji (3927 and 4056) followed by more kana (this time in the 06 range). These character loading sections seem to be separated by a FF FF mark.

Question: How to map conversations to the set of preloaded characters they use

 

Yolaru

Back when trying relative search, I marked the kana characters with roman letters to find sequences, since characters are 4 sprites big, I began the first sequence changing the first tile of the character, and the next sequence changing the second tile and so on.

When the first dialog is shown, the following tiles are loaded into VRAM (at offset 21504):

2U 1D 1H (Kanji) (Kanji) 4G 4I 4M 2z 2f 4R

(2U is the Kana which has an “U” on its second sprite)

Following Vehek’s input, I’m hoping to find a relative sequence for this somewhere in the ROM; what I’m thinking now is that there should be some sort of table which indicates what sprites to load for each dialog, and another one which says what actual sprites to use based on the ones that were loaded already.

If this is true and I manage to find both, I’ll be able to make a program to extract the script.

Kurak

Tried to do some relative searching by assigning numbers to some of the kana sprites, then firing the game and trying to spot some amidst the kanji, then tried the relative search (Not much luck)

Examined the ROM a bit using a “simple” hex editor, without table support and using Shift-JIS standard encoding (Havent found an hex editor with table support for mac so far). Found english text for credits (83326) and title (81568). There’s also a ORIGIN at 1044403, probably mixed with japanese around it (might help)

Played around a lot with relsearch, no more interesting results found, probably because the relsearch tools I’m using suck at wildcards (and text is likely to be encoded in 2 bytes)

The Story So Far…

So, based on what we know so far… (thanks to Vehek from romhacking.net)

Font is stored in GB format (2bpp); roman characters + japanese kana begin at 0x00122480

roman_kana

And then comes kanji (in-game dialog is in Kanji, kana is only used for some texts)

kanji

Vehek says the text doesn’t seem to be stored as simple indexes (perhaps due to the the fact that characters are composed of 4 sprites? or that there are so many kanji, or both). Nonetheless, I think knowing the order the font is on will help figure out where sprites are loaded in memory.

My (first) plan is to eventually replace kanji tiles with 2 roman letter combinations, that may make things easier IF we manage to build equivalence table(s) in some way.

Thoughts? corrections? comment!

Start New Game!

So, I’m teaming up with Dungy Dragon (and his wife :P) to translate the “Ultima: The Savage Empire” SNES ROM into Engrish… wish us luck, we are gonna need heaps of it!

This will be a great challenge… the technical efforts are far from straightforward, but that’s the fun! traditional romhacking methods won’t work so this will require lots of patience and probably building off some custom tools.

So far I’m mostly fleshing out my ROMHacking skills, remember a bit about Japanese scripts and bugging out Dungy so we can start off somewhere…

Some links:

A mockup

A dream to be made true someday