3.3 Markup, control codes and placeholders

Even if you are not much of a web developer or coder you will probably have a rough understanding of variables and markup (you have probably posted on a forum before if nothing else). Text engines are rarely at the level of a modest scripting language and almost never Turing complete but they can and do have markup options and placeholders. It was noted in the past but if by looking at the text in the game you suspect some form of markup or placeholder it is best not to use it for the basis of a relative search.

Back on topic the markup and placeholders can take many forms ranging from simple square bracketed plain text, hexadecimal flags in the text (we see numbered sections do this often enough and plain hex used to signify a new line or end of section all the time), XML style markup right through to things contained in with the pointers (think back to the various file packing formats that might have a flag to indicate compression for an example of a similar idea).

Control codes are a similar concept even though they are usually treated as part of the encoding and do things like signify a new line, a tab or some such. At what point in the reverse engineering of a text engine you want to try figuring them out is up to you though.

3.3.1 Worked example

Continuing with the Megaman ZX game on the DS. The file has been changed to talk_m01_en1.bin purely as it appears at the start of the game. It does however appear that the FE example might not hold entirely true (there are ones that line up with FE but there now others with FD in some cases being a potential) but that sort of thing is what makes hacking non trivial.

Looking at the text there is a FC value in the text on occasion. Running the game it would seem these correspond to line breaks; sometimes pointers do this, sometimes it is automatic and sometimes it is in the text.

More interesting than that though is the F202 F9E9 03F3 0DF8 03 that the text starts with.

At the next section

FD F202 F9EA 03F3 0DF8 03

FD is one thing and can be ignored for the time being (that it does not appear in the first value would appear to mean that it is not strictly part of it) leaving

F202 F9EA 03F3 0DF8 03

The original

F202 F9E9 03F3 0DF8 03

E9 = 11101001

EA = 11101010

Probably not a bit level flag which is nice. Equally is is probably not a length value as the first section is 38 hex long and the second is 20 hex7.

What is the same about those first two sections is they are being spoken by the same person (Giro). It is however unlikely that there need to be 72 bits just to represent a character name so there is probably more to it than that.

F202 F9EB 03F3 02F8 00 is the next one and that is spoken by someone different (???? and no picture/“sound only” at this point)

This goes back and forth for a while with the next character (Vent) having a picture appear on the right hand side of the screen

F203 F9F4 03F3 05F8 01

Next screen has the Giro character at the bottom of the screen. Worryingly there appears to be two extra bytes.

F201 F202 F9F5 03F3 0DF8 03

Vent at the bottom of the screen

F201 F203 F9F7 03F3 07F8 01

Before dealing with that getting the first bunch in a line

F202 F9E9 03F3 0DF8 03

F202 F9EA 03F3 0DF8 03

F202 F9EB 03F3 02F8 00

F202 F9EC 03F3 0DF8 03

F202 F9ED 03F3 02F8 00

The third byte appears to be counting upwards which is quite common in text systems (it is effectively numbered paragraphs). You will probably want to keep it intact as the game might trigger animations from a counter using it (if nothing else it is good form to change as little as is necessary) although you could test if you wanted.

Speaking of testing static analysis might get somewhere and is proving quite useful thus far but why analyse something statically when you have a machine capable of running the example and giving you results.

There are three schools of thought at this juncture

  1. Copy and paste another string
  2. Minor edit to the value
  3. Assembly

Assembly is always an option regardless of what you are doing seen as it is the lowest level that gets manipulated and it can be combined with the other two methods. It could be a simple value that loads directly (or via a simple instruction like a multiplication) to the OAM, it could be the input value to a nightmare function or indeed something in between. However although a highly respectable method most of ROM hacking and computing general is about getting away from assembly if you can so the other two are employed.

There are presumably some working values so one school of thought would be to replace a working one with another working one and seeing what happens. The other is a minor, hopefully educated, guess and then seeing what happens. Either could lead to a crash but it takes but a few seconds to check.

For the first go around the entire ???? pre text section was used to replace the one from the opening text sectionThere were actually two copies of the file in the ROM so both were replaced for the sake of this example leading to the following (hacked game and original game)

PICPIC

Only the first section was done and it reverted to the original character right afterwards but perhaps more interestingly the little sliding animation that it does between characters was done between the hacked ???? and the Giro character.

Next up was editing a single value, the original replacement was left for this and it continued to display the picture.

The value chosen was the very last one (the final 8 bits before the actual text, values in hex)

04 appeared to put Prairie

09 appeared to put Model L

10 appeared to put Hivalt

FF caused screen corruption that stuck around for quite a while (the broken background eventually got replaced after a swap out for a scene but the text pictures stayed broken and there was additional text corruption for several screens after).

Were it not in the markup this would probably count as multiple tile encoding and it certainly appears the same way when it happens in regular text (a single byte/character or a couple being used and the game generating a whole name).

This does however leave 64 other bits (save for the counting section) doing something.

F202 F9E9 03F3 0DF8 03 = origin

F202 F9EA 03F3 0DF8 03 = second

F202 F9EB 03F3 02F8 00 = ???? and picture to match

Replacing the 0DF8 with 0AF8 gave

PIC

Replacing the F8 with F9 cut off the first letter. Replacing with FA jumped the text ahead for a line before coming back and stuck things on odd lines and changed the name to “????” where F7 appeared to do nothing at all other than change the name. Relegated to magic number/constant/find out later.

Vent has a box with the name on the right hand side although still at the top

F203 F9F4 03F3 05F8 01

“F203” as opposed to F202

Sticking 03 in there did indeed put the portrait on the right at the top and it also mirrored it.

PIC

Not long into the conversation there is a short with the portrait on the left but at the bottom of the top screen. Pulling the command from it

F201 F202 F9F5 03F3 0DF8 03 = bottom left

xxxx F202 F9E9 03F3 0DF8 03 = first command of the game.

xxxx was added to line things up but could it be a variable length command system?

Before debating that though 01 was tried and it stuck it on the bottom with the portrait at the left. 04 and FF appeared to do nothing though.

Some more experimentation could be done but the rest is just filling in the blanks and most of the interesting stuff appears to have happened already.


  1. Remember just because it can be done one way a game does not have to (calculating things at run time is less than ideal) and equally redundancy exists so if something is there it might be ignored in the final product.↩︎