free counter

I 10x’d a TI-84 emulator’s speed by replacing a switch-case

Theres a javascript emulator for the TI83+, TI84+, and TI84+CSE calculators called jsTIfied, that was compiled by Christopher Mitchell, founder of the calculator fan site Theres not just a good deal of reasons to utilize it over another thing if youve got a native option available, but if you dont its pretty great. I acquired interested since it was the initial emulator to aid the TI84+CSE when that calculator premiered in the first 2010s. The CSE was exciting since it retrofitted a 320×240 color display onto the hardware platform of the 84+SE, so the rest of the hardware and OS access was exactly the same aside from graphics. I needed to be among the first game developers for the CSE, but developing for the calculator lacking any emulator and debugger is pretty painful, therefore i used jsTIfied with my older calculator ROMs to obtain a feel for this.

jsTIfied had an issue though: it had been too damn slow. These calculators work with a z80 processor, that is pretty easy to emulate. But jsTIfied couldnt even handle emulating the 6MHz calculator models at full speed, and the CSEs processor was clocked at 15MHz, so that it was a whole lot worse. jsTIfied is closed source, but I decided that I would try and do something positive about it anyway.

Needless to say the very first thing you should do when youre debugging a web app is visit the profiler. There is just one single hotspot that dwarfed all of the others, and that has been the instruction decode and execution switch-block. Thats the type of thing youd expect since these calculators dont have any complicated hardware to emulate like pixel processing units or audio chips, nonetheless it seemed a little fishy. Yeah javascript is slow, but computers manufactured in the first 2000s may have handled emulating this calculator at full speed with native code. Javascript overhead wasnt enough to describe it.

THEREFORE I started digging in to the actual code. I had to unminify it, but I was used to coping with obfuscated code from Minecraft. The instruction decode block had one giant switch block, with additional nested switch blocks for multi-byte instructions. Generally in most languages that is just fine, as your compiler will transform it into jump tables, why wasnt I seeing jump table performance here? I had a little bit of an obsession with javascript performance at that time because of my WebGL experiments, and I had already learned that at that time JS engines wouldnt optimize functions above a particular size. Knowing this, I split all of the nested switch statements to their own functions, and made the parent switch call them, to see if that could look after things.

Now I needed a method to actually load my code. I quickly spun up a web server on my computer that could go through requests to the upstream website, but intercept the obtain the emulator engine and return my modified code instead. I switched /etc/hosts to point the upstream domain at and I had my code loaded.

Unfortunately though, I saw basically no increase. I was very sure at this stage that I was within the size limits for functions, so there needed to be another thing missing. I went digging around searching for low level information on the implementation of switch statements in javascript. Eventually I came across a stackoverflow post from someone attempting to do a similar thing I was: optimize a (different) z80 emulator. Thats when I saw a deeply disturbing comment, with sources cited right to Chromes V8 source code:

@LGB actually in V8 (JS engine utilized by google chrome) you should jumpthrough lots of hoops to obtain switch case optimized: All of the cases should be ofsame type. All of the cases must either be string literals or 31-bit signedinteger literals. And there has to be significantly less than 128 cases. And also after allthose hoops, whatever you get is what you will have gotten with if-elses anyway(I.E. no jump tables or sth like this). True story.

Browse the post on your own here

This is simply not what you would like to listen to when youre considering an emulator with a heckload of switch blocks, especially switch blocks that had a lot more than 128 cases. I had allow optimizer operate on my functions, however when it surely got to the switch blocks it said thanks but no thanks Im good. I had only 1 option left, and that has been to wrap every case of each switch in a function, dump all of them into a wide range, and do the lookups myself. THEREFORE I wrote a script to accomplish that.

The initial code would look something similar to this:

switch (z8.r2[Regs2_PC]++)   case 0x00: // nop    break;  case 0x01: // do something?    break;  // ...  case 0xDD // index register prefix    switch (z8.r2[Regs2_PC]++)       case 0x00: // do something        break;      case 0x01: // do something        break;      // ...      case 0xFF        break;        break;  // ...  case 0xFF:    break;

THAT I then translated to something similar to this:

let instr_table = new Array(256);let instr_subtable_DD = new Array(256);instr_table[0] = functon()  /nop */ ;instr_table[1] = function()  /do something, probably */ ;instr_table[0xDD] = function()   return instr_subtable_DD[read_byte(z8.r2[Regs2_PC]++)]();;

In the end that has been done, I had success! The emulator went from slow as molasses to being too fast. I told the initial dev concerning this and he was wanting to merge those changes in, but a calculator thats too fast is bad in its way, as you can hardly control finished .. THEREFORE I got him to provide me usage of the foundation, wrote a speed governor, and he first got it all squared away and pushed around the site. You can view this yourself at, just seek out z8oT. You may also visit a demo I recorded at that time below, first with the old slow version and with the code that has been much too fast.

I had yet another trick up my sleeve too: observe that those program-counter register increments doesnt involve a & 0xFFFF to help keep the worthiness within the correct 16 bits. Thats because I switched to storing our registers in a Uint16Array, which includes that wrapping behavior built-in (since its backed by real honest-to-goodness u16s). I think that overflow behavior is defined in the spec but I dont actually remember- in any event it works just fine everywhere Ive tried it, but do me a favor and look for yourself. Ultimately this had a marginal performance benefit at best, nonetheless it removed plenty of bitwise ops from the code and managed to get much harder to screw up the value-range of register operations generally.

Before closing, I have to warn you I wouldnt recommend you take this as modern performance advice for just about any of your javascript without checking the V8 and spidermonkey source first. 2013 was another time, and JS engines attended quite a distance since that time. I really hope theyve made this much better than it had been.

Read More

Related Articles

Leave a Reply

Your email address will not be published.

Back to top button

Adblock Detected

Please consider supporting us by disabling your ad blocker