Pirating the Pirates

August 2nd, 2009

I’m not sure how many people are aware of this, but there’s an interesting little non-working game in MAME that goes by the unassuming nickname, “39in1″.  It’s one in a line of numerous “Xin1″ games, including 4in1, 9in1 and 48in1 as well, though only 4in1 and 48in1 have been dumped in addition to 39in1..

The reason why these games are interesting is because they’re bootlegs of MAME, hacked to run on a 200MHz Intel XScale system-on-a-chip, so there’s a very good chance that these games will be the first instance of MAME self-virtualizing, albeit with an earlier version than itself.

Between Andreas Naive breaking the encryption on the main program ROM, me wiring up a good number of the PXA255 peripherals, and R. Belmont figuring out a lot of the CPLD communication, it finally runs far enough to display something, albeit an error message:

Stay tuned; there’s probably more to come.

Midas

July 15th, 2009

This is a really lofty goal, but I think in my spare time I’m going to have a try at 100% accurately re-creating the Goldeneye engine in C, based on disassemblies and traces of the actual game.

Well, that’s the end goal, anyway.  For now I’d be satisfied with tracing out the game’s boot process to figure out why, exactly, it fails to boot in MESS currently.

Either way, my current work is here.  That’s after around 4-5 hours’ worth of work.

Oh Baby

June 1st, 2009

If anyone wants to try it out, I just committed my CPU core and driver for the Manchester Small-Scale Experimental Machine (SSEM), or “Baby”, to the MESS SVN depot.  It currently runs all known SSEM programs bundled with David Sharp’s SSEM simulator, available here.

I am not entirely happy with the fact that it is compatible with all of the programs, though.  Certain programs in particular, i.e. “nightmare.snp”, would not run on the SSEM had it ever been extended to the full 8192 words of storage space of which it was theoretically capable (per some SSEM history sites), as they pad out the unused 8 address bits with pretty patterns.

And now, a pretty picture:

The controls are as follows:
Up / Down: Move the selected store line up/down
Button 1: Halt / un-halt the SSEM
1-8, Q-I, A-K, Z-,: Toggle bits 0-31 of the currently-selected store line

Yeah, Baby, Yeah

April 15th, 2009

I was rather intrigued by David Link’s project to resurrect the Manchester Mark I, and commenced digging up more info about the Manchester Mark I and its predecessor, the Manchester Small Scale Experimental Machine, or “Baby”.

As it turns out, there’s already a Java-based emulator out there.  Inspiration struck, and I decided to see how easy it would be to emulate the SSEM in MESS.  It was, historically, the first electronic stored-program computer (if I read my sources right), so it seems like a prime candidate for support in MESS.

Without further ado, here is the Manchester Small Scale Experimental Machine displaying the results of Tom Kilburn’s “Highest Common Factor for 989″ program, including the correct answer (43).

Chances are good I’ll be able to get clearance to add this to MESS.  Fingers crossed!

USF

March 9th, 2009

On a weekend with nothing better to do thanks to a deathly head cold, I decided to bolt USF support onto MESS.

So far things seem semi-promising. It’s too slow to be listened to in realtime, but that can probably be fixed with an RSP recompiler. Some games work, some games don’t work, some games work but in strange manners.

Here’s a quick rundown on the USF sets I’ve tried so far, and their relative working or unworking status:

  • Banjo-Kazooie: Works fine.
  • Beetle Adventure Racing: Works fine.
  • Blast Corps: Works fine.
  • Bomberman 64: Plays nothing.
  • Buck Bumble: Plays a very short click, then nothing.
  • Donkey Kong 64: Exits MESS almost immediately with an unknown RSP opcode, presumably due to the game running off into the weeds.
  • Dr. Mario 64: Plays garbage for around 2 seconds, then causes MESS to fatalerror.
  • Conker’s Bad Fur Day: Works fine.
  • Diddy Kong Racing: Works fine.
  • Goldeneye: Plays the music at around 10 to 20 times the correct tempo.
  • Jet Force Gemini: Works fine.
  • The Legend of Zelda: Ocarina of Time: Plays nothing.
  • Mario Kart 64: Works fine.
  • The New Tetris: Works fine.
  • Perfect Dark: Works fine.
  • Pokemon Stadium: Plays nothing.
  • Sim City 2000: Works fine.
  • Space Station: Silicon Valley: Plays nothing.
  • Super Mario 64: Plays garbage for around 2 seconds, then causes MESS to fatalerror.
  • Super Smash Brothers: Works fine.
  • Tetrisphere: Works fine.
  • Yoshi’s Story: Plays nothing.

I suppose the next step is to either figure out why some games are playing nothing, or why some games are just running off into the weeds, resulting in MESS fatalerroring!

Performance Anxiety

February 18th, 2009

I’ve decided to take a short break from working on renderer issues, insofar as pretty much every single game that doesn’t run into some sort of bug lurking in machine/n64.c or some sort of MIPS CPU bug has largely correct graphics.  The few games that do run up to a machine/n64.c-related bug or MIPS CPU bug also have largely correct graphics.  Barring a few exceptional cases, these games would be playable if not for the aforementioned bugs and/or performance.

Since I am not quite familiar enough with the N64’s non-graphical functions to be comfortable bug-hunting in those realms, for now I’m going to concentrate on performance.

Using MAME’s built-in profiler to determine CPU load distributions across the main CPU, RSP, and everything else (mainly the RDP), I can break the games down into four categories:

  1. Untestably broken: These games include Indiana Jones, Battle for Naboo, Conker’s Bad Fur Day, Banjo-Kazooie, Banjo-Tooie, Donkey Kong 64, Mario Party 3, Paper Mario, Perfect Dark, Goldeneye, Yoshi’s Story, Gauntlet Legends, Turok - Rage Wars, and I’m sure plenty of others.  Games that don’t show a single thing in MESS before running off into the weeds.
  2. 2D Games: These games largely only use the RSP for audio processing, and limit their use of the RDP to things like Textured Rectangle commands.  As a result, performance data indicates the RDP as being the main bottleneck for them.  These games include Bust-A-Move 2: Arcade Edition and Bust-A-Move ‘99.
  3. 3D Games: These games use the RSP to do a whole bunch of vector calculations, and use the RDP as much as they want.  These are the majority of games, and include Super Mario 64, Mario Kart 64, Army Men: Sarge’s Heroes, Tetrisphere, The Legend of Zelda: Ocarina of Time, Kirby 64: The Crystal Shards, Madden 64, and Aidyn Chronicles: The First Mage.
  4. Namco Museum 64: This game is Namco Museum 64.  It does not use the RSP at all and does not use the RDP at all.  It shoves PCM data out the stereo DAC by way of the main CPU, and it uses the N64’s entire video system for nothing other than a framebuffer.  As a result, it runs at around 160% when unthrottled, compare with 10% unthrottled for most 3D games and 25% unthrottled for most 2D games.  It is the only game of its kind that I know of.

In order to more accurately nail down the performance of 3D games, I’ve run a profile on three games: Castlevania, Tom & Jerry: Fists of Furry, and Super Mario 64.  Unsurprisingly, due to the immensely small number of different microcodes that were ever used on the N64, the code profiles look largely the same.  The percentages listed are the percentage of execution time spent in each function, not including children.

  • Castlevania: RDP = 41.14%, RSP = 53.23%, Other = 5.63%
    • 12.04%: fill_span_buffer_2×2
    • 11.04%: FETCH_TEXEL
    • 8.05%: render_spans_16
    • 5.13%: read_dword_generic
    • 4.99%: handle_vmadn
    • 4.59%: cpu_execute_rsp
    • 3.60%: COLOR_COMBINER
    • 3.36%: write_dword_generic
    • 3.32%: BLENDER2_16
    • 3.11%: SATURATE_ACCUM
    • 3.08%: handle_vmadh
    • 2.01%: handle_vmadm
    • 1.91%: handle_vmulf
    • 1.56%: __divdi3
    • 1.56%: memory_decrypted_read_dword
    • 1.52%: handle_ldv
    • 1.39%: handle_vmudn
    • 1.25%: handle_vmudl
    • 1.23%: handle_vadd
    • 1.18%: handle_lqv
    • 1.05%: handle_vmacu
    • 1.02%: memory_read_byte_32be
    • 0.99%: handle_vector_ops
    • 0.96%: READ8
    • 0.93%: taddr_clamp
    • 0.91%: memory_write_byte_32be
    • 0.87%: handle_vge
    • 0.82%: handle_vmrg
    • 0.80%: WRITE8
    • 0.70%: handle_vmacf
    • 0.66%: handle_vsub
    • 0.62%: handle_sqv
    • 0.62%: debugger_instruction_hook
    • 0.62%: handle_lpv
    • 0.60%: handle_vmudm
    • 0.57%: handle_vmadl
    • 0.53%: calculate_coverage
    • 0.52%: handle_sdv
    • 0.50%: handle_vmudh
    • 0.46%: decompress_z
    • 0.45%: fill_rectangle_16bit
    • 0.43%: handle_luv
    • 0.41%: handle_vcl
    • 0.39%: handle_vmulu
    • 0.38%: handle_lwc2
    • 0.38%: handle_vrcph
    • 0.37%: video_update_n64
    • 0.35%: handle_vand
    • 0.34%: handle_vxnor
    • 0.33%: sp_dma
    • 0.32%: handle_vch
    • 0.31%: handle_vrcpl
    • 0.28%: handle_swc2
    • 0.26%: handle_vlt
    • 0.26%: handle_llv
    • 0.22%: handle_vsaw
    • 0.19%: handle_vor
    • 0.19%: fill_rectangle_32bit
    • 0.16%: rdp_load_block
  • Tom & Jerry: Fists of Furry: RDP = 29.15%, RSP = 64.42%, Other = 6.43%
    • 7.41%: read_dword_generic
    • 7.22%: cpu_execute_rsp
    • 5.52%: handle_vmadn
    • 4.79%: texture_rectangle_16bit
    • 4.38%: write_dword_generic
    • 4.29%: fill_span_buffer_2×2
    • 3.54%: BLENDER1_16
    • 3.38%: FETCH_TEXEL
    • 3.27%: SATURATE_ACCUM
    • 3.04%: handle_vmadh
    • 2.75%: handle_vmadm
    • 2.66%: handle_lqv
    • 2.57%: COLOR_COMBINER
    • 2.40%: memory_decrypted_read_dword
    • 2.30%: handle_vmulf
    • 2.25%: handle_ldv
    • 1.87%: fill_rectangle_16bit
    • 1.79%: handle_vmudl
    • 1.79%: render_spans_16
    • 1.48%: READ8
    • 1.45%: handle_vadd
    • 1.40%: handle_vmudn
    • 1.38%: video_update_n64
    • 1.24%: memory_read_byte_32be
    • 1.19%: memory_write_byte_32be
    • 1.10%: debugger_instruction_hook
    • 0.96%: handle_vector_ops
    • 0.94%: WRITE8
    • 0.92%: handle_vsub
    • 0.88%: handle_vmacf
    • 0.75%: handle_sqv
    • 0.72%: handle_vmudm
    • 0.71%: handle_vsubc
    • 0.70%: handle_sdv
    • 0.68%: calculate_coverage
    • 0.66%: handle_vge
    • 0.65%: sp_dma
    • 0.60%: rdp_load_tile
    • 0.53%: _divdi3
    • 0.52%: mame_rand
    • 0.52%: copyline_rgb32
    • 0.52%: handle_vmudh
    • 0.51%: rand_memory
    • 0.49%: handle_vcl
    • 0.49%: driver_get_name
    • 0.48%: compress_z
    • 0.47%: handle_lwc2
    • 0.47%: handle_vmrg
    • 0.44%: handle_vrcpl
    • 0.37%: handle_vrcph
    • 0.35%: handle_vlt
    • 0.33%: taddr_clamp
    • 0.33%: handle_luv
    • 0.30%: handle_llv
    • 0.30%: region_post_process
    • 0.28%: handle_swc2
    • 0.28%: fill_random
    • 0.27%: handle_vsaw
    • 0.26%: handle_lsv
    • 0.24%: handle_vch
    • 0.23%: handle_vabs
    • 0.22%: handle_ssv
    • 0.19%: handle_vxor
  • Super Mario 64: RDP = 27.33%, RSP = 61.21%, Other = 11.46%
    • 10.73%: fill_span_buffer_2×2
    • 6.56%: handle_vmadn
    • 6.16%: cpu_execute_rsp
    • 5.56%: read_dword_generic
    • 4.63%: render_spans_16
    • 3.61%: SATURATE_ACCUM
    • 3.38%: write_dword_generic
    • 3.20%: handle_vmadm
    • 3.19%: FETCH_TEXEL
    • 2.99%: handle_vmadh
    • 2.74%: BLENDER1_16
    • 2.27%: COLOR_COMBINER
    • 2.10%: memory_decrypted_read_dword
    • 1.97%: handle_vmudl
    • 1.88%: handle_ldv
    • 1.72%: handle_vadd
    • 1.66%: handle_vmudn
    • 1.51%: handle_vmulf
    • 1.26%: handle_vector_ops
    • 1.23%: handle_lqv
    • 1.13%: handle_vsub
    • 1.11%: handle_vge
    • 1.07%: debugger_instruction_hook
    • 1.04%: __divdi3
    • 1.00%: memory_write_byte_32be
    • 0.98%: handle_vsubc
    • 0.98%: READ8
    • 0.92%: memory_read_byte_32be
    • 0.82%: calculate_coverage
    • 0.78%: handle_sdv
    • 0.76%: WRITE8
    • 0.75%: mame_rand
    • 0.74%: handle_vmudm
    • 0.74%: driver_get_name
    • 0.72%: compress_z
    • 0.70%: video_update_n64
    • 0.67%: fill_rectangle_16bit
    • 0.65%: handle_vrcph
    • 0.65%: rand_memory
    • 0.60%: sp_dma
    • 0.59%: handle_vlt
    • 0.54%: decompress_z
    • 0.54%: handle_vmudh
    • 0.49%: handle_vrcpl
    • 0.45%: handle_lwc2
    • 0.43%: region_post_process
    • 0.42%: handle_vmacf
    • 0.40%: handle_vcl
    • 0.39%: handle_sqv
    • 0.38%: handle_vch
    • 0.38%: copyline_rgb32
    • 0.37%: handle_llv
    • 0.37%: handle_vxor
    • 0.36%: handle_vsaw
    • 0.36%: handle_vmrg
    • 0.35%: quark_tables_create
    • 0.35%: fill_random
    • 0.33%: handle_luv
    • 0.32%: taddr_clamp
    • 0.30%: handle_swc2
    • 0.28%: handle_ssv
    • 0.27%: handle_vmadl
    • 0.27%: handle_lsv
    • 0.27%: handle_lpv
    • 0.27%: handle_vaddc
    • 0.26%: handle_vor

As I see it, the first priority is to convert the RSP core over to use MAME’s DRC system.  Unfortunately, I’m not quite sure what sort of performance increase will be seen by DRC-ifying the RSP.  The VMAC* and VMUD* opcodes have a rather large amount of code associated with them, and not only that, they loop 8 times across 8 elements.  This was probably accomplished in parallel on the real RSP.

Another piece of low-hanging fruit is the fact that around 10% of the execution time is taken up by memory accessors thanks to the RSP’s less-than-optimal IMEM and DMEM implementation.  The RSP has to hit the memory system for every single read and write that it does.  However, in reality IMEM and DMEM are accessed far, far less often by the main CPU than they are by the RSP itself.  It therefore makes better performance sense to have two 4kbyte arrays central to the RSP core itself, which it will access directly rather than going through MAME’s core memory accessors.  The main CPU will be able to access these memory spaces by querying the RSP core, and any RSP DMA accesses can be done by simply grabbing a pointer into the RSP’s IMEM or DMEM arrays, just like it works now.

Lastly, the plan is to wire the RDP emulation up to MAME’s “work unit” system, which will allow it to distribute drawing commands across multiple CPU cores when available.  Unfortunately, the RDP being as slow as it is, it will likely not have too terribly much of a performance impact on my laptop, but it might improve in the situation of a quad-core CPU.

Anyway, that’s the main plan.  Here’s hoping I can stick to it.

Coordination

January 31st, 2009

Continuing this weekend’s N64 extravaganza, some more poking around has fixed a long-standing issue with my new coverage implementation, which is that the texture coordinates and gouraud steps were being mangled by up to +/- 1 pixel delta.  This may not seem like a lot, but keep in mind that the S (aka U) texture coordinate can change by anywhere from 8 to 32 texels when traversing across a triangle by only one pixel vertically.

And now, the pretty pictures:

Before:

After:

There’s still some work to be done on rounding the last pixel in a horizontal span, which is causing the remaining issues that are visible in the Zelda screenshots, but still, things are looking considerably better.

Roundabout

January 31st, 2009

I finally decided to figure out why the scene geometry in The Legend of Zelda: Ocarina of Time is so screwed up.  As it turns out, RSP DMA transfers should have their length rounded up to the next 8 bytes, not the next 4.

Since it’s kind of hard to get the dramatic change across with still shots, I broke out the -aviwrite parameter and uploaded a couple of videos to YouTube.

Before: http://www.youtube.com/watch?v=OUwwBc3G1h0

After: http://www.youtube.com/watch?v=7_9L0G7IsRY

Brains

January 21st, 2009

I just had a really weird thing happen to me.

My primary central media PC in my apartment, as of March of last year, had four 1TB drives in it.  Three of the drives were arranged in a RAID5 configuration, the fourth drive was standalone due to the on-board RAID controller not supporting a large enough single volume to encompass all four drives.

There was a brief power outage in August of last year.  When I powered the computer back up, I was greeted with an unfriendly POST message from my RAID controller informing me that three 1TB hard drives were connected and one unknown hard drive.  Oops.  It was at this point that I was grateful that I had selected a RAID5 configuration, and all the more grateful that I had never gotten around to putting any data on the loose drive.  I was able to put the computer to work rebuilding the parity data over the next couple of days, and everything was fine save for that one drive.

After reading about the various issues with 1.5TB and 1TB Seagate drives, just for kicks I decided to look up the model numbers of the drives, so I fired up the “Intel(R) Matrix Storage Manager”.  It then informed me that it had detected a new non-RAID drive, and asked me if I would like to reinitialize the drive so that its parameters could be read.  Sure enough, the drive seemed to have leapt back to life.  After rebooting the media server, the drive is still there.  I really have no idea what would have caused it to leap back to life like that, but I know I am sure as hell not going to be putting any important data onto it anytime soon.

Incidentally, all four drives appear to be ST31000340AS, firmware revision SD15, which is the firmware and model number affected by the bricking problems that have been raging across the Internet lately.  Hmm.

Handiwork

January 5th, 2009

As I mentioned in my previous blog post, I was going to ask my employer permission to submit the Palm driver to MESS.  This was a necessary step since the company I work for technically owns anything I write inside and outside of work during the period of my employment.

Since the Palm series of handhelds has nothing to do with Nintendo or anything my employer is doing, permission was quickly granted.  Thanks to that, once I’ve cleaned up the driver you can expect it to hit MESS soon after.