Update on GCW-Zero / RG-350 OpenDingux ports

It's been a while since I last did a public update on the status of OpenDingux for the GCW-Zero. Believe it or not, a lot has changed since the last release in 2014 🙂. Truth be told, I never really stopped to work on it since 2014, and the folks on the Freenode IRC channel #opendingux can testify just how much work I've been doing.

Linux From Scratch

All the GCW-Zero specific code is gone. Everything has been moved to Device Tree now. Device Tree is a textual representation of all the hardware present in a device. This means that with the right Device Tree files, in theory it should be possible to boot a generic MIPS kernel (Debian anyone?) on both the GCW-Zero, the RG-350, the old Dingoo A320 and the RetroMini. How cool is that?

The Device Tree source file is divided in two parts, one SoC specific (JZ4770) and one device specific (GCW-Zero). At compilation time, these two source files are assembled into one single binary file, that will be either compiled into the kernel, or loaded by the bootloader.

If you are curious about what it looks like, this is the current Device Tree file for the GCW-Zero:

Without going as far as having one kernel to support all devices, this work will allow one single kernel to support both the GCW-Zero and the RG-350, as well as the other JZ4770 based handhelds, provided somebody write the devicetree for these (and the board-specific hardware, like the LCD screen, is supported by Linux). It will also allow to extend the flasher software that's been used to flash the RetroMini through USB, to be used to flash other handhelds as well, since the flasher is basically a Linux kernel with a small payload. We do not want users to have no choice but to crack open a device to flash the internal SD, when there is the option to flash from USB. Besides all of that, switching to Device Tree files was the only way to go to support the OpenDingux devices in the upstream Linux kernel; crappy platform-specific code is not welcome anymore in there.

For the same reason, most of the drivers that were in use in the old 3.12 kernel are gone, too. The clock code, timer code, and drivers for the MMC/SD, DMA, LCD and framebuffer, IPU, VPU, GPU, I2C, ADC, joystick, battery, USB, PWM and backlight, power regulator chip, I2S/AC97 and audio codec are brand new. The WiFi drivers are new too but weren't written by us. Of the old 3.12 drivers, only the watchdog, RTC and interrupt controller drivers are still around. Most of these new drivers now also support multiple Ingenic SoCs, and will work on the Dingoo A320 (JZ4740), the RetroMini (JZ4725B), and even the MIPS CI20 (JZ4780). They will most likely also work on the RS-97 (JZ4760) with very little work needed, if any.

This sums up to a total of 342 patches that have been accepted upstream, and about two dozens still in the process of being upstreamed, for a diff of about 15k lines added and 7.5k lines removed. This may not sound like much, but the process of upstreaming code to the Linux kernel is a long and complicated one. All these patches went under peer review and saw many revisions before being merged. To give an order of magnitude of the work involved, if we count all the intermediary states of these patches, it sums up to more than six thousand commits that were authored by me. In the meantime, I became the de-facto Linux kernel maintainer for Ingenic SoCs, which means that all patches that touch Ingenic specific code now have to go through me, and need my ack to go forward.

Current status

From a usability standpoint, the work on the kernel is pretty much done, almost every single feature of the 3.12 kernel is present in the current development kernel based on 5.7; the only exception being the "linkdev" feature, which allowed to map the face and shoulder buttons to joystick events, as it was simply impossible to upstream for being nothing more than a quick and dirty hack. With that said, the feature could very well be re-introduced at a different level, for instance as a feature within SDL itself; but the feature was never really used so dropping it shouldn't be a problem.

Since 3.12, new features were added, too. The IPU now supports a variety of RGB color formats (15/16/32-bit, RGB/BGR), packed YUV 4:2:2 and planar YUV 4:1:1, 4:2:0, 4:2:2, and 4:4:4. It now also scales with bicubic filtering with a configurable sharpness factor. Experimental support for HDMI was added as well, with support to resolutions up to 720p, a terrible image quality and no audio (for now). Support for hardware overlays has been added too, so it should be possible to have an overlay shown whenever something like the volume or brightness is modified. All the improvements that were made on the RS-90 port of OpenDingux are also present here, for instance a proper mass-storage mode using MTP, just like on Android, to replace the need for a FTP client.

As the kernel is pretty much complete, most of the work left concerns the userspace. Speaking about SDL, while IPU support is integrated into the kernel, the framebuffer emulation of the DRM/KMS core does not allow it to be used through the old fbdev API. As a result, SDL applications cannot yet use the IPU, but it's a problem that needs to be fixed in SDL, not in the kernel, either by adding a DRM/KMS backend to SDL, or by using a SDL-to-SDL2 compat library and using the DRM/KMS backend of SDL2 (which has its own share of challenges).

The second other point that needs work, is WiFi support. The old root filesystem of OpenDingux used ifconfig and iwconfig to configure the network, which have been deprecated for years now. The current root filesystem uses the newer iw tool, but the GCWConnect WiFi configuration tool wasn't updated yet to support it, and the overall performance and stability of wireless links is unknown.

The missing fish is USB OTG. This feature allows to plug USB peripherals like gamepads to the USB port. I think everyone who ever tried USB OTG on their GCW-Zero have had a hardware failure at some point, myself included. I understand the interest in having the handheld plugged to HDMI with a couple of gamepads connected, but that just won't happen on the GCW-Zero. I cannot take the risk of causing more hardware failures. This is however not a problem on the RG-350, which does not use USB OTG, but has a USB host port directly available as USB1. On the GCW-Zero, this USB host port is used internally to connect the WiFi chip.

On the RG-350, the only thing missing right now is support for the right joystick, but that's just a matter of cleaning up the existing code and integrating it properly into the new ADC driver.

Tasty numbers

For people worried about performance, the overhead of the OS is slightly lower, which means that games and emulators gain a few FPS with the 5.7 kernel vs. the old 3.12 kernel. When recompiled with the new toolchain, based on GCC 9, there is an extra small gain that adds up. This is with our super-secret optimization turned OFF, which once enabled, will give a good 10+% performance increase on top of that. I/O performance drastically improved too: the read and write speeds to the SD cards, internal or external, more than doubled between the 3.12 and 5.7 kernels.

Some areas regressed, unfortunately. The upstream GPU kernel driver (etnaviv) along with the OpenGL ES layer (Mesa) that are used in the current development builds of OpenDingux have had a lot of fixes for crashes or graphical glitches, and is in many ways faster, but is also much slower than the old driver in some specific areas. With that said, the numbers of the old firmware should be taken with a grain of salt, since a lot of them are not rendering properly.

The new kernel comes with advanced debugging tools, like performance counters and debug overlays, so debugging the issue shouldn't be a problem, but that still has to be done. On the bright side, working close to upstream means that when we find the bottleneck, they will probably fix it for us.

ETA?

As usual, it is pretty difficult for me to give an estimate about when the next version of the firmware for all the supported OpenDingux devices will be ready. Working on OpenDingux always feels like taking two steps forward then one step back. The IPU issue with SDL is hopefully the last big rock to smash in what has been a very long and lonesome road. Then will come the testing phase, to be extra-careful that everything that used to work still works.

OpenDingux release 2020.01.06

Since last 2019.06.01 update, there has been reports of a lot of hangs and crashes that I could never manage to reproduce. The difference was that unlike most users, I don't use a micro SD and only have a handful of GBA games on the internal NAND. The bug turned out to be in the DMA driver, which caused data packets to be lost between the SD card controller and the card itself.

Changelog

  • Fixed DMA driver; using external micro SD cards won't cause crashes anymore.
  • Based on Linux v5.5-rc5 kernel and Buildroot 2019.11.
  • Small fixes to GMenu2X, nothing particularly noteworthy to report.
  • GMenu2X should now properly respawn when an app crashes.

Download links

The update OPK can be downloaded here: OpenDingux update OPK.
Be careful that you must have at least 25 MiB of internal storage before running the update.

For those who did not flash already, an updated flasher can be downloaded here: Flasher tool download

Special thanks

A big thanks to all those who donated. That was many more people than I thought. While I don't do this for the money, tips are always appreciated! Thank you!

Introducing Lightrec, a MIPS-to-everything dynarec

Emulation is what got me into computer science to begin with, as I always thought that emulators are impressive pieces of software. The fact that we simulate a real-world electronic device is just amazing. What astounds me even more is that some emulators break the boundaries of what we thought was possible. Who remembers UltraHLE? Who ever tried Bleem!cast? As my knowledge of computer science increased for the last 12 years learning C, working on Linux and doing low-level programming on embedded systems, emulators slowly ceased to be a mystery to me; but that made me even more respectful now that I can grasp the genius that went into these craft pieces.

The biggest praise I have for emulator creators is that they don't follow the common premise that the solution is always better hardware. In a world where consumption is the key to our doom, I like to believe that we can always do more with less. Under constraints, people get creative. Writing software on infinitely powerful machines would be boring.

Introducing Lightrec

Since 2014, I've been working on-off on a project called Lightrec. Started as an experiment, to test my skills and improve my knowledge, it later became a fully working dynamic recompiler (aka. dynarec) for the PCSX Playstation emulator targetting a wide panel of host CPUs, thanks to the use of GNU Lightning as the code emitter.

Succeeding where others failed

The big disavantage of traditional dynamic recompilers is that they only target one architecture. PCSX has one dynamic recompiler for x86 PCs, another one for ARM-based smartphones, and yet another one for MIPS. Each new dynarec means a different code base, a different performance, a different compatibility.

Ever since projects like LLVM or libjit came out, several unrelated attempts have been made by different people to create a dynamic compiler that would use these technologies to support a lot of different CPUs. Unfortunately, they all failed, as they soon discovered that these technologies were really not well-suited to dynamic recompilers. The reason is that while they can generate well-optimized code at runtime, they were not designed to do so in a tight schedule. A game's frame time is generally of about 16ms, and the recompiler sometimes needs to execute thousands of pieces of code in that time frame, something that LLVM or libjit just cannot do.

GNU Lightning is different than the two aforementioned projects as it has a different scope. LLVM and libjit were designed for creating programming language compilers or fast interpreters, and as such have the concept of variables, which is a construct that all programming languages share, but not something that machine code has. Machine code manipulates registers.

GNU Lightning is better described as a code emitter. It offers you a finite number of virtual registers (the actual number depends on the architecture), and a programming API that closely ressembles the instruction set of MIPS processors. All it does, is translate each virtual instruction and virtual registers to the corresponding CPU instruction (or instructions) with the corresponding hardware registers. It doesn't perform any optimization (except very obvious and easy ones), and does not provide register allocation facilities either. Thanks to being that simple, it is extremely fast at generating code, and is well suited for a portable dynamic recompiler project, as it supports almost every CPU on which you'd ever want to run a Playstation emulator.

Implementation details

As you may have guessed by now, the Lightrec name is a fusion of GNU Lightning and recompiler, as it's what it really is. It could also be read as Light Recompiler and that wouldn't be wrong either.

From a compatibility standpoint, Lightrec is very compatible with only a handful of games showing glitches or bugs. Regarding performance, it was truely abysmal a couple of years ago, being slower than PCSX's interpreter. It is now a few times faster, thanks to a few tricks:

  • High-level optimizations.
    The MIPS code is first pre-compiled into a form of Intermediate Representation (IR). Basically, just a single-linked list of structures representing the instructions. On that list, several optimization steps are performed: instructions are modified, reordered, tagged; new meta-instructions can be added, for instance to tell the code generator that a certain register won't be used anymore.
  • Run-time profiling with a built-in interpreter.
    The first time the MIPS code will jump to a new address, Lightrec will emulate it with its built-in interpreter. The interpreter will then gather run-time information. For instance, whether a load/store will hit the BIOS area, the RAM, or a hardware register. The code generator will then use this information to generate direct read/writes to the emulated memories, instead of jumping to C for every call.

  • Lazy compilation.
    If the interpreter detects a block of code that would be very hard to compile properly (e.g. a branch with a branch in its delay slot), the block is marked as not compilable, and will always be emulated with the interpreter. This allows to keep the code emitter simple and easy to understand.

  • Threaded compilation.
    The code generator can optionally run in a different thread of execution. Instead of compiling a block of code right when we jump to it, Lightrec can add it to the working queue of the threaded compiler, and emulate the block of code using the interpreter in the meantime. This greatly reduces stutter in the games when a lot of code is being recompiled, as the main execution thread doesn't wait anymore for the compilation process to finish.

  • Fast code LUT.
    Coming from psx4all's mipsrec dynarec, the function block Look-Up Table (LUT) is now a huge array of the size of the Playstation's RAM, 2 MiB. It makes it extremely fast to obtain a pointer to generated code from its MIPS address, and extremely easy to mark a block of code as outdated - the generated code just writes NULL to the corresponding offset.

Big-Ass Debugger

The tool I developped that helped build this dynarec from the ground up is called the Big-Ass Debugger. The name comes from the fact that it doesn't try to do anything smart: it runs the interpreter and the dynarec in parallel, and every time a block of code is executed, it will calculate a hash of all the registers and the whole RAM, thousands of times per frame, in the two instances of the emulator, and compare the results. It is a slow process, but if a difference is found, emulation stops and the debugger reports what exactly has gone wrong, and where it went wrong. This tool is what allowed me, from a state where the code emitted for all MIPS instructions were calls to PCSX's interpreter, to write the dynarec progressively, instruction after instruction, while still making sure that my code was fully working and compliant with the expected behaviour shown by the interpreter. To this day, I still use it to verify each optimization and improvement made to the dynarec.

Projects using Lightrec

So far Lightrec has been plugged into a few different emulators:

  • PCSX-ReArmed, which is the emulator I've been using for developing Lightrec. Not the fastest, since the dynarec exits after each piece of recompiled code; but it supports the Big-Ass Debugger.
  • pcsx4all, which is the fastest for various reasons: the dynarec doesn't return as often to the main loop, and the BIOS/scratchpad/RAM and RAM mirror memories are memory-mapped to locations that are a much better fit for the generated code.
  • Beetle, which is a libretro core based on Mednafen. The Lightrec integration is much more recent and still incomplete, but it already is a strong contender to replace the slow interpreter that Beetle has been using since the beginning.

Future

As it is now, the dynarec is already working really well and ready for prime time. Of course, it still has ways to go; I already have ideas about advanced optimizations (or should I say optimizations senquack suggested) but all the "easy" optimizations have already been done, and the benefit-over-work-needed ratio is getting smaller and smaller. Also, the fact that it's been plugged into Beetle means that we may start seeing it running on all libretro-supported platforms, which is something I definitely look forward to.

Overall, it's been a challenging project and I'm glad that I could take it to a state where it is usable.

Till next time!

OpenDingux release 2019.06.01

Another month, another update.

Changelog

  • Added USB mass storage mode (MTP). Finally, you can transfer your apps other files without any specific software! Use the 'USB Mode' app in the settings tab to revert to the Ethernet-over-USB mode that was the default in the previous versions of the firmware.
  • Added 20 MiB of in-RAM compressed swap (zram). This will permits some RAM-hungry apps to start, although with a performance hit vs. those who don't require swap.
  • Switched from mdev to udev, which fixes some issues, like the automounting of SD cards.
  • The brightness setting is now preserved across reboots.
  • And most importantly, the cow is back. Those who used to develop for OpenDingux on other devices will understand.

Download links

The update OPK can be downloaded here: OpenDingux update OPK.
Be careful that you must have at least 25 MiB of internal storage before running the update.

Enjoy!