Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Core: Implement Device Mapping & GPU SMMU #12579

Merged
merged 23 commits into from Jan 22, 2024
Merged

Conversation

FernandoS27
Copy link
Contributor

@FernandoS27 FernandoS27 commented Jan 4, 2024

Memory in Tegra X1 works very differently to how it's currently emulated. Normally there's physical memory of (4Gb), a virtual memory address space used by processes (applications like games for example) and device virtual memory spaces used by peripherals (GPU, DSP, bluetooth, etc). This device memory space is normally called (SMMU) System Memory Management Unit or IOMMU. With this address space, the different devices can map over physical memory used by many different applications to one they can only see. This has the advantage that it allows sharing memory among many multiple processes in the OS with each device.

This PR implements a general simplified version of the SMMU and implements it for the GPU. With some differences, first, we use full 34 bits of the AS instead of just 32 bits for pinning memory, second, we use a common SMMU page file for every process in the GPU instead of switching it per channel.

Currently in yuzu., the GPU uses a memory model of GMMU (GPU's MMU) -> PMMU (the main application's virtual space) -> physical memory to handle memory overall. This is great when there is only one application using the GPU at a time and it's never shared with other processes. 95% of the time this is what happens in the switch. However, it is possible to have multiple processes running and rendering concurrently as is the case of overlays and applets like inline software keyboard. It's also possible to have an application suspended while another is running (an applet is running while the game is suspended). With the implementation of the SMMU, it is now possible to share the GPU resources without any issues whatsoever.

Advantages of new SMMU:

  • Not perfect but more accurate than before.
  • Uses less memory overall for tracking GPU memory.
  • More optimizations possible, in the future.
  • Multiprocess use of the emulated GPU.
  • Other devices can use the device mapper in case it's needed or be extended through correct implementation of KDevicePageTable.
  • Total memory usage of the SMMU is about 80Mb counting tracking. The tracking alone on none-SMMU version costed 256MBs.

Disadvantages of the new SMMU.

  • Harder tracking of resources.
  • Complicated.
  • More accurate than before but not perfectly accurate to avoid sacrificing performance.

Current issues:

  • Pikmin 4 seems to get some vertex explosions after transitioning in many worlds. (Fixed)
  • There's some memory leaking after closing an application. (Unrelated, the leak is on master)
  • It's not working with 6Gb/8Gb memory layouts. (fixed)
  • Needs more cleanup.

@FernandoS27 FernandoS27 marked this pull request as draft January 4, 2024 19:16
src/core/hle/service/nvdrv/core/nvmap.cpp Outdated Show resolved Hide resolved
src/core/hle/service/nvdrv/devices/nvmap.cpp Outdated Show resolved Hide resolved
@kkxiegouquan

This comment was marked as off-topic.

@FernandoS27
Copy link
Contributor Author

l hope to test this feature

this alone means almost nothing to users but it will allow a LOT of cool stuffs soon.

@FernandoS27 FernandoS27 added core-new New functionality for Core gpu-new New functionality for GPU gpu General functionality additions / issues for GPU labels Jan 7, 2024
@FernandoS27 FernandoS27 marked this pull request as ready for review January 7, 2024 04:05
kkxiegouquan

This comment was marked as off-topic.

@Slexer
Copy link

Slexer commented Jan 7, 2024

How much work is left for a complete UMA implementation? Is SMMU a big chunk of it or just a little bit?

@FernandoS27
Copy link
Contributor Author

How much work is left for a complete UMA implementation? Is SMMU a big chunk of it or just a little bit?

For a real good and accurate UMA implementation; a lot of work. SMMU makes it easier for making the mirrors needed for DMI.

After that We'll do only DMI for downloads, which will improve things a bit.

Doing full DMI will be a challenge if we want to keep or even increase the compatibility we have. It will take a lot of time. I can't give an ETA.

@brujo5
Copy link

brujo5 commented Jan 8, 2024

by DMI do they refer to direct memory import? If yes, I remember that in Skyline they said that it only worked perfectly on a7xx and on a6xx you had to restart the phone or the games start to crash and it was a hardware bug, is that true?

@FernandoS27
Copy link
Contributor Author

by DMI do they refer to direct memory import? If yes, I remember that in Skyline they said that it only worked perfectly on a7xx and on a6xx you had to restart the phone or the games start to crash and it was a hardware bug, is that true?

Yes. It's true.

@liamwhite liamwhite added mainline-merge Merge this PR into the next mainline build and removed early-access-merge labels Jan 19, 2024
@liamwhite liamwhite merged commit 8bd1047 into yuzu-emu:master Jan 22, 2024
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
android-merge core-new New functionality for Core gpu General functionality additions / issues for GPU gpu-new New functionality for GPU mainline-merge Merge this PR into the next mainline build
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants