How To Speed Up Citra Emulator Mac

Honestly, it depends on the game and specs will lessen as development continues. For reference I have an i5 6600k clocked at 3.8ghz and a GTX 960 4GB with a slight overclock coupled with 8gb of ram. The only way to speed up Citra is to find parts in citra that are slow and make them fast. This is not easy at all and needs very technical people to work many long, hard hours to make it faster. Additionally, citra isn’t even feature complete yet meaning if you were to optimize citra code right now, you will probably have to throw out some.

Cut to the chase, how fast is this?

Realtime performance comparison with framelimit off

Very fast. Test results across various computers show that it averages out to be a 2x speed boost.With the new update, Citra will use much more of your GPU, removing some of the dependence on a CPU with high single-core performance.As always, the actual difference will vary by game and by your specific hardware configuration!In celebration of this massive improvement, we wanted to share some of the successes and struggles we’ve had over the years with the hardware renderer.

Brief History of Citra’s Rendering Backends

Back in early 2015, Citra was still a young emulator, and had just barely started displaying graphics for the first time.In a momentous occasion, Citra displayed 3D graphics from a commercial game, Legend of Zelda: Ocarina of Time 3D

This engineering feat was thanks to the hard work of many contributors in both the emulator scene and the 3ds hacking scene, who worked tirelessly to reverse engineer the 3DS GPU, a chip called the PICA200.But not even a few months later, Citra was able to play the game at full speed!

Why is there such a major difference in speed from the first and the second video?The speed difference boils down to how the 3DS GPU is being emulated.The first video is showing off the software renderer, which emulates the PICA200 by using your computer’s CPU.On the other hand, the second video is using the OpenGL hardware renderer, which emulates the PICA200 by using your computer’s GPU.From those videos, using your GPU to emulate the 3DS GPU is the clear winner when it comes to speed!However, it’s not all sunshine and daisies; there’s always tradeoffs in software development.

Challenges in the Hardware Renderer

Earlier it was stated that the OpenGL hardware renderer was emulating the PICA200 by using the GPU instead of the CPU, and … that’s only partially true.As it stands, only a portion of the PICA200 emulation is running on the GPU; most of it is running on the CPU!To understand why, we need to dive a little deeper into the difference between CPUs and modern GPUs.

As a general rule of thumb, CPUs are fast at computing general tasks, while GPUs are blazing fast at computing very specific tasks.Whenever the tasks the PICA200 can perform matches up with tasks you can do on a GPU using OpenGL, everything is fast and everyone is happy!That said, we tend to run into edge cases that the PICA200 supports, but frankly, OpenGL is not well suited to support.This leads to cases where sometimes we just have to live with minor inaccuracies as a tradeoff for speed.

Mac

OpenGL is also great for emulator developers because it’s a cross-platform standard for graphics, with support for all major desktop platforms.But because OpenGL is just a specification, every vendor is left up to their own to make their drivers support the specification for every individual platform.This means performance and features can vary widely between operating systems, graphics driver, and the physical graphics card.As you might have guessed, this leads to some OS specific bugs that are very hard to track down.In the linked issue, only on Mac OSX, Citra would leak memory from the hardware renderer.We traced it back to how textures were juggled between the 3DS memory and the host GPU, but we don’t have many developers that use Mac, so we never did find the root cause.For a little bit of good news, this is fixed in the latest nightly, but only because the entire texture handling code was rewritten!

Moving Forward with the Hardware Renderer: Cleaning up Texture Forwarding

How To Speed Up Citra Emulator Mac

Despite the issues mentioned above, OpenGL has been a fair choice for a hardware renderer, and phantom has been hard at work improving the renderer.Their first major contribution was a massive, complete rewrite of the texture forwarding support that was added back in 2016.The new texture forwarding code increases the speed of many games, and fixes upscaled rendering in some other games as well.

Whenever a texture is used in the hardware renderer, the hardware renderer will try to use a copy of the texture already in the GPU memory, but if that fails, it has to reload the texture from the emulated 3DS memory.This is called a texture upload, and it’s slow for a good reason.The communication between CPU and GPU is optimized for large amounts of data transferred, but as a tradeoff, it’s not very fast.This works great for PC games, where you know all the textures you want to upload ahead of time and can send them in one large batch, but ends up hurting performance for Citra since we can’t know in advance when the emulated game will do something that requires a texture upload.

The texture forwarding rewrite increases the speed of many games by adding in new checks to avoid this costly synchronization of textures between emulated 3DS memory and the host GPU memory.Additionally, the new texture forwarding can avoid even more texture uploads by copying the data from any compatible locations.As an extension of this feature, phantom went the extra mile and fixed Pokémon outlines as well!Pokémon games would draw the outline by reinterpreting the depth and stencil buffer as an RGBA texture, using the value for the red color to draw the outline.Sadly, OpenGL doesn’t let you just reinterpret like that, meaning we needed to be more creative.phantom worked around this limitation by copying the data into a Pixel Buffer Object, and running a shader to extract the data into a Buffer Texture which they could use to draw into a new texture with the correct format.

The texture forwarding rewrite has been battle tested in Citra Canary for the last 2 months, during which time we fixed over 20 reported issues.We are happy to announce that it’s now merged into the master branch, so please enjoy the new feature in the latest nightly build!

The Big News You’ve Been Waiting For

A few paragraphs ago, we mentioned that Citra’s hardware renderer did most of the emulation on the CPU, and only some of it on the GPU.The big news today is Citra now does the entire GPU emulation on the host GPU.

With an unbelievable amount of effort, phantom has done it again.Moving the rest of the PICA200 emulation to the GPU was always a sort of “white whale” for Citra.We knew it would make things fast, but the sheer amount of effort required to make this happen scared off all those who dared attempt it.But before we get into why this was so challenging, let’s see some real performance numbers!

All testing was done with the following settings: 4x Internal Resolution, Accurate Hardware Shaders On, Framelimit Off

Obstacles to Emulating the PICA200 on a GPU

Making Functions Out of GOTOs

It’s likely that the game developers for the 3DS didn’t have to write PICA200 GPU assembly, but when emulating the PICA200, all Citra can work with is a commandlist and a stream of PICA200 opcodes.While the developers probably wrote in a high level shader language that supports functions, when the shaders are compiled, most of that goes away.The PICA200 supports barebones CALL, IF, and LOOP operations, but also supports an arbitrary JMP that can go to any address.Translating PICA200 shaders into GLSL (OpenGL Shader Language) means that you’ll have to be prepared to rewrite every arbitrary JMP without using a goto as GLSL doesn’t support them.

phantom assumed the worst when they originally translated PICA200 shaders into GLSL and wrote a monstrous switch statement that would have a case for every jump target and act as a PICA200 shader interpreter.This worked, but proved to be slower than the software renderer!Now that phantom knew it was possible, and they had some data about how the average PICA200 shader looked, they took to rewrite it with the goal to make it fast.While the shaders could theoretically be very unruly and hard to convert, almost all the shaders were well behaved, presumably because they are compiled from a higher level language.This time around, phantom generated native GLSL functions wherever possible by analyzing the control flow of the instructions, and the results are much prettier and faster.Armed with the new knowledge, phantom rewrote the conversion a third time, and optimized the generated shaders even further.What started off slower than the software renderer ended up being the massive performance boost we have today!

Multiplication Shouldn’t Be This Slow

When converting from PICA200 shaders into GLSL, there are a few PICA200 opcodes that should just match up without any issues.Addition, subtraction, and multiplication should … wait. Where did this issue come from?

It turns out that the PICA200 multiplication opcode has a few edge cases that don’t impact a large majority of games, and leads to some hilarious results in others.On the PICA200, infinity * 0 = 0 but in OpenGL infinity * 0 = NaN and this can’t be configured.In the generated GLSL shaders, phantom emulates this behavior by making a function call instead of a simple multiplication.

Alas, it’s a performance penalty to use a function everywhere instead of regular multiplication.On weaker GPUs, we noticed the penalty is so severe, we actually made this configurable.The whole point of a hardware renderer is to be fast, so eating a penalty when only a small handful of games need this level of accuracy would be regrettable.You can turn off this feature in the settings by deselecting “Accurate Hardware Shader” and get a noticeable performance boost, but be aware that a few games will break in strange ways!

Finding Bugs and Working Overtime

We were very excited to launch this feature when phantom declared that it was ready; results from user testing were entirely positive, and the performance improvements were unbelievable, but one thing stood in the way.No one had yet tested to see if it worked on AMD GPUs.We called for our good friend JMC47 to break out the AMD card he uses for testing Dolphin, and Citra crashed the driver! Oh no!

From JMC47’s time in Dolphin, he’s made a few friends here and there, and he found someone willing to investigate.After a few gruelling weeks, JonnyH was able to narrow down what the problem is, and luckily it’s not a bug in the AMD drivers.It turns out that it’s a bug in the GL specification, or more precisely, the exact issue is ambiguous wording.glDrawRangeElementsBaseVertex states that the indices should be a pointer, but doesn’t say whether the pointer should be to CPU memory or GPU memory.Citra passed a pointer to CPU memory without a second thought, as both Nvidia and Intel drivers seemed fine with it, but AMD drivers are strict.As a workaround, phantom added support for streaming storage buffers, which allows Citra to work with the data on the CPU and sync it with the GPU when it’s time to draw.

It’s a challenge to support all of the many GPUs out there, and we’ve put in so much work to ensure that this new feature will run on as many hardware configurations as possible.But it’s very likely that there will be some GPUs that do not fully support the new hardware renderer, and so we added another option in the Configuration to allow users to turn this feature off entirely.Simply changing the “Shader Emulation” from “GPU” to “CPU” will revert to using the same old CPU shaders that Citra was using before.

What’s next

While today marks a victory for fast emulation, we always have room for improvement.As explained earlier in the article, getting OpenGL to work consistently across all platforms and GPUs is surprisingly challenging, so be ready for bugs.This isn’t the end for the hardware renderer, but a wonderful boost to one of Citra’s more complicated features.There is always something more that can be done to make the hardware renderer even faster and more accurate (contributions welcome!), but in the meantime, we hope you enjoy playing even more games at full speed!

The below game article is based on user submitted content.
See a mistake? Want to contribute? Edit this game article on Github
Rating Okay
Game functions with major graphical or audio glitches, but game is playable from start to finish with workarounds.
Type
Game Icon
Title IDs
00040000000A0500
System Files Required?N/A
Shared Font Required?N/A

Summary

Fire Emblem Awakening works very well in Citra. Graphically, the game looks great when not in areas afflicted by missing geometry shaders; however, when in those areas (usually in-battle), major graphical issues occur, such as Chrom having blonde hair, Olivia a black dancer’s outfit, or dressing dark knights as tricksters. This game can achieve full speed on decent hardware.

Player experience: Really great performance, was able to finish start to finish with no issues. The only thing that might be annoying is it goes to 70% speed on cutscenes but it’s not a big deal at all. Cutscenes now run at full speed on sufficient hardware in recent Citra versions.

Compatibility

IDBuild DateTested ByHardwareCitra VersionRating
64a952fd-2646-4e9f-92fb-b50ac08f35a9 10/14/2020 Ceruvianon 10/26/2020Intel(R) Core(TM) i7-6700T CPU @ 2.80GHz
Intel(R) HD Graphics 530
Windows
Nightly Build
1bb2057
Great
c6fda0ae-5f57-4fc4-b9ba-bc287e52e8ae 10/11/2020 IloveRPGson 10/12/2020Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
GeForce RTX 2060/PCIe/SSE2
Windows
Canary Build
a34b8e8
Okay
30340226-14b2-410a-baa0-9c0df4723a13 10/9/2020 IloveRPGson 10/11/2020Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
GeForce RTX 2060/PCIe/SSE2
Windows
Canary Build
f9e7514
Bad
29956df1-8c17-47d7-af26-050fc290558c 10/1/2020 Araxiason 10/03/2020 Intel(R) Core(TM) i5-3210M CPU @ 2.50GHz
GeForce GT 640M LE/PCIe/SSE2
Windows
Canary Build
79b5bcc
Okay
94f59cca-4df8-48b7-ba63-f7819a63acf9 09/19/2020 ImamHasan1991on 09/26/2020Intel(R) Core(TM) i3-6006U CPU @ 2.00GHz
GeForce 920M/PCIe/SSE2
Windows
Nightly Build
a576eb6
Great
128b9eb6-8df9-48f3-b9e9-0b90ed16b996 09/10/2020 SapphicFaeon 09/11/2020AMD Ryzen 5 2600 Six-Core Processor
GeForce GTX 1060 6GB/PCIe/SSE2
Windows
Nightly Build
df9e230
Perfect
9c849b3e-cbd0-494d-879f-c8bfad946c0e 09/10/2020 ReadyRedon 09/18/2020AMD Ryzen 5 2600 Six-Core Processor
Radeon RX 580 Series
Windows
Canary Build
063cae9
Perfect
1a845b15-5b2f-43f4-bdaf-c4c7151f67fa 09/6/2020 Linkarioon 10/06/2020Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
GeForce GTX 1070/PCIe/SSE2
Windows
Nightly Build
e97ecdc
Perfect
49cdd650-fef2-40bb-b61d-0376eee1fdfc 09/5/2020 majinboneon 09/06/2020Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz
Intel(R) HD Graphics 620
Windows
Nightly Build
316a649
Great
2672d312-d78b-49d8-a1d3-471f083e2b10 09/5/2020 Seoneion 09/06/2020AMD Ryzen 5 3600 6-Core Processor
AMD Radeon RX 5700 XT
Windows
Nightly Build
316a649
Great

Known Issues

Speed

No issues have been reported for this game.

Savefiles

No savefiles have been uploaded for this game.

Screenshots

How To Speed Up Citra Emulator Macos

No screenshots have been uploaded for this game.