My GPU Benchmark for Stable Diffusion

In this video, I will be sharing with you the findings that I have when I explore the AI art using the two stable diffusion. I was going through the journey to try to understand what sort of hardware requirement the stable diffusion required, and then also the experience when I use different cards. So, there’s some additional data points that I haven’t seen from other channels or other sources. Hope you enjoy it and find something suitable for yourself.

So, if we go to the official website form of Stable Diffusion, there isn’t a lot of mention about the exact hardware requirement, only this line: “Four gigabytes of video card support.” It also mentioned that it supports two gigabytes, which is also working. That means that the critical part is we need a video card and also the video card should have at least two to four gigabytes to make it work.

Other than that, there are a bunch of software requirements, including Python, of course, the Stable Diffusion web UI, and lots of other things. But in this video, we are not going to touch on the installation of Stable Diffusion. Certainly, it is critical to get all the requirements done. As I mentioned, the critical part is the video card, and the official website didn’t mention much about it. On the internet, there are quite a few articles about benchmarking with the cards, but those are mostly the high-end cards. So for this one, I’m really focusing more on the beginners and some of those who are trying to get in touch with Stable Diffusion, trying to explore a little bit about this AI art. So suddenly, we want to have more benchmarks on the beginners and intermediate-grade video cards.

The data point that I collected is basically coming from the Stable Diffusion web UI extension system info. You can see that this is the benchmark data on the page. This is the extension from the web UI where you can run standardized benchmarks using 512 times 512 graphical size and a standard sampler. So all these benchmarks are consistent across, of course, the variables are the system, the platform, the video card, CPU, and also the software. And one thing to highlight is that sometimes the user, when they’re generating this benchmark, may be running something in the background. So the results may not always be the best performance that you can get from the graphics card, and it also depends on the number of sample sizes that we can get for each type of graphics card.

Let’s go into the results. I’ve been extracting data for about a month from this system info benchmark. You can see that this is a chart plotting the iterations per second taken for a standard generation based on SD web UI benchmark. No surprise that the top-notch cards, RTX 4090 and some of the top models, be it the 3000 series or 4000 series, all come up at the top of the chart. And it’s quite aligned with the normal gaming performance for the graphic cards.

One thing to highlight is for the GTX 1000 series, you can see that the performance dropped quite a bit compared to the 2000 series or 3000 series. I have done some research; I will be showing with you a bit later. So this is talking about the iteration per second. You can take a look at the benchmark. I will be going into a little bit, explaining a bit about the sample size.

Since the data all coming from the system info, those are users generated and submitted data. So you can see that for the users that are running Stable Diffusion, they are mostly using the higher-end cards, particularly the latest series and, in particular, the RTX 490. But then we can still get some samples from the 2000 series, 3000 series, and also a little bit from the 1000 series, a 1000 series. But the one thing to bear in mind is that given the number of sample size is really small, so there could be some variance. But I think, in general, the trend is still there, and it still serves as a very good reference.

Okay, one thing I missed a lot for the 1000 series card is that the performance is not quite aligned with the normal gaming part. The reason for that is for NVIDIA, they have introduced some additional feature from the RTX 2000 series that is basically allowing a lower precision for deep learning performance enhancement. So normally, they will be running at a 32-bit floating point, but then with that enhancement, it can be running at 16-bit, for example. So that requires less memory and less memory bandwidth, which speeds up the data transfer operation. And then also, for the mathematical operation, it runs much faster in reduced precision. So that explains why for the 1000 series card versus the 2000 series characters, it’s such a huge difference, basically two times better. So, it will affect the recommendation, of course.

So, I think that the iteration per second is a good reference in terms of understanding the performance of the graphics card. But more importantly, as a user, I try to explore the AI, try to generate the image. The what-matters mode is the time taken per each generation. Again, this particular chart is based on standard kind of dimension and also the standard sampler. So if you are using a more complicated sampler, loading up a few additional models and upscaling the image to a high resolution, then the time taken will be much longer. But again, this is a good indication of most of graphical GPUs, which give you an idea of how much time you have to wait when you’re using such graphics cards. So, this is an important benchmark.

To begin with, I think the 1000 series card, to be honest, it’s not that bad. It’s talking about 15 seconds, 10 to 15, which I think is quite reasonable to begin with. Of course, when you move to the later generations, even the lower-end card can give you around 40 to 50 seconds sort of range. And then the higher-end models within the series could improve the performance to about two seconds or for RTX 3070, 40-60. That’s kind of the range. And then the top-notch card can really give you a flashy experience, can generate an image in about a second. So I think this is hopefully this will give you a very good sense in terms of what sort of graphics cards that you would like. In the next slide, I’ll be summarizing some of the recommendations based on the intention or the purpose.

In terms of what graphics card do I need for a beginner? If you are trying to have some fun, doing a bit of exploration, want to do things on a local machine rather than using the alternatives like the cloud, Azure, AWS, etc. To begin with, I think that the GTX 1060, 1660, and 1070, if that is with six gigabytes of video RAM, that can give you like a 15-second waiting time per image, which is not ideal, but for beginners who want to try something, don’t want to invest a lot of funds, then certainly this is quite reasonable from my perspective.

For the intermediary, when you start to enjoy and try to explore a bit more beyond the basic features and doing a bit more work for either your job or your study, certainly you may want to reduce or enhance the performance of the video card. So, GTX 2060, 2070, 3060, and 4060 with at least six gigabytes, in between six to eight, will be ideal. That can give you a three to five-second per image type of performance, which is quite good. And of course, the price range is quite reasonable nowadays with these kind of older generation cards.

Of course, for advanced or heavy users, you don’t want to wait that few seconds, and then it’s not a few seconds forever. And each iteration, a few milliseconds per second, is a lot of time. So, and also, you may need to have some advanced features, running model training, or for doing creative work at very high resolution. So certainly, you want to get a better card. The recommendation here is GTX 2080, 3070, 3080, and 4070. If you can get the best, it would be the 12-gigabyte VRAM version. So that the resolution you can do high, much higher resolution with those cards, and the waiting time is quite good, less than two seconds, 1.5 to 2. So, reasonable.

And then, for a waiting time, of course, if you are so into this AI art and wanted to get the best performance and user experience, and certainly go for 3090, 4080, and 4090. Get the highest possible VRAM model if possible. The performance is really good, less than one second, or around one second per image. So, I think that just want to share it all with you, all the data that I’ve collated, which I’ve spent time to massage, pull into a nice graphical format. Hope you enjoy it, and I will see you in the next one. So please help me by either liking, sharing, or subscribing to my channel. Thanks for watching.