In June, AMD announced updates to its 4th Gen AMD EPYC family of processors designed for specialized workloads needed to address businesses’ requirements. The announcements were made during the opening of AMD’s Data Center and AI Technology Premiere with the unveiling of the 4th Gen AMD EPYC 97X4 processors, previously codenamed AMD Bergamo. The AMD EPYC 97X4 processors provide greater vCPU density and increased performance targeting AI applications and applications running in the cloud.
In June, AMD announced updates to its 4th Gen AMD EPYC family of processors designed for specialized workloads needed to address businesses’ requirements. The announcements were made during the opening of AMD’s Data Center and AI Technology Premiere with the unveiling of the 4th Gen AMD EPYC 97X4 processors, previously codenamed AMD Bergamo. The AMD EPYC 97X4 processors provide greater vCPU density and increased performance targeting AI applications and applications running in the cloud.
AMD Bergamo
The AMD EPYC Zen 4 processors, equipped with 3D V-Cache, codenamed Genoa-X, were identified as the leading x86 server CPU for technical computing in a recent SPEC.org report. These processors bring 3D V-Cache to the 96-core Zen 4 chips and are ideally suited for demanding technical computing workloads.
According to AMD, aligning its product roadmap to customers’ environments can deliver the performance needed for general-purpose, cloud-native, and technical computing workloads. AMD has taken the position that one size does not fit all. These new AMD EPYC processors were designed around that concept to deliver increased performance for specific workloads.
Applications are increasingly designed for cloud-native workloads, allowing rapid development, deployment, and updates. The AMD EPYC 97X4 processors, with 128 cores, can deliver better throughput, up to 3.7x better performance for key cloud-native workloads compared to Ampere.
Model | Cores | Max Threads | Default TDP (W) | Base Freq (GHz) | Boost Freq1 (GHz) | L3 Cache (MB) |
9754 | 128 | 256 | 360W | 2.25 | 3.10 | 256 |
9754S | 128 | 128 | 360W | 2.25 | 3.10 | 256 |
9734 | 112 | 224 | 320W | 2.20 | 3.00 | 256 |
Addressing the need for faster design iterations and comprehensive simulations, the 4th Gen AMD EPYC processors with 3D V-Cache deliver a best-in-class x86 CPU for technical computing workloads such as computational fluid dynamics (CFD), finite element analysis (FEA), electronic design automation (EDA), and structural analysis. These processors have up to 96 “Zen 4” cores and 1GB+ of L3 cache and can significantly speed up product development.
Model | Cores | Max Threads | Default TDP (W) | Base Freq (GHz) | Boost Freq1 (GHz) | L3 Cache (MB) |
9684X | 96 | 192 | 400W | 2.55 | 3.70 | 1,152 |
9384X | 32 | 64 | 320W | 3.10 | 3.90 | 768 |
9184X | 16 | 32 | 320W | 3.55 | 4.20 | 768 |
AMD Bergamo and Genoa-X Benchmarks
We tested two new CPUs and simulated a third by disabling SMT. In the lab, we had the 9754, a 128 core 256 thread Bergamo chip, and the 9684X, a 96 core 192 thread Genoa-X chip with a massive 1.1 GB of 3D L3 cache and a higher clock than the Genoa. To simulate the third, we disabled SMT on our 9754 since AMD has also released the 9754S Bergamo chip, which comes without multithreading and is just pure cores. Our testing of the SMT-disabled chip will be separate from this review.
We conducted an extensive set of benchmarks to evaluate the performance of the newly released AMD Bergamo and Genoa-X CPUs. We started with Cinebench R23 tests on Multi- and Single-core configurations, which provided valuable insights about these processors’ rendering capabilities.
It appears that the application, Cinebench R23, itself is limited in how it can handle so many threads. We noted a cap on 128 cores, but the 96 Core Genoa-X 3D Cache really shines in its performance, themes that will be common across all of the tests.
Next, we ran y-cruncher at 1 billion and 10 billion digit levels to assess their computational prowess, particularly for tasks involving a high degree of number crunching.
Lower is better here, our 2 processor 96 core Genoa results are after some extensive tuning and were able to put up some good numbers, and the stock configuration of both the Genoa-X and Bergamo chips show some promising potential for tuning and tweaking to put up some even more impressive record-setting numbers.
We then used Blender benchmarks, specifically the Monster, Junkshop, and Classroom tests, to measure how well these CPUs perform in graphically intensive rendering scenarios.
In the Blender benchmark, the raw power of 512 threads really showed through, once again topping the charts with just a stock configuration.
Lastly, we ran Geekbench 6 CPU tests, known for their broad examination of processor performance in single-core and multi-core operations. This suite of tests provided us with a comprehensive view of the overall capabilities, strengths, and incremental improvement of the AMD Bergamo and Genoa-X processors.
Performance Overview
Here are the raw scores for each of the benchmarks. Keep in mind, we had months to do tuning and configuration on the 96-core Genoa system, and only ran a stock configuration of the new AMD Bergamo.
Benchmark | 2p/96c Genoa | 1p/96c Genoa-X | 1p/128c Bergamo | 2p/128c Bergamo | |
---|---|---|---|---|---|
Cinebench R23 Multi | 116744 | 93720 | 103876 | 102125 | |
Cinebench R23 Single | 1294 | 1301 | 1098 | 1089 | |
Cinebench MP Ratio | 90.22 | 72.04 | 94.65 | 93.75 | |
y-cruncher 1b | 8.882 | 10.296 | 9.568 | 9.184 | |
y-cruncher 10b | 51.071 | 72.377 | 80.171 | 55.683 | |
Blender Monster | 1700.647985 | 879.580323 | 1031.49474 | 2038.714424 | |
Blender Junkshop | 1101.839271 | 605.445705 | 704.167826 | 1382.575225 | |
Blender Classroom | 869.476693 | 421.318478 | 506.665693 | 1045.959162 | |
Geekbench 6 CPU Single | 2048 | 2093 | 1738 | 1723 | |
Geekbench 6 CPU Multi | 20217 | 21329 | 18683 | 17916 |
AMD Bergamo for AI
Featuring an array of AI inference engines from top-tier vendors, the UL Procyon AI Inference Benchmark caters to a broad spectrum of hardware setups and requirements. The benchmark score provides a convenient and standardized summary of on-device inferencing performance. This enables us to compare and contrast different hardware setups in real-world situations without requiring in-house solutions.
Processor | Model | Average Inference Time | Median Inference Time | Total Inferences Count |
---|---|---|---|---|
2p/96c Genoa | MobileNet V3 | 3.61 ms | 3.63 ms | 45,800 |
1p/96c Genoa-X | MobileNet V3 | 2.71 ms | 2.72 ms | 58,631 |
1p/128c Bergamo | MobileNet V3 | 3.90 ms | 3.91 ms | 41,538 |
2p/128c Bergamo | MobileNet V3 | 4.10 ms | 4.16 ms | 40,008 |
2p/96c Genoa | ResNet 50 | 6.36 ms | 6.34 ms | 26,525 |
1p/96c Genoa-X | ResNet 50 | 6.66 ms | 6.64 ms | 25,049 |
1p/128c Bergamo | ResNet 50 | 10.14 ms | 10.08 ms | 16,919 |
2p/128c Bergamo | ResNet 50 | 8.21 ms | 8.22 ms | 20,842 |
2p/96c Genoa | Inception V4 | 25.98 ms | 25.99 ms | 6,555 |
1p/96c Genoa-X | Inception V4 | 29.19 ms | 29.18 ms | 5,879 |
1p/128c Bergamo | Inception V4 | 33.17 ms | 33.04 ms | 5,158 |
2p/128c Bergamo | Inception V4 | 30.63 ms | 30.68 ms | 5,573 |
2p/96c Genoa | DeepLab V3 | 25.51 ms | 25.33 ms | 5,660 |
1p/96c Genoa-X | DeepLab V3 | 28.26 ms | 27.86 ms | 5,394 |
1p/128c Bergamo | DeepLab V3 | 32.16 ms | 32.09 ms | 4,708 |
2p/128c Bergamo | DeepLab V3 | 31.16 ms | 30.57 ms | 4,807 |
2p/96c Genoa | YOLO V3 | 34.10 ms | 34.13 ms | 4,818 |
1p/96c Genoa-X | YOLO V3 | 43.59 ms | 43.58 ms | 3,831 |
1p/128c Bergamo | YOLO V3 | 44.50 ms | 44.39 ms | 3,739 |
2p/128c Bergamo | YOLO V3 | 41.35 ms | 41.38 ms | 4,001 |
2p/96c Genoa | Real-ESRGAN | 2540.04 ms | 2524.03 ms | 71 |
1p/96c Genoa-X | Real-ESRGAN | 3725.07 ms | 3720.35 ms | 49 |
1p/128c Bergamo | Real-ESRGAN | 2734.77 ms | 2717.41 ms | 66 |
2p/128c Bergamo | Real-ESRGAN | 2291.66 ms | 2301.35 ms | 79 |
Final Thoughts
Our tests with the new 128-core AMD Bergamo CPU reflect the expected gains of the uptick in core count. Regarding raw performance, the new CPU handled data and compute-intensive tasks with an ease that seemed almost effortless. Our trials with 3D rendering and computation applications, in particular, showcased the true prowess of these extra cores.
We noted a significant boost in processing speeds over the 96-core Genoa, both with and without SMT enabled, highlighting the efficiency of AMD’s chiplet design. As we delve deeper into the era of advanced ultra-high core count computing, this 128-core, 256-thread monster sets a new benchmark in rack density.
Engage with StorageReview
Newsletter | YouTube | Podcast iTunes/Spotify | Instagram | Twitter | TikTok | RSS Feed