Home Enterprise AMD Bergamo/Genoa-X Review

AMD Bergamo/Genoa-X Review

by Jordan Ranous

In June, AMD announced updates to its 4th Gen AMD EPYC family of processors designed for specialized workloads needed to address businesses’ requirements. The announcements were made during the opening of AMD’s Data Center and AI Technology Premiere with the unveiling of the 4th Gen AMD EPYC 97X4 processors, previously codenamed AMD Bergamo. The AMD EPYC 97X4 processors provide greater vCPU density and increased performance targeting AI applications and applications running in the cloud.

In June, AMD announced updates to its 4th Gen AMD EPYC family of processors designed for specialized workloads needed to address businesses’ requirements. The announcements were made during the opening of AMD’s Data Center and AI Technology Premiere with the unveiling of the 4th Gen AMD EPYC 97X4 processors, previously codenamed AMD Bergamo. The AMD EPYC 97X4 processors provide greater vCPU density and increased performance targeting AI applications and applications running in the cloud.

AMD Bergamo CPU Socket

AMD Bergamo

The AMD EPYC Zen 4 processors, equipped with 3D V-Cache, codenamed Genoa-X, were identified as the leading x86 server CPU for technical computing in a recent SPEC.org report. These processors bring 3D V-Cache to the 96-core Zen 4 chips and are ideally suited for demanding technical computing workloads.

AMD Bergamo CPU 9684X

According to AMD, aligning its product roadmap to customers’ environments can deliver the performance needed for general-purpose, cloud-native, and technical computing workloads. AMD has taken the position that one size does not fit all. These new AMD EPYC processors were designed around that concept to deliver increased performance for specific workloads.

AMD Bergamo CPU 9684X and 9754

Applications are increasingly designed for cloud-native workloads, allowing rapid development, deployment, and updates. The AMD EPYC 97X4 processors, with 128 cores, can deliver better throughput, up to 3.7x better performance for key cloud-native workloads compared to Ampere.

Model Cores Max Threads Default TDP (W) Base Freq  (GHz) Boost Freq1 (GHz) L3 Cache  (MB)
9754 128 256 360W 2.25 3.10 256
9754S 128 128 360W 2.25 3.10 256
9734 112 224 320W 2.20 3.00 256

Addressing the need for faster design iterations and comprehensive simulations, the 4th Gen AMD EPYC processors with 3D  V-Cache deliver a best-in-class x86 CPU for technical computing workloads such as computational fluid dynamics (CFD), finite element analysis (FEA), electronic design automation (EDA), and structural analysis. These processors have up to 96 “Zen 4” cores and 1GB+ of L3 cache and can significantly speed up product development.

Model Cores Max Threads Default TDP (W) Base Freq  (GHz) Boost Freq1 (GHz) L3 Cache  (MB)
9684X 96 192 400W 2.55 3.70 1,152
9384X 32 64 320W 3.10 3.90 768
9184X 16 32 320W 3.55 4.20 768

AMD Bergamo and Genoa-X Benchmarks

We tested two new CPUs and simulated a third by disabling SMT. In the lab, we had the 9754, a 128 core 256 thread Bergamo chip, and the 9684X, a 96 core 192 thread Genoa-X chip with a massive 1.1 GB of 3D L3 cache and a higher clock than the Genoa. To simulate the third, we disabled SMT on our 9754 since AMD has also released the 9754S Bergamo chip, which comes without multithreading and is just pure cores. Our testing of the SMT-disabled chip will be separate from this review.

We conducted an extensive set of benchmarks to evaluate the performance of the newly released AMD Bergamo and Genoa-X CPUs. We started with Cinebench R23 tests on Multi- and Single-core configurations, which provided valuable insights about these processors’ rendering capabilities.

It appears that the application, Cinebench R23, itself is limited in how it can handle so many threads. We noted a cap on 128 cores, but the 96 Core Genoa-X 3D Cache really shines in its performance, themes that will be common across all of the tests.

Next, we ran y-cruncher at 1 billion and 10 billion digit levels to assess their computational prowess, particularly for tasks involving a high degree of number crunching.

Lower is better here, our 2 processor 96 core Genoa results are after some extensive tuning and were able to put up some good numbers, and the stock configuration of both the Genoa-X and Bergamo chips show some promising potential for tuning and tweaking to put up some even more impressive record-setting numbers.

We then used Blender benchmarks, specifically the Monster, Junkshop, and Classroom tests, to measure how well these CPUs perform in graphically intensive rendering scenarios.

In the Blender benchmark, the raw power of 512 threads really showed through, once again topping the charts with just a stock configuration.

Lastly, we ran Geekbench 6 CPU tests, known for their broad examination of processor performance in single-core and multi-core operations. This suite of tests provided us with a comprehensive view of the overall capabilities, strengths, and incremental improvement of the AMD Bergamo and Genoa-X processors.

Performance Overview

Here are the raw scores for each of the benchmarks. Keep in mind, we had months to do tuning and configuration on the 96-core Genoa system, and only ran a stock configuration of the new AMD Bergamo.

Benchmark 2p/96c Genoa 1p/96c Genoa-X 1p/128c Bergamo 2p/128c Bergamo
Cinebench R23 Multi 116744 93720 103876 102125
Cinebench R23 Single 1294 1301 1098 1089
Cinebench MP Ratio 90.22 72.04 94.65 93.75
y-cruncher 1b 8.882 10.296 9.568 9.184
y-cruncher 10b 51.071 72.377 80.171 55.683
Blender Monster 1700.647985 879.580323 1031.49474 2038.714424
Blender Junkshop 1101.839271 605.445705 704.167826 1382.575225
Blender Classroom 869.476693 421.318478 506.665693 1045.959162
Geekbench 6 CPU Single 2048 2093 1738 1723
Geekbench 6 CPU Multi 20217 21329 18683 17916

AMD Bergamo for AI

Featuring an array of AI inference engines from top-tier vendors, the UL Procyon AI Inference Benchmark caters to a broad spectrum of hardware setups and requirements. The benchmark score provides a convenient and standardized summary of on-device inferencing performance. This enables us to compare and contrast different hardware setups in real-world situations without requiring in-house solutions.

Processor Model Average Inference Time Median Inference Time Total Inferences Count
2p/96c Genoa MobileNet V3 3.61 ms 3.63 ms 45,800
1p/96c Genoa-X MobileNet V3 2.71 ms 2.72 ms 58,631
1p/128c Bergamo MobileNet V3 3.90 ms 3.91 ms 41,538
2p/128c Bergamo MobileNet V3 4.10 ms 4.16 ms 40,008
2p/96c Genoa ResNet 50 6.36 ms 6.34 ms 26,525
1p/96c Genoa-X ResNet 50 6.66 ms 6.64 ms 25,049
1p/128c Bergamo ResNet 50 10.14 ms 10.08 ms 16,919
2p/128c Bergamo ResNet 50 8.21 ms 8.22 ms 20,842
2p/96c Genoa Inception V4 25.98 ms 25.99 ms 6,555
1p/96c Genoa-X Inception V4 29.19 ms 29.18 ms 5,879
1p/128c Bergamo Inception V4 33.17 ms 33.04 ms 5,158
2p/128c Bergamo Inception V4 30.63 ms 30.68 ms 5,573
2p/96c Genoa DeepLab V3 25.51 ms 25.33 ms 5,660
1p/96c Genoa-X DeepLab V3 28.26 ms 27.86 ms 5,394
1p/128c Bergamo DeepLab V3 32.16 ms 32.09 ms 4,708
2p/128c Bergamo DeepLab V3 31.16 ms 30.57 ms 4,807
2p/96c Genoa YOLO V3 34.10 ms 34.13 ms 4,818
1p/96c Genoa-X YOLO V3 43.59 ms 43.58 ms 3,831
1p/128c Bergamo YOLO V3 44.50 ms 44.39 ms 3,739
2p/128c Bergamo YOLO V3 41.35 ms 41.38 ms 4,001
2p/96c Genoa Real-ESRGAN 2540.04 ms 2524.03 ms 71
1p/96c Genoa-X Real-ESRGAN 3725.07 ms 3720.35 ms 49
1p/128c Bergamo Real-ESRGAN 2734.77 ms 2717.41 ms 66
2p/128c Bergamo Real-ESRGAN 2291.66 ms 2301.35 ms 79

Final Thoughts

Our tests with the new 128-core AMD Bergamo CPU reflect the expected gains of the uptick in core count. Regarding raw performance, the new CPU handled data and compute-intensive tasks with an ease that seemed almost effortless. Our trials with 3D rendering and computation applications, in particular, showcased the true prowess of these extra cores.

AMD Epyc Bergamo CPU Server

We noted a significant boost in processing speeds over the 96-core Genoa, both with and without SMT enabled, highlighting the efficiency of AMD’s chiplet design. As we delve deeper into the era of advanced ultra-high core count computing, this 128-core, 256-thread monster sets a new benchmark in rack density.

Engage with StorageReview

Newsletter | YouTube | Podcast iTunes/Spotify | Instagram | Twitter | TikTok | RSS Feed