Home Enterprise The Math of Performance: Stress Testing Hardware for Math-Changing Results

The Math of Performance: Stress Testing Hardware for Math-Changing Results

by Guest Author

We doubled the decimal places record for log(2) to 3 trillion digits in just 42.7 hours, an efficiency improvement of about 50x in 3 years.

The following is from Jorge Zuniga, an independent civil engineer focused on finding mathematical recipes. Jorge has been working with Jordan on leveraging servers and storage in our lab to make fundamental advancements in mathematics. We’ve been thrilled to participate in this research with Jorge and our little teaser social media video from a couple of weeks ago garnered tremendous support. 

 

View this post on Instagram

 

A post shared by StorageReview (@storagereview)

Since that video, we’ve continued to work with Jorge to prove out models on workstations, then run them in full on the Supermicro E1.S server, filled with KIOXIA XD7P E1.S SSDs. The density of the platform and the ability to keep so many drives close to the CPU makes this a great platform for this kind of mathematics research. 

We’re proud to present Jorge’s findings here. – Brian Beeler

Quality control through stress testing represents a healthy policy to determine the actual capacity of a given hardware facility. Number crunching is a widespread practice for implementing these policies. In particular, when evaluating your system’s actual performance with professional quality, multiple tools are available to deliver accurate results.

One, developed in Austria and a favorite of mine, is Matthias Zronek’s BenchMate, which contains several demanding stress tests for SSDs, Memory, GPUs, and CPUs. Other high-performance computing packages, such as Zronek’s GPUPI, calculate the constant π using the GPU. The calculation of decimals places of π is a standard and common test that has often appeared in the specialized scientific press and some mathematics-specialized social networks when the record of known decimals has been broken. Today, 100 trillion decimal places (10^14) of π are known.

In BenchMate, there is also Alex Yee’s y-cruncher, a platform that allows CPU-Multicore computing of many mathematical constants, including π, with extremely high precision, delivering a huge number of decimal places limited only by the system’s capacity. In this case, several strategies can be followed, such as choosing a constant and stressing the system to calculate a sufficiently large number of decimal places by recording the elapsed time it takes. This provides an excellent measure to rank and compare the performance of different isolated setups.

Also, if there is enough installed capacity, a system setup can be run to break the known number of decimal places of a given constant, a challenge taken on at StorageReview with great success. As reported in the December 2023 note,  I worked with the StorageReview team to improve the known number of decimal places for several mathematical constants. Not only did we manage to beat these records, but we did it in the shortest possible time. It has been historically normal to take weeks or months to do it. Working with StorageReview, we achieved this for all constants in just hours and, in some cases, by doubling the number of known decimals. The technology available at StorageReview has enabled time reductions by orders of magnitude. A summary of these results and their details can be found here.

Record results set by y-cruncher. For a complete list of records set by y-cruncher, visit numberworld.

This note refers precisely to these results. To achieve a decimal places record, you must act in three layers.

Starting from the bottom, the third, or final layer, is the hardware setup; this means a robust SSD installation, high-performance RAM capacity, and state-of-the-art multicore CPU. Many of these details and the setup applied can be seen online here.

The second, or intermediate layer, comprises the software, i.e., y-cruncher, constituting the link between the initial and final layers. To increase its performance, y-cruncher maintains different atom executables depending on the type of multicore CPU in the system. The right executing atom is automatically chosen at the time of running. The bench tester enables the user to select a constant and apply the corresponding algorithm. This algorithm may either be integrated directly into the system or, if not, it utilizes custom configuration files for implementation. Details of y-cruncher implementation and use can be viewed on numberworld. y-cruncher software is constantly in development, surfing on the wave of new hardware technologies.

The first, or initial layer, is the algorithm, or rather the mathematical formula implemented to calculate the constant that serves as input for y-cruncher. Each constant can be represented by an infinite number of formulas, almost all of which have poor performance, in the sense that many mathematical operations are only capable of delivering few correct digits.

On the other side is the efficient formulas category. Of those, the hypergeometric series stands out. Within that series are a handful that are suitable for breaking records, being able to provide many decimals in a relatively short timeframe. In fact, the primary formula for calculating π, known as the Chudnovsky algorithm, is one of the hypergeometric formulas.

Working closely with the StorageReview team, we have also undertaken the challenge of searching for these formulas, that is, acting on the first layer to achieve a very efficient hypergeometric series that makes it possible to find the fastest formula known. How is this accomplished? A multicore setup is needed, hopefully with as many physical cores as possible, to apply distributed computing since the process is very CPU demanding but can be easily parallelized with a great reduction of computation times. The new formula, if successful, will be used to break the record of known decimal digit places for that constant.

We used a 64-core AMD Theadripper PRO 5995WX and prepared code scripts on PARI-GP, a University of Bordeaux (France) platform for Number Theory, and implemented a search algorithm. The objective is to identify a set of 64-bit integers that, when inserted in the parameters of a particular hypergeometric series with a known fixed structure, yield the constant we are seeking. To do this, the LLL algorithm — an internal part of PARI GP — is used, which looks for integer linear relationships between multiple precision floating-point values. More mathematical details can be found on mathoverflow.net.

We started first with the constant ζ(5) = 1.036927755143… since it doesn’t have a very efficient known formula, but it turned out to be very elusive. We couldn’t find a known formula beyond the unique hypergeometric series, which, by the way, isn’t fast enough. After a couple of unsuccessful weeks, we switched to a hypergeometric series for logarithms.

In this case, we had success, we were able to find the fastest known algorithms for the fundamental constants log(2), log(3), and log(5) as described in this blog.

Log(2) Formula Searching. All 64 physical cores are Busy.


Screen Capture. Fastest Log(2) series found.


Log(2) StorageReview G2 Formula Found

After uncovering the formulas, we prepared the scripts for y-cruncher. Storagereview’s Jordan Ranous designed the installation to beat the number of known decimal places. In this case, a setup based on 2x Intel Xeon Platinum 8460H and 512 GB of SK Hynix RAM.

This setup doubled the decimal places record for log(2) to 3 trillion digits in just 42.7 hours. A second new algorithm — the G2 formula below, also found in the StorageReview Lab — was applied to verify the decimals, which took 58.3 hours of wall time. It should be noted that the previous record of 2021 took 98.9 and 61.7 days for the calculation and verification of 1.5 trillion decimal digits, respectively. This means an efficiency improvement of about 50x in 3 years.

We have been able to perform all the required steps to produce useful results. A new formula for a mathematical constant, log(2), was discovered using a specially tailored installation. This formula was then applied to a custom-made setup to break the record for the number of decimal places known for this particular constant.

StorageReview facilities have made it possible to cover the complete process of preparing bench tests based on numerical calculations that put a system under great stress. These experiments successfully tested SSD, RAM, and CPU to the extreme.

– Jorge Zuniga

Engage with StorageReview 

Newsletter | YouTube | Podcast iTunes/Spotify | Instagram | Twitter | TikTok | RSS Feed