DataFusion Virtual Database Memory Performance Benchmark

DataFusion is an enterprise in-memory virtual database or data-warehouse that has the flexibility and scalability to handle the most challenging data requirements. Its extended SQL92 instruction set enables direct aggregation of disparate source data from existing locations. This allows organisations to leverage existing assets and deliver new requirements without the need for large and expensive data warehouse projects. With high levels of data compression and advanced execution strategies, DataFusion offers a hardware light route to exceptional query performance. The seamless integration of industry standard and custom analytics gives organisations the ability to rapidly assess the most demanding and dynamic situations. Fully featured for high performance, rapid deployment and high availability, DataFusion is a valuable asset for any “agile organisation”.

Large data intensive organisations in all industries including financial services, the public sector and volume distributors can benefit from DataFusion’s unique architecture which will assist any company faced with a “Big Data” challenge. Example solutions include real-time decision support, risk management, regulatory reporting and supply chain optimisation requiring data aggregation across multiple jurisdictions or legal entities.

DataFusion Benchmark

  • The DataFusion benchmark has been developed by Stream Financial to evaluate hardware configurations and is used as part of a suite of performance tests employed during DataFusion releases.
  • Whilst DataFusion can query across multiple architectures, this benchmark is a straight-forward “where and group-by” on a 1Tn row by 13 column table which is performed “brute-force” with no indexing.
  • The trillion row data table is pre-stored on disk(s) in DataFusion’s native format.
  • To achieve the maximum performance from multi-channel or even mixed storage, DataFusion can stripe large tables across multiple mounts and round-robin across these mounts with different IO multipliers to make best use of all available storage performance.
  • DataFusion can spread the execution of these large queries across multiple CPU’s and machines, whilst optimising usage of both IO and memory.
  • To achieve the maximum benchmark performance, DataFusion has the flexibility to add either storage performance or CPU independently.  This ensures that both servers and storage are being used to their full capacity.