GROW YOUR STARTUP IN EUROPE

New technology boosts data compression for analytics and AI 

- September 24, 2024

90% percent of the world’s data was created in the last two years. And every two years, the volume of data across the world doubles in size.

Information technology is a cornerstone of our society today. What many don’t always recognize, is that the efficiency of these solutions rely on the speed of processing and transmitting data. 

As an example, retailers leaning on analytics to improve in-store performance must gather, analyze and streamline incredibly large amounts of data, from records to image transmission, in order to ultimately deliver results.

At the same time, because data transfer can consume so much time, there is a core need for effective data compression.

SQream, a data acceleration platform, recently participated in the TPC Express Big Bench tests from the nonprofit Transaction Processing Performance Council, in order to test the performance of its data lakehouse solution SQream Blue.

During the test, SQream handled 30TB of data 3X faster than Databricks’ Spark-based Photon SQL engine, at 1/3 of the price. SQream Blue’s total runtime was 2462.6 seconds, with the total cost for processing the data end-to-end being $26.94. Databricks’ total runtime was 8332.4 seconds, at a cost of $76.94. 

The performance of SQream Blue, according to the company, is equivalent to reading every cataloged book in the US Library of Congress in under an hour, and then buying them all for less than $25. 

Said Matan Libis, VP Product at SQream, “In cloud analytics, cost performance is the only factor that matters. SQream Blue’s proprietary complex engineering algorithms offer unparalleled capabilities, making it the top choice for heavy workloads when analyzing structured data.”

SQream’s Matan Libis (Photo Credit: SQream)

The results of the test broke existing benchmarks on several fronts, providing a glimpse into a new era of high-performance big data tools.

The challenges of data compression will not be going away anytime soon. 90% percent of the world’s data was created in the last two years. Additionally, every two years, the volume of data across the world doubles in size.

To test the capabilities of the solutions the benchmark analysis was run on Amazon Web Services (AWS) with a dataset of 30 TB. Generated data was stored as Apache Parquet files on Amazon Simple Storage Service (Amazon  S3), and the queries were processed without pre-loading into a database.

According to a statement from the company, the solution’s architecture contributes to its streamlined efficiency.

The technology directly accesses data in open-standard formats at the customer’s low-cost cloud storage to maintain privacy and ownership, preserving a single source and eliminating the need for data duplication.