Categories: EnterpriseSoftware

Google Releases Cloud Dataproc In Beta

Google has announced yet another new cloud technology within its Cloud Platform line, Google Cloud Dataproc. This new cloud technology is aimed at making Hadoop and Spark easier to deploy and manage within Google Cloud Platform. Much like the recent announcement from Dell and Cloudera, this technology allows the use of Hadoop without the high costs of training involved.


Google has announced yet another new cloud technology within its Cloud Platform line, Google Cloud Dataproc. This new cloud technology is aimed at making Hadoop and Spark easier to deploy and manage within Google Cloud Platform. Much like the recent announcement from Dell and Cloudera, this technology allows the use of Hadoop without the high costs of training involved.

As datasets continue to grow in size and complexity more powerful tools will be needed to analyze these datasets. While the tools exist they often add another layer of complexity and can be costly to train administrators on new technologies or bring in consultants. Google is introducing Dataproc, an automatic and managed service for Hadoop and Spark. With Dataproc users can take advantage of open source data tools for batch processing, querying, streaming, and machine learning while using its automation to quickly create and manage clusters. Dataproc also allows clusters to be turned off when not in use helping save costs as billing is minute-by-minute.

Benefits include:

  • Cloud Dataproc is priced at only 1 cent per virtual CPU in a customer’s cluster per hour, on top of the other Cloud Platform resources used. Cloud Dataproc clusters can include preemptible instances that have lower compute prices, reducing costs further. Instead of rounding usage up to the nearest hour, Cloud Dataproc charges customers only for what is used with minute-by-minute billing and a ten-minute-minimum billing period.
  • Without using Dataproc, it can take anywhere from 5 to 30 minutes to create Spark and Hadoop clusters on-premises or through IaaS providers. By comparison, Cloud Dataproc clusters are quick to start, scale, and shutdown with each of these operations taking 90 seconds or less, on average. This means users can spend less time waiting for clusters and more hands-on time working with their data.
  • Cloud Dataproc has built-in integration with other Google Cloud Platform services, such as BigQuery, Cloud Storage, Cloud Bigtable, Cloud Logging, and Cloud Monitoring, so customers have more than just a Spark or Hadoop cluster—they have a complete data platform. For example, they can use Cloud Dataproc to effortlessly ETL terabytes of raw log data directly into BigQuery for business reporting.
  • Customers can easily interact with clusters and Spark or Hadoop jobs through the Google Developers Console, the Google Cloud SDK, or the Cloud Dataproc REST API. When they're done with a cluster, they can simply turn it off so money isn’t wasted on an idle cluster. There is no worry about losing data, because Cloud Dataproc is integrated with Cloud Storage, BigQuery, and Cloud Bigtable.
  • There is no need to learn new tools or APIs to use Cloud Dataproc, making it easy to move existing projects into Cloud Dataproc without redevelopment. Spark, Hadoop, Pig, and Hive are frequently updated, so users can be productive faster.

Availability and pricing

Google Cloud Dataproc is available now as a beta service as starts at $0.01 per virtual CPU.

Google Cloud Dataproc

Discuss this story

Sign up for the StorageReview newsletter

Adam Armstrong

Adam is the chief news editor for StorageReview.com, managing our internal and freelance content teams.

Recent Posts

Dell Advances Data Protection Portfolio Amid Rising Cyber Threats

Dell Technologies is advancing its data protection portfolio to enhance cyber resiliency across appliances, software, and as-a-service offerings amid rising…

1 day ago

HPE Cray Storage Systems C500 Lowers Storage Costs For Entry-level Snd Midrange HPC/AI Clusters

Since its launch in 2019, the Cray ClusterStor E1000 Storage System has emerged as a pivotal technology in the field…

1 day ago

Quantum Introduces Quantum GO Subscription Service For Data Management

Quantum Corporation has introduced Quantum GO, a subscription service designed to meet the escalating data demands and cost considerations enterprises…

2 days ago

JetCool Unveils Cold Plates for the NVIDIA H100 GPU

JetCool has launched an innovative liquid cooling module tailored for NVIDIA's H100 SXM and PCIe GPUs, claiming a significant advancement…

4 days ago

iXsystems Expands TrueNAS Enterprise with H-Series Platforms

iXsystems has launched the TrueNAS Enterprise H-Series platforms, designed to give organizations ultimate performance. The H10 model is now available,…

1 week ago

Microsoft Azure Edge Infrastructure At Hannover Messe 2024

Hannover Messe 2024 represents a significant event in the global industrial sector, serving as the world's largest industrial trade fair.…

1 week ago