Deploy a broad range of analytics in the public cloud quickly and easily.
CDP Data Hub is a powerful analytics service on Cloudera Data Platform (CDP) Public Cloud that makes it easier and faster to achieve high-value analytics from the Edge to AI in a familiar cluster model in the cloud. Featuring the widest range of analytical workloads—including streaming, ETL, data marts, databases, and machine learning—CDP Data Hub lets you easily move existing workloads from on premises to the cloud or build directly in the cloud.
The comprehensive, cloud-based solution is powered by Cloudera Runtime, a suite of integrated open source technologies, and built on SDX. It offers extensive choices in cluster shapes, workload types, pre-built templates, and configuration options, delivering an intuitive, customizable experience for users who are comfortable with traditional architectures.
Data Hub use cases
Simplify your journey to cloud
Easily lift and shift on-premises Cloudera workloads to the public cloud thanks to a platform that spans both public and private clouds and provides:
- The improved performance, robust governance, and availability of public cloud
- The flexibility to optimize your workloads in both deployment models
- The benefits of a familiar form factor with a traditional cluster model facilitating your move to the cloud
- A seamless migration path to CDP’s containerized experiences
Deploy complex multi-analytic workloads quickly
Speed up the deployment of complex workloads in the public cloud across the data lifecycle with:
- A cloud-based architecture that lets you deploy a wide variety of flexible, custom analytics workloads
- An intuitive experience employed using familiar node-based clusters, whether you choose a templated approach or build your own workloads
- A high degree of customization, allowing you to deploy workloads tailor-made for your specific business requirements
Real-time data mart
Data engineering for complex pipelines
Streaming on hybrid cloud
Real-time data mart
Enable analytics on high volumes of fast-arriving data.
The Real Time Data Mart template in Data Hub lets you ingest millions of records per second, with in-place updates as needed. The data is immediately available in an optimal format for querying. This pattern is ideal for time-series applications, event analytics, CDC reconciliation, and real-time data processing pipelines. The template features the Apache Kudu analytic storage engine, Apache Impala for fast SQL execution, HUE for SQL development and analysis, and Apache Spark Streaming for stream processing/analytics.
Data Engineering for complex pipelines
Enrich, transform, and load data.
Data Hub enables you to enrich, transform, and cleanse data in order to create, execute, and manage end-to-end data pipelines with high degrees of flexibility and customization. The Data Engineering template enables you to execute a wide range of data processing workloads including batch and real-time stream processing using Apache Spark and Hive.
Streaming on hybrid cloud
Collect, process, and build real-time analytics
DataFlow for CDP Data Hub is a comprehensive edge-to-cloud streaming data platform that addresses some of the streaming data challenges across hybrid environments with Apache NiFi and Kafka. It enables users to extend the same on-premises streaming experience of Cloudera DataFlow to the cloud without taxing enormous resources to develop, configure, and maintain them.
Build highly reliable enterprise-class applications.
Data Hub allows you to run high-performance NoSQL databases with support for ANSI SQL. This provides unparalleled scale and performance for business-critical operational applications with Apache Hbase. Operational Database provides evolutionary schema support that enables developers to leverage the power of data while preserving flexibility in application design. It also provides auto-scaling based on the workload utilization of the cluster to optimize infrastructure utilization and cost.
Data Hub is for users who want flexibility, scalability, and ease of use. It allows you to rearrange worker roles, configure GPU support, adjust resource management settings, and tune clusters to implement complex, multi-function analytics use cases at scale.
Data Hub clusters can be provisioned and disposed of quickly with pre-built or custom configuration options for infrastructure. Pre-configured cluster definitions with cloud provider-specific settings and cluster templates with Cloudera Runtime service configurations allow you to quickly provision workload clusters for prescriptive use cases. You can also save your own cluster definitions and templates for future reuse.
Data Hub enables you to easily move your legacy workloads in a familiar form factor to a cloud model. The cloud-based architecture decouples data from the compute infrastructure, and the data delivery layer is abstracted from raw data. This decoupled architecture significantly improves flexibility, agility, data protection, and scale.
It’s easy to provision multiple clusters on shared data, so customers can launch new applications that can be fully isolated with the right security and governance and without interrupting existing production applications.
Data Hub is underpinned by Cloudera SDX, which allows you to secure and govern platform data and metadata and control capabilities with dedicated, integrated interfaces to manage it. Data security, governance, and control policies are set once and consistently enforced everywhere, reducing operational costs and business risks while also enabling complete infrastructure choice and flexibility.
Data Hub is built in Cloudera Runtime, the core open source software distribution within CDP that includes approximately 50 open source projects. Leveraging Runtime allows you to leverage the right set of open source tools to build your workloads and applications.
See how CDP lets companies build end-to-end data pipelines for hybrid cloud., with integrated security and governance.
Discover CDP video tour
Look under the hood of Cloudera Data Platform with a video tour showcasing how it manages and secures the data lifecycle.
Get started with a step-by-step tutorial teaching you how to create, resize, and terminate Data Hubs on Cloudera Data Platform.
Take Cloudera Essentials for CDP and learn how it enables both business teams and IT staff to be more productive by turning data into actionable insight.
Evaluate pricing, billing terms, licensing details, and hourly rates as well as estimate costs with handy calculators.
Get started on the right foot with resource planning, product configuration, and product management best practices.