Lentiq is a lightweight end-to-end, data science, ML and analytics environment, offered as a service. It is a cloud-native service based on Kubernetes. As such it provides many of the elements that were traditionally offered by a Hadoop distribution:
- Application stacks management: Notebooks, Apache Spark cluster management, application provisioning etc.
- Data management: data browsing, data documentation etc.
- Model lifetime management: workflows, automatic docker image build etc.
- Team and enterprise-level collaboration facilities: Common table metadata, cross-cloud data sharing, code sharing etc.
- A portability layer enabling code and data to move between clouds without any adaptations.
- Large scale SQL access through our JDBC connector and SparkSQL engine, enabling interactive data access to external BI tools.
Lentiq's architecture parts completely from Hadoop as it leverages Docker containers on a Kubernetes cluster instead of YARN and Object storage instead of HDFS.
Comparison with Hadoop
Many elements from the Hadoop stack have an equivalent in Lentiq. The Migrating from Hadoop guide will help you get your bearings faster.
Other elements have no equivalent service in hadoop per se but might be found as serverless services in some cloud providers such as the SageMaker service.
The data pool
A fundamental concept in the Lentiq product is the data pool which is a common pool of compute resources (a Kubernetes cluster) and data stored in an object storage plus the associated security context. A data pool is designed to serve a single, multi-disciplinary team working on one or more projects in parallel.
The data pool graph
Multiple data pools form a graph which serves more or less the same function as an enterprise-wide data lake. There is no "master" data pool, each pool has ownership on both data and code.
The data pools share a common "data store" which allows users to discover data, notebooks or reusable code blocks present in other data pools and managed by another team. Users can then either access the data directly or create a local copy of the data. This design pattern is sometimes called a "logical data lake".
Lentiq is not a serverless service under the strict sense. You still provision servers behind the scenes (manually or automatically). However both, along with object storage, load balancers, firewalls etc are abstracted away from the user and offered as a managed service.
This mechanism offers cloud portability as Kubernetes service will be the common denominator across all clouds and on-premises. It also offers more control over resource allocation and ultimately over costs.