- Application stacks management: Notebooks, Apache Spark cluster management, application provisioning etc.
- Data management: data browsing, data documentation etc.
- Model lifetime management: workflows, automatic docker image build etc.
- Team and enterprise-level collaboration facilities: Common table metadata, cross-cloud data sharing, code sharing etc.
- A portability layer enabling code and data to move between clouds without any adaptations.
The data pool
A fundamental concept in the Lentiq product is the data pool which is a common pool of compute resources (a Kubernetes cluster) and data stored in an object storage plus the associated security context. A data pool is designed to serve a single, multi-disciplinary team working on one or more projects in parallel.
The data pool graph
Multiple data pools form a graph which serves more or less the same function as an enterprise-wide data lake. There is no "master" data pool, each pool has ownership on both data and code.
The data pools share a common "data store" which allows users to discover data, notebooks or reusable code blocks present in other data pools and managed by another team. Users can then either access the data directly or create a local copy of the data. This design pattern is sometimes called a "logical data lake".