Documentation

Documentation

  • Home
  • Blog
  • API
  • Contact

›Overview

Overview

  • Lentiq introduction
  • Lentiq architecture
  • What is a data pool?
  • What is a project?
  • Migrating from Hadoop

Getting started

  • Deploying applications and processing clusters
  • Connecting to Spark from a notebook
  • Uploading data to Lentiq
  • Creating a data pool
  • Deploying on GCP
  • Deploying on AWS

User Guide

    Managing applications

    • Working with applications
    • Managing compute resources

    Managing data

    • Working with data and metadata
    • Sharing data between data pools
    • Querying data with SQL (DataGrip)
    • Connecting Tableau to Lentiq

    Managing models

    • Working with models
    • Publishing notebooks
    • Training and serializing a model
    • Managing model servers

    Managing workflows

    • Working with workflows
    • Creating a reusable code block from a notebook
    • Creating a docker image based reusable code block
  • Glossary
  • API

Tutorials

  • End-to-end Machine Learning Tutorial

Lentiq introduction

Lentiq is a lightweight end-to-end, data science, ML and analytics environment, offered as a service. It is a cloud-native service based on Kubernetes. As such it provides many of the elements that were traditionally offered by a Hadoop distribution:

  1. Application stacks management: Notebooks, Apache Spark cluster management, application provisioning etc.
  2. Data management: data browsing, data documentation etc.
  3. Model lifetime management: workflows, automatic docker image build etc.
  4. Team and enterprise-level collaboration facilities: Common table metadata, cross-cloud data sharing, code sharing etc.
  5. A portability layer enabling code and data to move between clouds without any adaptations.
  6. Large scale SQL access through our JDBC connector and SparkSQL engine, enabling interactive data access to external BI tools.

Lentiq's architecture parts completely from Hadoop as it leverages Docker containers on a Kubernetes cluster instead of YARN and Object storage instead of HDFS.

Comparison with Hadoop

Many elements from the Hadoop stack have an equivalent in Lentiq. The Migrating from Hadoop guide will help you get your bearings faster.

Other elements have no equivalent service in hadoop per se but might be found as serverless services in some cloud providers such as the SageMaker service.

The data pool

A fundamental concept in the Lentiq product is the data pool which is a common pool of compute resources (a Kubernetes cluster) and data stored in an object storage plus the associated security context. A data pool is designed to serve a single, multi-disciplinary team working on one or more projects in parallel.

lentiq_high_level_architecture

The data pool graph

Multiple data pools form a graph which serves more or less the same function as an enterprise-wide data lake. There is no "master" data pool, each pool has ownership on both data and code.

lentiq_high_level_architecture

The data pools share a common "data store" which allows users to discover data, notebooks or reusable code blocks present in other data pools and managed by another team. Users can then either access the data directly or create a local copy of the data. This design pattern is sometimes called a "logical data lake".

Serverless

Lentiq is not a serverless service under the strict sense. You still provision servers behind the scenes (manually or automatically). However both, along with object storage, load balancers, firewalls etc are abstracted away from the user and offered as a managed service.

This mechanism offers cloud portability as Kubernetes service will be the common denominator across all clouds and on-premises. It also offers more control over resource allocation and ultimately over costs.

Lentiq architecture →
  • Comparison with Hadoop
  • The data pool
  • The data pool graph
  • Serverless
Copyright © 2019 Lentiq