Documentation

Documentation

  • Home
  • Blog
  • API
  • Contact

›User Guide

Overview

  • Lentiq introduction
  • Lentiq architecture
  • What is a data pool?
  • What is a project?
  • Migrating from Hadoop

Getting started

  • Deploying applications and processing clusters
  • Connecting to Spark from a notebook
  • Uploading data to Lentiq
  • Creating a data pool
  • Deploying on GCP
  • Deploying on AWS

User Guide

    Managing applications

    • Working with applications
    • Managing compute resources

    Managing data

    • Working with data and metadata
    • Sharing data between data pools
    • Querying data with SQL (DataGrip)
    • Connecting Tableau to Lentiq

    Managing models

    • Working with models
    • Publishing notebooks
    • Training and serializing a model
    • Managing model servers

    Managing workflows

    • Working with workflows
    • Creating a reusable code block from a notebook
    • Creating a docker image based reusable code block
  • Glossary
  • API

Tutorials

  • End-to-end Machine Learning Tutorial

Working with models on Lentiq

Working with prediction models in Lentiq follows the same develop-train-serve design pattern. Moving through the stages requires different tools from Lentiq but generally a food understanding of working with models in a notebook will be enough.

data store

Lentiq provides tools and mechanisms for handling all these stages of the model lifetime:

  1. Jupyter and Spark, and our Data Store for feature engineering and data discovery
  2. Workflows and the LambdaBook feature which allow a data scientist or data engineer to convert notebooks into Docker containers and run them as part of scheduled workflows.
  3. The Lentiq Model server which is a scalable, resilient, low latency model server based on MLeap.

Model serving is a mechanism for exposing a prediction (a.k.a inference) API to applications inside or outside of the data lake.

Model development

Typically model development happens in a notebook on offline data, typically a sample of a production data. During this phase a number of algorithms are tested and various features are "engineered" to improve a model's prediction accuracy.

In this phase the data scientist would try Logistic Regression, Random Forest, Neural Networks etc. After selecting an algorithm a further hyper-parameter optimization takes place to further tune the algorithm.

A notebook must first be "published" before it can be converted into a Reusable Code Block.

  1. Publishing notebooks

Model training

This training typically happens on production data, typically renewed as new data flows into the data pool. This is where our LambdaBook feature (our ability to convert notebooks into Reusable Code Blocks) comes in handy.

  1. Working with Workflows
  2. Creating a Reusable code block based on a Notebook (LambdaBook)
  3. Creating a Reusable code block based on a Docker Image

Model serving

Our Lentiq Model Server will "serve" models which means to expose an "inference" API.

  1. Training and serving models
  2. Managing model servers
← Connecting Tableau to LentiqPublishing notebooks →
  • Model development
  • Model training
  • Model serving
Copyright © 2019 Lentiq