Documentation

Documentation

  • Home
  • Blog
  • API
  • Contact

›User Guide

Overview

  • Lentiq introduction
  • Lentiq architecture
  • What is a data pool?
  • What is a project?
  • Migrating from Hadoop

Getting started

  • Deploying applications and processing clusters
  • Connecting to Spark from a notebook
  • Uploading data to Lentiq
  • Creating a data pool
  • Deploying on GCP
  • Deploying on AWS

User Guide

    Managing applications

    • Working with applications
    • Managing compute resources

    Managing data

    • Working with data and metadata
    • Sharing data between data pools
    • Querying data with SQL (DataGrip)
    • Connecting Tableau to Lentiq

    Managing models

    • Working with models
    • Publishing notebooks
    • Training and serializing a model
    • Managing model servers

    Managing workflows

    • Working with workflows
    • Creating a reusable code block from a notebook
    • Creating a docker image based reusable code block
  • Glossary
  • API

Tutorials

  • End-to-end Machine Learning Tutorial

Working with workflows

The Lentiq Workflow engine is designed to perform periodic tasks such model training, data processing, data cleaning, ETL etc. Workflows are graphs of instances of Reusable Code Blocks that can be scheduled to be executed at a certain date and repeat with a certain periodicity.

a lentiq workflow

Reusable Code Blocks

You can think about a Reusable Code Block as a template for creating Tasks. Behind a Reusable Code Block there is a Docker container that can be either automatically generated by Lentiq by packaging a Notebook and it's dependencies or a user's custom Docker image.

Users typically package frequent tasks such as cleaning data, anonymizing data, training models etc.

Reusable code blocks are shared with the entire data lake and are stored in Lentiq's global registry. This means that any user of any data pool within the same data lake can leverage this Reusable Code Block once it has been created to perform similar tasks.

  1. Creating a reusable code block from a notebook

  2. Creating a custom docker image based Code Block

Tasks

Workflows are built with Tasks. Each task is an instance of a Reusable Code Block that is, the input variables of the Reusable Code Block are configured to match the environment and requirements of the user. Several more environment variables are passed to the Task at run time that include for instance the API key for accessing the data pool which is interpreted by Lentiq's file access layer.

Tasks can also have as dependencies other tasks. A task will wait for the other task to be executed before starting.

For instance task test24-2 has as input dependency task test24-1:

task with dependency

Tasks can run for any amount of time, there is no timeout.

Resource management

Tasks have associated an amount CPU and RAM that they will ask the service to deliver when they need to execute. If the current project does not have enough resources available the Task will stay in a "PENDING" state until enough resources are freed up. Make sure that the project has enough resources configured to cover the execution of the workflow or create a separate project for the workflows.

Scheduler

Every 30 seconds the Scheduler checks to see if there are available workflows and will start the execution immediately if the date is set in the past. The scheduler will not execute a new run of a workflow if there is another one pending or running.

← Managing model serversCreating a reusable code block from a notebook →
  • Reusable Code Blocks
  • Tasks
  • Resource management
  • Scheduler
Copyright © 2019 Lentiq