Documentation

Documentation

  • Home
  • Blog
  • API
  • Contact

›Getting started

Overview

  • Lentiq introduction
  • Lentiq architecture
  • What is a data pool?
  • What is a project?
  • Migrating from Hadoop

Getting started

  • Deploying applications and processing clusters
  • Connecting to Spark from a notebook
  • Uploading data to Lentiq
  • Creating a data pool
  • Deploying on GCP
  • Deploying on AWS

User Guide

    Managing applications

    • Working with applications
    • Managing compute resources

    Managing data

    • Working with data and metadata
    • Sharing data between data pools
    • Querying data with SQL (DataGrip)
    • Connecting Tableau to Lentiq

    Managing models

    • Working with models
    • Publishing notebooks
    • Training and serializing a model
    • Managing model servers

    Managing workflows

    • Working with workflows
    • Creating a reusable code block from a notebook
    • Creating a docker image based reusable code block
  • Glossary
  • API

Tutorials

  • End-to-end Machine Learning Tutorial

Creating a data pool

Lentiq acts as a management layer on top of the cloud providers you choose to use for your data lake. If you haven't set up your cloud services provider account already, check out the Deploying Lentiq on GCP or Deploying Lentiq on AWS guides for additional information on the prerequisite steps that have to be taken before starting your first Lentiq data pool.

Creating a data pool

A data pool is part of a data lake and consists of a Kubernetes cluster that can host one or more projects.

The first data pool and project are easily created using our wizard:

  1. Type in a name for your data pool.
  2. Select the data lake that the data pool will be created on from the dropdown menu.
  3. Click Next step. data pool configuration

Configuring your project

  1. Type in a relevant name for the project
  2. Modify the project owner email address if you are creating the project on behalf of someone else.
  3. Click Next step. cloud provider credentials configuration

Configuring the project applications

Multiple application clusters can be set up from the get go. The default application running on each data pool is a Spark cluster.

  1. Type in a relevant name for the application.
  2. Configure the number of workers that will be created for the application and the number of cores and amount of RAM used by each worker.

Some clusters use multiple applications that can be configured independently.

  1. Click Next step. cloud provider credentials configuration

Configuring your cloud provider credentials

We are currently supporting Amazon Web Services and Google Cloud Platform.

  1. Select your cloud provider from the dropdown menu.
  2. Select the zone where the cloud resources will be provisioned.
  3. Select the cloud provider's credentials that will be used by projects created on the data pool.

cloud provider credentials configuration

If you don't already have credentials set up then click Create credential.

  1. Type in a name for the credentials to serve as identifier for further use.
  2. Paste in the access key ID from your cloud account.
  3. Paste in the entire secret access key from your cloud account.

cloud provider credentials configuration

If you don't have a secret access key set up you can create a new one from the cloud services provider dashboard.

  1. Click Next step.

Configuring the hardware resources

The amount of resources required by the project varies depending on the application configuration.

  1. Select the type of instance to be used by the underlying infrastructure.

If your desired instance type is not available and you do not want to select another type, wait for a few minutes and try again.

  1. Increase the number of instances until the required amount of hardware resources is met.
  2. Click Next step. cloud provider credentials configuration

Configuring project connectivity

  1. Add specific firewall rules for all the IP addresses (or ranges) that will need to connect to the data pool. cloud provider credentials configuration

  2. Click Next step.

Provisioning the data pool

Once you are done with all the configuration steps click Provision now to have the hardware deployed and all the applications installed and configured.

cloud provider credentials configuration

The provisioning step may take a few minutes to finish, depending on the complexity of the configuration.

Once the provisioning step is completed, you will be redirected to the project dashboard.

← Uploading data to LentiqDeploying on GCP →
  • Creating a data pool
  • Configuring your project
  • Configuring the project applications
  • Configuring your cloud provider credentials
  • Configuring the hardware resources
  • Configuring project connectivity
  • Provisioning the data pool
Copyright © 2019 Lentiq