Let’s talk data

Read about our company, how we work and the latest developments in the industry.

← Back to articles

How to use Lentiq EdgeLake with AWS

Lentiq EdgeLake is a new data science platform that was specifically designed to work in multiple cloud environments. The deploy sequence is fast and easy regardless of the user's cloud provider. In order to better illustrate this, we made a quick 5-minute tutorial using Amazon Web Services account and set up a data pool on it. See below the detailed how-to article.

In order to use Lentiq EdgeLake, you need an active Amazon Web Services account.
You can quickly create an account if you don’t already have one. A valid bank card is required during the registration process, but you will only be billed for the resources that you actually use.

1.     Account Prerequisites

After you have logged in into your AWS account, start by creating a new user with programmatic access that will allow Lentiq to provision all services needed for EdgeLake into your AWS account.

1.    Go to the User Administration menu and click on ‘My Security Credentials’

2.    Navigate to ‘Users’ from the left-hand side menu and click ‘Add user’.

3.    Complete the AWS add user wizard and make sure you enable Programmatic access:

4.    Set permissions for the newly created user and attach existing policies directly that allow AdministratorAccess.

These permissions are needed because Lentiq provisions a Kubernetes Cluster for each of your data pools and an object storage bucket for each of your projects behind the scenes. These are fully managed and configured by the application. We recommend users accessing these resources from the Lentiq Management interface only.

 

 

5.    Create user and make sure you save the Access Key ID and Secret access key for this particular user. Download the csv for future use.

2.    Link Lentiq EdgeLake to your AWS account

In order to create a Lentiq EdgeLake data pool, select Amazon Web Services as the Cloud Provider, choose a region for the server, enter the Access Key ID and the Secret access key.

In the next steps, you will configure the initial Spark cluster and choose the underlying hardware infrastructure.

While a c4.2xlarge machine provides enough resources to deploy Lentiq EdgeLake, the initial Spark cluster and internal processes use 6 CPU cores, so you should provision a larger server if you plan to use additional applications.
However, keep in mind that the underlying server remains active and you are billed for it even if no Lentiq EdgeLake applications are running until you delete the data pool, so don’t deploy unnecessary resources that will generate higher costs.

3.    Deploy your first Lentiq EdgeLake Data Pool

Setup firewall rules to allow connectivity from your IP addresses, then click on Provision now. After a few minutes, the deploy process is complete.


Welcome to your Lentiq EdgeLake Data Pool!

Twitter Facebook LinkedIn

Readers also enjoyed:

Data Lakes and Their Current State

In a world where oil has been replaced by data as the most valuable resource, terms like big data, data warehouse and data lake are in the spotlight.…

How to Build the Next Generation Data Lake: Interconnected Data Pools in a Multi-Cloud Environment

On March 5th, Lentiq will host the first webinar in a long series that focuses on the evolution of data lakes, how to build the next generation data lake…

How to use Lentiq EdgeLake with Google Cloud

Lentiq EdgeLake is a new data science platform that was specifically designed to work in multiple cloud environments. The deploy sequence is…

Lentiq EdgeLake – The Freedom to Innovate

Lentiq EdgeLake reimagines the vision of the data lake concept by moving away from a centralized, unified data repository to a fully distributed architecture.…