Lentiq applications · Documentation

Lentiq provides a list of applications or application clusters that can be deployed automatically in a project. Applications are supervised and will be restarted in case of failure. All curated applications are deployed in containers and are using the pool of resources allocated to a project through the project quota property.

Application clusters can typically be:

independently scaled horizontally and vertically, depending on each applications’ architecture
provisioned and scaled to zero at any point in time to minimize the costs when applications are not utilized
multiple instances of the same application can be used in the data pool project so that each member of your team has complete freedom
instantly started, they have a very low boot time

By default applications are easily interconnectable. One user can easily connect a Jupyter Notebook to a specific Spark cluster, perform data analysis at scale, and shut down the Spark cluster when it is not needed anymore. In addition, users can interact with our data and metadata management layer through Spark by creating tables, SQL-based representations of the files, stored in the data lake. Once a table is created in a project, it will automatically be exposed in our Table Browser, and users can add documentation around it.

On-premises or cloud based BI and visualization tools can easily be connected to data stored in the data lake through the Spark Thift Server based JDBC connector.

Jupyter

This is the de-facto notebook technology for data scientists and it is integrated with the Spark engine as well as the standard NumPy, Ray, Dash, Seaborn, Scikit-learn, SciPy, matplotlib and other tools for the Python and Scala programming languages.

Apache Spark

A pay-per-use, fully managed, large scale, in-memory data processing service capable of machine learning and graph processing that leverages Apache Spark. In each project, one can provision multiple, independent, smaller clusters for separate jobs, to simplify management in multi-tenant environments.

SparkSQL

SparkSQL is a service that allows industry standard JDBC and ODBC connectivity for business intelligence tools to data coming from a variety of sources and Spark programs. It can allow users to seamlessly connect with Tableau, Looker, QlikView, Zoomdata or PowerBI.

Kafka

Is a fast, scalable, queuing system, designed as an intermediate layer between producers and consumers of data. It can be used to bring data into Lentiq by connecting it via Spark Streaming jobs.

SFTPProxy

SFTPProxy is a service that allows you to easily import high-volume data into a project within the data lake.

PostgreSQL

Is an open source database especially designed to store structured data.

Streamsets

Is an open source data ingestion and transformation engine that streamlines data integration tasks.