Spark as a Service On Lentiq EdgeLake
From data engineer to data scientist, Lentiq EdgeLake lets you run, optimize and scale any batch, real-time, data processing or machine learning workload. Access finely tuned Spark clusters with no technical knowledge, scale with a click to obtain the best results from your analysis, faster.
Spark as a Service running on Lentiq EdgeLake is quick and easy to set up by the entire data team, offering a frictionless and fully optimized environment.
Connect notebooks to an existent Spark cluster through a one-liner and accelerate your data science, machine learning, and data processing tasks.
Connecting to a Spark cluster pre-created in the data pool.
from pyspark.sql import SparkSession spark = SparkSession.builder\ .master(”spark://22.214.171.124:7077")\ .getOrCreate()
An existing Spark cluster is required. Click on the Spark icon in the Lentiq interface or provision one. If Spark will not be used for processing a minimal resources configuration can be used. Copy the Spark master URL from the widget and use below as an argument to the master function.
By using Spark under the hood with Jupyter Notebooks, you can maintain the IDE you are loving and benefit from large scale processing when needed, as well as easily scale with the dataset.
Package your notebooks as executable code ready to be put in production, through our unique "Reusable Code Blocks" technology. This can help you automate your data science workflow.
Lentiq's Spark as a Service is seamlessly integrated with the rest of the platform. Creating tables results in having them available for data documentation in the Table Browser page or for standard BI tools through the JDBC connector.