Working with models on Lentiq
Working with prediction models in Lentiq follows the same develop-train-serve design pattern. Moving through the stages requires different tools from Lentiq but generally a food understanding of working with models in a notebook will be enough.
Lentiq provides tools and mechanisms for handling all these stages of the model lifetime:
- Jupyter and Spark, and our Data Store for feature engineering and data discovery
- Workflows and the LambdaBook feature which allow a data scientist or data engineer to convert notebooks into Docker containers and run them as part of scheduled workflows.
- The Lentiq Model server which is a scalable, resilient, low latency model server based on MLeap.
Model serving is a mechanism for exposing a prediction (a.k.a inference) API to applications inside or outside of the data lake.
Model development
Typically model development happens in a notebook on offline data, typically a sample of a production data. During this phase a number of algorithms are tested and various features are "engineered" to improve a model's prediction accuracy.
In this phase the data scientist would try Logistic Regression, Random Forest, Neural Networks etc. After selecting an algorithm a further hyper-parameter optimization takes place to further tune the algorithm.
A notebook must first be "published" before it can be converted into a Reusable Code Block.
Model training
This training typically happens on production data, typically renewed as new data flows into the data pool. This is where our LambdaBook feature (our ability to convert notebooks into Reusable Code Blocks) comes in handy.
- Working with Workflows
- Creating a Reusable code block based on a Notebook (LambdaBook)
- Creating a Reusable code block based on a Docker Image
Model serving
Our Lentiq Model Server will "serve" models which means to expose an "inference" API.