Let’s talk data

Read about our company, how we work and the latest developments in the industry.

← Back to articles

Data Lakes and Their Current State

In a world where oil has been replaced by data as the most valuable resource, terms like big data, data warehouse and data lake are in the spotlight. Most people don’t know that data lakes provide not only storage capabilities, but also compute power, offering an environment to perform analytics, big data and machine learning projects.

This article defines data lakes and the need for them, explains the downhill of traditional data lakes, and introduces an innovative solution: the data lake powered by interconnected data pools.

A data lake is a storage repository containing massive amounts of data in their raw format. Data lakes appeared out of the necessity to gather the various and vast amount of data that was scattered around different departments. It seemed like an all-solving problem – having no pre-defined structure made data lakes flexible and more versatile.

As more and more companies started using data lakes, however, the functionality of data lakes started developing issues and even hindering digital transformation. Data swamps became a thing of the present, merely gathering data with no goal in mind, or encountering hindrance in real-time analytics due to governance limitations. Companies are beginning to realize that the time and effort spent on building vast data lakes just for them to result into data swamps due to poor data governance and management is counterproductive.

Today’s enterprise data lakes are overly rigid, expensive and brittle and end up hindering innovation instead of fostering it. Although increasingly more companies are using data lakes as advanced analytics platforms to power digital transformation initiatives, only 8 out of 100 pilots achieve the intended goals* because data lakes are:

Over-centralized - All data projects must use the same technologies, schema model regardless of their organizational impact.

Over-generalized - Current data lakes are built for the entire enterprise. This rigidity makes it harder to choose the right tools for a specific problem.

Complex - For all possible use cases, you will have Hadoop, key value stores, advanced data management and data lineage systems.

Expensive - The data lakes implementation takes months, the project TCO is high and the team is complex (devops, big data engineers, analysts).

Current data lake implementations slow down your data teams and make them inefficient. In an ideal world, you should allow your data teams to build use case-specific projects where they have the freedom to choose their technology stack, cloud provider, region, and data that needs to be used. As a result, your data team becomes more agile in choosing the right tools for the job and will be able to adapt to individual needs.

Today, there are many barriers to becoming a data-driven company that aims to implement a digital transformation strategy. It’s not easy to properly build and use a data lake, is it?

This is how Lentiq’s EdgeLake came into being. Our goal is to allow as many teams as possible to access data and have a friendly environment for analytics and machine learning projects. We strongly believe transformative innovation can only be achieved through a human-centric machine learning approach for all data projects that are being developed in an organization.

If you are curious about our thorough product vision, please read Lentiq EdgeLake – The Freedom to Innovate.

* McKinsey AI Index, May 2018

Twitter Facebook LinkedIn