Data and metadata management

Unify data pools through a universal data catalogue

Balance flexibility and control and enable data teams to work in independent projects using only the most relevant data sets for that particular project and enforce stricter governance rules when sharing curated data with the rest of the organization.

The data catalogue enables building a data-driven culture, encouraging collaboration and leveraging existing work to increase efficiency.

Easily understand the available data and drive usage

Annotate files, and tables, enrich data with context and metadata documentation and improve dataset explainability

Document project level files and tables

Add metadata information, attach files to describe data lineage and transformation, as well as notebooks with discovered insights.

Curate datasets in the global data catalogue

Add tags and categories to datasets to make them easily discoverable and publish them in the data catalogue. Other users can subscribe to them instantly.

Subscribe to datasets and stay updated

Once a user subscribes to a dataset, he can easily access it from any cloud and can use it for data exploration. Get notified each time the original dataset is changed to ensure your analysis is synced with the freshest data.

Centralized data governance

Embrace agility and self-service in a controlled and fully governed data lake environment

Restrict access to data

Highly sensitive data is secured in data pool projects and access to projects is managed by the project owner. Data is visible only to users with access to the project.

Encourage data exploration

Data relevant to the entire organization is available in the global data catalogue, the Data Store. It can be used instantly and can drive collaboration between different teams.

Encourage creativity

With documentation in place, multiple projects and teams can use the original source data without impacting the source or other teams when performing analysis.

Simplify cross-cloud data management with a unified interface

Interact with data and tables in a standardized manner regardless of the underlying cloud storage

Encourage collaboration and build an internal knowledge repository

Quickly bootstrap new projects by leveraging best practices, curated code and documented datasets

Through our unique publish subscribe mechanism, users can share data and code between data pools and build an internal knowledge repository.

Teams that are just starting off a new project for a particular use case using a specific machine learning technique can use battle-tested best practices and algorithms and get productive faster.

In the end, they can publish to the rest of the organization their insights, code and data.

