Data and metadata management
Balance flexibility and control and enable data teams to work in independent projects using only the most relevant data sets for that particular project and enforce stricter governance rules when sharing curated data with the rest of the organization.
The data catalogue enables building a data-driven culture, encouraging collaboration and leveraging existing work to increase efficiency.
Annotate files, and tables, enrich data with context and metadata documentation and improve dataset explainability
Document project level files and tables
Add metadata information, attach files to describe data lineage and transformation, as well as notebooks with discovered insights.
Curate datasets in the global data catalogue
Add tags and categories to datasets to make them easily discoverable and publish them in the data catalogue. Other users can subscribe to them instantly.
Subscribe to datasets and stay updated
Once a user subscribes to a dataset, he can easily access it from any cloud and can use it for data exploration. Get notified each time the original dataset is changed to ensure your analysis is synced with the freshest data.
Embrace agility and self-service in a controlled and fully governed data lake environment
Restrict access to data
Highly sensitive data is secured in data pool projects and access to projects is managed by the project owner. Data is visible only to users with access to the project.
Encourage data exploration
Data relevant to the entire organization is available in the global data catalogue, the Data Store. It can be used instantly and can drive collaboration between different teams.
With documentation in place, multiple projects and teams can use the original source data without impacting the source or other teams when performing analysis.
Interact with data and tables in a standardized manner regardless of the underlying cloud storage
Quickly bootstrap new projects by leveraging best practices, curated code and documented datasets
Through our unique publish subscribe mechanism, users can share data and code between data pools and build an internal knowledge repository.
Teams that are just starting off a new project for a particular use case using a specific machine learning technique can use battle-tested best practices and algorithms and get productive faster.
In the end, they can publish to the rest of the organization their insights, code and data.