Member-only story

Data Cleanup Best Practices: Bronze, Silver, Gold Standards

Manan Mehta
3 min readMar 7, 2024

--

Medallion Lakehouse Architecture: The medallion architecture describes a series of data layers that denote the quality of data stored in the lakehouse. Databricks recommends taking a multi-layered approach to building a single source of truth for enterprise data products.

symbolizing the tiered cleanup process.

The foundation of sound decision-making and successful analytical endeavors is data quality. Enterprises, regardless of their size or industry, are depending more and more on enormous amounts of unprocessed data from many sources. However, a large portion of this wealth of information remains underutilized due to irregular structures, duplicates, insufficient details, and inaccurate documentation. Let me introduce data cleansing, the hidden hero responsible for rescuing subpar datasets and getting them ready for the big stage. Taken from the grading of precious metals, your datasets should be purified in three stages: bronze, silver, and gold.

Bronze Stage: Basic Preparation 🥉

Unfiltered data coming from various sources

Start data cleanup at the bronze level by focusing on the easiest to achieve goals. Standardize date/time protocols, remove unnecessary columns, and standardize…

--

--

Manan Mehta
Manan Mehta

Written by Manan Mehta

📚Join me on this exciting learning path, where we can exchange knowledge and grow together. Cloud Solution Architect | SaaS Consultant | Exploring AI

Responses (1)