Member-only story
Data Cleanup Best Practices: Bronze, Silver, Gold Standards
Medallion Lakehouse Architecture: The medallion architecture describes a series of data layers that denote the quality of data stored in the lakehouse. Databricks recommends taking a multi-layered approach to building a single source of truth for enterprise data products.
The foundation of sound decision-making and successful analytical endeavors is data quality. Enterprises, regardless of their size or industry, are depending more and more on enormous amounts of unprocessed data from many sources. However, a large portion of this wealth of information remains underutilized due to irregular structures, duplicates, insufficient details, and inaccurate documentation. Let me introduce data cleansing, the hidden hero responsible for rescuing subpar datasets and getting them ready for the big stage. Taken from the grading of precious metals, your datasets should be purified in three stages: bronze, silver, and gold.
Bronze Stage: Basic Preparation 🥉
Start data cleanup at the bronze level by focusing on the easiest to achieve goals. Standardize date/time protocols, remove unnecessary columns, and standardize…