

Within each layer there will be a folder structure, which is designed based upon reasons such as subject matter, security, or performance (i.e. Also called exploration layer, development layer or data science workspace Sandbox data layer – Optional layer to be used to “play” in, usually for data scientists.Also called application, workspace, trusted, gold, secure, production ready, governed, curated, or consumption layer The data is joined and/or aggregated, and can be stored in de-normalized data marts or star schemas. data warehouse application, advanced analysis process, etc). Presentation data layer – Business logic is applied to the cleansed data to produce data ready to be consumed by applications (i.e.Also called silver, transformed, integrated, or enriched layer The aim is to uniform the way files are stored in terms of encoding, format, data types and content (i.e. It removes impurities and can also involve enrichment. Think of the cleansed layer as a filtration layer. Cleansed data layer – Raw events are transformed (cleaned and mastered) into directly consumable data sets.Sometimes there is a separate conformed layer (also called base layer or standardized layer) that is used after the raw layer to make all the file types the same, usually parquet. Also called bronze layer, staging layer or landing area. A typical example is if you need to rerun an ETL job because of a bug, you can get the data from the raw layer instead of going back to the source.

Advantages are auditability, discovery, and recovery. Think of the raw layer as a reservoir that stores data in its natural and original state.
#Data lake architecture update
I touched on this in my blog Data lake details, but that was written a long time ago so I wanted to update it.
#Data lake architecture how to
I have had a lot of conversations with customers to help them understand how to design a data lake.
