Chapter 2

8 Steps Towards a Modern Data Estate

Steps 5 - 8

#5 Define Your Architecture

Data needs to be extracted, processed, and refined to be useful. And just as oil can be refined into different types of fuel, data can be prepared for different uses when it comes to analytics and artificial intelligence. In this step, you describe how your organization chooses to prepare data for these different uses, from reporting to analytics and artificial intelligence. Most data estates are split into 3 distinct layers: the data lake, the data warehouse, and the data marts.

The final result is an integrated architecture that significantly reduces costs, accelerates time-to-value, and supports your data compliance needs.

Data lake

This layer is primarily for power users such as data scientists, who perform various types of analysis on raw data to look for anomalies and patterns and eventually perform machine learning. This layer enables quick ingestion of raw data from all data sources and into Azure Data Lake or a SQL Database.

Data warehouse

Raw data isn’t the best choice for business users, such as business analysts. These users need data that has been cleansed, enriched, and rationalized – in a modern data warehouse. In a layered data architecture, this data warehouse would be sourced from the data lake – but placed in a SQL-based database with semi-structured data transformed into structured data for analysis.

Data mart

The data mart supports common users by delivering relevant datasets from the data warehouse, enabling self-service analytics across multiple analytics tools for a line of business or function-specific views, so that business users can explore data safely and efficiently.

#6 Cloud, On-Premise, or Hybrid

Where in real-life the foundation is essential for the building that’s constructed, in data life the foundation of your data estate is equally important. You don’t want your data estate to end up like the leaning tower of Pisa, where your data estate becomes a costly affair to maintain. Consider the pros and cons of cloud, on-premise, and hybrid. There are great cloud solutions available on the market – such as Microsoft – but “Cloud should be thought of as a means to an end. The end must be specified first”, says David Smith, Distinguished VP Analyst and Gartner Fellow Emeritus in the article ‘The Top 10 Cloud Myths’.

#7 Selecting Your Construction Partners

In this step, you select your construction partners. Which software will you use for your data estate? Who will you build the estate? And how will you maintain the estate?

Data management and automation software

You should select the right software platform for today and the future. You will want to ensure that your data estate is built in with an integrated data management platform that is completely independent of developers, data sources, data platforms (SQL Server, Azure SQL, Data Lake, Synapse), front-end tooling (Power BI, Qlik) and deployment model (on-premise, cloud, hybrid). You should be able to expedite development with automated code generation, freeing data engineers to focus on data quality and business requirements and limit the required number and types of highly skilled resources by using a single tool to build your data lake, data warehouse, and data marts. Last, but not least, you will want to ensure your data estate is ‘future-proof’ meaning it is fully scalable and ready to adopt future releases without rebuilding.

Deployment and maintenance partner

Will you deploy and maintain the data estate yourself? Will you consider a deployment partner and then take on the maintenance yourself? The latter is what many organizations opt for – as they want to ultimately be able to take the control of their data into their own hands, and not have to depend on a business partner. However, whatever you decide – consider a partner with experience and a partner you can trust, as they will be deploying a future-proof foundation for your most valuable asset: data.

#8 Think big, Start Small, and Act Agile

Have you followed all these seven steps? Then it is very likely that your data estate will help drive innovation and that you will be deploying a scalable, future-proof data environment. It is key to start small. For example, by developing an estate in the cloud (which you can scale to production), with only a couple of data sources, and a few tools, and then testing and experimenting. Step by step you’ll then help your organization to transform into a data-driven organization.