Building a Lakehouse part 2: tackle your data migration

Jerrold Stolk
20 Nov, 2023

Data Warehouses within organizations have for many years provided the insights that support important decisions. Although Massive Parallel Processing (MPP) architecture has enabled Data Warehouses to easily process large amounts of data, Data Warehouses are primarily focused on structured data.

As medium and large organizations increasingly deal with unstructured data as well as streaming data, they are running into the limitations of their Data Warehouses. The first blog in this series of two discusses what a Lakehouse architecture is and why it is the next step in data-driven processes within these organizations. In this second blog, we'll tell you how to go about migrating to a Lakehouse.

Before you start, proper preparation is required

When an organization opts for a Lakehouse architecture to achieve a future-proof data platform, that is the moment to address shaping the migration. For the migration to be successful, proper preparation is required. This preparation consists of at least the following items:

  • Use case: what internal questions will we be answering with this migration and how does this align with our (IT) strategy?
  • Assessment: is the current environment suitable for lift & shift migration?
  • Migration design: what approach will we apply per component?
  • Evaluation: is the migration plan ready to be implemented?
  • Implementation: actually performing the migration.

After that, we cover the following topics:

  1. Platform and Security
  2. Data Migration
  3. Data Transformation
  4. Data Products

1. Platform and security

After preparing steps 1 through 4, the first step in actually performing the migration is setting up the platform. This involves setting up the Landing Zones on which the data platform will be built. Microsoft recommends Data and Data Management Landing Zones for this purpose, part of the Cloud Adoption Framework. Part of the platform rollout also includes the security setup. Before the migration starts, it should be clear who has rights to do what. This is central to setting up the help of firewalls, Network Security Groups and Private Endpoints, among others.

Four people working around a table collaborating on a business value assessment

2. Data migration

Once the platform is in place, it's time for the data migration. This contains two sub-steps: migrating the history and setting up the loading processes. In each, we recommend a side-by-side migration over an in-place migration. This involves the new data platform being set up alongside the existing platform, making testing and validation easy. This way, you can easily compare the newly established data platform 1-to-1 with the already existing platform.

Migrating history

When migrating history, all relevant data is copied from the original Data Warehouse into the Data Lake. When data is copied from a Data Warehouse, the source is usually a database environment. To access these, we recommend metadata-driven extraction. This involves using an ETL tool to create one copy per table in the Data Lake, in a predefined structure.

Setting up the loading processes

When setting up the loading processes, it is necessary to keep the data from both environments the same by updating them. To do this, the loading processes from the Data Warehouse must be set up in the Data Lake. For this, too, we recommend a metadata-driven solution, which in this case connects directly to the source: the Data Warehouse. When migrating from an on premise solution to a cloud solution, a Gateway is required in many cases.

3. Data transformations

The next step in migrating from a Data Warehouse to a Lakehouse is to convert the data transformations. It too contains two sub-steps: migration of transformation processes and rebuilding of data processes. The decision for this should already be made at the stage and depends on the answers to the following questions:

  • What language or tooling were the transformation processes developed in?
    • SQL language is easier to migrate than a graphical tool like SQL Server Integration Services (SSIS).
  • How mature are these transformation processes?
    • When the current transformation processes are no longer 100% aligned with the objectives, it is advisable to rebuild them in a new form.

If rebuilding is chosen, this is the time to also review data layering. Many Lakehouses use a medallion architecture, with Bronze, Silver and Gold layers.

Whether migration or rebuilding of data transformations is chosen, the location where the transformed data is stored must be changed in any case. This is different for a Lakehouse than for a Data Warehouse. The transformed data in the Silver and Gold layers is also stored back in the Data Lake.

4. Data products

Once the data and transformation processes are successfully migrated, the same datasets are available in the new environment. This is the time to convert the data products to the new environment. A distinction can be made in this between managed reporting and self-service analytics.

In managed reporting, the data products are managed by a central reporting team. This team can take care of converting the data itself, by referring the data products to the new environment. For self-service use, users should be informed of the change in three stages:

  • Advance notice that the data platform is going to change, and why;
  • After migration, notify that the new environment is available, and that data products must be converted before the migration deadline;
  • After this date, users are notified that their reports will not work if they have not been converted.

Challenges and opportunities

When managing a data lake or data warehouse

Connect with us

Want to learn more about the key benefits and challenges of Lakehouse architecture, or get started right away? Connect with one of our Data & Analytics experts.

By using this form you agree to the storage and processing of the data you provide, as indicated in our privacy policy. You can unsubscribe from sent messages at any time. Please review our privacy policy for more information on how to unsubscribe, our privacy practices and how we are committed to protecting and respecting your privacy.

Read more

These customers rely on our data & analytics expertise