BIP xTech carried out an initial assessment phase, necessary to perimeter the scope of the modernization and to choose the proper technologies. We decided to create the new DWH on Google Cloud Platform (GCP) using BigQuery and Dataform.
BigQuery gives us better performance (compared to old on-premises hardware), integrates easily with Business Intelligence tools, complies with industry security standards and allows granular access control, and offers a pay-as-you-go pricing model (especially useful in the migration phase, when the amount of data to be processed is not known precisely).
Dataform is the orchestrator executing queries from BigQuery to BigQuery. It's been chosen because, in the legacy system, most of the transformations were already implemented in SQL language. This allowed BIP xTech to perform a comprehensive cleaning of the logic and refactoring without having to rewrite all the pipelines and leveraging the current know-how of the company's internal DWH team. The adoption of this tool only required training for the management of GCP console and the usage of version control tools (Git). In addition to maintaining continuity with the past, Dataform allowed us to exploit all the innovations brought over the years by software engineering, being based on the concept of data transformations as code (with full versioning support of transformations). It also natively integrates assertions (data quality checks, used to ensure data meets expectations) and unit tests, in such a way to have a real single source of truth. Equally critical is the ability to document the data schema and the transformations by adding documentation to your Dataform code. The documentation is automatically added to the Data Catalog within Dataform and can later be exported to other tools. Finally, the dependency tree visualization of all the dependencies that are automatically detected by Dataform, allows the viewer to easily understand the relationship between each entity of the project and their order of execution.
Bringing the infrastructure to the cloud also lets us take advantage of all the native features offered by GCP, which would have required many ad-hoc developments to be implemented in the old on-premise environment. BIP xTech made use of “table snapshot" to save an ultra-light state of the table at a certain point in time, allowing instant data rollbacks or easily calculation of the delta between two different moments. We used Cloud Workflow, allowing the management of the integration between Dataform and other GCP services, and we monitored the whole infrastructure via Cloud Alerting, which notifies the responsible personnel in real time if any errors occur.