Empowering data-driven decisions with Azure data management

By Nicola Duncan

09 January 2024

Large organisations, with their huge volumes of data, often experience data silos, inconsistent data quality and a lack of insights. That’s why data management solutions, such as those provided by Azure, are essential. Find out more about the Azure data management ecosystem and how we help clients harness its capabilities.

Benefits of good data management

When mishandled, data can quickly become a liability rather than an asset. The efficiency of data services relies on a good foundation of data management. Effective data management offers a streamlined approach to data collection, integration, analysis and visualisation.

Data platform solutions can help organisations establish a comprehensive framework for making more informed decisions from their data. Those decision can drive efficiency, innovation and competitive advantage.

"When mishandled, data can quickly become a liability rather than an asset."

Good data management can help organisations:

  • improve the quality of their data – effective data management removes errors, inconsistencies and duplication of records
  • make better decisions – using quality data significantly improves the quality of informed business decisions
  • boost productivity – good data management streamlines data access, analysis and sharing
  • ensure compliance and security – safeguarding sensitive information and ensuring compliance with governance and data protection regulations is crucial for avoiding repercussions of data breaches or non-compliance
  • scale – data management supports an organisation’s ability to scale its data infrastructure seamlessly as their business grows, preventing disruptions
  • innovate – new insights and product ideas can be uncovered, as well as opportunities for optimisation
  • be more cost effective – streamlining data processes can reduce data collection, storage and maintenance costs

These benefits can only be achieved with good data processing practices. From establishing effective data governance, management and stewardship to ensuring data accuracy, security and compliance.

The Microsoft Azure data management ecosystem gives organisations a fundamental tool for seamlessly gathering and processing data.

The Azure data management ecosystem

Microsoft Azure’s comprehensive set of services and tools allows organisations to use the power of cloud computing for efficient data gathering and processing. The ecosystem encompasses various components designed for data storage like databases, data warehouses and components for data processing, such as virtual machines. It allows for well-designed, robust and extensible data management platforms.

Let's dive deeper into the capabilities that Azure solutions can deliver.

Flow diagram of the Azure data management ecosystem

Migrate – transferring legacy data to a new platform

This uses the Data Factory function. Where data volumes are too large for smooth migration from a non-Azure source, we can use the Data Box service to save and store data in an Azure Storage account.

Ingest – collecting data from external sources and making it processable

There are two primary modes of data source ingestion: streaming and batch.

Streaming involves a continuous flow of data. We ingest data streams from applications through Event Hub, and from IoT devices through IoT Hub.

Batch involves periodic ingestion of large volumes of data. It supports:

  • unstructured data, such as images, videos and emails
  • semi-structured data, such as CSVs, XMLs and JSON documents
  • structured data, such as relational data and weblog stats

On-premises data stores can be securely accessed through the Azure VPN Gateway.

Store – depositing unprocessed data into the storage platform

There are two primary data storage facilities: Azure Data Lake Storage Gen2 and Azure SQL.

Azure Data Lake Storage Gen2 is an optimised storage layer, ideal for storing streams, unstructured data and semi-structured data. It offers cost-effective storing of large volumes of data and supports Delta Lake for transactional access.

Azure SQL is tailored specifically for relational and structured data, ensuring fast access to well-defined schemas.

Both storage facilities support processing by offering three distinct storage tiers:

  1. Raw – replicates the source data without applying any processing
  2. Cleansed – implements initial data quality rules to filter data from each source
  3. Transformed – data from various sources is merged, correlated, restructured and calculations applied to generate valuable business insights

Process – cleansing and transforming data for analysis

Two distinct variations are offered, each aligning with Azure's key offerings.

Azure Synapse Analytics is best suited for data warehousing, SQL data analysis and interactive reports. It uses serverless SQL pools, closely integrated with the Data Lake Storage capability. Processing functions include Azure’s:

  • Stream Analytics for stream-based data analysis
  • Machine Learning for model development and training
  • Cognitive Services for accessing pre-defined machine learning models.

Azure Databricks excels in handling streaming, machine learning, artificial intelligence and data science workloads. It provides a tailored Platform as a Service (PaaS) with additional capabilities atop Spark pools. Processing functions include Azure’s:

  • Data Factory for designing processing pipelines
  • Stream Analytics for stream-based analysis
  • Machine Learning for training machine learning models
  • Cognitive Services for harnessing machine learning models

Serve – distributing data to end users through various channels

Channels include:

  • Power BI – delivering reports and visualisation, providing valuable business insights to consumers
  • APIs – delivering mastered datasets to client's business applications or external partners, with API Manager providing necessary wrapping
  • Kubernetes Clusters – deploying and containing machine learning models for efficient use
  • Azure Synapse SQL Pools – supporting traditional data warehouses and SQL-based views onto Data Lake Storage master datasets
  • Azure SQL databases – functioning as data warehouses
  • Azure Data Share – sharing master datasets to a client's own customers and external partners, enabling snapshot-based or in-place sharing.

Governance and master data management (MDM) – establishing and enforcing data policies

Azure Purview plays a critical role in automating the discovery of datasets within the Store capability. It helps capture the end-to-end lineage of these datasets as they are created, transformed and exposed via the Serve capability.

Governance involves the development of a business glossary for appropriate data classification, along with tagging datasets with subject matter experts and owners to establish clear responsibilities.

At present, Azure lacks an integrated MDM capability. Nevertheless, the Azure platform addresses this deficiency by incorporating a marketplace solution called Profisee.

Profisee uses the glossary and classification data provided by Purview to enhance its workflows. These workflows include tasks such as matching, merging, verification and enrichment of dataset. Azure Data Factory drives Profisee activities, allowing triggers to originate from either Azure Synapse Analytics or Azure Databricks. Importantly, data lineage is maintained consistently throughout the process.

The result is the generation of Golden Records, which serve as master datasets for either the Process or Serve capabilities. These Golden Records are highly valuable for a wide array of data-related operations within the Azure ecosystem.

Secure – ensuring critical aspects of security

This capability places its emphasis on authentication, access control, classification, integrity, audit and incident management.

Key security measures woven into the solution include:

  • Azure Active Directory (AAD) identity for controlling access to all services within the Ingest, Store, Process and Serve capabilities
  • Encryption of all data in the Store capability, using either Microsoft or client keys
  • Encryption of data accessed by Ingest, Process and Serve capabilities during transit through Transport Layer Security.
  • Secure storage of inter-service communication secrets in Azure Key Vault
  • Azure Virtual Network integration for secure inbound and outbound traffic using either service or private endpoints
  • Azure DevOps integration for services featuring internal development and release cycles, providing quality assurance and auditability
  • Security-related policies defined in Azure Policy and proactive monitoring through Azure Monitor
  • Monitoring and protection of resources using Azure Security Center and Azure Defender to safeguard assets within the Store capability
  • Secure access and transit of data ingested from external sources using Azure VPN Gateway

How we can help

Azure data management allows organisations to fully unlock the value of their data resources. With the wide range of capabilities detailed above, it helps establish a robust foundation for developing data-driven services.

At Storm ID, we have substantial experience in harnessing the power of Azure data management platforms to deliver exceptional results for our large organisation and public sector clients. We can provide effective solutions for clients working with:

  • diverse data sources
  • data ingestion
  • systems integration
  • data storage
  • analytics technologies

We’re committed to ensuring that our clients’ data is managed effectively and securely throughout its entire lifecycle. Find out more about our data management expertise.