Microsoft FabricAzure SQL DatabasePySparkPower BIDAX

OEM Material Insights: ESG Analytics Platform Built on Microsoft Fabric

Real-time Automating material compliance reporting for complex manufacturing supply chains

Business Context

Modern OEM manufacturers in automotive, aerospace, and electronics face mounting pressure to make procurement decisions that balance cost, quality, and sustainability. EU regulations (CSRD, Supply Chain Due Diligence) demand transparency on supplier environmental and governance performance, while investors and customers increasingly scrutinize supply chain responsibility. Traditional ERP systems track cost and quality but lack integration with sustainability metrics—creating blind spots where purchasing decisions may inadvertently concentrate risk in countries with weak environmental regulations or unstable governance.

This project addresses that gap by building an end-to-end ESG analytics platform on Microsoft Fabric that integrates procurement data with public sustainability datasets. The platform enables manufacturers to assess supplier sustainability (Yale Environmental Performance Index across 180+ countries), evaluate governance risks (World Bank indicators covering 200+ jurisdictions), analyze supply concentration for 80+ critical materials, and track procurement patterns correlated with ESG performance.

Target Use Cases: Battery material sourcing decisions for EV manufacturers (lithium, cobalt, nickel), critical material supply security for aerospace/defense (rare earth elements, titanium), multi-tier supplier ESG monitoring for electronics, and regulatory compliance reporting for any materials-intensive industry navigating ESG requirements.


In Short: Built an ESG analytics platform that helps manufacturers comply with EU regulations and reduce supply chain risks by integrating procurement data with environmental and governance metrics. The system combines ERP data with public sustainability datasets to provide real-time insights for sourcing decisions.

Business Value

This platform enables OEM manufacturers to:

  • Navigate ESG Compliance: Meet EU regulatory reporting obligations (CSRD, Supply Chain Due Diligence) with auditable sustainability metrics
  • Build Supply Chain Resilience: Identify geographic concentration risks for critical materials and diversify supplier base
  • Make Informed Sourcing Decisions: Quantify environmental and governance trade-offs across supplier options
  • Proactive Risk Management: Monitor governance indicators (political stability, corruption) to anticipate supply disruption risks

The integration of dimensional modeling with comprehensive data quality framework enables scalable, maintainable analytics rather than fragile one-off reports—critical for production enterprise environments.


For technical readers: The following sections detail the data engineering architecture and implementation specifics.

Technical Approach

Data Integration & Architecture

Integrated four disparate data sources into Microsoft Fabric using medallion architecture (bronze → silver → gold) ensuring data quality, auditability, and performance optimization:

  • Azure SQL Database: ERP procurement transactions and supplier master data (Data Factory pipeline with incremental load parameters)
  • Yale Environmental Performance Index: 180+ countries, 40+ environmental indicators across climate, biodiversity, and environmental health
  • World Bank Governance Indicators: 200+ countries across six governance dimensions (accountability, stability, effectiveness, regulatory quality, rule of law, corruption)
  • EU Critical Raw Materials Supply Data: 80+ materials with supply concentration by country (automated HTTP ingestion from GitHub)

Bronze Layer: Parallel ingestion of raw data into Delta tables preserving audit trail Silver Layer: PySpark-based standardization (column naming, type casting, unpivoting wide-format ESG data) Gold Layer: Star schema dimensional model optimized for DirectLake semantic model

Dimensional Modeling

Designed star schema optimized for DirectLake semantic model:

3 Fact Tables:

  • fact_procurement - Procurement transactions with spend metrics
  • fact_supply_share - Material supply concentration by country/stage
  • fact_epi_score - Environmental performance scores by country/indicator

5 Dimension Tables:

  • gold_dim_country - 180-200+ countries with ISO codes, ESG metadata (role-playing dimension for HQ vs. production country)
  • gold_dim_material - 80+ materials with commodity group hierarchy
  • gold_dim_date - Calendar dimension with temporal attributes
  • gold_dim_indicator - EPI and WGI indicator metadata
  • gold_dim_stage - Extraction vs. Processing for supply chain analysis

Microsoft Fabric Warehouse mirrors dimensional model using Microsoft.Build.Sql, providing optimized storage for DirectLake semantic model with near-real-time refresh and minimal latency.

Data Quality Framework

Implemented comprehensive data quality framework addressing the universal challenge of inconsistent entity naming across disparate data sources:

Deterministic Surrogate Keys: Used xxhash64 algorithm for stable, reproducible surrogate keys ensuring dimensional integrity across pipeline reruns without external key management systems.

Alias Resolution with Confidence Scoring: Built mapping tables with 100+ country aliases and material name variations (addressing “USA” vs. “United States”, “Türkiye” vs “Turkey”), each with confidence scores (0-1 scale). High confidence ≥0.90 for direct matches, medium ≥0.70 for alternate spellings, low ≥0.50 for fuzzy matches, unmapped <0.50 flagged for review.

Audit Trail & Quality Categorization:

  • Quality tier assignment based on mapping confidence
  • Placeholder dimensions preserve unmapped values (no data loss)
  • Audit tables (gold_unmapped_procurement_audit, gold_unmapped_supply_audit) track unresolved items
  • Coverage matrix documents which entities appear in which data sources

This framework enables transparent data quality monitoring and ensures business users understand confidence levels when interpreting analytics—critical for production analytics systems.

Pipeline Orchestration & Semantic Model

Orchestration Pipeline coordinates end-to-end data flow:

  1. Bronze ingestion (parallel loading of 4 data sources)
  2. Silver cleaning (PySpark transformations with error handling)
  3. Gold modeling (dimensional table generation with quality framework)
  4. Warehouse sync (Gold layer → SQL warehouse for DirectLake)

DirectLake Semantic Model: Star schema (5 dimensions, 3 facts, 8 active relationships) with role-playing dimension for gold_dim_country. DirectLake mode queries Delta tables directly without import, enabling near-real-time refresh with minimal latency.

Testing & Code Quality

Implemented unit testing framework (rare for BI projects) demonstrating software engineering rigor:

  • Test Suite: /tests/test_key_generation.py validates surrogate key consistency and uniqueness; /tests/test_data_quality.py performs schema validation, null checks, duplicate detection
  • Modular Code: Reusable transformation functions in /src/transformations/ (key_generation.py, data_quality.py) enabling testability and maintainability
  • Configuration: pytest.ini with test markers (unit, integration, slow, smoke) and requirements-test.txt for dependencies

Tech Stack: Python 3.12+, Java 11+, PySpark 3.4+, pytest

Implementation Status

Completed ✅

Core Data Platform:

  • Medallion architecture (bronze → silver → gold) with Delta tables
  • Data quality framework with confidence scoring and audit trails
  • Dimensional model (3 facts, 5 dimensions) with DirectLake semantic model
  • Unit testing infrastructure with pytest

Data Engineering:

  • Orchestration pipeline with parallel ingestion (4 data sources)
  • Deterministic surrogate key generation (xxhash64)
  • Alias resolution with 100+ country/material mappings
  • Role-playing dimension for supplier geography

In Development 🚧

Power BI Dashboard - Coming Soon

Interactive ESG-integrated procurement intelligence dashboard providing:

  • Supplier Sustainability Assessment: Environmental performance evaluation using Yale EPI metrics across climate, biodiversity, and environmental health categories
  • Governance Risk Analysis: Political and regulatory risk assessment using World Bank indicators (rule of law, corruption, political stability)
  • Supply Concentration Risk: Geographic concentration analysis for 80+ critical materials identifying single-source dependencies
  • Procurement Pattern Analysis: Spending correlation with ESG metrics revealing sustainable sourcing opportunities
  • Data Quality Transparency: Mapping confidence scores and unmapped value visibility

Platform Enhancements:

  • Incremental load implementation (parameters configured, logic in development)
  • Row-level security for multi-tenant access control
  • DAX measures documentation
  • Automated external data refresh (EPI/WGI updates)

Skills Demonstrated

  • Modern Data Platform Architecture — End-to-end Microsoft Fabric implementation (Lakehouse, Warehouse, Data Factory, PySpark, DirectLake) demonstrating cloud-native analytics stack expertise across Fortune 500-adopted integrated platform

  • Production-Grade Data Engineering — Medallion architecture (bronze-silver-gold) with deterministic surrogate keys (xxhash64), comprehensive data quality framework with confidence scoring, and reusable transformation functions. Demonstrates understanding of maintainable, scalable systems beyond proof-of-concept prototypes

  • Analytics Engineering — Star schema dimensional modeling with role-playing dimensions, DirectLake optimization for near-real-time analytics, and appropriate fact/dimension granularity for enterprise BI architecture

  • Data Quality Governance — Alias resolution with confidence scoring and audit trail tables addressing the universal challenge of inconsistent entity naming across disparate data sources. Framework directly transferable to any multi-source integration project

  • Software Engineering Rigor — Unit testing framework with pytest (rare for BI projects), modular code in /src/transformations/, and professional development practices demonstrating commitment to code quality and maintainability

  • Domain Expertise — ESG analytics integration (Yale Environmental Performance Index, World Bank Governance Indicators, EU supply chain data) translated into procurement decision support. Demonstrates ability to bridge technical implementation with business context for cross-functional collaboration

Relevant For: Data Engineering, Analytics Engineering, and BI Platform roles at manufacturing companies (automotive/EV, aerospace, electronics) and ESG-focused organizations requiring supply chain due diligence capabilities. Directly applicable to organizations like Volvo, Northvolt, Saab, and SSAB navigating ESG compliance and supply chain resilience.

Project Resources

Technologies Used

Microsoft FabricAzure SQL DatabasePySparkPower BIDAX

Interested in similar work?

I write about data architecture and sustainability analytics.

Read Insights