Microsoft Fabric: Accessing CDM data (from Dataverse) from Spark

Christian Henrik Reich
3 min readFeb 7, 2024

--

Motivation

There is a need to read and interpret CDM data with Spark. In our case, we had a need to read CDM data exported from Dataverse. The export was done by Power Apps and exported to an Azure Data Lake Gen2.

There is currently a Spark CDM connector from Microsoft (https://github.com/Azure/spark-cdm-connector), and Fabric Shortcuts makes it easy to connect and access the files from the Azure Data Lake Gen2.

Still, we can’t read the CDM data.

We need an Azure Entra Id to access OneLake and its shortcuts, and the Spark CDM connector, by default uses Managed Identity, which only works on Databricks or Synapse. To add to the challenges, at time of writing, there seems to be an issue with using custom jars in Microsoft Fabric, and the connector is not part of the Microsoft Fabric runtime 1.2.

Lucky, and undocumented (Default level packages for Java/Scala libraries) the Spark CDM connector is part of Microsoft Fabric runtime 1.1.

Dump of the Spark environment from running Microsoft Fabric Runtime 1.1

The solution

To sum up, we are going to do follow:

  1. We are not creating a shortcut. Still we are creating a SAS token for the Azure Datalake where the CDM is stored.
  2. We create a Fabric Environment with Fabric Runtime 1.1.
  3. We create Fabric Notebook, attach it to our environment from step 2.
  4. Read the data

Creating the SAS

Navigate to the storage account in the Azure Portal. Find the Shared Access Signature

Creating the Fabric Environment

Make sure it says Runtime 1.1, and remember to Save and Publish

Create a Notebook in the Workspace and attach it to the Fabric environment

NB! The SAS token should not start with & when inserted

Read the Data

Use the code, in this example it is PySpark.

storageAccountName = "<azure_storage_name_where_cdm_data_is_stored>.dfs.core.windows.net"
container = "<container_name_where_cdm_data_is_stored>"


df = (spark.read.format("com.microsoft.cdm")
.option("storage", storageAccountName)
.option("sasToken", "<insert_your_SAS_token_without_&")
.option("manifestPath", container + "/model.json")
.option("entity", "account")
.load())

display(df)

Thanks for reading

--

--

Christian Henrik Reich
Christian Henrik Reich

Written by Christian Henrik Reich

Renaissance man @ twoday Kapacity, Renaissance man @ Mugato.com. Focusing on data architecture, ML/AI and backend dev, cloud and on-premise.

Responses (1)