Tag Archives: DBFS

Mount/Unmount SASURL with Databricks File System

When we develop data analytics solution, data preparation and data load are the steps that we cannot skip. Azure Databricks supports both native file system Databricks File System (DBFS) and external storage. For external storage, we can access directly or mount it into Databricks File System. This article explains how to mount and unmount blog storage into DBFS.

The code from Azure Databricks official document.

#  Mount an Azure Blob storage container
dbutils.fs.mount(
  source = "wasbs://<container-name>@<storage-account-name>.blob.core.windows.net",
  mount_point = "/mnt/<mount-name>",
  extra_configs = {"<conf-key>":dbutils.secrets.get(scope = "<scope-name>", key = "<key-name>")})
# Unmount a mount point
dbutils.fs.unmount("/mnt/<mount-name>")

Normally in our data pipeline, we have the logic like this: 1) Check if the path is mounted or not. 2) If it is not mounted yet, mount the path. 3) If it is already mounted, either ignore the mount logic use the existing mounting point, or unmount it and mounting it again.

def mount_blob_storage_from_sas(dbutils, storage_account_name, container_name, mount_path, sas_token, unmount_if_exists = True):
  if([item.mountPoint for item in dbutils.fs.mounts()].count(mount_path) > 0):
    if unmount_if_exists:
        print('Mount point already taken - unmounting: '+mount_path)
        dbutils.fs.unmount(mount_path)
    else:
        print('Mount point already taken - ignoring: '+mount_path)
        return
  print('Mounting external storage in: '+mount_path)
  dbutils.fs.mount(
    source = "wasbs://{0}@{1}.blob.core.windows.net".format(container_name, storage_account_name),
    mount_point = mount_path,
    extra_configs = {"fs.azure.sas.{0}.{1}.blob.core.windows.net".format(container_name, storage_account_name): sas_token }) 

When blob storage is shared using SASURL instead of blob details information, we can parse the blob information from SASURL as below:

def get_detail_info_from_url(str):
  array_1=str.split('//', 1)
  array_2=array_1[1].split('.', 2)
  storageaccoutname = array_2[0]
  type=array_2[1]
  array_3=array_2[2].split('/', 1)
  array_4=array_3[1].split('?', 1)
  sas='?' + array_4[1]
  array_5=array_4[0].split('/', 1)
  contianer=array_5[0]
  return (storageaccoutname, contianer, sas)
sas_url = dbutils.secrets.get(scope = "<scope-name>", key = "<key-name>")
storage_account_name, container_name, sas_token = get_detail_info_from_url(sas_url)
mount_path = "/mnt/path1"
mount_blob_storage_form_sas(dbutils, storage_account_name, container_name, mount_path, sas_token, True)

We can integrate our Databricks tasks into Azure Data Factory with other activities to build one end to end data pipeline. Suggest that this mount/unmounting activity is designed as one prerequisite step for other notebooks tasks, see one example diagram in Azure Data Factory: