Azure Blob File System Driver – File, Object, and Data Lake Storage
Azure Blob File System Driver
Perhaps the biggest benefit to using ADLS is its ability to allow users to manage and access data like they would with a Hadoop Distributed File System (HDFS). The Azure Blob File System (ABFS) driver is native to ADLS and is used by Apache Hadoop environments to create and interact with data in ADLS. Typical environments in Azure that access ADLS with the ABFS driver include Azure HDInsight, Azure Databricks, and Azure Synapse Analytics.
The ABFS driver enables access to ADLS resources using a URI-based connection. Applications constructing the URI can use the following format to navigate to specific directories or files in ADLS:
abfs://<container-name>@<storage-account-name>.dfs.core.
windows.net/<path>/<file-name>
As you can see, the parent directory is a container object. This object can contain several levels of subdirectories depending on how the data is organized. For example, the directory hierarchy for storing product data by date could look like <product>/<year>/<month>/<day>, where product is the parent directory and year, month, and day represent different subdirectories. Using this directory structure, let’s look at how you would format the ABFS URI to access product data from November 11, 2021:
abfs://[email protected]/2021/November/11/
This URI will access all the data in the 11 subdirectory. Adding a specific filename at the end of the URI, such as bicycles.csv, will redirect access to that one file. There is also support for accessing all files of a specific format by adding the * wildcard and the file extension to the end of the URI. The following extends the previous example by accessing all the CSV data produced on November 11, 2021. abfs://[email protected]/2021/November/11/*.csv