Azure Blob Storage Client Library – File, Object, and Data Lake Storage
Azure Blob Storage Client Library
The Azure Blob Storage client library is a part of the Azure SDK that users can use to build custom application logic for managing Azure Blob Storage. Users can take advantage of this SDK when building applications using different languages, including .NET, Java, Python, JavaScript, and C++. The library contains several classes that can be used to create, replace, list, and delete containers or blobs.
FIGURE 4.15 Upload blob
The following are the most useful classes for interacting with Azure Blob Storage content:
- BlobServiceClient allows users to manipulate storage account resources such as blob containers.
- BlobContainerClient allows users to manipulate containers and their blobs. This class goes by ContainerClient in the Python and JavaScript versions of the Azure Blob Storage client library.
- BlobClient allows users to manipulate blobs.
Check out the following tutorial if you would like to learn more about how to get started building custom application logic that manipulates Azure Blob storage: https://docs.microsoft.com/en-us/azure/storage/blobs/storage-quickstart-blobs-dotnet. While this specific tutorial uses .NET, there are several other tutorials available for the other languages that support the Azure Blob storage client library.
Azure Data Lake Storage Gen2
Azure Data Lake Storage Gen2, or ADLS for short, is an object storage solution that is built on top of Azure Blob Storage. It can be enabled when creating a storage account in the Azure Portal by selecting the Enable hierarchical namespace setting in the Advanced tab. The hierarchical namespace allows users to easily organize data objects into a hierarchy of directories and subdirectories for efficient data access.
The addition of the hierarchical namespace to Azure Blob storage’s existing capabilities makes ADLS an ideal storage solution for big data and distributed analytics solutions. Users can easily organize data into different directories that are specific to where it is in the data processing life cycle. For example, there could be a raw directory that acts as a landing zone for new datasets, a cleansed directory that stores the data once it has been scrubbed of any errors or inconsistencies, and a report-ready directory that stores the data once aggregations and business logic have been applied to it. Furthermore, each of these directories can host several subdirectories that partition data by certain features such as year, month, and date.
ADLS is easily scalable and very cost-effective because it is built on top of Azure Blob Storage. This allows organizations to store data in multiple stages without having to worry about high costs or running out of storage. This allows users to leverage data at different life cycles for several different use cases. For example, data scientists can use raw and processed data to build their models, and analysts can use the report-ready directory to build reports or share aggregated datasets with other business units.
In addition to its performance enhancements, ADLS provides more granular access security to what is available with Azure Blob Storage. Administrators can use POSIX-like access control lists (ACLs) to set user permissions at the directory and file level. Using ACLs to manage access for data stored in ADLS is described further later in this chapter in the section “Access Management.”