Two competing cloud storage products by Microsoft are defined the next way:
- Azure Blob Storage is a general purpose, scalable object store that is designed for a wide variety of storage scenarios.
- Azure Data Lake Store is a hyper-scale repository that is optimised for big data analytics workloads.
Let’s go deeper and list the major differences between them:
|Azure Data Lake Store||Azure Blob Storage|
|Purpose||Optimized storage for big data analytics workloads||General purpose object store for a wide variety of storage scenarios|
|Use Cases||Batch, interactive, streaming analytics and machine learning data such as log files, IoT data, click streams, large datasets||Any type of text or binary data, such as application back end, backup data, media storage for streaming and general purpose data|
|Structure||Hierarchical file system,
Data Lake Store account contains folders, which in turn contains data stored as files
|Object store with flat namespace.
There is actually a single layer of containers. You can virtually create a “”file-system”” like layered storage, but in reality everything will be in 1 layer, the container in which it is.
|API||REST API over HTTPS||REST API over HTTP/HTTPS|
|Server-side API||WebHDFS-compatible REST API||Azure Blob Storage REST API|
|Hadoop File System Client||Yes||Yes|
|Data Operations – Authentication||Based on Azure Active Directory Identities||Based on shared secrets – Account Access Keys and Shared Access Signature Keys.|
|Data Operations – Authentication Protocol||OAuth 2.0. Calls must contain a valid JWT (JSON Web Token) issued by Azure Active Directory||Hash-based Message Authentication Code (HMAC) . Calls must contain a Base64-encoded SHA-256 hash over a part of the HTTP request.|
|Data Operations – Authorization||POSIX Access Control Lists (ACLs). ACLs based on Azure Active Directory Identities can be set file and folder level.||For account-level authorization – Use Account Access Keys
For account, container, or blob authorization – Use Shared Access Signature Keys
|Data Operations – Auditing||Available.||Available|
|Encryption data at rest||Transparent, Server side
With service-managed keys
With customer-managed keys in Azure KeyVault
|Transparent, Server side
With service-managed keys
With customer-managed keys in Azure KeyVault (coming soon)
|Developer SDKs||.NET, Java, Python, Node.js||.Net, Java, Python, Node.js, C++, Ruby|
|Analytics Workload Performance||Optimized performance for parallel analytics workloads. High Throughput and IOPS.||Not optimized for analytics workloads|
|Geo-redundancy||Locally-redundant (multiple copies of data in one Azure region)||Locally redundant (LRS), globally redundant (GRS), read-access globally redundant (RA-GRS).|
What is not mentioned here is that U-SQL engine generates different query plans for Data Lake and Blob Storage. That means for some types of solutions it would be more reasonable to make choice not basing on optimisation for load but on optimisation for read.