PolyBase: A Superior Alternative To Process And Query Data

Recently I was literally stunned when loading bulky data into PDW got result a magnitude faster than expected. Load of 20Gb file into Azure Blob Storage from local machine takes 15 minutes, copying from Blob Storage into Data Lake (ie from one hump in the cloud into another) takes 5 minutes, but when copying of unstructured data into relational db took 4 minutes 20 seconds – that was really shocking. It supposed to be much longer due to data transformation into relational form – expecting 20 minutes for overall I even planned “to put Billy on” (“to make tea” in Aussie slang 😉 ) – so for several seconds just couldn’t believe my eyes and thought that transmission simply interrupted and success message returned is a mistake. No, it was one of those rare moments when soft works more than brilliant.

So how does the magic work?

This is what SQL Server Customer Advisory Team tells about PolyBase’s loading functionality:

PolyBase data loading (ie load through EXTERNAL TABLES) is not limited by the Control node, and so as you scale out your DWU, your data transfer throughput also increases. By mapping the external files as external tables the data files can be accessed using standard T-SQL commands.

As the following architecture diagrams show that each Compute node connects to an external resource independently:

PolyBase_in_PDW1

When “old school” methods still use the Control node. The reason why PolyBase provides a superior load rate is that PolyBase data transfer is not limited by the Control node. But if using PolyBase is not currently an option, the following technologies and methods can be used for loading into PDW:

  • BCP
  • Bulk Insert
  • SSIS
  • SQLBulkCopy
  • Azure Data Factory (ADF) (uses SQLBulkCopy)

PolyBase_in_PDW2

Finally MS teams give a general recommendation that I’m completely agree with:

As a general rule, we recommend making PolyBase your first choice for loading data into SQL Data Warehouse unless you can’t accommodate PolyBase-supported file formats.

 

Advertisements

About fdtki

Sr. BI Developer | An accomplished, quality-driven IT professional with over 16 years of experience in design, development and implementation of business requirements as a Microsoft SQL Server 6.5-2014 | Tabular/DAX | SSAS/MDX | Certified Tableau designer
This entry was posted in Big Data, Business Capability, R&D and tagged , , , , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s