Smart Data – Extracting Quality From Quantity

Amit Sheth defined “Smart Data” as “realising productivity, efficiency, and effectiveness gains by using semantics to transform raw data into Smart Data.”

Clarifying “Smart Data” concept can be understood as automation of the process represented by the arrow linking “Information” and “Knowledge” entities on the classic diagram:

Semantic and Knowledge

Smart Data provides value from harnessing the challenges posed by volume, velocity, variety, and veracity of big data, and in turn providing actionable information and improving decision making.  It is about extracting value by improving human involvement in data creation, processing, and consumption, resulting in enhanced human experience.

Continue reading

Posted in Big Data, Data to Knowledge, Smart Data | Tagged , , , , , , | Leave a comment

Querying Tabular model XML/A with XPath

There can be various reasons to extract metadata from Tabular model, for example to compose documentation or logical model diagram. The next collections are useful from architectural point of view:

  • list of dimensions,
  • list of measure groups and measures,
  • list of calculations with their formulas,
  • list of hierarchies.

There are a lot of utilities allowing to browse XML-scripts and efficiently query them. However most likely Tabular developer has SQL Server Management Studio under hand and usage of XQuery looks convenient way to go.


Continue reading

Posted in programming, Uncategorized | Tagged , , , , , , , | Leave a comment

From SQL to NoSQL to NewSQL

History of RDBMS

Every time new technology emerged it’s evolution ended up in realisation as relational system (RDBMS). In other words, the business before adopting the stuff always demanded atomicity, consistency, isolation, and durability (ACID).


Continue reading

Posted in Big Data, Business Capability, Business Delivery, R&D, Uncategorized | Tagged , , , , , , , , | 1 Comment

SQL or NoSQL: that is the question.

Relational database management system (RDBMS) have been a primary data storage mechanism for decades. NoSQL databases have existed since the 1960s, but have been recently gaining traction and the business faces a challenge of their efficient adoption.


There are many tutorials explaining how to use a particular flavor of SQL or NoSQL, but few discuss why you should choose one in preference to the other (“SQL or NoSQL – that is the question”). Hope to answer the tough question here covering the fundamental differences in business capabilities. Here should be noted that since the author is Microsoft fan all the next is written keeping in mind Azure environment with such a products as Azure SQL Database, Azure Parallel Data Warehouse, Azure Data Lake Analytics and Azure Spark on HDInsight.

Continue reading

Posted in Big Data, Business Capability, Uncategorized | Tagged , , , , , , , , , , | Leave a comment

Azure PDW is generally available

Microsoft released Azure SQL Data Warehouse accompanying  with a bouquet of impressive distinct capabilities that can be found in the press-release.

An item attracted my attention was the next:

Continue reading

Posted in Big Data, Business Capability, Business Delivery | Tagged , , , , , , | Leave a comment

PolyBase: A Superior Alternative To Process And Query Data

Recently I was literally stunned when loading bulky data into PDW got result a magnitude faster than expected. Load of 20Gb file into Azure Blob Storage from local machine takes 15 minutes, copying from Blob Storage into Data Lake (ie from one hump in the cloud into another) takes 5 minutes, but when copying of unstructured data into relational db took 4 minutes 20 seconds – that was really shocking. It supposed to be much longer due to data transformation into relational form – expecting 20 minutes for overall I even planned “to put Billy on” (“to make tea” in Aussie slang 😉 ) – so for several seconds just couldn’t believe my eyes and thought that transmission simply interrupted and success message returned is a mistake. No, it was one of those rare moments when soft works more than brilliant.

So how does the magic work?

Continue reading

Posted in Big Data, Business Capability, R&D | Tagged , , , , , , , | Leave a comment

Hadoop Data Processing: Battle For Speed

The idea behind Hadoop шs brilliant and revolutionary: invented an algorithm – MapReduce – allowing decomposition on it of all major data processing tasks (grouping, statistical, graph, etc). However it’s use of input files and lack of schema support prevented the performance improvements enabled by common database system features such as B-trees and hash partitioning. Business demanded instant improvement and Hadoop vendors had to move one. They faced the challenge of choosing one of the two ways of progress: whether to speed up MapReduce or to get rid of it (loosing so important scalability and fault-tolerance).


And here is what they choose:

Continue reading

Posted in Big Data, Business Capability, R&D, Uncategorized | Tagged , , , , , , | Leave a comment