Amit Sheth defined “Smart Data” as “realising productivity, efficiency, and effectiveness gains by using semantics to transform raw data into Smart Data.”
Clarifying “Smart Data” concept can be understood as automation of the process represented by the arrow linking “Information” and “Knowledge” entities on the classic diagram:
Smart Data provides value from harnessing the challenges posed by volume, velocity, variety, and veracity of big data, and in turn providing actionable information and improving decision making. It is about extracting value by improving human involvement in data creation, processing, and consumption, resulting in enhanced human experience.
There can be various reasons to extract metadata from Tabular model, for example to compose documentation or logical model diagram. The next collections are useful from architectural point of view:
- list of dimensions,
- list of measure groups and measures,
- list of calculations with their formulas,
- list of hierarchies.
There are a lot of utilities allowing to browse XML-scripts and efficiently query them. However most likely Tabular developer has SQL Server Management Studio under hand and usage of XQuery looks convenient way to go.
History of RDBMS
Every time new technology emerged it’s evolution ended up in realisation as relational system (RDBMS). In other words, the business before adopting the stuff always demanded atomicity, consistency, isolation, and durability (ACID).
Posted in Big Data, Business Capability, Business Delivery, R&D, Uncategorized
Tagged Architecture, Big Data, Business Capability Increment, Massively Parallel Processing, MPP, NewSQL, NoSQL, TechNews, Trends
Relational database management system (RDBMS) have been a primary data storage mechanism for decades. NoSQL databases have existed since the 1960s, but have been recently gaining traction and the business faces a challenge of their efficient adoption.
There are many tutorials explaining how to use a particular flavor of SQL or NoSQL, but few discuss why you should choose one in preference to the other (“SQL or NoSQL – that is the question”). Hope to answer the tough question here covering the fundamental differences in business capabilities. Here should be noted that since the author is Microsoft fan all the next is written keeping in mind Azure environment with such a products as Azure SQL Database, Azure Parallel Data Warehouse, Azure Data Lake Analytics and Azure Spark on HDInsight.
Posted in Big Data, Business Capability, Uncategorized
Tagged Architecture, Big Data, Business Capability Increment, Data Lake, Hive, Massively Parallel Processing, Parallel Data Warehouse, PDW, SQL Server, SQL Server 2016, Trends
Microsoft released Azure SQL Data Warehouse accompanying with a bouquet of impressive distinct capabilities that can be found in the press-release.
An item attracted my attention was the next:
Recently I was literally stunned when loading bulky data into PDW got result a magnitude faster than expected. Load of 20Gb file into Azure Blob Storage from local machine takes 15 minutes, copying from Blob Storage into Data Lake (ie from one hump in the cloud into another) takes 5 minutes, but when copying of unstructured data into relational db took 4 minutes 20 seconds – that was really shocking. It supposed to be much longer due to data transformation into relational form – expecting 20 minutes for overall I even planned “to put Billy on” (“to make tea” in Aussie slang 😉 ) – so for several seconds just couldn’t believe my eyes and thought that transmission simply interrupted and success message returned is a mistake. No, it was one of those rare moments when soft works more than brilliant.
So how does the magic work?
The idea behind Hadoop шs brilliant and revolutionary: invented an algorithm – MapReduce – allowing decomposition on it of all major data processing tasks (grouping, statistical, graph, etc). However it’s use of input files and lack of schema support prevented the performance improvements enabled by common database system features such as B-trees and hash partitioning. Business demanded instant improvement and Hadoop vendors had to move one. They faced the challenge of choosing one of the two ways of progress: whether to speed up MapReduce or to get rid of it (loosing so important scalability and fault-tolerance).
And here is what they choose: