Build with an Azure free account. Get USD200 credit for 30 days and 12 months of free services.
Start free todayUnlock petabyte-scale datasets in Azure with aggregations in Power BI
Description
Christian Wade joins Scott Hanselman to show you how to unlock petabyte-scale datasets in Azure with a way that was not previously possible. Learn how to use the aggregations feature in Power BI to enable interactive analysis over big data.
For more information:
- Power BI Desktop September 2018 Feature Summary (Analytics)
- Aggregations in Power BI Desktop (Preview) docs
- Microsoft Power BI - Interactive Data Visualization BI Tools
- Create a free account (Azure)
Follow @SHanselman Follow @AzureFriday Follow @_christianWade
Download
The Discussion
-
Hi Guys,
Thanks for the great demo and a great feature. Queries that are not cached are getting processed by spark as mentioned but can you share more details around how a 23 node spark cluster fits into this eco-system ?
Thanks -
@kbaig:thanks for the feedback. The spark cluster is optional. From the Power BI side it works the same way if it's HDI Spark, Azure SQL Data Warehouse, DataBricks and various other sources in Azure (that support DirectQuery). The setup and optimization of these systems is dependent on the system itself and is standard for query perf tuning on that system - there is nothing special about setting up/query optimizing these systems that is different when using aggregations
-
This is super awesome, Christian!
Just curios if there is a plan for aggregations in Analysis Services Tabular?
-
Christian! You are brilliant we just need to figure out how to travel to Mars and back combining all of NASA data. I can setup that appointment if need be as I know a few smart people there. All the best! I will be using this for a few of our companies.
Ezra Gabay -
@Danaraj: we are currently focusing on going in the other direction: bringing the Analysis Services scalability, manageability, ALM, debugging, etc. to Power BI
-
@Ezra Gabay: Thank you so much Ezra! Glad to be of service! We're all about pushing the boundaries :)
-
@wadecb: That's interesting, looking forward for what's coming up next. Thanks Christian, amazing work.
-
re: Spark query. In order for this query to complete in reasonable time over big data, the data has to be partitioned. But there are limited way you can partition data in Spark (not more than 100 partitions).
So can you explain a bit how the data is partition bucketed?
-
@aljj:it is stored in parquet and coalesced into 200 random chunks of rows