A couple of use cases helped by changing the color criteria is the location of “hot spots”, i.e. those nodes which process too many bytes or have too many jobs, and “cold spots” or those nodes that have not been used in a long time and hence could be be removed/deleted from the data lake. In the following screenshot we can see that the color indicates the volume of data each node has processed. Red indicates too much, and green indicates a reasonable volume. Grey indicates that we don’t have information about this particular node. It is interesting to see nodes that have a deep level and are very red. This usually means that those nodes could be optimized by creating aggregated/intermediate tables.
If we change the color criteria to “inherited recency”, we can search for nodes that have not received any query/job in a long time. The meaning of “inherited” is that we take the smallest recency among one node and its children, given that if one child node receives a job, in some way, the parent node also receives that job. Here, red indicates a very high inherited recency and green indicates very small inherited recency. Grey, indicates that we have no info about the particular node. We could analyze the plot to search for unused nodes (red and grey) and clean the data lake.
The team included a mouse over capability to display relevant metrics of a node, as you can see in the following screenshot:
Making Science teams, acting as Data Custodians for some of our customers, have found is that conversations with Data Business Owners/Stewards regarding optimizations and housekeeping are greatly facilitated by the use of the BigQuery Intelligence tool. Sitting down with the Data Steward with data and visualizations in hand enables a transparent, data driven conversation. The value to the business from a tool such as BigQuery Intelligence comes in the form of improved performance for systems accessing/consuming the data and ensures a reduced GCP operating costs. The Making Science team members have feature additions already in the works and we will share some of the use cases and the corresponding benefits as they are available.