Azure Databricks – The Storage of Data


When you provision an Azure Databricks workspace, an Azure storage account containing an Azure blob container used for storing your data is also provisioned. Figure 4.15 shows the details of the Azure storage account and the blob container service.

FIGURE 4.15 Data redundancy storage account for Azure Databricks

Notice the reference to a Primary/Secondary Location, which are paired regions (see Table 4.2). Additionally, notice the Disk State and the Replication setting of Geo‐redundant Storage (GRS). If, for example, there is an outage in West Europe, the people working on the issue will make the data on the GRS replicated drive available for access. Then, you must provision a new Azure Databricks workspace in North Europe (the paired region), using the replicated Azure storage account and Azure blob container.

Implement Distributions

Table distributions are an important concept. The different table distribution types are round‐robin, hash, and replicated. Perform Exercise 4.4 to implement each of these table distribution types.

EXERCISE 4.4
Implement Distributions

  1. Log in to the Azure portal at https://portal.azure.com ➢ navigate to the Azure Synapse Analytics workspace you created in Exercise 3.3 ➢ on the Overview blade, click the Open link in the Open Synapse Studio tile ➢ select the Data hub ➢ expand the SQL database menu ➢ expand the dedicated SQL pool you created for Exercise 3.7 ➢ hover over the Tables folder ➢ click the ellipse (…) ➢ click New SQL Script ➢ click New table ➢ and then execute the SQL found on GitHub at https://github.com/benperk/ADE, in the Chapter04/Ch04Ex03 directory; the file is named distributionSQL.txt.
  2. Execute the SQL syntax located in the Chapter04/Ch04Ex03 directory on GitHub at https://github.com/benperk/ADE, in the filed named checkDistribution.txt ➢ view the tables and their associated distribution type. An example is illustrated in Figure 4.16.

FIGURE 4.16 Implementing table distributions

For more information about distribution types, refer to the section “Design a Distribution Strategy” in Chapter 3. See also the section “Implement Different Table Geometries with Azure Synapse Analytics Pools” earlier in this chapter.

Leave a Reply

Your email address will not be published. Required fields are marked *