Implement a Serving Layer with a Star Schema – The Storage of Data-1
Categories :
- Log in to the Azure portal at https://portal.azure.com ➢ navigate to the Azure Synapse Analytics workspace you created in Exercise 3.3 ➢ on the Overview blade, click the Open link in the Open Synapse Studio tile ➢ select the Develop hub ➢ click the + to the right Develop ➢ click SQL Script ➢ select your dedicated SQL pool from the Connect To drop‐down list box ➢ and then execute the SQL syntax in the file createAlterStarSchemaTables.sql, in the Chapter04/Ch04Ex12 directory on GitHub.
After you execute the syntax, the tables in Figure 4.35 will be created. Notice the different icons next to the table names to identify their different distribution types.
FIGURE 4.35 Serving layer using star schema distribution types
- Using the Azure SQL database you created in Exercise 2.1 (brainjammer) and the Linked service you created in Exercise 3.4 (BrainjammerAzureSQL), create an integration dataset that represents the [dbo].[READING] table. Additional data for the table [dbo].[READING] is available, in CSV format, in the file READING.zip in the BrainwaveData/Tables directory, on GitHub at https://github.com/benperk/ADE. The integration dataset should resemble Figure 4.36.
FIGURE 4.36 Serving layer using star schema integration dataset
- Load the star schema with data using the following bulk load COPY INTO command. You can find all the CSV table data in the BrainwaveData/Tables directory on GitHub. See the file copyIntoDimStarTables.sql, in the directory Chapter04/Ch04Ex12, for all the COPY INTO statements.
COPY INTO [brainwaves].[DimMODE]
FROM ‘https://.blob.core.windows.net/brainjammer/Tables/MODE.csv’ WITH ( FILE_TYPE=’CSV’,FIRSTROW = 2)GO COPY INTO [brainwaves].[DimSCENARIO] FROM ‘https://.blob.core.windows.net/brainjammer/Tables/SCENARIO.csv’
WITH (FILE_TYPE=’CSV’,FIRSTROW = 2)…
- Select the Develop hub ➢ click the + symbol to the right of the Develop title ➢ select Data Flow ➢ click the Add Source activity in the editor canvas ➢ provide the output stream name (I used BrainwavesReading) ➢ select the Integration Dataset as the Source type ➢ use the integration dataset you created in step 2 ➢ click the Open item next to the selected dataset ➢ enable Interactive Authoring ➢ navigate back to the data flow ➢ enable Data Flow Debug by selecting the toggle switch ➢ wait until enabling completes ➢ and then select the Enable radio button for Sampling.
- Select the + on the lower right of the Source activity ➢ select Sink ➢ enter an output stream name (I used brainjammerTmpReading) ➢ set the incoming stream to the source you configured in step 4 ➢ select Integration Dataset as the Sink type ➢ create a new dataset by clicking the + New button next to the Dataset drop‐down list box ➢ select Azure Synapse Analytics ➢ press Continue ➢ enter a name (I used brainjammerSynapseTmpReading) ➢ select the *WorkspaceDefaultSqlServer from the Linked service drop‐down list box ➢ click the Refresh icon to the right of the Table Name drop‐down list box ➢ enter the name of your dedicated SQL pool (I used SQLPool) ➢ click the OK button ➢ and then select the [brainwaves].[tmpREADING] table you created in step 1. The configuration should resemble Figure 4.37.
FIGURE 4.37 Serving layer using star schema tmp integration dataset
- Click the OK button ➢ look through the remaining tabs on the data flow configuration panel ➢ consider renaming the data flow ➢ and then click the Commit button to save the data flow to your GitHub repository. Figure 4.38 illustrates the configuration.
FIGURE 4.38 Serving layer using star schema data flow
- Navigate to the Integrate hub ➢ click the + to the right of the Integrate text ➢ select Pipeline in the Properties window ➢ enter a name (I used IngestTmpReading) ➢ expand the Move & Transform group from within the Activity pane ➢ drag and drop a data flow into the graph ➢ on the General tab, enter a name (I used MoveToTmpReading) ➢ on the Settings tab, select the data flow completed in step 5 from the Data Flow drop‐down list box ➢ select your WorkspaceDefaultStorage from the Staging Linked Service drop‐down list box ➢ click the Browse button next to the Staging Storage Folder text boxes ➢ and then choose a stage location for PolyBase. Figure 4.39 resembles how the configuration appears.
FIGURE 4.39 Serving layer using star schema pipeline
- Click the Validate button ➢ click the Debug button ➢ and then execute the following SQL query. Notice the count of rows copied into the [brainwaves].[tmpREADING] table equals the Rows Limit Sampling number configured in step 4.
SELECT COUNT(*) FROM [brainwaves].[tmpREADING]
SELECT TOP (10) * FROM [brainwaves].[tmpREADING]
DELETE FROM [brainwaves].[tmpREADING]