Azure Synapse Analytics and ADLS – Microsoft DP-203 Certification Exam

Additional Data Storage Topics – The Storage of DataAdditional Data Storage Topics – The Storage of Data

2024-08-05| Tammy Hale| 0 Comment| 7:26 am

This section will help you increase your knowledge of two of Azure open‐source products and features, as well as provide a summary of the storage of data along the Big [...]

Implement Efficient File and Folder Structures – The Storage of DataImplement Efficient File and Folder Structures – The Storage of Data

2023-09-14| Tammy Hale| 0 Comment| 7:03 am

df = spark.read.load(‘abfss://@.dfs.core.windows.net/in-path/file.csv’, df.write.mode(“overwrite”) \ df = spark.read.load(‘abfss://@.dfs.core.windows.net/out-path/file.parquet’, print(df.count()) from pyspark.sql.functions import year, month, col df = spark.read \ .load(‘abfss://@.dfs.core.windows.net/out-path/file.parquet’, format=’parquet’, header=True) df_year_month_day = (df.withColumn(“year”, year(col(“SESSION_DATETIME”)))) \ .withColumn(“month”, month(col(“SESSION_DATETIME”))) from [...]

Azure Synapse Analytics Data Hub Data Flow – The Storage of DataAzure Synapse Analytics Data Hub Data Flow – The Storage of Data

2023-06-30| Tammy Hale| 0 Comment| 6:47 am

DROP TABLE brainwaves.DimELECTRODE 2. Create an SCD table, and then execute the following SQL script, which is located in the folder Chapter04/Ch04Ex09 on GitHub at https://github.com/benperk/ADE and named createSlowlyChangingDimensionTable.sql: CREATE [...]

Implement a Partition Strategy for Streaming Workloads – The Storage of DataImplement a Partition Strategy for Streaming Workloads – The Storage of Data

2023-03-26| Tammy Hale| 0 Comment| 7:07 am

Chapter 7, “Design and Implement a Data Stream Processing Solution,” discusses partitioning data within one partition and across partitions. Exercise 7.5 features the hands‐on implementation of partitioning streaming workloads. Partitioning [...]

Deliver Data in Parquet Files – The Storage of DataDeliver Data in Parquet Files – The Storage of Data

2023-01-05| Tammy Hale| 0 Comment| 7:20 am

In Exercise 4.7 you performed a conversion of brain waves stored in multiple CSV files using the following PySpark code snippet: %%pysparkdf = spark.read.option(“header”,”true”) \ .csv(‘abfss://*@*.dfs.core.windows.net/EMEA/brainjammer/in/2022/04/01/18/*’)display(df.limit(10)) Then you wrote that [...]

Source – The Storage of DataSource – The Storage of Data

2022-10-30| Tammy Hale| 0 Comment| 6:30 am

When you select the Source item, six tabs are rendered in the configuration panel: Source Settings, Source Options, Projection, Optimize, Inspect, and Data Preview. Figure 4.25 shows the Source Settings [...]

Azure Databricks – The Storage of DataAzure Databricks – The Storage of Data

2022-07-23| Tammy Hale| 0 Comment| 6:16 am

Your Azure Databricks workspace includes an associated Azure storage account and blob container. Two lifecycle management policies that delete temporary and log files are created by default. You can view [...]

Azure Synapse Analytics Data Hub SQL Script – The Storage of DataAzure Synapse Analytics Data Hub SQL Script – The Storage of Data

2022-05-25| Tammy Hale| 0 Comment| 6:21 am

FIGURE 4.19 Azure Synapse Analytics Data hub ADLS directory FROM OPENROWSET( ) AS [result] FIGURE 4.20 Azure Synapse Analytics Data hub ADLS directory Notice in Figure 4.20 that the SQL [...]

Apache Spark Job Definition – The Storage of DataApache Spark Job Definition – The Storage of Data

2022-02-24| Tammy Hale| 0 Comment| 6:34 am

The Apache Spark job definition feature enables you to execute code snippets using PySpark (Python), Spark (Scala) or .NET Spark (C#, F#). The first text box is requesting a main [...]

Build a Temporal Data Solution – The Storage of DataBuild a Temporal Data Solution – The Storage of Data

2022-01-14| Tammy Hale| 0 Comment| 6:41 am

AUTHORIZATION dbo FIGURE 4.31 Finding the history table WHERE SCENARIO_ID = 2 SELECT * FROM [brainwaves].[DimFREQUENCY] FOR SYSTEM_TIME AS OF ‘2022-04-06 15:58:56’ SELECT * FROM [brainwaves].[DimFREQUENCY] FOR SYSTEM_TIME AS [...]

Category: Azure Synapse Analytics and ADLS