Category: Azure Databricks

Implement Efficient File and Folder Structures – The Storage of DataImplement Efficient File and Folder Structures – The Storage of Data

df = spark.read.load(‘abfss://@.dfs.core.windows.net/in-path/file.csv’, df.write.mode(“overwrite”) \ df = spark.read.load(‘abfss://@.dfs.core.windows.net/out-path/file.parquet’, print(df.count()) from pyspark.sql.functions import year, month, col df = spark.read \ .load(‘abfss://@.dfs.core.windows.net/out-path/file.parquet’, format=’parquet’, header=True) df_year_month_day = (df.withColumn(“year”, year(col(“SESSION_DATETIME”)))) \ .withColumn(“month”, month(col(“SESSION_DATETIME”))) from [...]

Build External Tables on a Serverless SQL Pool – The Storage of DataBuild External Tables on a Serverless SQL Pool – The Storage of Data

COLLATE Latin1_General_100_BIN2_UTF8 WITH (LOCATION = ‘abfss://@.dfs.core.windows.net’) WITH (FORMAT_TYPE = PARQUET) ([Timestamp] NVARCHAR(50),[AF3theta] NVARCHAR(50),[AF3alpha] NVARCHAR(50),[AF3betaL] NVARCHAR(50),…) WITH(LOCATION = ‘EMEA/brainjammer/out/2022/04/03//.parquet/*’,DATA_SOURCE = SampleBrainwavesSource,FILE_FORMAT = SampleBrainwavesParquet) FIGURE 4.33 Building an external table You might [...]

Azure Synapse Analytics Develop Hub Data Flow – The Storage of DataAzure Synapse Analytics Develop Hub Data Flow – The Storage of Data

EMEA/brainjammer/out/2022/04/03/17/sampleBrainwaves.parquet 3. Name the new integration dataset (I used sampleBrainwavesParquet) ➢ select the From Connection/Store radio button ➢ click OK ➢ return to the data flow ➢ click the Data [...]