Source – The Storage of Data
When you select the Source item, six tabs are rendered in the configuration panel: Source Settings, Source Options, Projection, Optimize, Inspect, and Data Preview. Figure 4.25 shows the Source Settings configuration tab.
FIGURE 4.25 Azure Synapse Analytics Develop hub, data flow Source Settings tab
You can choose three types of sources. The first is an integration dataset, which is a preconfigured object that identifies a specific set of data to be retrieved from a designated source. An inline source type is useful when the schema of the data needs to be flexible, or the data flow retrieval activity will be a one‐off, i.e., it happens only once. For example, on occasion an additional column could be appended to a row, or you may receive a one‐time dump of data. The Workspace DB gives you the option to select data from an Azure Data Lake database, which is currently in preview. The data on this Azure Data Lake database can be accessed without linked services or an integration dataset.
The Allow Schema Drift option should be enabled if the schema is expected to change often. The Infer Drifted Column Types option results in the platform attempting to identify and apply the data type of the incoming column values. The Validate Schema option imposes a restriction on the incoming data based on the configured dataset. If the incoming data does not match the schema, the data flow will fail. Finally, in a scenario where you are testing and debugging, it might be prudent to retrieve only a subset of the data from the source. The Enabling Sampling option reduces the amount of retrieved data, which improves performance and reduces costs. You can reduce costs by using a smaller integration runtime machine to complete the testing. You might also be able to debug more quickly due to there being less data to analyze.
The next tab of interest is Optimize, as shown in Figure 4.26. This first setting you might notice is Partition Type. You see where you can visually configure the distribution method in the Azure Synapse Analytics workspace. Since you know the data being loaded, because you configured and provisioned the integration dataset, you can make the judgment as to which distribution is best for this data flow.
FIGURE 4.26 Azure Synapse Analytics Develop hub, data flow Optimize tab
The Data Preview tab renders the data from the data source configured on the Source Settings tab. In addition to this Source section, there are numerous options to discuss, for example, the Schema Modifier.