Flowlets – The Storage of Data


A flowlet is a container that holds reusable activities. Consider the activity you created in Exercise 4.8, which retrieves brain wave data and converts the epoch date into a more human‐readable format. You can now place those transformations into a flowlet. Then, if you ever need that same set of transformations, you can use the flowlet instead of duplicating the configuration and the code within the Derived Column. In many cases the code and transformations that take place will be much more complicated. Being able to group sets of transformation activities into a container and then reuse that container has benefits. If something in the pipeline changes and requires a modification, then you would need to modify it in only a single place, instead of everywhere. To create a new data flow flowlet containing the transformations, right‐click the transformation in the location where you want the flowlet to stop, and then select Create a New Flowlet. This flowlet can now be used in other data flows.

FIGURE 4.28 Azure Synapse Analytics Develop hub, Visual Expression Builder

Destination

The sink is the location you intend to store the data once the transformation has been performed. Where and how you store the transformed data has much to do with the kind of data and the DLZ stage of the transformation. Should the data be stored back in an ADLS container or in an Azure SQL relational database? Perhaps the output is a JSON document that needs to be immediately available globally, making Azure Cosmos DB a valid option. Table 4.5 provides some additional information about data flow transformation features.

TABLE 4.5 Data flow transformation features

CategoryNameDescription
FormattersFlattenConverts hierarchical files like JSON and unrolls them into individual rows
 ParseParses data from the incoming stream
 StringifyConverts complex data types to a string
Multiple
 inputs/outputs
Conditional splitRoutes data rows to different streams based on a matching data pattern or condition
 ExistsChecks whether data exists in a second stream or data flow
 JoinCombines data from multiple sources
 LookupReferences data that exists in a different second stream or data flow
 New branchPerforms multiple operations on the same data stream
 UnionCombines data from multiple sources vertically
Row modifierAlter rowSets delete, insert, update, and upsert row policy
 AssertSets an assert rule per row that specifies allowed values
 FilterFilters data based on a configured condition
 SortSorts incoming data rows

In summary, a data flow consists of a source, a sink, and one or more transformations, as described in Table 4.4 and Table 4.5. You can construct a flowlet from a subset of transformations within a data flow for reuse in other data flows. As you will soon learn, one or more data flows are considered activities that are added to a pipeline. The pipeline is responsible for initiating, executing, monitoring, and completing all the activities within it.

Leave a Reply

Your email address will not be published. Required fields are marked *