ADF debug options when the pipeline has dataflow activity

July 3, 2023 0 By Bhargava

These two options will appear only when the pipeline has the dataflow activity.

1- use dataflow debug session

2- use activity runtime

The difference between these two options is how the IR is used when running the debug/activity runtime sessions.

Use dataflow debug session:

This option allows you to debug a data flow in a separate debug session. This debug cluster is separate from the original cluster that was used to run the data flow in the pipeline. The debug cluster is typically smaller and less powerful than the original cluster, which is more cost-effective.

You can see the 7 sec(for my workload) for the processing time using the dataflow debug session. It took less time to spin up the small cluster.

User's image

Use activity runtime:

When you use the “Use Activity Runtime” option, ADF runs the data flow as part of the pipeline activity runtime, using the original cluster that was configured in the data flow integration runtime. “Use Data Flow Debug Session” option uses the original cluster, which is typically more powerful and expensive than the debug cluster.

Here, we can optimize the integration runtime on the actual pipeline based on the time taken to run the data flow. If the data flow is taking too long to run, we may need to adjust the integration runtime to use a more powerful cluster or to optimize the data flow itself.

You can see the processing time as 24 sec when using the activity runtime. The reason for more time is spinning the actual compute takes more time in my case.

User's image