Dynamic Procedural SQL

Created by Ben Deverman, Modified on Thu, 10 Oct, 2024 at 2:10 PM by Ben Deverman

*If you’re unfamiliar with procedural SQL, we recommend reading our introductory article here. This article assumes a basic understanding of procedural SQL concepts.

Procedural SQL can be a helpful tool for a number of use cases. One of the most common is for instances where you want to utilize dynamic SQL, e.g., writing SQL where the table, dataset or project name changes depending on other variables or conditions.

For example, dynamic procedural SQL can help make creating a set of tables or datasets much more efficient. See below for a sample code statement of how we’d loop through dynamically creating a set of datasets within the same project:

DECLARE datasetida ARRAY<STRING> DEFAULT ['<yourdatasetname>','<yourdatasetname>','<yourdatasetname>'];
DECLARE projectid STRING DEFAULT '`<yourprojectid>';
for x in (select * from unnest(datasetida) as datasetid)
do
EXECUTE IMMEDIATE CONCAT("CREATE SCHEMA ",projectid,".",x.datasetid,".`");
END FOR;

This code:

Declares our loop, which contains all the different datasets we want to create.
Declares our project id, which will be the same value in each loop (this step just saves us time in not having to write out the project id in the query)
Sets up our loop
Defines the SQL query that we want to run, with our variable names included; this code will be looped through for each variable in our loop (e.g., all of our dataset names). Procedural SQL requires that we set up our SQL query as a concatenated statement.
Ends our for loop, which ends our code.

Sometimes, you may have a use case where you want multiple parts of the code to change dynamically, not just a set of table names or a set of dataset names. You can achieve that by layering the STRUCT data type to your procedural SQL. The STRUCT data type is similar to a dictionary in programming languages; it allows you to pair together two values to move through a loop simultaneously. See below for how we could alter our sample code above to both dynamically create a list of datasets and also place each dataset in a specific project via a STRUCT data type:

DECLARE projectida ARRAY<STRUCT<project_id STRING, dataset_id STRING^<* DEFAULT [
struct('<yourprojectidhere>' as project_id, '<yourdatasetnamehere>' as dataset_id),
struct('<yourprojectidhere>' as project_id, '<yourdatasetnamehere>' as dataset_id),
struct('<yourprojectidhere>' as project_id, '<yourdatasetnamehere>' as dataset_id)
];

for x in (select * from unnest(projectida)) DO
EXECUTE IMMEDIATE CONCAT('CREATE SCHEMA `', x.project_id,'.',x.dataset_id,'`');
END FOR;

This code:

Declares our loop and identifies within the STRUCT loop list which values should be paired together (the values that are included in each individual STRUCT statement are the ones that will be paired together in the loop).
Sets up our loop
Defines the SQL query that we want to run, with our variable names included; this code will be looped through for each paired set of variables in our loop (e.g., all of our project IDs and dataset names). Procedural SQL requires that we set up our SQL query as a concatenated statement.
Ends our for loop, which ends our code.