Command and control now easier in BigQuery with scripting and stored procedures
Jagan R. Athreya
Product Manager, Google Cloud
Software Engineer, Google Cloud
We hear from our customers that you’re big fans of our BigQuery data warehouse and its features: the ability to handle massive datasets, the petabyte-scale performance, and the ability to create and execute machine learning models in SQL. We also know that you have complex tasks that you execute in your BigQuery environment, like data migration jobs and data quality checkers. Until now, these jobs had to be executed outside BigQuery. We are pleased to announce the beta availability of scripting and stored procedures in all regions where BigQuery is available. These new features can improve your productivity by making complex tasks a lot simpler to perform. They can also offer a way to port migration scripts or run complex ETL logic in BigQuery's UI or API.
Understanding scripting and stored procedures
Scripting allows data engineers and data analysts to execute a wide range of tasks, from simple ones like running queries in a sequence to complex, multi-step tasks with control flow including IF statements and WHILE loops. Scripting can also help with tasks that make use of variables. If you used BigQuery before this new release, you had to run each step manually and had no control over the execution flow without using other tools.
Stored procedures allow you to save these scripts and run them within BigQuery in the future. Similar to views, you can also share a stored procedure with others in your organization, all while maintaining one canonical version of the procedure.
Here’s an example that demonstrates how you can combine queries and control logic to easily get query results. The result identifies the reporting hierarchy of an employee.
1. Select your project and create a dataset named “dataset” in the BigQuery UI. If you have a different dataset name you want to use, update the dataset name in the script below.
2. Run the following command in the Query Editor:
3. Now create a stored procedure that returns the hierarchy for a given employee ID by running the following query:
4. Next, after creating the table and the procedure, call the procedure to see the hierarchy for a particular employee ID (Employee ID #9):
Using temporary tables
Temporary tables let BigQuery users save intermediate results to tables as part of scripts and stored procedures. These temporary tables exist at the session level, eliminating the need to save or maintain these tables within datasets.
The following example attempts to find a correlation between precipitation and number of births or birth weight in 1988 with the natality public data using temporary tables. (Spoiler alert: It initially looks like there is no correlation!)
Here’s what that looks like in the query editor in BigQuery:
And here are the results of that query:
Learn more about scripting in the BigQuery documentation as well as the DDL commands to CREATE and DROP STORED PROCEDUREs and to CREATE TEMPORARY TABLEs. And check out another example and what else is new in BigQuery.
There is no additional cost for the use of scripting and stored procedures; it's included with the base BigQuery pricing. However, any chargeable operations—such as INSERT, UPDATE, or other operations that result in bytes scanned within scripts or stored procedures—will incur their corresponding costs. Keep this in mind when using those commands with control flow, such as WHILE. And check out the documentation for examples of chargeable operations in scripts.