Scheduling - Advanced
#
How to run Dataform actions after GA4 export to BigQueryYou could run Dataform actions as soon as GA4 exports data to BigQuery. The main idea is:
- subscribe to an event when GA4 exports data to BigQuery (using Cloud Logging Router)
- on this event build realese configuration and pass the new table name as a variable
- execute release configuration
In this case you should define configuration variables in your dataform.json
file like this:
{ "defaultSchema": "dataform", "assertionSchema": "dataform_assertions", "warehouse": "bigquery", "defaultDatabase": "<GCP-PROJECT-ID>", "defaultLocation": "<REGION>", "vars": { "GA4_DATASET": "analytics_XXXXX" "GA4_TABLE": "events_<date>", }}
and use them in your definitions/sources/ga4.js
file like this:
const ga4 = require("dataform-ga4-sessions");
const config = { dataset: dataform.projectConfig.vars.GA4_DATASET, incrementalTableName: dataform.projectConfig.vars.GA4_TABLE,};
ga4.declareSources(config);
And during relase configuration creation you could rewrite variables in dataform.json
like this (Python example):
compilation_result["code_compilation_config"] = {"vars": { f"GA4_TABLE": config.last_event_table, }}
Here we set value to dataform.projectConfig.vars.GA4_TABLE
.
This way, you will always query the latest day table. Reduce costs and simplify your workflows.
note
Sometimes GA4 updates daily data a few times, even a few days later. So be ready to handle such cases. Especially if you decide to create a custom column like sessions_count
per user.
#
More detailsYou could read more about how to set up this scheduling:
- Dataform: schedule daily updates using Cloud Functions
- Cloud Workflow โ Run Dataform queries immediately after the GA4 BigQuery export happens by Taneli Salonen
#
TerrafromYou could automate the process of enabling all needed GCP services using Terraform. A great starting point is GitHub repository - Dataform Pipeline for Google Analytics 4 created by Moritz Bauer.