Skip to main content

Scheduling - Advanced

How to run Dataform actions after GA4 export to BigQuery#

You could run Dataform actions as soon as GA4 exports data to BigQuery. The main idea is:

  • subscribe to an event when GA4 exports data to BigQuery (using Cloud Logging Router)
  • on this event build realese configuration and pass the new table name as a variable
  • execute release configuration

In this case you should define configuration variables in your dataform.json file like this:

{  "defaultSchema": "dataform",  "assertionSchema": "dataform_assertions",  "warehouse": "bigquery",  "defaultDatabase": "<GCP-PROJECT-ID>",  "defaultLocation": "<REGION>",  "vars": {    "GA4_DATASET": "analytics_XXXXX"    "GA4_TABLE": "events_<date>",  }}

and use them in your definitions/sources/ga4.js file like this:

const ga4 = require("dataform-ga4-sessions");
const config = {  dataset: dataform.projectConfig.vars.GA4_DATASET,  incrementalTableName: dataform.projectConfig.vars.GA4_TABLE,};
ga4.declareSources(config);

And during relase configuration creation you could rewrite variables in dataform.json like this (Python example):

compilation_result["code_compilation_config"] = {"vars": {        f"GA4_TABLE": config.last_event_table,    }}

Here we set value to dataform.projectConfig.vars.GA4_TABLE.

This way, you will always query the latest day table. Reduce costs and simplify your workflows.

note

Sometimes GA4 updates daily data a few times, even a few days later. So be ready to handle such cases. Especially if you decide to create a custom column like sessions_count per user.

More details#

You could read more about how to set up this scheduling:

Terrafrom#

You could automate the process of enabling all needed GCP services using Terraform. A great starting point is GitHub repository - Dataform Pipeline for Google Analytics 4 created by Moritz Bauer.