Orchestra
Advertising Data Lakes and Workflow Automation
Install / Use
/learn @google/OrchestraREADME
Orchestra
Orchestra is not an official Google Product
- Overview
- Setting up your Orchestra environment in GCP
- Service Accounts
- Configuring Orchestra
- Additional info
Overview
Composer is a Google Cloud managed version of Apache Airflow, an open source project for managing ETL workflows. We use it for this solution as you are able to deploy your code to production simply by moving files to Google Cloud Storage. It also provides Monitoring, Logging and software installation, updates and bug fixes for Airflow are fully managed.
It is recommended that you install this solution through the Google Cloud Platform UI.
We recommend familiarising yourself with Composer here.
Orchestra is an open source project, built on top of Composer, that is custom operators for Airflow designed to solve the needs of Advertisers.
Orchestra lets Enterprise Clients build their Advertising Data Lake out of the box and customize it to their needs
Orchestra lets sophisticated clients automate workflows at scale for huge efficiency gains.
Orchestra is a fully open sourced Solution Toolkit for building enterprise data solutions on Airflow.
Setting up your Orchestra environment in GCP
Billing
Composer and Big Query - two of the main Google Cloud Platform tools which Orchestra is based on - will require a GCP Project with a valid billing account.
See this article for more information Google Cloud Billing.
APIs
In you GCP Project menu (or directly through this link) access the API Library so that you can enable the following APIs:
- Cloud Composer
- Cloud Dataproc
- Cloud Storage APIs
- BigQuery
Create a Composer environment
Follow these steps to create a Composer environment in Google Cloud Platform - please note that it can take up to 20/30 minutes.
Environment Variables, Tags and Configuration Properties (airflow.cfg) can all be left as standard and you can use the default values for number of nodes, machine types and disk size (you can use a smaller disk size if you want to save some costs).
Service Accounts
Setting up a service account
Google Cloud uses service accounts to automate tasks between services. This includes other Google services such as DV360 and CM.
You can see full documentation for Service Accounts here:
https://cloud.google.com/iam/docs/service-accounts
Default Service Account
By default you will see in the IAM section of your Project a default service account for Composer ("Cloud Composer Service Agent") and a default service account for Compute Engine ("Compute Engine default service account") - with their respective email addresses.
These service accounts have access to all Cloud APIs enabled for your project, making them a good fit for Orchestra. We recommend you use in particular the Compute Engine Service Account (i.e. "Compute Engine default service account" because it is the one used by the individual Compute Engine virtual machines that will run your tasks) as the main "Orchestra" service account.
If you wish to use another account, you will have to give it access to BigQuery and full permissions for the Storage APIs.
Creating a new user for your service account in DV360
Your Service Account will need to be setup as a DV360 user so that it can access the required data from your DV360 account.
You need to have partner-level access to your DV360 account to be able to add a new user; follow the simple steps to create a new user in DV360, using this configuration:
- Give this user the email of the service account you wish to use.
- Select all the advertisers you want to be able to access
- Give** Read&Write** permissions
- Save!
Configuring Orchestra
You have now set up the Composer environment in GCP and granted the proper permissions to its default Service Account.
You're ready to configure Orchestra!
Variables
The Orchestra project will require several variables to run.
These can be set via the Admin section in the Airflow UI (accessible from the list of Composer Environments, clicking on the corresponding link under "Airflow Web server").

Image1:


Image2:

Adding Workflows
As with any other Airflow deployment, you will need DAG files describing your Workflows to schedule and run your tasks; plus, you'll need hooks, operators and other libraries to help building those tasks.
You can find the core files for Orchestra in our github repository: clone the repo (or directly download the files)
You can then design the dags you wish to run and add them to the dags folder.
Upload all the DAGs and other required files to the DAGs Storage Folder that you can access from the Airflow UI.

This will automatically generate the DAGs and schedule them to run (you will be able to see them in the Airflow UI).
From now, you can use (the Composer-managed instance of) Airflow as you normally would - including the differe
