Technical Onboarding
Step-by-step setup for the AWS account structure, Datadog monitoring, secrets management, and your first test deployment.
Introduction & Prerequisites
Ask your TACEO contact person to invite you to relevant resources. Please provide the following details for each person involved in deploying infrastructure:
- Full name
- Email address (for Datadog access)
- GitHub handle (for private repository access)
TACEO invites you to a shared Slack channel for all technical communication.
General Overview
The TACEO Network consists of resources created in AWS accounts and the monitoring platform Datadog using OpenTofu projects. TACEO provides OpenTofu sources via private GitHub repositories with read-only access, and provides instructions on when and how to apply changes to the infrastructure.
Step 1 — AWS Account Structure
Request access to the private repository TaceoLabs/node-operator-aws-setup on GitHub by reaching out to your TACEO contact person. Read the README.md in the repo root and perform the actions described there.
If you plan to deviate from the setup, make sure to support the terraform interface used for accessing subaccounts:
provider "aws" {
region = "eu-central-1"
assume_role {
role_arn = "arn:aws:iam::${var.aws_account_id}:role/admin"
duration = "1h"
}
}
Share the AWS account ID of the management/root and test AWS accounts, as well as the S3 state bucket name, in the shared Slack channel.
Step 2 — Datadog Setup
Ask your TACEO contact to set up a Datadog organization for you. Once created, you will receive an invitation by email. Set up your account and enable Multi-Factor Authentication. Let us know whether account creation worked.
Step 3 — Store Datadog Keys in AWS Secrets Manager
This secret is used to create Datadog resources via OpenTofu. TACEO will share a secret with you in the Slack channel.
- Open the AWS access portal using
AdministratorAccesson the management/root account. - Select the
eu-central-1region. - Open Secrets Manager and select "Store a new Secret".
- Choose "Other type of secret".
- Switch to "Plaintext" and paste the JSON string from the Slack channel.
Store the keys
- Use the default Encryption Key (or add a new one). Click Next.
- Set the secret name to
datadog/apikey/terraform. - Proceed: Next → Next → Store.
OpenTofu projects will fetch this secret from AWS Secrets Manager to set up the Datadog integration in your deployments.
Step 4 — Hello World Test Deployment
Deploy an OpenTofu project from the repository TaceoLabs/node-operator-aws-test. Use the git tag deployment/simple-hello-world. Follow the README.md in the repo root and deploy the service inside your AWS test account.
This deployment creates a Datadog Dashboard with logs, traces, and metrics, plus an alert in case the system is not working properly. Feel free to play around with the system as described in the README — actual protocols will use similar infrastructure for monitoring and logging.
Step 5 — On-Call Setup
During incidents, infrastructure deployments may need to be updated on short notice. Provide TACEO with a REST API endpoint to be contacted in case of incident deployments. Discuss the details in the Slack channel. See the On-Call Playbook below for the full workflow.
Step 6 — Weekly Update Window
TACEO regularly sends out deployment instructions within a fixed weekly time frame. Please discuss your availability in the shared Slack channel with TACEO.
Responsibilities Overview
Summary of responsibilities shared between TACEO and the Node Operator:
| Topic | TACEO | Node Op. |
|---|---|---|
| Secure credentials of NO's AWS account(s) | — | ✓ |
| Deploy/Update Infrastructure (scheduled & on-call) | — | ✓ |
| Backup procedures for protocol data | ✓ | — |
| Maintenance and bugfixes for the protocol | ✓ | — |
| Protocol monitoring | ✓ | — |
| Cost monitoring & reporting anomalies | ✓ | — |
| Optimizing protocol/AWS architecture for costs | ✓ | — |
| Securing public endpoints | ✓ | — |
As part of the protocol lifecycle, protocol nodes may be migrated between parties via AWS account handovers — accounts can be detached from one organization and attached to another with full ownership transfer.
On-Call Playbook
TACEO monitors all systems deployed by the Node Operator and reacts to malfunctions. The Node Operator has no responsibility in monitoring any system.
Setup Phase
TACEO establishes a channel with the Node Operator to create Pages in the NO's Incident Management System and performs a test run where TACEO creates a Page and the NO confirms receipt.
The NO ensures that pages created by TACEO are handled within the agreed reaction timeframe.
Incident Workflow
- TACEO posts a message in the Slack channel containing deployment instructions (e.g. a
tofu plan→tofu applyworkflow). - TACEO triggers a Page in the NO's Incident Management System. The page contains a URL pointing to the message in the Slack channel, so the NO can link the page with the resolution instructions.
- The NO reacts within the agreed reaction time window and follows the instructions provided in the linked Slack message. All communication shall happen in a thread to this Slack message.
- TACEO confirms resolution by posting a final message in the thread, clearly stating "Incident Resolved". No further action is required by the NO.