Protect Sensitive Data with Terraform
 
    Introduction
Terraform provides automation to provision your infrastructure in the cloud. To do this, Terraform authenticates with cloud providers (and other providers) to deploy the resources and perform the planned actions. However, the information Terraform needs for authentication is very valuable, and generally, is sensitive information that you should always keep secret since it unlocks access to your services. For example, you can consider API keys or passwords for database users as sensitive data.
If a malicious third party were to acquire the sensitive information, they would be able to breach the security systems by presenting themselves as a known trusted user. In turn, they would be able to modify, delete, and replace the resources and services that are available under the scope of the obtained keys. To prevent this from happening, it is essential to properly secure your project and safeguard its state file, which stores all the project secrets.
By default, Terraform stores the state file locally in the form of unencrypted JSON, allowing anyone with access to the project files to read the secrets. While a solution to this is to restrict access to the files on disk, another option is to store the state remotely in a backend that encrypts the data automatically; we will be using DigitalOcean Spaces to demonstrate this today.
In this tutorial, you’ll hide sensitive data in outputs during execution and store your state in a secure cloud object storage, which encrypts data at rest. You’ll use DigitalOcean Spaces in this tutorial as your cloud object storage. You’ll also use tfmask, which is an open source program written in Go that dynamically censors values in the Terraform execution log output.
Prerequisites
- A DigitalOcean Personal Access Token, which you can create via the DigitalOcean control panel.
- Terraform installed on your local machine and a project set up with the DigitalOcean provider.
- A DigitalOcean Space with API keys (access and secret).
Note: This tutorial has specifically been tested with Terraform 0.13.
Marking Outputs as sensitive
In this step, you’ll hide outputs in code by setting their sensitive parameter to true. This is useful when secret values are part of the Terraform output that you’re storing indefinitely, or you need to share the output logs beyond your team for analysis.
Assuming you are in the terraform-sensitive directory, which you created as part of the prerequisites, you’ll define a Droplet and an output showing its IP address. You’ll store it in a file named droplets.tf, so create and open it for editing by running:
nano droplets.tfAdd the following lines:terraform-sensitive/droplets.tf
resource "digitalocean_droplet" "web" {
  image  = "ubuntu-18-04-x64"
  name   = "web-1"
  region = "fra1"
  size   = "s-1vcpu-1gb"
}
output "droplet_ip_address" {
  value = digitalocean_droplet.web.ipv4_address
}This code will deploy a Droplet called web-1 in the fra1 region, running Ubuntu 18.04 on 1GB RAM and one CPU core. Here you’ve given the droplet_ip_address output a value and you’ll receive this in the Terraform log.
To deploy this Droplet, execute the code by running the following command:
terraform apply -var "do_token=${DO_PAT}"The actions Terraform will take will be the following:
OutputAn execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  + create
Terraform will perform the following actions:
  # digitalocean_droplet.web will be created
  + resource "digitalocean_droplet" "web" {
      + backups              = false
      + created_at           = (known after apply)
      + disk                 = (known after apply)
      + id                   = (known after apply)
      + image                = "ubuntu-18-04-x64"
      + ipv4_address         = (known after apply)
      + ipv4_address_private = (known after apply)
      + ipv6                 = false
      + ipv6_address         = (known after apply)
      + ipv6_address_private = (known after apply)
      + locked               = (known after apply)
      + memory               = (known after apply)
      + monitoring           = false
      + name                 = "web-1"
      + price_hourly         = (known after apply)
      + price_monthly        = (known after apply)
      + private_networking   = (known after apply)
      + region               = "fra1"
      + resize_disk          = true
      + size                 = "s-1vcpu-1gb"
      + status               = (known after apply)
      + urn                  = (known after apply)
      + vcpus                = (known after apply)
      + volume_ids           = (known after apply)
      + vpc_uuid             = (known after apply)
    }
Plan: 1 to add, 0 to change, 0 to destroy.
...Enter yes when prompted. You’ll receive the following output:
Outputdigitalocean_droplet.web: Creating...
...
digitalocean_droplet.web: Creation complete after 33s [id=216255733]
Apply complete! Resources: 1 added, 0 changed, 0 destroyed.
Outputs:
droplet_ip_address = your_droplet_ip_addressYou will find that the IP address is in the output. If you’re sharing this output with others, or in case it will be publicly available because of automated deployment processes, it’s important to take actions to hide this data in the output.
To censor it, you’ll need to set the sensitive attribute of the droplet_ip_address output to true.
Open droplets.tf for editing:
nano droplets.tfAdd the highlighted line:terraform-sensitive/droplets.tf
resource "digitalocean_droplet" "web" {
  image  = "ubuntu-18-04-x64"
  name   = "web-1"
  region = "fra1"
  size   = "s-1vcpu-1gb"
}
output "droplet_ip_address" {
  value = digitalocean_droplet.web.ipv4_address
  sensitive = true
}Save and close the file when you’re done.
Apply the project again by running:
terraform apply -var "do_token=${DO_PAT}"The output will be:
Outputdigitalocean_droplet.web: Refreshing state... [id=216255733]
Apply complete! Resources: 0 added, 0 changed, 0 destroyed.
Outputs:
droplet_ip_address = <sensitive>You’ve now explicitly censored the IP address—the value of the output. Censoring outputs is useful in situations when the Terraform logs would be in a public space, or when you want them to remain hidden, but not delete them from the code. You’ll also want to censor outputs that contain passwords and API tokens, as they are sensitive info as well.
You’ve now hidden the values of the defined outputs by marking them as sensitive. In the next step, you’ll configure Terraform to store your project’s state in the encrypted cloud, instead of locally.
Storing State in an Encrypted Remote Backend
The state file stores all information about your deployed infrastructure containing all its internal relationships and secrets. By default, it’s stored in plaintext, locally on the disk. Storing it remotely, in the cloud, provides a higher level of security. If the cloud storage service supports encryption at rest, it will store the state file in an encrypted state at all times, so that potential attackers won’t be able to gather information from it. Storing the state file encrypted remotely is different from marking outputs as sensitive—this way, all secrets are securely stored in the cloud, which only changes how Terraform stores data, not when it’s displayed.
You’ll now configure your project to store the state file in a DigitalOcean Space. As a result it will be encrypted at rest and protected with TLS in transit.
By default, the Terraform state file is called terraform.tfstate and is located in the root of every initialized directory. You can view its contents by running:
cat terraform.tfstateThe contents of the file will be similar to this:
{
  "version": 4,
  "terraform_version": "0.13.1",
  "serial": 3,
  "lineage": "926017f6-d7be-e1fa-99e4-f2a988026ed4",
  "outputs": {
    "droplet_ip_address": {
      "value": "...",
      "type": "string",
      "sensitive": true
    }
  },
  "resources": [
    {
      "mode": "managed",
      "type": "digitalocean_droplet",
      "name": "web",
      "provider": "provider[\"registry.terraform.io/digitalocean/digitalocean\"]",
      "instances": [
        {
          "schema_version": 1,
          "attributes": {
            "backups": false,
            "created_at": "...",
            "disk": 25,
            "id": "216255733",
            "image": "ubuntu-18-04-x64",
            "ipv4_address": "...",
            "ipv4_address_private": "10.135.0.3",
            "ipv6": false,
            "ipv6_address": "",
            "ipv6_address_private": null,
            "locked": false,
            "memory": 1024,
            "monitoring": false,
            "name": "web-1",
            "price_hourly": 0.00744,
            "price_monthly": 5,
            "private_networking": true,
            "region": "fra1",
            "resize_disk": true,
            "size": "s-1vcpu-1gb",
            "ssh_keys": null,
            "status": "active",
            "tags": [],
            "urn": "do:droplet:216255733",
            "user_data": null,
            "vcpus": 1,
            "volume_ids": [],
            "vpc_uuid": "fc52519c-dc84-11e8-8b13-3cfdfea9f160"
          },
          "private": "..."
        }
      ]
    }
  ]
}The state file contains all the resources you’ve deployed, as well as all outputs and their computed values. Gaining access to this file is enough to compromise the entire deployed infrastructure. To prevent that from happening, you can store it encrypted in the cloud.
Terraform supports multiple backends, which are storage and retrieval mechanisms for the state. Examples are: local for local storage, pg for the Postgres database, and s3 for S3 compatible storage, which you’ll use to connect to your Space.
The back-end configuration is specified under the main terraform block, which is currently in provider.tf. Open it for editing by running:
nano provider.tfAdd the following lines:terraform-sensitive/provider.tf
terraform {
  required_providers {
    digitalocean = {
      source = "digitalocean/digitalocean"
      version = "1.22.2"
    }
  }
  backend "s3" {
    key      = "state/terraform.tfstate"
    bucket   = "your_space_name"
    region   = "us-west-1"
    endpoint = "https://spaces_endpoint"
    skip_region_validation      = true
    skip_credentials_validation = true
    skip_metadata_api_check     = true
  }
}
variable "do_token" {}
provider "digitalocean" {
  token = var.do_token
}The s3 back-end block first specifies the key, which is the location of the Terraform state file on the Space. Passing in state/terraform.tfstate means that you will store it as terraform.tfstate under the state directory.
The endpoint parameter tells Terraform where the Space is located and bucket defines the exact Space to connect to. The skip_region_validation and skip_credentials_validation disable validations that are not applicable to DigitalOcean Spaces. Note that region must be set to a conforming value (such as us-west-1), which has no reference to Spaces.
Remember to put in your bucket name and the Spaces endpoint, including the region, which you can find in the Settings tab of your Space. When you are done customizing the endpoint, save and close the file.
Next, put the access and secret keys for your Space in environment variables, so you’ll be able to reference them later. Run the following commands, replacing the highlighted placeholders with your key values:
export SPACE_ACCESS_KEY="your_space_access_key"
export SPACE_SECRET_KEY="your_space_secret_key"Then, configure Terraform to use the Space as its backend by running:
terraform init -backend-config "access_key=$SPACE_ACCESS_KEY" -backend-config "secret_key=$SPACE_SECRET_KEY"The -backend-config argument provides a way to set back-end parameters at runtime, which you are using here to set the Space keys. You’ll be asked if you wish to copy the existing state to the cloud, or start anew:
OutputInitializing the backend...
Do you want to copy existing state to the new backend?
  Pre-existing state was found while migrating the previous "local" backend to the
  newly configured "s3" backend. No existing state was found in the newly
  configured "s3" backend. Do you want to copy this state to the new "s3"
  backend? Enter "yes" to copy and "no" to start with an empty state.Enter yes when prompted. The rest of the output will be the following:
OutputSuccessfully configured the backend "s3"! Terraform will automatically
use this backend unless the backend configuration changes.
Initializing provider plugins...
- Using previously-installed digitalocean/digitalocean v1.22.2
Terraform has been successfully initialized!
You may now begin working with Terraform. Try running "terraform plan" to see
any changes that are required for your infrastructure. All Terraform commands
should now work.
If you ever set or change modules or backend configuration for Terraform,
rerun this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so if necessary.Your project will now store its state in your Space. If you receive an error, double-check that you’ve provided the correct keys, endpoint, and bucket name.
Your project is now storing state in your Space. The local state file has been emptied, which you can check by showing its contents:
cat terraform.tfstateThere will be no output, as expected.
You can try modifying the Droplet definition and applying it to check that the state is still being correctly managed.
Open droplets.tf for editing:
nano droplets.tfModify the highlighted lines:terraform-sensitive/droplets.tf
resource "digitalocean_droplet" "web" {
  image  = "ubuntu-18-04-x64"
  name   = "test-droplet"
  region = "fra1"
  size   = "s-1vcpu-1gb"
}
output "droplet_ip_address" {
  value = digitalocean_droplet.web.ipv4_address
  sensitive = false
}Save and close the file, then apply the project by running:
terraform apply -var "do_token=${DO_PAT}"You will receive the following output:
OutputAn execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  ~ update in-place
Terraform will perform the following actions:
  # digitalocean_droplet.web will be updated in-place
  ~ resource "digitalocean_droplet" "web" {
        backups              = false
        created_at           = "2020-11-11T18:43:03Z"
        disk                 = 25
        id                   = "216419273"
        image                = "ubuntu-18-04-x64"
        ipv4_address         = "159.89.21.92"
        ipv4_address_private = "10.135.0.4"
        ipv6                 = false
        locked               = false
        memory               = 1024
        monitoring           = false
      ~ name                 = "web-1" -> "test-droplet"
        price_hourly         = 0.00744
        price_monthly        = 5
        private_networking   = true
        region               = "fra1"
        resize_disk          = true
        size                 = "s-1vcpu-1gb"
        status               = "active"
        tags                 = []
        urn                  = "do:droplet:216419273"
        vcpus                = 1
        volume_ids           = []
        vpc_uuid             = "fc52519c-dc84-11e8-8b13-3cfdfea9f160"
    }
Plan: 0 to add, 1 to change, 0 to destroy.
...Enter yes when prompted, and Terraform will apply the new configuration to the existing Droplet, meaning that it’s correctly communicating with the Space its state is stored on:
Output
digitalocean_droplet.web: Modifying... [id=216419273]
digitalocean_droplet.web: Still modifying... [id=216419273, 10s elapsed]
digitalocean_droplet.web: Modifications complete after 12s [id=216419273]
Apply complete! Resources: 0 added, 1 changed, 0 destroyed.
Outputs:
droplet_ip_address = your_droplet_ip_addressYou’ve configured the s3 backend for your project, so that you’re storing the state encrypted in the cloud, in a DigitalOcean Space. In the next step, you’ll use tfmask, a tool that will dynamically censor all sensitive outputs and information in Terraform logs.
Using tfmask in CI/CD Environments
In this section, you’ll download tfmask and use it to dynamically censor sensitive data from the whole output log Terraform generates when executing a command. It will censor the variables and parameters whose values are matched by a RegEx expression that you provide.
Dynamically matching their names is possible when they follow a pattern (for example, contain the word password or secret). The advantage of using tfmask over only marking the outputs as sensitive, is that it also censors matched parts of the resource declarations that Terraform prints out while executing. It’s imperative you hide them when the execution logs may be public, such as in automated CI/CD environments, which may often list execution logs publicly.
Compiled binaries of tfmask are available at its releases page on GitHub. For Linux, run the following command to download it:
sudo curl -L https://github.com/cloudposse/tfmask/releases/download/0.7.0/tfmask_linux_amd64 -o /usr/bin/tfmaskMark it as executable by running:
sudo chmod +x /usr/bin/tfmasktfmask works on the outputs of terraform plan and terraform apply by masking the values of all variables whose names are matched by a RegEx expression that you specify. The regex expression and the character with which the actual values will be replaced, you supply using environment variables called TFMASK_CHAR and TFMASK_VALUES_REGEX, respectively.
You’ll now use tfmask to censor the name and ipv4_address of the Droplet that Terraform would deploy. First, you’ll need to set the mentioned environment variables by running:
export TFMASK_CHAR="*"
export TFMASK_VALUES_REGEX="(?i)^.*(ipv4_address|name).*$"This regex expression will match all strings starting with ipv4_address or name, and will not be case sensitive.
To make Terraform plan an action for your Droplet, modify its definition:
nano droplets.tfModify the Droplet’s name:terraform-sensitive/droplets.tf
resource "digitalocean_droplet" "web" {
  image  = "ubuntu-18-04-x64"
  name   = "web"
  region = "fra1"
  size   = "s-1vcpu-1gb"
}
output "droplet_ip_address" {
  value = digitalocean_droplet.web.ipv4_address
  sensitive = false
}Save and close the file.
Because you’ve changed an attribute of the Droplet, Terraform will show its full definition in its output. Plan the configuration, but pipe it to tfmask to censor variables according to the regex expression:
terraform plan -var "do_token=${DO_PAT}" | tfmaskYou’ll receive output similar to the following:
Output...
digitalocean_droplet.web: Refreshing state... [id=216419273]
------------------------------------------------------------------------
An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  ~ update in-place
Terraform will perform the following actions:
  # digitalocean_droplet.web will be updated in-place
  ~ resource "digitalocean_droplet" "web" {
        backups              = false
        created_at           = "2020-11-11T18:43:03Z"
        disk                 = 25
        id                   = "216419273"
        image                = "ubuntu-18-04-x64"
        ipv4_address         = "************"
        ipv4_address_private = "**********"
        ipv6                 = false
        locked               = false
        memory               = 1024
        monitoring           = false
      ~ name                 = "**********************************"
        price_hourly         = 0.00744
        price_monthly        = 5
        private_networking   = true
        region               = "fra1"
        resize_disk          = true
        size                 = "s-1vcpu-1gb"
        status               = "active"
        tags                 = []
        urn                  = "do:droplet:216419273"
        vcpus                = 1
        volume_ids           = []
        vpc_uuid             = "fc52519c-dc84-11e8-8b13-3cfdfea9f160"
    }
Plan: 0 to add, 1 to change, 0 to destroy.
...Note that tfmask has censored the values for name, ipv4_address, and ipv4_address_private using the character you specified in the TFMASK_CHAR environment variable, because they match the regex expression.
This way of value censoring in the Terraform logs is very useful for CI/CD, where the logs may be publicly available. The benefit of tfmask is that you have full control over what variables to censor (using the regex expression). You can also specify keywords that you want to censor, which may not currently exist, but you are anticipating using in the future.
You can destroy the deployed resources by running the following command and entering yes when prompted:
terraform destroy -var "do_token=${DO_PAT}"Conclusion
In this article, you’ve worked with a couple of ways to hide and secure sensitive data in your Terraform project. The first measure, using sensitive to hide values from the outputs, is useful when only logs are accessible, but the values themselves stay present in the state stored on disk.
To remedy that, you can opt to store the state file remotely, which you’ve achieved with DigitalOcean Spaces. This allows you to make use of encryption at rest. You also used tfmask, a tool that censors values of variables—matched using a regex expression—during terraform plan and terraform apply.
You can also check out Hashicorp Vault to store secrets and secret data. It can be integrated with Terraform to inject secrets in resource definitions, so you’ll be able to connect your project with your existing Vault workflow.
 
                     
            