Updating BigQuery Table Schemas

Modified on Mon, 25 Mar, 2024 at 6:08 PM

BigQuery allows you to provide several pieces of information about each column in a table, including column name (required), type (required) and description (optional). Updating the description allows you to include information that you might find in a data dictionary directly in BigQuery. There are several methods you can follow to do so, noted below. We do not recommend changing the column name or type in CTA-created tables in your project. If you do need to change a column name or type in a CTA-created table, email help@techallies.org.

Please note that you will need to be an admin or an editor in order to update descriptions via any of the methods below. Below is a list of each method listed in this article:

Method 1: Cloud Shell in the browser (recommended approach)
1. Optional: Use File Stored in GCS to Update Schema
Method 2: Locally through the CLI
1. Optional: Use File Stored in GCS to Update Schema 7
Method 3: Running SQL commands in BigQuery 9
Method 4: Executing Python Code 10
Method 5: Calling the BigQuery REST API 12
Method 6: Using the BigQuery UI 16
Other Resources 17
1. Updating JSON files 17

Method 1: Cloud Shell in the browser (recommended approach)

Cloud Shell is a free service available to all Google Cloud users through the browser that provides users a terminal and editor automatically configured to your account and project.

Navigate to BigQuery. In the top right corner, select the “Activate Cloud Shell” icon.
If this is your first time activating Cloud Shell, you’ll see a prompt letting you know Cloud Shell is free for all users. Click `Continue`.
1. While your Cloud Shell session will disappear after you close Cloud Shell, you do have the option to save scripts or configuration files if needed. Cloud Shell stores files saved in your $HOME directory across sessions. Read more about the persistent disk storage here.
Once Cloud Shell finishes launching, you can immediately start using `gcloud` and `bq` commands without needing to set anything up. These commands let you interact with Google Cloud and BigQuery. To get started, download the current schema for the table you want to update with the below code Note that you will need to be in the project that has the table(s) you want to update. Hit enter to execute the code.
```
bq show --schema --format=prettyjson DATASET.TABLE > schema.json
```
1. Replace DATASET and TABLE with their respective values.
2. This will create a JSON file named `schema.json` in the Cloud Shell home directory. You can also name the `schema.json` file to something more descriptive, like `DATASET_TABLE_schema.json` if you wish.
Once you’ve executed the command, click the `Open Editor` button above the Terminal. This will switch your Cloud Shell from a Terminal (where you run code) to an Editor (where you can see the files in your home directory - including the one you just made - and modify them).
On the left hand side, you should see `EXPLORER` with two panes: `OPEN EDITORS` and `<FIRSTNAME>_<LASTNAME>`, which is your home directory. In your home directory, there should be a README-cloudshell.txt file (you can quickly read through this if you wish), and the JSON file containing the JSON schema you just downloaded. Open the schema file.
Update the JSON file with the intended changes.
1. For an existing table schema, BigQuery allows you to update the column descriptions only. Attempting to update other fields will return an error. Do not modify or remove the other fields.
Once you finish making changes, save the file (press CTRL + S or click the `File` tab and select `Save`. Go back to the Terminal by clicking `Open Terminal` and then update the table with the modified schema file by running the below code.
```
bq update --schema schema.json -t DATASET.TABLE
```
1. Replace DATASET, and TABLE with their respective values.
2. Your table’s schema should now be updated with your changes!
Repeat steps 3 - 7 as necessary for each table.

Optional: Use File Stored in GCS to Update Schema

Instead of downloading the current schema from BigQuery and modifying it, you can download a schema file from GCS to update your table. Be aware that if you choose to do this, you need to make sure that the schema file is properly formatted as a JSON array of objects, with the fields `name` and `type` for each column matching the current schema. A valid schema file should look something like this:

[
  {
"description": "User's full name",
"name": "name",
"type": "STRING"
  },
  {
"description": "User's street address",
"name": "address",
"type": "STRING"
  },
  {
"description": "User's email address",
"name": "email",
"type": "STRING"
  }
]

Download your schema file from Google Cloud Storage (note: this replaces steps 3-6 from above).
```
gcloud storage cp gs://BUCKET_NAME/OBJECT_NAME .
```
1. Replace BUCKET_NAME, and OBJECT_NAME with their respective values.
2. The period `.` at the end means to save it in your current directory. In this case, the home directory of Cloud Shell.
Update the table using step 7 from above, replacing `schema.json` with OBJECT_NAME.

Method 2: Locally through the CLI

This is a more technical approach than Method 1, since it requires installing and setting up the gcloud CLI (command line interface) on your own machine.

Install the gcloud CLI. The BigQuery CLI is bundled with the gcloud CLI.
Login to Google Cloud through the CLI.
```
gcloud init
```
1. This will open a window in your default web browser where you can authorize access. Make sure that you’re logged into your CTA profile in the browser window when you complete this step.
Get the current schema for the table you want to update by running the below code in your terminal.
```
bq --project_id=PROJECT_ID show --schema --format=prettyjson DATASET.TABLE > schema.json
```
1. Replace PROJECT_ID, DATASET, and TABLE with their respective values.
2. This will create a JSON file named `schema.json` in your terminal’s current working directory. You can also name the `schema.json` file to something more descriptive, like `DATASET_TABLE_schema.json` if you wish.
3. The JSON file will look similar to this:[
```
  {
"description": "Placeholder description.",
"mode": "NULLABLE",
"name": "string_field_0",
"type": "STRING"
  },
  {
"description": "Placeholder description.",
"mode": "NULLABLE",
"name": "string_field_1",
"type": "STRING"
  },
  ...
]
```
Update the JSON file with the intended changes.
1. For an existing table schema, BigQuery allows you to update the column descriptions only. Attempting to update other fields will return an error. Do not modify or remove the other fields.
Once you have all the schema changes made and the file saved, update it using the BigQuery CLI with the below code.
```
bq --project_id=PROJECT_ID update --schema schema.json -t DATASET.TABLE
```
1. Replace PROJECT_ID, DATASET, and TABLE with their respective values.
2. Your table’s schema should now be updated with your changes!
Repeat steps 3-5 as necessary for each table.

Optional: Use File Stored in GCS to Update Schema

[
  {
"description": "User's full name",
"name": "name",
"type": "STRING"
  },
  {
"description": "User's street address",
"name": "address",
"type": "STRING"
  },
  {
"description": "User's email address",
"name": "email",
"type": "STRING"
  }
]

Download your schema file from Google Cloud Storage (note: this replaces steps 3-4 from above).
```
gcloud storage cp gs://BUCKET_NAME/OBJECT_NAME .
```
1. Replace BUCKET_NAME, and OBJECT_NAME with their respective values.
2. The period `.` at the end means to save it in your Terminal’s current directory.
Update the table using step 7 from above, replacing `schema.json` with OBJECT_NAME.

Method 3: Running SQL commands in BigQuery

You can execute SQL statements to modify a table to update column descriptions. This is a time-intensive process, since you can only update one column in one table at a time.

Execute the query below in BigQuery, replacing `<dataset>`, `<table>`, `<column>`, and `<description>`.

ALTER TABLE <dataset>.<table>
ALTER COLUMN <column>
SET OPTIONS (description = '<description>');

# You can do a similar query if you're modifying a view instead.
ALTER VIEW <dataset>.<view>
ALTER COLUMN <column>
SET OPTIONS (description = '<description>');

Method 4: Executing Python Code

Google provides Python libraries for use with their services. You can use the BigQuery and Google Cloud Storage libraries to download a JSON file, and alter a BigQuery table based on the contents.

Set up your Python development environment.
1. You will also need to pip install the BigQuery and Google Cloud Storage packages.
```
pip install --upgrade google-cloud-bigquery
pip install --upgrade google-cloud-storage
```
Install the gcloud CLI.
Login to Google Cloud through the CLI.
```
gcloud init
```
1. This will open a window in your default web browser where you can authorize access. Make sure that you’re logged into your CTA profile in the browser window when you complete this step.
Set up ADC (application default credentials).
1. Google uses ADC to automatically authenticate when executing Python scripts.
2. Create your credentials file.
```
gcloud auth application-default login

# Optionally revoke the credentials afterwards
gcloud auth application-default revoke
```
Now that you have your Python environment set up and authenticated, you can start writing Python scripts! Below is an example script for downloading a JSON schema file from Cloud Storage and using it to update a table in BigQuery.

from google.api_core.exceptions import Forbidden, NotFound
from google.cloud import bigquery, storage
import json
import sys

project_id = "project"
dataset_id = "dataset"
table_id = "table"
gcs_bucket = "bucket"
gcs_file = "path/to/file/schema.json"

bq = bigquery.Client()
gcs = storage.Client()

try:
    bucket = gcs.bucket(gcs_bucket)
    blob = bucket.blob(gcs_file)
    contents = blob.download_as_string()
    contents_json = json.loads(contents)
except NotFound as e:
    sys.exit(f"ERROR: File not found!\n{e}")
except Forbidden as e:
    sys.exit(f"ERROR: User does not have access to file!\n{e}")

for i in contents_json:
    column = i["name"]
    description = i["description"]
    query = f"ALTER TABLE {dataset_id}.{table_id} ALTER COLUMN {column} SET OPTIONS (description = '{description}')"

    try:
        query_job = bq.query(query)
        rows = query_job.result()
        for row in rows:
            print(row)
    except NotFound as e:
         sys.exit(f"ERROR: Table not found!\n{e}")
    except Forbidden as e:
        sys.exit(f"ERROR: User does not have access to table!\n{e}")

print(f"Schema for {project_id}.{dataset_id}.{table_id} updated successfully!")

Method 5: Calling the BigQuery REST API

Google provides REST APIs for BigQuery that you can call through the browser as long as you’re logged into your CTA account. This method is the least UX friendly and should be used sparingly by a technical user.

Go to https://cloud.google.com/bigquery/docs/reference/rest/v2/tables/patch.
1. Double check that you’re using the `tables.patch` method, and not `tables.update`. UPDATE replaces the entire table resource, whereas PATCH only replaces fields that you include.
2. Make sure that you’re logged into your CTA account.
On the right hand side, you should automatically see a panel that says “Try this method”. If not, click the “Try it!” button
Fill in the `projectId`, `datasetId`, and `tableId` request parameters with their respective values.
In the `Request body`, click the blue + button and add the `schema` parameter. Click the blue + button inside the `schema` parameter and add the `fields` parameter. Within the `fields` parameter, you can `[Add Items]` where each item is a column in the table. You must define every existing column in the table with its current `name` and `type`, and the updated `description`.
Once you’ve finished defining the schema, click `Execute`. Do not uncheck any of the credentials boxes.
1. You may receive a pop-up window asking you to authorize the API, and if so, authorize it.
You should receive a `200 OK` response from the API showing that the table schema was successfully updated!

Method 6: Using the BigQuery UI

You can also use the BigQuery UI to update a table schema. This method is best suited for updating a small amount of columns/tables since it’s a manual and time consuming process.

Navigate to BigQuery.
Find your table in the explorer panel and double click the table name to open it (or click the vertical … button to the right and select `Open`).
In the `SCHEMA` tab, click `EDIT SCHEMA`.

Make the necessary changes to each field’s description and then click `SAVE` at the bottom of the screen.

Other Resources

Updating JSON files

Some of the methods require modifying and uploading a JSON file to BigQuery to update the schema. BigQuery requires that JSON schema files contain all the current existing columns in the table, as well as the name and type for each column. There are additional optional fields, but for the purpose of updating an existing table’s schema, you generally only need to have the name, type, and description. For more information, you can take a look at the documentation.

Example JSON schema for a table with two columns(named id and email) that are both strings with a description we’re updating:

[
    {
    "name": "id",
    "type": "STRING",
    "description": "Unique ID for this user."
    },
    {
    "name": "email",
    "type": "STRING",
    "description": "User's email address."
    }
]