Guidance on BigQuery Projects, Datasets, Tables and Views

Modified on Fri, 15 Nov, 2024 at 10:09 AM

PAD provides different levels for organizing and storing your data. In this article, you will learn about these different levels and receive recommendations from CTA on how to approach and utilize them effectively.

Projects

The highest level is a project, which is similar to a database in other SQL applications. Projects house all of the other data levels and dictate where query costs are billed. Most PAD users have one project for all of their organization’s data, which is typically what CTA recommends, but some users may have access to multiple projects depending on how your organization has chosen to set up their PAD environment. All users can also access several additional projects with public data (see more here).

To confirm which project you’re in or to switch projects, you can navigate to the project drop-down in the upper left corner of your screen. You can also access all of your projects in the Explorer pane on the left side of your screen. Remember that if you try to query data from a project, but your project drop-down is set to a different project, you may have permissions issues when you try to query data.

If you’re having trouble finding your PAD project, you can see a detailed walkthrough here.

PAD users can create new projects outside of the PAD environment, but we recommend against it. CTA cannot support or sync data to any projects created outside the PAD environment. If you believe your organization has a use case for additional projects, please have your administrator contact your CTA point of contact to discuss creating additional projects for you within PAD. Note that additional projects in PAD incur additional costs, and we typically don’t recommend multiple projects for the same organization.

Datasets

Datasets are the second highest level of the hierarchy. Datasets live within projects and roughly resemble schemas in other SQL applications. Datasets are excellent options for organizing different data groups; for example, all your VAN data might be in one dataset, and all your Mobilize data might be in another. Users with admin, editor, or contributor roles can create datasets (though only admins can delete them). Users can create as many datasets within your project as they would like. Datasets are project-specific, meaning you need to reference your project ID when creating them.

Tables and Views

The final level of hierarchy within PAD are tables and views. Tables and views live within datasets and are what you’ll query to surface your data. They are project- and dataset-specific, meaning that you need to reference your project ID and dataset name when you create and query them. You can read more about the differences between tables and views here.

Users with admin, editor, or contributor roles can create tables or views, and users are free to create as many as they’d like. All users can query tables and views as long as they can access the dataset and project.

PAD provides many preview details about tables and views that you can access by clicking on the table or view name in the Explorer pane and navigating to the Schema tab (for column names and data types) or the Details tab (for metadata).

Have questions? Reach out to help@techallies.org!