Cost monitoring in Google BigQuery can be a difficult task, especially within a growing organization and lots of (independent) stakeholders that have access to the data. If your organization is not using reserved slots (flat-rate pricing) but is billed by the number of bytes processed (on-demand pricing), costs can get quickly out of hand, and we need the means to investigate or "debug" the BigQuery usage in order to understand: - who ran queries with a high cost - what were the exact queries - when did those queries run (and are they maybe even running regularly)
Previously, we had to manually set up query logging via Stackdriver as explained in the article
Taking a practical approach to BigQuery cost monitoring
but in late 2019 BigQuery introduced
INFORMATION_SCHEMA
views
as a beta feature that also contain data about BigQuery jobs via the
INFORMATION_SCHEMA.JOBS_BY_*
views
and became generally available (GA) at 2020-06-16
Examples
SELECT
creation_time,
job_id,
project_id,
user_email,
total_bytes_processed,
query
FROM
`region-us`.INFORMATION_SCHEMA.JOBS_BY_USER
SELECT * FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT
SELECT * FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_ORGANIZATION
Working Example
- this query will select the most interesting fields in terms of cost monitoring from the
INFORMATION_SCHEMA.JOBS_BY_USER
view for all jobs that have been run in region US in the currently selected project - the
cost_in_dollar
is estimate by calculating thetotal_bytes_processed
in Terabyte and multiplying the result with $5.00 (which corresponds to the cost as of today 2020-06-22). Also, we only take those costs into account if the query was not answered from the cache (see thecache_hit != true
condition) - the
creation_time
is converted to our local timezone - the results are restricted to the past 30 days by using the
WHERE
clause to filter on the partition columncreation_time
- feel free to replace
JOBS_BY_PROJECT
withJOBS_BY_USER
orJOBS_BY_ORGANIZATION
Run on BigQuery
Notes
While playing around with the INFORMATION_SCHEMA
views I've hit a couple of gotchas:
- the different views require different permissions
- the views are regionalized, i.e. we must prefix the region (see region-us
in the view specification) and
must run the job in that region (e.g. from the BigQuery UI via
More > Query Settings > Processing location
)
- it is not possible to mix multiple regions in the query, because a query with processing location US
can only access resources in location US
. Though it would be very helpful for organizations that actively use different locations,
something like this is not possible:
SELECT * FROM
(SELECT * `region-us`.INFORMATION_SCHEMA.JOBS_BY_ORGANIZATION)
UNION ALL
(SELECT * `region-eu`.INFORMATION_SCHEMA.JOBS_BY_ORGANIZATION)
- data is currently only kept for the past 180 days
- the JOBS_BY_USER
view seems to "match" the user based on the email address. My user email adress is a @googlemail.com
address; in the user column it is stored as @gmail.com
. Thus, I get no results when using JOBS_BY_USER
- JOBS_BY_USER
and JOBS_BY_PROJECT
will use the currently selected project by default. A different
project (e.g. other-project
) can be specified via
SELECT * FROM `other-project.region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT
- the full query
is not available for JOBS_BY_ORGANIZATION
Use Cases
I use this approach in our organization to set up a view based on INFORMATION_SCHEMA.JOBS_BY_ORGANIZATION
that is then used as a data source for Google DataStudio.
This allows me to get a quick high level overview
over all query costs and further enables me to drill down deeper if I need to. I can even find the exact
queries via their job_id
.
Wanna stay in touch?
Since you ended up on this blog, chances are pretty high that you're into Software Development (probably PHP, Laravel, Docker or Google Big Query) and I'm a big fan of feedback and networking.
So - if you'd like to stay in touch, feel free to shoot me an email with a couple of words about yourself and/or connect with me on LinkedIn or Twitter or simply subscribe to my RSS feed or go the crazy route and subscribe via mail and don't forget to leave a comment :)