We're gonna use the REGEXP_EXTRACT_ALL function provided in the Standard SQL dialect of BigQuery to extract parameters from the query part of a URL and return them as an ARRAY.
Code
#standardSQL
SELECT
REGEXP_EXTRACT_ALL(query,r'(?:\?|&)((?:[^=]+)=(?:[^&]*))') as params,
REGEXP_EXTRACT_ALL(query,r'(?:\?|&)(?:([^=]+)=(?:[^&]*))') as keys,
REGEXP_EXTRACT_ALL(query,r'(?:\?|&)(?:(?:[^=]+)=([^&]*))') as values
FROM
table
Working Example
Result
Row | id | query | params | keys | values | description |
---|---|---|---|---|---|---|
1 | 1 | ?foo=bar | foo=bar | foo | bar | simple |
2 | 2 | ?foo=bar&bar=baz | foo=bar | foo | bar | multiple params |
bar=baz | bar | baz | ||||
3 | 3 | ?foo[]=bar&foo[]=baz | foo[]=bar | foo[] | bar | arrays |
foo[]=baz | foo[] | baz | ||||
4 | 4 | no query |
Run on BigQuery
Notes
REGEXP_EXTRACT_ALL
only excepts 1 capturing group, hence we need to mark all other groups as non-capturing with(?:
- if the URL contains a fragment part (e.g. https://example.org/?foo=bar#baz), the fragment is currently not removed.
To do so, remove the fragment prior to extraction with
REGEXP_REPLACE,
e.g. like so:
REGEXP_EXTRACT_ALL( REGEXP_EXTRACT(query, r'#.*', ''), r'(?:\?|&)(?:(?:[^=]+)=([^&]*))') as values
Links
Use cases
- compile a list of all parameters from your log files
- evaluate the frequency of parameters keys/values
Wanna stay in touch?
Since you ended up on this blog, chances are pretty high that you're into Software Development (probably PHP, Laravel, Docker or Google Big Query) and I'm a big fan of feedback and networking.
So - if you'd like to stay in touch, feel free to shoot me an email with a couple of words about yourself and/or connect with me on LinkedIn or Twitter or simply subscribe to my RSS feed or go the crazy route and subscribe via mail and don't forget to leave a comment :)