Skip to content

Latest commit

 

History

History
1815 lines (1451 loc) · 55.1 KB

aggregate-dp-functions.md

File metadata and controls

1815 lines (1451 loc) · 55.1 KB

Differentially private aggregate functions

ZetaSQL supports differentially private aggregate functions. For an explanation of how aggregate functions work, see Aggregate function calls.

You can only use differentially private aggregate functions with differentially private queries in a differential privacy clause.

Note: In this topic, the privacy parameters in the examples are not recommendations. You should work with your privacy or security officer to determine the optimal privacy parameters for your dataset and organization.

Function list

Name Summary
AVG (Differential Privacy) DIFFERENTIAL_PRIVACY-supported AVG.

Gets the differentially-private average of non-NULL, non-NaN values in a query with a DIFFERENTIAL_PRIVACY clause.
COUNT (Differential Privacy) DIFFERENTIAL_PRIVACY-supported COUNT.

Signature 1: Gets the differentially-private count of rows in a query with a DIFFERENTIAL_PRIVACY clause.

Signature 2: Gets the differentially-private count of rows with a non-NULL expression in a query with a DIFFERENTIAL_PRIVACY clause.
PERCENTILE_CONT (Differential Privacy) DIFFERENTIAL_PRIVACY-supported PERCENTILE_CONT.

Computes a differentially-private percentile across privacy unit columns in a query with a DIFFERENTIAL_PRIVACY clause.
SUM (Differential Privacy) DIFFERENTIAL_PRIVACY-supported SUM.

Gets the differentially-private sum of non-NULL, non-NaN values in a query with a DIFFERENTIAL_PRIVACY clause.
VAR_POP (Differential Privacy) DIFFERENTIAL_PRIVACY-supported VAR_POP (Differential Privacy).

Computes the differentially-private population (biased) variance of values in a query with a DIFFERENTIAL_PRIVACY clause.
ANON_AVG Deprecated. Gets the differentially-private average of non-NULL, non-NaN values in a query with an ANONYMIZATION clause.
ANON_COUNT Deprecated.

Signature 1: Gets the differentially-private count of rows in a query with an ANONYMIZATION clause.

Signature 2: Gets the differentially-private count of rows with a non-NULL expression in a query with an ANONYMIZATION clause.
ANON_PERCENTILE_CONT Deprecated. Computes a differentially-private percentile across privacy unit columns in a query with an ANONYMIZATION clause.
ANON_QUANTILES Deprecated. Produces an array of differentially-private quantile boundaries in a query with an ANONYMIZATION clause.
ANON_STDDEV_POP Deprecated. Computes a differentially-private population (biased) standard deviation of values in a query with an ANONYMIZATION clause.
ANON_SUM Deprecated. Gets the differentially-private sum of non-NULL, non-NaN values in a query with an ANONYMIZATION clause.
ANON_VAR_POP Deprecated. Computes a differentially-private population (biased) variance of values in a query with an ANONYMIZATION clause.

AVG (DIFFERENTIAL_PRIVACY)

WITH DIFFERENTIAL_PRIVACY ...
  AVG(
    expression,
    [ contribution_bounds_per_group => (lower_bound, upper_bound) ]
  )

Description

Returns the average of non-NULL, non-NaN values in the expression. This function first computes the average per privacy unit column, and then computes the final result by averaging these averages.

This function must be used with the DIFFERENTIAL_PRIVACY clause and can support the following arguments:

  • expression: The input expression. This can be any numeric input type, such as INT64.
  • contribution_bounds_per_group: A named argument with a contribution bound. Performs clamping for each group separately before performing intermediate grouping on the privacy unit column.

Return type

DOUBLE

Examples

The following differentially private query gets the average number of each item requested per professor. Smaller aggregations might not be included. This query references a table called professors.

-- With noise, using the epsilon parameter.
SELECT
  WITH DIFFERENTIAL_PRIVACY
    OPTIONS(epsilon=10, delta=.01, max_groups_contributed=1, privacy_unit_column=id)
    item,
    AVG(quantity, contribution_bounds_per_group => (0,100)) average_quantity
FROM professors
GROUP BY item;

-- These results will change each time you run the query.
-- Smaller aggregations might be removed.
/*----------+------------------*
 | item     | average_quantity |
 +----------+------------------+
 | pencil   | 38.5038356810269 |
 | pen      | 13.4725028762032 |
 *----------+------------------*/
-- Without noise, using the epsilon parameter.
-- (this un-noised version is for demonstration only)
SELECT
  WITH DIFFERENTIAL_PRIVACY
    OPTIONS(epsilon=1e20, delta=.01, max_groups_contributed=1, privacy_unit_column=id)
    item,
    AVG(quantity) average_quantity
FROM professors
GROUP BY item;

-- These results will not change when you run the query.
/*----------+------------------*
 | item     | average_quantity |
 +----------+------------------+
 | scissors | 8                |
 | pencil   | 40               |
 | pen      | 18.5             |
 *----------+------------------*/

The following differentially private query gets the average number of each item requested per professor. Smaller aggregations might not be included. This query references a view called view_on_professors.

-- With noise, using the epsilon parameter.
SELECT
  WITH DIFFERENTIAL_PRIVACY
    OPTIONS(epsilon=10, delta=.01, max_groups_contributed=1)
    item,
    AVG(quantity, contribution_bounds_per_group=>(0, 100)) average_quantity
FROM {{USERNAME}}.view_on_professors
GROUP BY item;

-- These results will change each time you run the query.
-- Smaller aggregations might be removed.
/*----------+------------------*
 | item     | average_quantity |
 +----------+------------------+
 | pencil   | 38.5038356810269 |
 | pen      | 13.4725028762032 |
 *----------+------------------*/
-- Without noise, using the epsilon parameter.
-- (this un-noised version is for demonstration only)
SELECT
  WITH DIFFERENTIAL_PRIVACY
    OPTIONS(epsilon=1e20, delta=.01, max_groups_contributed=1)
    item,
    AVG(quantity) average_quantity
FROM {{USERNAME}}.view_on_professors
GROUP BY item;

-- These results will not change when you run the query.
/*----------+------------------*
 | item     | average_quantity |
 +----------+------------------+
 | scissors | 8                |
 | pencil   | 40               |
 | pen      | 18.5             |
 *----------+------------------*/

Note: For more information about when and when not to use noise, see Remove noise.

COUNT (DIFFERENTIAL_PRIVACY)

  • Signature 1: Returns the number of rows in a differentially private FROM clause.
  • Signature 2: Returns the number of non-NULL values in an expression.

Signature 1

WITH DIFFERENTIAL_PRIVACY ...
  COUNT(
    *,
    [ contribution_bounds_per_group => (lower_bound, upper_bound) ]
  )

Description

Returns the number of rows in the differentially private FROM clause. The final result is an aggregation across a privacy unit column.

This function must be used with the DIFFERENTIAL_PRIVACY clause and can support the following argument:

  • contribution_bounds_per_group: A named argument with a contribution bound. Performs clamping for each group separately before performing intermediate grouping on the privacy unit column.

Return type

INT64

Examples

The following differentially private query counts the number of requests for each item. This query references a table called professors.

-- With noise, using the epsilon parameter.
SELECT
  WITH DIFFERENTIAL_PRIVACY
    OPTIONS(epsilon=10, delta=.01, max_groups_contributed=1, privacy_unit_column=id)
    item,
    COUNT(*, contribution_bounds_per_group=>(0, 100)) times_requested
FROM professors
GROUP BY item;

-- These results will change each time you run the query.
-- Smaller aggregations might be removed.
/*----------+-----------------*
 | item     | times_requested |
 +----------+-----------------+
 | pencil   | 5               |
 | pen      | 2               |
 *----------+-----------------*/
-- Without noise, using the epsilon parameter.
-- (this un-noised version is for demonstration only)
SELECT
  WITH DIFFERENTIAL_PRIVACY
    OPTIONS(epsilon=1e20, delta=.01, max_groups_contributed=1, privacy_unit_column=id)
    item,
    COUNT(*, contribution_bounds_per_group=>(0, 100)) times_requested
FROM professors
GROUP BY item;

-- These results will not change when you run the query.
/*----------+-----------------*
 | item     | times_requested |
 +----------+-----------------+
 | scissors | 1               |
 | pencil   | 4               |
 | pen      | 3               |
 *----------+-----------------*/

The following differentially private query counts the number of requests for each item. This query references a view called view_on_professors.

-- With noise, using the epsilon parameter.
SELECT
  WITH DIFFERENTIAL_PRIVACY
    OPTIONS(epsilon=10, delta=.01, max_groups_contributed=1)
    item,
    COUNT(*, contribution_bounds_per_group=>(0, 100)) times_requested
FROM {{USERNAME}}.view_on_professors
GROUP BY item;

-- These results will change each time you run the query.
-- Smaller aggregations might be removed.
/*----------+-----------------*
 | item     | times_requested |
 +----------+-----------------+
 | pencil   | 5               |
 | pen      | 2               |
 *----------+-----------------*/
-- Without noise, using the epsilon parameter.
-- (this un-noised version is for demonstration only)
SELECT
  WITH DIFFERENTIAL_PRIVACY
    OPTIONS(epsilon=1e20, delta=.01, max_groups_contributed=1)
    item,
    COUNT(*, contribution_bounds_per_group=>(0, 100)) times_requested
FROM {{USERNAME}}.view_on_professors
GROUP BY item;

-- These results will not change when you run the query.
/*----------+-----------------*
 | item     | times_requested |
 +----------+-----------------+
 | scissors | 1               |
 | pencil   | 4               |
 | pen      | 3               |
 *----------+-----------------*/

Note: For more information about when and when not to use noise, see Remove noise.

Signature 2

WITH DIFFERENTIAL_PRIVACY ...
  COUNT(
    expression,
    [contribution_bounds_per_group => (lower_bound, upper_bound)]
  )

Description

Returns the number of non-NULL expression values. The final result is an aggregation across a privacy unit column.

This function must be used with the DIFFERENTIAL_PRIVACY clause and can support these arguments:

  • expression: The input expression. This expression can be any numeric input type, such as INT64.
  • contribution_bounds_per_group: A named argument with a contribution bound. Performs clamping per each group separately before performing intermediate grouping on the privacy unit column.

Return type

INT64

Examples

The following differentially private query counts the number of requests made for each type of item. This query references a table called professors.

-- With noise, using the epsilon parameter.
SELECT
  WITH DIFFERENTIAL_PRIVACY
    OPTIONS(epsilon=10, delta=.01, max_groups_contributed=1, privacy_unit_column=id)
    item,
    COUNT(item, contribution_bounds_per_group => (0,100)) times_requested
FROM professors
GROUP BY item;

-- These results will change each time you run the query.
-- Smaller aggregations might be removed.
/*----------+-----------------*
 | item     | times_requested |
 +----------+-----------------+
 | pencil   | 5               |
 | pen      | 2               |
 *----------+-----------------*/
-- Without noise, using the epsilon parameter.
-- (this un-noised version is for demonstration only)
SELECT
  WITH DIFFERENTIAL_PRIVACY
    OPTIONS(epsilon=1e20, delta=.01, max_groups_contributed=1, privacy_unit_column=id)
    item,
    COUNT(item, contribution_bounds_per_group => (0,100)) times_requested
FROM professors
GROUP BY item;

-- These results will not change when you run the query.
/*----------+-----------------*
 | item     | times_requested |
 +----------+-----------------+
 | scissors | 1               |
 | pencil   | 4               |
 | pen      | 3               |
 *----------+-----------------*/

The following differentially private query counts the number of requests made for each type of item. This query references a view called view_on_professors.

-- With noise
SELECT
  WITH DIFFERENTIAL_PRIVACY
    OPTIONS(epsilon=10, delta=.01, max_groups_contributed=1)
    item,
    COUNT(item, contribution_bounds_per_group=>(0, 100)) times_requested
FROM {{USERNAME}}.view_on_professors
GROUP BY item;

-- These results will change each time you run the query.
-- Smaller aggregations might be removed.
/*----------+-----------------*
 | item     | times_requested |
 +----------+-----------------+
 | pencil   | 5               |
 | pen      | 2               |
 *----------+-----------------*/
--Without noise (this un-noised version is for demonstration only)
SELECT
  WITH DIFFERENTIAL_PRIVACY
    OPTIONS(epsilon=1e20, delta=.01, max_groups_contributed=1)
    item,
    COUNT(item, contribution_bounds_per_group=>(0, 100)) times_requested
FROM {{USERNAME}}.view_on_professors
GROUP BY item;

-- These results will not change when you run the query.
/*----------+-----------------*
 | item     | times_requested |
 +----------+-----------------+
 | scissors | 1               |
 | pencil   | 4               |
 | pen      | 3               |
 *----------+-----------------*/

Note: For more information about when and when not to use noise, see Remove noise.

PERCENTILE_CONT (DIFFERENTIAL_PRIVACY)

WITH DIFFERENTIAL_PRIVACY ...
  PERCENTILE_CONT(
    expression,
    percentile,
    contribution_bounds_per_row => (lower_bound, upper_bound)
  )

Description

Takes an expression and computes a percentile for it. The final result is an aggregation across privacy unit columns.

This function must be used with the DIFFERENTIAL_PRIVACY clause and can support these arguments:

  • expression: The input expression. This can be most numeric input types, such as INT64. NULL values are always ignored.
  • percentile: The percentile to compute. The percentile must be a literal in the range [0, 1].
  • contribution_bounds_per_row: A named argument with a contribution bounds. Performs clamping for each row separately before performing intermediate grouping on the privacy unit column.

NUMERIC and BIGNUMERIC arguments are not allowed. If you need them, cast them as the DOUBLE data type first.

Return type

DOUBLE

Examples

The following differentially private query gets the percentile of items requested. Smaller aggregations might not be included. This query references a view called professors.

-- With noise, using the epsilon parameter.
SELECT
  WITH DIFFERENTIAL_PRIVACY
    OPTIONS(epsilon=10, delta=.01, max_groups_contributed=1, privacy_unit_column=id)
    item,
    PERCENTILE_CONT(quantity, 0.5, contribution_bounds_per_row => (0,100)) percentile_requested
FROM professors
GROUP BY item;

-- These results will change each time you run the query.
-- Smaller aggregations might be removed.
 /*----------+----------------------*
  | item     | percentile_requested |
  +----------+----------------------+
  | pencil   | 72.00011444091797    |
  | scissors | 8.000175476074219    |
  | pen      | 23.001075744628906   |
  *----------+----------------------*/

The following differentially private query gets the percentile of items requested. Smaller aggregations might not be included. This query references a view called view_on_professors.

-- With noise, using the epsilon parameter.
SELECT
  WITH DIFFERENTIAL_PRIVACY
    OPTIONS(epsilon=10, delta=.01, max_groups_contributed=1)
    item,
    PERCENTILE_CONT(quantity, 0.5, contribution_bounds_per_row=>(0, 100)) percentile_requested
FROM {{USERNAME}}.view_on_professors
GROUP BY item;

-- These results will change each time you run the query.
-- Smaller aggregations might be removed.
/*----------+----------------------*
 | item     | percentile_requested |
 +----------+----------------------+
 | pencil   | 72.00011444091797    |
 | scissors | 8.000175476074219    |
 | pen      | 23.001075744628906   |
 *----------+----------------------*/

SUM (DIFFERENTIAL_PRIVACY)

WITH DIFFERENTIAL_PRIVACY ...
  SUM(
    expression,
    [ contribution_bounds_per_group => (lower_bound, upper_bound) ]
  )

Description

Returns the sum of non-NULL, non-NaN values in the expression. The final result is an aggregation across privacy unit columns.

This function must be used with the DIFFERENTIAL_PRIVACY clause and can support these arguments:

  • expression: The input expression. This can be any numeric input type, such as INT64. NULL values are always ignored.
  • contribution_bounds_per_group: A named argument with a contribution bound. Performs clamping for each group separately before performing intermediate grouping on the privacy unit column.

Return type

One of the following supertypes:

  • INT64
  • UINT64
  • DOUBLE

Examples

The following differentially private query gets the sum of items requested. Smaller aggregations might not be included. This query references a view called professors.

-- With noise, using the epsilon parameter.
SELECT
  WITH DIFFERENTIAL_PRIVACY
    OPTIONS(epsilon=10, delta=.01, max_groups_contributed=1, privacy_unit_column=id)
    item,
    SUM(quantity, contribution_bounds_per_group => (0,100)) quantity
FROM professors
GROUP BY item;

-- These results will change each time you run the query.
-- Smaller aggregations might be removed.
/*----------+-----------*
 | item     | quantity  |
 +----------+-----------+
 | pencil   | 143       |
 | pen      | 59        |
 *----------+-----------*/
-- Without noise, using the epsilon parameter.
-- (this un-noised version is for demonstration only)
SELECT
  WITH DIFFERENTIAL_PRIVACY
    OPTIONS(epsilon=1e20, delta=.01, max_groups_contributed=1, privacy_unit_column=id)
    item,
    SUM(quantity) quantity
FROM professors
GROUP BY item;

-- These results will not change when you run the query.
/*----------+----------*
 | item     | quantity |
 +----------+----------+
 | scissors | 8        |
 | pencil   | 144      |
 | pen      | 58       |
 *----------+----------*/

The following differentially private query gets the sum of items requested. Smaller aggregations might not be included. This query references a view called view_on_professors.

-- With noise, using the epsilon parameter.
SELECT
  WITH DIFFERENTIAL_PRIVACY
    OPTIONS(epsilon=10, delta=.01, max_groups_contributed=1)
    item,
    SUM(quantity, contribution_bounds_per_group=>(0, 100)) quantity
FROM {{USERNAME}}.view_on_professors
GROUP BY item;

-- These results will change each time you run the query.
-- Smaller aggregations might be removed.
/*----------+-----------*
 | item     | quantity  |
 +----------+-----------+
 | pencil   | 143       |
 | pen      | 59        |
 *----------+-----------*/
-- Without noise, using the epsilon parameter.
-- (this un-noised version is for demonstration only)
SELECT
  WITH DIFFERENTIAL_PRIVACY
    OPTIONS(epsilon=1e20, delta=.01, max_groups_contributed=1)
    item,
    SUM(quantity) quantity
FROM {{USERNAME}}.view_on_professors
GROUP BY item;

-- These results will not change when you run the query.
/*----------+----------*
 | item     | quantity |
 +----------+----------+
 | scissors | 8        |
 | pencil   | 144      |
 | pen      | 58       |
 *----------+----------*/

Note: For more information about when and when not to use noise, see Use differential privacy.

VAR_POP (DIFFERENTIAL_PRIVACY)

WITH DIFFERENTIAL_PRIVACY ...
  VAR_POP(
    expression,
    [ contribution_bounds_per_row => (lower_bound, upper_bound) ]
  )

Description

Takes an expression and computes the population (biased) variance of the values in the expression. The final result is an aggregation across privacy unit columns between 0 and +Inf. You can clamp the input values explicitly, otherwise input values are clamped implicitly. Clamping is performed per individual user values.

This function must be used with the DIFFERENTIAL_PRIVACY clause and can support these arguments:

  • expression: The input expression. This can be any numeric input type, such as INT64. NULLs are always ignored.
  • contribution_bounds_per_row: A named argument with a contribution bound. Performs clamping for each row separately before performing intermediate grouping on individual user values.

NUMERIC and BIGNUMERIC arguments are not allowed. If you need them, cast them as the DOUBLE data type first.

Return type

DOUBLE

Examples

The following differentially private query gets the population (biased) variance of items requested. Smaller aggregations may not be included. This query references a view called professors.

-- With noise
SELECT
  WITH DIFFERENTIAL_PRIVACY
    OPTIONS(epsilon=10, delta=.01, max_groups_contributed=1, privacy_unit_column=id)
    item,
    VAR_POP(quantity, contribution_bounds_per_row => (0,100)) pop_variance
FROM professors
GROUP BY item;

-- These results will change each time you run the query.
-- Smaller aggregations may be removed.
/*----------+-----------------*
 | item     | pop_variance    |
 +----------+-----------------+
 | pencil   | 642             |
 | pen      | 2.6666666666665 |
 | scissors | 2500            |
 *----------+-----------------*/

The following differentially private query gets the population (biased) variance of items requested. Smaller aggregations might not be included. This query references a view called view_on_professors.

-- With noise
SELECT
  WITH DIFFERENTIAL_PRIVACY
    OPTIONS(epsilon=10, delta=.01, max_groups_contributed=1)
    item,
    VAR_POP(quantity, contribution_bounds_per_row=>(0, 100)) pop_variance
FROM {{USERNAME}}.view_on_professors
GROUP BY item;

-- These results will change each time you run the query.
-- Smaller aggregations might be removed.
/*----------+-----------------*
 | item     | pop_variance    |
 +----------+-----------------+
 | pencil   | 642             |
 | pen      | 2.6666666666665 |
 | scissors | 2500            |
 *----------+-----------------*/

ANON_AVG (DEPRECATED)

Warning: This function has been deprecated. Use AVG (differential privacy) instead.

WITH ANONYMIZATION ...
  ANON_AVG(expression [clamped_between_clause])

clamped_between_clause:
  CLAMPED BETWEEN lower_bound AND upper_bound

Description

Returns the average of non-NULL, non-NaN values in the expression. This function first computes the average per privacy unit column, and then computes the final result by averaging these averages.

This function must be used with the ANONYMIZATION clause and can support these arguments:

  • expression: The input expression. This can be any numeric input type, such as INT64.
  • clamped_between_clause: Perform clamping per privacy unit column averages.

Return type

DOUBLE

Examples

The following differentially private query gets the average number of each item requested per professor. Smaller aggregations might not be included. This query references a view called view_on_professors.

-- With noise, using the epsilon parameter.
SELECT
  WITH ANONYMIZATION
    OPTIONS(epsilon=10, delta=.01, max_groups_contributed=1)
    item,
    ANON_AVG(quantity CLAMPED BETWEEN 0 AND 100) average_quantity
FROM {{USERNAME}}.view_on_professors
GROUP BY item;

-- These results will change each time you run the query.
-- Smaller aggregations might be removed.
/*----------+------------------*
 | item     | average_quantity |
 +----------+------------------+
 | pencil   | 38.5038356810269 |
 | pen      | 13.4725028762032 |
 *----------+------------------*/
-- Without noise, using the epsilon parameter.
-- (this un-noised version is for demonstration only)
SELECT
  WITH ANONYMIZATION
    OPTIONS(epsilon=1e20, delta=.01, max_groups_contributed=1)
    item,
    ANON_AVG(quantity) average_quantity
FROM {{USERNAME}}.view_on_professors
GROUP BY item;

-- These results will not change when you run the query.
/*----------+------------------*
 | item     | average_quantity |
 +----------+------------------+
 | scissors | 8                |
 | pencil   | 40               |
 | pen      | 18.5             |
 *----------+------------------*/

Note: You can learn more about when and when not to use noise here.

ANON_COUNT (DEPRECATED)

Warning: This function has been deprecated. Use COUNT (differential privacy) instead.

Signature 1

WITH ANONYMIZATION ...
  ANON_COUNT(*)

Description

Returns the number of rows in the differentially private FROM clause. The final result is an aggregation across privacy unit columns. Input values are clamped implicitly. Clamping is performed per privacy unit column.

This function must be used with the ANONYMIZATION clause.

Return type

INT64

Examples

The following differentially private query counts the number of requests for each item. This query references a view called view_on_professors.

-- With noise, using the epsilon parameter.
SELECT
  WITH ANONYMIZATION
    OPTIONS(epsilon=10, delta=.01, max_groups_contributed=1)
    item,
    ANON_COUNT(*) times_requested
FROM {{USERNAME}}.view_on_professors
GROUP BY item;

-- These results will change each time you run the query.
-- Smaller aggregations might be removed.
/*----------+-----------------*
 | item     | times_requested |
 +----------+-----------------+
 | pencil   | 5               |
 | pen      | 2               |
 *----------+-----------------*/
-- Without noise, using the epsilon parameter.
-- (this un-noised version is for demonstration only)
SELECT
  WITH ANONYMIZATION
    OPTIONS(epsilon=1e20, delta=.01, max_groups_contributed=1)
    item,
    ANON_COUNT(*) times_requested
FROM {{USERNAME}}.view_on_professors
GROUP BY item;

-- These results will not change when you run the query.
/*----------+-----------------*
 | item     | times_requested |
 +----------+-----------------+
 | scissors | 1               |
 | pencil   | 4               |
 | pen      | 3               |
 *----------+-----------------*/

Note: You can learn more about when and when not to use noise here.

Signature 2

WITH ANONYMIZATION ...
  ANON_COUNT(expression [CLAMPED BETWEEN lower_bound AND upper_bound])

Description

Returns the number of non-NULL expression values. The final result is an aggregation across privacy unit columns.

This function must be used with the ANONYMIZATION clause and can support these arguments:

  • expression: The input expression. This can be any numeric input type, such as INT64.
  • CLAMPED BETWEEN clause: Perform clamping per privacy unit column.

Return type

INT64

Examples

The following differentially private query counts the number of requests made for each type of item. This query references a view called view_on_professors.

-- With noise
SELECT
  WITH ANONYMIZATION
    OPTIONS(epsilon=10, delta=.01, max_groups_contributed=1)
    item,
    ANON_COUNT(item CLAMPED BETWEEN 0 AND 100) times_requested
FROM {{USERNAME}}.view_on_professors
GROUP BY item;

-- These results will change each time you run the query.
-- Smaller aggregations might be removed.
/*----------+-----------------*
 | item     | times_requested |
 +----------+-----------------+
 | pencil   | 5               |
 | pen      | 2               |
 *----------+-----------------*/
--Without noise (this un-noised version is for demonstration only)
SELECT
  WITH ANONYMIZATION
    OPTIONS(epsilon=1e20, delta=.01, max_groups_contributed=1)
    item,
    ANON_COUNT(item CLAMPED BETWEEN 0 AND 100) times_requested
FROM {{USERNAME}}.view_on_professors
GROUP BY item;

-- These results will not change when you run the query.
/*----------+-----------------*
 | item     | times_requested |
 +----------+-----------------+
 | scissors | 1               |
 | pencil   | 4               |
 | pen      | 3               |
 *----------+-----------------*/

Note: You can learn more about when and when not to use noise here.

ANON_PERCENTILE_CONT (DEPRECATED)

Warning: This function has been deprecated. Use PERCENTILE_CONT (differential privacy) instead.

WITH ANONYMIZATION ...
  ANON_PERCENTILE_CONT(expression, percentile [CLAMPED BETWEEN lower_bound AND upper_bound])

Description

Takes an expression and computes a percentile for it. The final result is an aggregation across privacy unit columns.

This function must be used with the ANONYMIZATION clause and can support these arguments:

  • expression: The input expression. This can be most numeric input types, such as INT64. NULLs are always ignored.
  • percentile: The percentile to compute. The percentile must be a literal in the range [0, 1]
  • CLAMPED BETWEEN clause: Perform clamping per privacy unit column.

NUMERIC and BIGNUMERIC arguments are not allowed. If you need them, cast them as the DOUBLE data type first.

Return type

DOUBLE

Examples

The following differentially private query gets the percentile of items requested. Smaller aggregations might not be included. This query references a view called view_on_professors.

-- With noise, using the epsilon parameter.
SELECT
  WITH ANONYMIZATION
    OPTIONS(epsilon=10, delta=.01, max_groups_contributed=1)
    item,
    ANON_PERCENTILE_CONT(quantity, 0.5 CLAMPED BETWEEN 0 AND 100) percentile_requested
FROM {{USERNAME}}.view_on_professors
GROUP BY item;

-- These results will change each time you run the query.
-- Smaller aggregations might be removed.
/*----------+----------------------*
 | item     | percentile_requested |
 +----------+----------------------+
 | pencil   | 72.00011444091797    |
 | scissors | 8.000175476074219    |
 | pen      | 23.001075744628906   |
 *----------+----------------------*/

ANON_QUANTILES (DEPRECATED)

Warning: This function has been deprecated. Use QUANTILES (differential privacy) instead.

WITH ANONYMIZATION ...
  ANON_QUANTILES(expression, number CLAMPED BETWEEN lower_bound AND upper_bound)

Description

Returns an array of differentially private quantile boundaries for values in expression. The first element in the return value is the minimum quantile boundary and the last element is the maximum quantile boundary. The returned results are aggregations across privacy unit columns.

This function must be used with the ANONYMIZATION clause and can support these arguments:

  • expression: The input expression. This can be most numeric input types, such as INT64. NULLs are always ignored.
  • number: The number of quantiles to create. This must be an INT64.
  • CLAMPED BETWEEN clause: Perform clamping per privacy unit column.

NUMERIC and BIGNUMERIC arguments are not allowed. If you need them, cast them as the DOUBLE data type first.

Return type

ARRAY<DOUBLE>

Examples

The following differentially private query gets the five quantile boundaries of the four quartiles of the number of items requested. Smaller aggregations might not be included. This query references a view called view_on_professors.

-- With noise, using the epsilon parameter.
SELECT
  WITH ANONYMIZATION
    OPTIONS(epsilon=10, delta=.01, max_groups_contributed=1)
    item,
    ANON_QUANTILES(quantity, 4 CLAMPED BETWEEN 0 AND 100) quantiles_requested
FROM {{USERNAME}}.view_on_professors
GROUP BY item;

-- These results will change each time you run the query.
-- Smaller aggregations might be removed.
/*----------+----------------------------------------------------------------------*
 | item     | quantiles_requested                                                  |
 +----------+----------------------------------------------------------------------+
 | pen      | [6.409375,20.647684733072918,41.40625,67.30848524305556,99.80078125] |
 | pencil   | [6.849259,44.010416666666664,62.64204,65.83806818181819,98.59375]    |
 *----------+----------------------------------------------------------------------*/

ANON_STDDEV_POP (DEPRECATED)

Warning: This function has been deprecated. Use STDDEV_POP (differential privacy) instead.

WITH ANONYMIZATION ...
  ANON_STDDEV_POP(expression [CLAMPED BETWEEN lower_bound AND upper_bound])

Description

Takes an expression and computes the population (biased) standard deviation of the values in the expression. The final result is an aggregation across privacy unit columns between 0 and +Inf.

This function must be used with the ANONYMIZATION clause and can support these arguments:

  • expression: The input expression. This can be most numeric input types, such as INT64. NULLs are always ignored.
  • CLAMPED BETWEEN clause: Perform clamping per individual entity values.

NUMERIC and BIGNUMERIC arguments are not allowed. If you need them, cast them as the DOUBLE data type first.

Return type

DOUBLE

Examples

The following differentially private query gets the population (biased) standard deviation of items requested. Smaller aggregations might not be included. This query references a view called view_on_professors.

-- With noise, using the epsilon parameter.
SELECT
  WITH ANONYMIZATION
    OPTIONS(epsilon=10, delta=.01, max_groups_contributed=1)
    item,
    ANON_STDDEV_POP(quantity CLAMPED BETWEEN 0 AND 100) pop_standard_deviation
FROM {{USERNAME}}.view_on_professors
GROUP BY item;

-- These results will change each time you run the query.
-- Smaller aggregations might be removed.
/*----------+------------------------*
 | item     | pop_standard_deviation |
 +----------+------------------------+
 | pencil   | 25.350871122442054     |
 | scissors | 50                     |
 | pen      | 2                      |
 *----------+------------------------*/

ANON_SUM (DEPRECATED)

Warning: This function has been deprecated. Use SUM (differential privacy) instead.

WITH ANONYMIZATION ...
  ANON_SUM(expression [CLAMPED BETWEEN lower_bound AND upper_bound])

Description

Returns the sum of non-NULL, non-NaN values in the expression. The final result is an aggregation across privacy unit columns.

This function must be used with the ANONYMIZATION clause and can support these arguments:

  • expression: The input expression. This can be any numeric input type, such as INT64.
  • CLAMPED BETWEEN clause: Perform clamping per privacy unit column.

Return type

One of the following supertypes:

  • INT64
  • UINT64
  • DOUBLE

Examples

The following differentially private query gets the sum of items requested. Smaller aggregations might not be included. This query references a view called view_on_professors.

-- With noise, using the epsilon parameter.
SELECT
  WITH ANONYMIZATION
    OPTIONS(epsilon=10, delta=.01, max_groups_contributed=1)
    item,
    ANON_SUM(quantity CLAMPED BETWEEN 0 AND 100) quantity
FROM {{USERNAME}}.view_on_professors
GROUP BY item;

-- These results will change each time you run the query.
-- Smaller aggregations might be removed.
/*----------+-----------*
 | item     | quantity  |
 +----------+-----------+
 | pencil   | 143       |
 | pen      | 59        |
 *----------+-----------*/
-- Without noise, using the epsilon parameter.
-- (this un-noised version is for demonstration only)
SELECT
  WITH ANONYMIZATION
    OPTIONS(epsilon=1e20, delta=.01, max_groups_contributed=1)
    item,
    ANON_SUM(quantity) quantity
FROM {{USERNAME}}.view_on_professors
GROUP BY item;

-- These results will not change when you run the query.
/*----------+----------*
 | item     | quantity |
 +----------+----------+
 | scissors | 8        |
 | pencil   | 144      |
 | pen      | 58       |
 *----------+----------*/

Note: You can learn more about when and when not to use noise here.

ANON_VAR_POP (DEPRECATED)

Warning: This function has been deprecated. Use VAR_POP (differential privacy) instead.

WITH ANONYMIZATION ...
  ANON_VAR_POP(expression [CLAMPED BETWEEN lower_bound AND upper_bound])

Description

Takes an expression and computes the population (biased) variance of the values in the expression. The final result is an aggregation across privacy unit columns between 0 and +Inf. You can clamp the input values explicitly, otherwise input values are clamped implicitly. Clamping is performed per individual entity values.

This function must be used with the ANONYMIZATION clause and can support these arguments:

  • expression: The input expression. This can be any numeric input type, such as INT64. NULLs are always ignored.
  • CLAMPED BETWEEN clause: Perform clamping per individual entity values.

NUMERIC and BIGNUMERIC arguments are not allowed. If you need them, cast them as the DOUBLE data type first.

Return type

DOUBLE

Examples

The following differentially private query gets the population (biased) variance of items requested. Smaller aggregations might not be included. This query references a view called view_on_professors.

-- With noise, using the epsilon parameter.
SELECT
  WITH ANONYMIZATION
    OPTIONS(epsilon=10, delta=.01, max_groups_contributed=1)
    item,
    ANON_VAR_POP(quantity CLAMPED BETWEEN 0 AND 100) pop_variance
FROM {{USERNAME}}.view_on_professors
GROUP BY item;

-- These results will change each time you run the query.
-- Smaller aggregations might be removed.
/*----------+-----------------*
 | item     | pop_variance    |
 +----------+-----------------+
 | pencil   | 642             |
 | pen      | 2.6666666666665 |
 | scissors | 2500            |
 *----------+-----------------*/

Supplemental materials

Clamp values in a differentially private aggregate function

In differentially private queries, aggregation clamping is used to limit the contribution of outliers. You can clamp explicitly or implicitly as follows:

Implicitly clamp values

If you don't include the contribution bounds named argument with the DIFFERENTIAL_PRIVACY clause, clamping is implicit, which means bounds are derived from the data itself in a differentially private way.

Implicit bounding works best when computed using large datasets. For more information, see Implicit bounding limitations for small datasets.

Details

In differentially private aggregate functions, explicit clamping is optional. If you don't include this clause, clamping is implicit, which means bounds are derived from the data itself in a differentially private way. The process is somewhat random, so aggregations with identical ranges can have different bounds.

Implicit bounds are determined for each aggregation. So if some aggregations have a wide range of values, and others have a narrow range of values, implicit bounding can identify different bounds for different aggregations as appropriate. Implicit bounds might be an advantage or a disadvantage depending on your use case. Different bounds for different aggregations can result in lower error. Different bounds also means that different aggregations have different levels of uncertainty, which might not be directly comparable. Explicit bounds, on the other hand, apply uniformly to all aggregations and should be derived from public information.

When clamping is implicit, part of the total epsilon is spent picking bounds. This leaves less epsilon for aggregations, so these aggregations are noisier.

Example

The following anonymized query clamps each aggregate contribution for each differential privacy ID and within a derived range from the data itself. As long as all or most values fall within this range, your results will be accurate. This query references a view called view_on_professors.

--Without noise (this un-noised version is for demonstration only)
SELECT WITH DIFFERENTIAL_PRIVACY
  OPTIONS (
    epsilon = 1e20,
    delta = .01,
    privacy_unit_column=id
  )
  item,
  AVG(quantity) average_quantity
FROM view_on_professors
GROUP BY item;

/*----------+------------------*
 | item     | average_quantity |
 +----------+------------------+
 | scissors | 8                |
 | pencil   | 72               |
 | pen      | 18.5             |
 *----------+------------------*/

The following anonymized query clamps each aggregate contribution for each differential privacy ID and within a derived range from the data itself. As long as all or most values fall within this range, your results will be accurate. This query references a view called view_on_professors.

--Without noise (this un-noised version is for demonstration only)
SELECT WITH DIFFERENTIAL_PRIVACY
  OPTIONS (
    epsilon = 1e20,
    delta = .01,
    max_groups_contributed = 1
  )
  item,
  AVG(quantity) AS average_quantity
FROM view_on_professors
GROUP BY item;

/*----------+------------------*
 | item     | average_quantity |
 +----------+------------------+
 | scissors | 8                |
 | pencil   | 72               |
 | pen      | 18.5             |
 *----------+------------------*/

Explicitly clamp values

contribution_bounds_per_group => (lower_bound,upper_bound)
contribution_bounds_per_row => (lower_bound,upper_bound)

Use the contribution bounds named argument to explicitly clamp values per group or per row between a lower and upper bound in a DIFFERENTIAL_PRIVACY clause.

Input values:

  • contribution_bounds_per_row: Contributions per privacy unit are clamped on a per-row (per-record) basis. This means the following:
    • Upper and lower bounds are applied to column values in individual rows produced by the input subquery independently.
    • The maximum possible contribution per privacy unit (and per grouping set) is the product of the per-row contribution limit and max_groups_contributed differential privacy parameter.
  • contribution_bounds_per_group: Contributions per privacy unit are clamped on a unique set of entity-specified GROUP BY keys. The upper and lower bounds are applied to values per group after the values are aggregated per privacy unit.
  • lower_bound: Numeric literal that represents the smallest value to include in an aggregation.
  • upper_bound: Numeric literal that represents the largest value to include in an aggregation.

NUMERIC and BIGNUMERIC arguments are not allowed.

Details

In differentially private aggregate functions, clamping explicitly clamps the total contribution from each privacy unit column to within a specified range.

Explicit bounds are uniformly applied to all aggregations. So even if some aggregations have a wide range of values, and others have a narrow range of values, the same bounds are applied to all of them. On the other hand, when implicit bounds are inferred from the data, the bounds applied to each aggregation can be different.

Explicit bounds should be chosen to reflect public information. For example, bounding ages between 0 and 100 reflects public information because the age of most people generally falls within this range.

Important: The results of the query reveal the explicit bounds. Do not use explicit bounds based on the entity data; explicit bounds should be based on public information.

Examples

The following anonymized query clamps each aggregate contribution for each differential privacy ID and within a specified range (0 and 100). As long as all or most values fall within this range, your results will be accurate. This query references a view called view_on_professors.

--Without noise (this un-noised version is for demonstration only)
SELECT WITH DIFFERENTIAL_PRIVACY
  OPTIONS (
    epsilon = 1e20,
    delta = .01,
    privacy_unit_column=id
  )
  item,
  AVG(quantity, contribution_bounds_per_group=>(0,100)) AS average_quantity
FROM view_on_professors
GROUP BY item;

/*----------+------------------*
 | item     | average_quantity |
 +----------+------------------+
 | scissors | 8                |
 | pencil   | 40               |
 | pen      | 18.5             |
 *----------+------------------*/

Notice what happens when most or all values fall outside of the clamped range. To get accurate results, ensure that the difference between the upper and lower bound is as small as possible, and that most inputs are between the upper and lower bound.

--Without noise (this un-noised version is for demonstration only)
SELECT WITH DIFFERENTIAL_PRIVACY
  OPTIONS (
    epsilon = 1e20,
    delta = .01,
    privacy_unit_column=id
  )
  item,
  AVG(quantity, contribution_bounds_per_group=>(50,100)) AS average_quantity
FROM view_on_professors
GROUP BY item;

/*----------+------------------*
 | item     | average_quantity |
 +----------+------------------+
 | scissors | 54               |
 | pencil   | 58               |
 | pen      | 51               |
 *----------+------------------*/

The following differentially private query clamps each aggregate contribution for each privacy unit column and within a specified range (0 and 100). As long as all or most values fall within this range, your results will be accurate. This query references a view called view_on_professors.

--Without noise (this un-noised version is for demonstration only)
SELECT WITH DIFFERENTIAL_PRIVACY
  OPTIONS (
    epsilon = 1e20,
    delta = .01,
    max_groups_contributed = 1
  )
  item,
  AVG(quantity, contribution_bounds_per_group=>(0,100)) AS average_quantity
FROM view_on_professors
GROUP BY item;

/*----------+------------------*
 | item     | average_quantity |
 +----------+------------------+
 | scissors | 8                |
 | pencil   | 40               |
 | pen      | 18.5             |
 *----------+------------------*/

Notice what happens when most or all values fall outside of the clamped range. To get accurate results, ensure that the difference between the upper and lower bound is as small as possible, and that most inputs are between the upper and lower bound.

--Without noise (this un-noised version is for demonstration only)
SELECT WITH DIFFERENTIAL_PRIVACY
  OPTIONS (
    epsilon = 1e20,
    delta = .01,
    max_groups_contributed = 1
  )
  item,
  AVG(quantity, contribution_bounds_per_group=>(50,100)) AS average_quantity
FROM view_on_professors
GROUP BY item;

/*----------+------------------*
 | item     | average_quantity |
 +----------+------------------+
 | scissors | 54               |
 | pencil   | 58               |
 | pen      | 51               |
 *----------+------------------*/

Note: For more information about when and when not to use noise, see Remove noise.