使用分析规则限制数据访问权限

本文档提供有关 GoogleSQL for BigQuery 中分析规则的一般信息。

什么是分析规则?

分析规则会强制执行共享数据的条件。借助 BigQuery,您可以使用数据净室或通过直接将分析规则应用于视图,对视图强制执行分析规则。强制执行分析规则时,您强制要求查询该视图的所有人都必须遵循视图上的该分析规则。如果满足分析规则,则查询会生成满足分析规则的输出。如果查询不满足分析规则,则会引发错误。

支持的分析规则

支持以下分析规则:

  • 聚合阈值分析规则:强制执行数据集内必须存在的不同实体的最小数量。您可以使用语句或数据净室对视图强制执行此规则。
  • 差分隐私分析规则:强制执行隐私预算,当数据受到差分隐私保护时,该预算会限制向订阅者透露的数据。您可以使用语句或数据净室对视图强制执行此规则。
  • 联接限制分析规则:限制可用于特定列的联接类型。查询中不需要存在联接,某些列可以被阻止。可以包含在聚合阈值分析规则或差分隐私分析规则中。您可以使用语句或数据净室对视图强制执行此规则。
  • 列表重叠分析规则:与联接限制分析规则类似,但它不能与其他分析规则一起使用。您可以使用数据净室对视图强制执行此规则。

聚合阈值分析规则

聚合阈值分析规则强制规定了数据集内必须存在的不同实体的数量下限,以便有关该数据集的统计信息包含在查询结果中。

如果强制执行,聚合阈值分析规则会跨维度对数据进行分组,同时确保满足聚合阈值。它会计算每个组的不同隐私单元(由隐私单元列表示)的数量,并且仅输出不同隐私单元计数满足聚合阈值的组。

包含此分析规则的视图还可以包含联接限制分析规则

为视图定义聚合阈值分析规则

您可以在数据净室中或使用以下语句为视图定义聚合阈值分析规则:

CREATE OR REPLACE VIEW VIEW_NAME
  OPTIONS (
    privacy_policy= '{
      "aggregation_threshold_policy": {
        "threshold" : THRESHOLD,
        "privacy_unit_column": "PRIVACY_UNIT_COLUMN"
      }
    }'
  )
  AS QUERY;

替换以下值:

  • VIEW_NAME:视图的路径和名称。
  • THRESHOLD:需要对查询结果中的每一行产生影响的不同隐私单元的最小数量。如果某个潜在行不满足此阈值,则查询结果中将省略该行。
  • PRIVACY_UNIT_COLUMN:表示隐私权单元列。隐私权单元列是隐私权单元的唯一标识符。隐私权单元是隐私单元列中的值,它表示一组数据中的受保护实体。

    您只能使用一个隐私单位列,且隐私单位列的数据类型必须可分组

    隐私单元列中的值无法通过查询直接映射,且您只能使用分析规则支持的聚合函数来聚合此列中的数据。

  • QUERY:对视图的查询。

在以下示例中,系统会创建聚合阈值分析规则:

-- Create a table called ExamTable.
CREATE TABLE mydataset.ExamTable AS (
  SELECT "Hansen" AS last_name, "P91" AS test_id, 510 AS test_score UNION ALL
  SELECT "Wang", "U25", 500 UNION ALL
  SELECT "Wang", "C83", 520 UNION ALL
  SELECT "Wang", "U25", 460 UNION ALL
  SELECT "Hansen", "C83", 420 UNION ALL
  SELECT "Hansen", "C83", 560 UNION ALL
  SELECT "Devi", "U25", 580 UNION ALL
  SELECT "Devi", "P91", 480 UNION ALL
  SELECT "Ivanov", "U25", 490 UNION ALL
  SELECT "Ivanov", "P91", 540 UNION ALL
  SELECT "Silva", "U25", 550);

-- Create a view that includes ExamTable.
CREATE VIEW mydataset.ExamView
OPTIONS(
  privacy_policy= '{"aggregation_threshold_policy": {"threshold": 3, "privacy_unit_column": "last_name"}}'
)
AS ( SELECT * FROM mydataset.ExamTable );

如需查看 CREATE VIEWprivacy_policy 译法,请参阅 CREATE VIEW 中的 OPTIONS 列表。

更新视图的聚合阈值分析规则

您可以在数据净室中或使用以下语句更改视图的聚合阈值分析规则:

ALTER VIEW VIEW_NAME
SET OPTIONS (
  privacy_policy= '{
    "aggregation_threshold_policy": {
      "threshold" : THRESHOLD,
      "privacy_unit_column": "PRIVACY_UNIT_COLUMN"
    }
  }'
)

替换以下值:

  • VIEW_NAME:视图的路径和名称。
  • THRESHOLD:需要对查询结果中的每一行产生影响的不同隐私单元的最小数量。如果某个潜在行不满足此阈值,则查询结果中将省略该行。
  • PRIVACY_UNIT_COLUMN:表示隐私权单元列。隐私权单元列是隐私权单元的唯一标识符。隐私权单元是隐私单元列中的值,它表示一组数据中的受保护实体。

    您只能使用一个隐私单位列,且隐私单位列的数据类型必须可分组

    隐私单元列中的值无法通过查询直接映射,且您只能使用分析规则支持的聚合函数来聚合此列中的数据。

在以下示例中,系统会更新聚合阈值分析规则:

ALTER VIEW mydataset.ExamView
SET OPTIONS (
  privacy_policy= '{"aggregation_threshold_policy": {"threshold": 50, "privacy_unit_column": "last_name"}}'
);

如需查看 ALTER VIEWprivacy_policy 语法,请参阅 ALTER VIEW SET OPTIONS 中的 OPTIONS 列表。

查询聚合阈值分析规则强制执行的视图

您可以使用 AGGREGATION_THRESHOLD 子句来查询具有聚合阈值分析规则的视图。查询必须包含聚合函数,且您只能在此查询中使用分析规则支持的聚合函数

在以下示例中,查询具有聚合阈值分析规则的视图:

-- Create a table called ExamTable.
CREATE TABLE mydataset.ExamTable AS (
  SELECT "Hansen" AS last_name, "P91" AS test_id, 510 AS test_score UNION ALL
  SELECT "Wang", "U25", 500 UNION ALL
  SELECT "Wang", "C83", 520 UNION ALL
  SELECT "Wang", "U25", 460 UNION ALL
  SELECT "Hansen", "C83", 420 UNION ALL
  SELECT "Hansen", "C83", 560 UNION ALL
  SELECT "Devi", "U25", 580 UNION ALL
  SELECT "Devi", "P91", 480 UNION ALL
  SELECT "Ivanov", "U25", 490 UNION ALL
  SELECT "Ivanov", "P91", 540 UNION ALL
  SELECT "Silva", "U25", 550);

-- Create a view that includes ExamTable.
CREATE VIEW mydataset.ExamView
OPTIONS(
  privacy_policy= '{"aggregation_threshold_policy": {"threshold": 3, "privacy_unit_column": "last_name"}}'
)
AS ( SELECT * FROM mydataset.ExamTable );

-- Query an analysis–rule enforced view called ExamView.
SELECT WITH AGGREGATION_THRESHOLD
  test_id, COUNT(DISTINCT last_name) AS student_count
FROM mydataset.ExamView
GROUP BY test_id;

/*---------+---------------*
 | test_id | student_count |
 +---------+---------------+
 | P91     | 3             |
 | U25     | 4             |
 *---------+---------------*/

如需查看 AGGREGATION_THRESHOLD 子句的其他示例,请参阅 AGGREGATION_THRESHOLD 子句

差分隐私分析规则

差分隐私分析规则会强制执行隐私预算,以限制在使用差分隐私保护数据时向订阅者显示的数据。当所有查询的 epsilon 或增量的总和达到 epsilon 或总增量值时,隐私预算会阻止任何订阅者查询共享数据。您可以在视图中使用此分析规则。包含此分析规则的视图还可以包含联接限制分析规则

为视图定义差分隐私分析规则

您可以在数据净室中或使用以下语句为视图定义差分隐私分析规则:

CREATE OR REPLACE VIEW VIEW_NAME
  OPTIONS (
    privacy_policy= '{
      "differential_privacy_policy": {
        "privacy_unit_column": "PRIVACY_UNIT_COLUMN",
        "max_epsilon_per_query": MAX_EPSILON_PER_QUERY,
        "epsilon_budget": EPSILON_BUDGET,
        "delta_per_query": DELTA_PER_QUERY,
        "delta_budget": DELTA_BUDGET,
        "max_groups_contributed": MAX_GROUPS_CONTRIBUTED
      }
    }'
  )
  AS QUERY;

替换以下值:

  • PRIVACY_UNIT_COLUMN:用于标识使用隐私分析规则保护的数据集中实体的。此值为 JSON 字符串。
  • MAX_EPSILON_PER_QUERY:确定每次查询的隐私保证强度,并防止单次查询达到总 Epsilon 值。此值是一个介于 0.0011e+15 之间的 JSON 数字。
  • EPSILON_BUDGET:表示整体隐私保证强度的 epsilon 预算。这用于视图上的所有差分隐私查询。此值必须大于 MAX_EPSILON_PER_QUERY,是一个介于 0.0011e+15 之间的 JSON 数字。
  • DELTA_PER_QUERY:每个查询的隐私丢失概率上限,超出由总 epsilon 确定的保证。防止单个查询达到总增量。此值是一个介于 1e-151 之间的 JSON 数字。
  • DELTA_BUDGET增量预算,表示整体隐私损失在超出总 epsilon 确定的保证的可能性上限。这用于视图上的所有差分隐私查询。此值必须大于 DELTA_PER_QUERY,是一个介于 1e-151000 之间的 JSON 数字。
  • MAX_GROUPS_CONTRIBUTED:可选。限制隐私单元列中的实体可以贡献的组数。该值必须是非负 JSON 整数。
  • QUERY:对视图的查询。

在以下示例中,系统会创建差分隐私分析规则:

-- Create a table called ExamTable.
CREATE TABLE mydataset.ExamTable AS (
  SELECT "Hansen" AS last_name, "P91" AS test_id, 510 AS test_score UNION ALL
  SELECT "Wang", "U25", 500 UNION ALL
  SELECT "Wang", "C83", 520 UNION ALL
  SELECT "Wang", "U25", 460 UNION ALL
  SELECT "Hansen", "C83", 420 UNION ALL
  SELECT "Hansen", "C83", 560 UNION ALL
  SELECT "Devi", "U25", 580 UNION ALL
  SELECT "Devi", "P91", 480 UNION ALL
  SELECT "Ivanov", "U25", 490 UNION ALL
  SELECT "Ivanov", "P91", 540 UNION ALL
  SELECT "Silva", "U25", 550);

-- Create a view that includes ExamTable.
CREATE VIEW mydataset.ExamView
OPTIONS(
  privacy_policy= '{"differential_privacy_policy": {"privacy_unit_column": "last_name", "max_epsilon_per_query": 10.01, "epsilon_budget": 1000.0, "delta_per_query": 0.01, "delta_budget": 1000.0, "max_groups_contributed": 2}}'
)
AS ( SELECT * FROM mydataset.ExamTable );

-- Epsilon parameters are set very high due to the small dataset.
-- In practice, epsilon should be much smaller.

如需查看 CREATE VIEWprivacy_policy 译法,请参阅 CREATE VIEW 中的 OPTIONS 列表。

更新视图的差分隐私分析规则

您可以在数据净室中或使用以下语句更改视图的差分隐私分析规则:

ALTER VIEW VIEW_NAME
SET OPTIONS (
  privacy_policy= '{
    "differential_privacy_policy": {
      "privacy_unit_column": "PRIVACY_UNIT_COLUMN",
      "max_epsilon_per_query": MAX_EPSILON_PER_QUERY,
      "epsilon_budget": EPSILON_BUDGET,
      "delta_per_query": DELTA_PER_QUERY,
      "delta_budget": DELTA_BUDGET,
      "max_groups_contributed": MAX_GROUPS_CONTRIBUTED
    }
  }'
)

替换以下值:

  • PRIVACY_UNIT_COLUMN:用于标识使用隐私分析规则保护的数据集中实体的。此值为 JSON 字符串。
  • MAX_EPSILON_PER_QUERY:确定每次查询的隐私保证强度,并防止单次查询达到总 Epsilon 值。此值是一个介于 0.0011e+15 之间的 JSON 数字。
  • EPSILON_BUDGET:表示整体隐私保证强度的 epsilon 预算。这用于视图上的所有差分隐私查询。此值必须大于 MAX_EPSILON_PER_QUERY,是一个介于 0.0011e+15 之间的 JSON 数字。
  • DELTA_PER_QUERY:每个查询的隐私丢失概率上限,超出由总 epsilon 确定的保证。防止单个查询达到总增量。此值是一个介于 1e-151 之间的 JSON 数字。
  • DELTA_BUDGET增量预算,表示整体隐私损失在超出总 epsilon 确定的保证的可能性上限。这用于视图上的所有差分隐私查询。此值必须大于 DELTA_PER_QUERY,是一个介于 1e-151000 之间的 JSON 数字。
  • MAX_GROUPS_CONTRIBUTED:可选。限制隐私单元列中的实体可以贡献的组数。该值必须是非负 JSON 整数。

在以下示例中,更新了差分隐私分析规则:

ALTER VIEW mydataset.ExamView
SET OPTIONS(
  privacy_policy= '{"differential_privacy_policy": {"privacy_unit_column": "last_name", "max_epsilon_per_query": 0.01, "epsilon_budget": 1000.0, "delta_per_query": 0.005, "delta_budget": 1000.0, "max_groups_contributed": 2}}'
);

-- Epsilon parameters are set very high due to the small dataset.
-- In practice, epsilon should be much smaller.

如需查看 ALTER VIEWprivacy_policy 语法,请参阅 ALTER VIEW SET OPTIONS 中的 OPTIONS 列表。

查询差分隐私分析规则强制执行的视图

您可以使用 DIFFERENTIAL_PRIVACY 子句查询具有差分隐私分析规则的视图。如需查看 DIFFERENTIAL_PRIVACY 子句的语法和其他示例,请参阅 DIFFERENTIAL_PRIVACY 子句

确保差分隐私查询运行

为强制执行分析规则的视图创建差分隐私查询,并确保该查询运行。

例如,在以下查询中,差分隐私数据从 ExamView 成功返回,因为 epsilondeltamax_groups_contributed 都满足 ExamView 中差异分析规则的条件:

-- Create a table called ExamTable.
CREATE TABLE mydataset.ExamTable AS (
  SELECT "Hansen" AS last_name, "P91" AS test_id, 510 AS test_score UNION ALL
  SELECT "Wang", "U25", 500 UNION ALL
  SELECT "Wang", "C83", 520 UNION ALL
  SELECT "Wang", "U25", 460 UNION ALL
  SELECT "Hansen", "C83", 420 UNION ALL
  SELECT "Hansen", "C83", 560 UNION ALL
  SELECT "Devi", "U25", 580 UNION ALL
  SELECT "Devi", "P91", 480 UNION ALL
  SELECT "Ivanov", "U25", 490 UNION ALL
  SELECT "Ivanov", "P91", 540 UNION ALL
  SELECT "Silva", "U25", 550);

-- Create a view that includes ExamTable.
CREATE VIEW mydataset.ExamView
OPTIONS(
  privacy_policy= '{"differential_privacy_policy": {"privacy_unit_column": "last_name", "max_epsilon_per_query": 1000.01, "epsilon_budget": 1000.0, "delta_per_query": 0.01, "delta_budget": 1000.0, "max_groups_contributed": 2}}'
)
AS ( SELECT * FROM mydataset.ExamTable );

创建视图后,稍等片刻,然后当您确定视图已创建后,运行以下查询:

-- Query an analysis–rule enforced view called ExamView.
SELECT
  WITH DIFFERENTIAL_PRIVACY
    OPTIONS(epsilon=10, delta=.01, max_groups_contributed=2, privacy_unit_column=last_name)
    test_id,
    AVG(test_score, contribution_bounds_per_group => (0,100)) AS average_test_score
FROM mydataset.ExamView
GROUP BY test_id;

-- These results will change each time you run the query.
-- Smaller aggregations might be removed.
/*---------+--------------------*
 | test_id | average_test_score |
 +---------+--------------------+
 | P91     | ???                |
 | U25     | ???                |
 *---------+--------------------*/

-- Epsilon parameters are set very high due to the small dataset.
-- In practice, epsilon should be much smaller.

使用超出范围 epsilon 阻止查询

Epsilon 可用于添加或移除噪声。epsilon 越多,增加的噪声越少。如要确保差分隐私查询的噪声最少,请密切关注差分隐私分析规则中 max_epsilon_per_query 的值。

例如,在以下查询中,查询会被阻止并报错,因为 DIFFERENTIAL_PRIVACY 子句中的 epsilon 高于 ExamView 中的 max_epsilon_per_query

-- Create a table called ExamTable.
CREATE TABLE mydataset.ExamTable AS (
  SELECT "Hansen" AS last_name, "P91" AS test_id, 510 AS test_score UNION ALL
  SELECT "Wang", "U25", 500 UNION ALL
  SELECT "Wang", "C83", 520 UNION ALL
  SELECT "Wang", "U25", 460 UNION ALL
  SELECT "Hansen", "C83", 420 UNION ALL
  SELECT "Hansen", "C83", 560 UNION ALL
  SELECT "Devi", "U25", 580 UNION ALL
  SELECT "Devi", "P91", 480 UNION ALL
  SELECT "Ivanov", "U25", 490 UNION ALL
  SELECT "Ivanov", "P91", 540 UNION ALL
  SELECT "Silva", "U25", 550);

-- Create a view that includes ExamTable.
CREATE VIEW mydataset.ExamView
OPTIONS(
  privacy_policy= '{"differential_privacy_policy": {"privacy_unit_column": "last_name", "max_epsilon_per_query": 10.01, "epsilon_budget": 1000.0, "delta_per_query": 0.01, "delta_budget": 1000.0, "max_groups_contributed": 2}}'
)
AS ( SELECT * FROM mydataset.ExamTable );

创建视图后,请稍等片刻,然后运行以下查询:

-- Error: epsilon is out of bounds.
SELECT
  WITH DIFFERENTIAL_PRIVACY
    OPTIONS(epsilon=1e20, delta=.01, max_groups_contributed=2, privacy_unit_column=last_name)
    test_id,
    AVG(test_score, contribution_bounds_per_group => (0,100)) AS average_test_score
FROM mydataset.ExamView
GROUP BY test_id;

-- Epsilon parameters are set very high due to the small dataset.
-- In practice, epsilon should be much smaller.

阻止没有特定增量的查询

增量表示用于确定数据是否会意外泄露的阈值。增量越少,阈值越高,增量越大,阈值越小。为确保差分隐私查询具有特定阈值,请更新差分隐私分析规则中的 delta_per_query 设置。

例如,在以下查询中,查询会被阻止并报错,因为 DIFFERENTIAL_PRIVACY 子句中的 deltaExamView 中的 delta_per_query 不匹配:

-- Create a table called ExamTable.
CREATE TABLE mydataset.ExamTable AS (
  SELECT "Hansen" AS last_name, "P91" AS test_id, 510 AS test_score UNION ALL
  SELECT "Wang", "U25", 500 UNION ALL
  SELECT "Wang", "C83", 520 UNION ALL
  SELECT "Wang", "U25", 460 UNION ALL
  SELECT "Hansen", "C83", 420 UNION ALL
  SELECT "Hansen", "C83", 560 UNION ALL
  SELECT "Devi", "U25", 580 UNION ALL
  SELECT "Devi", "P91", 480 UNION ALL
  SELECT "Ivanov", "U25", 490 UNION ALL
  SELECT "Ivanov", "P91", 540 UNION ALL
  SELECT "Silva", "U25", 550);

-- Create a view that includes ExamTable.
CREATE VIEW mydataset.ExamView
OPTIONS(
  privacy_policy= '{"differential_privacy_policy": {"privacy_unit_column": "last_name", "max_epsilon_per_query": 10.01, "epsilon_budget": 1000.0, "delta_per_query": 0.01, "delta_budget": 1000.0, "max_groups_contributed": 2}}'
)
AS ( SELECT * FROM mydataset.ExamTable );

创建视图后,请稍等片刻,然后运行以下查询:

-- Error: delta in query does not match delta_per_query.
SELECT
  WITH DIFFERENTIAL_PRIVACY
    OPTIONS(epsilon=10, delta=.02, max_groups_contributed=2, privacy_unit_column=last_name)
    test_id,
    AVG(test_score, contribution_bounds_per_group => (0,100)) AS average_test_score
FROM mydataset.ExamView
GROUP BY test_id;

-- Epsilon parameters are set very high due to the small dataset.
-- In practice, epsilon should be much smaller.

阻止超出 epsilon 预算的查询

Epsilon 可用于添加或移除噪声。epsilon 越低,噪声越高,epsilon 越多,噪声越低。即使噪声较高,对同一数据的多个查询最终也会揭示数据的无噪声版本。为避免这种情况,您可以创建 epsilon 预算。如果您想添加 epsilon 预算,请查看视图的差分隐私分析规则中 epsilon_budget 的值。

运行以下查询三次。第三次,查询被阻止,因为使用的总 epsilon 为 30,但 ExamView 中的 epsilon_budget 仅允许 25.6

-- Create a table called ExamTable.
CREATE TABLE mydataset.ExamTable AS (
  SELECT "Hansen" AS last_name, "P91" AS test_id, 510 AS test_score UNION ALL
  SELECT "Wang", "U25", 500 UNION ALL
  SELECT "Wang", "C83", 520 UNION ALL
  SELECT "Wang", "U25", 460 UNION ALL
  SELECT "Hansen", "C83", 420 UNION ALL
  SELECT "Hansen", "C83", 560 UNION ALL
  SELECT "Devi", "U25", 580 UNION ALL
  SELECT "Devi", "P91", 480 UNION ALL
  SELECT "Ivanov", "U25", 490 UNION ALL
  SELECT "Ivanov", "P91", 540 UNION ALL
  SELECT "Silva", "U25", 550);

-- Create a view that includes ExamTable.
CREATE VIEW mydataset.ExamView
OPTIONS(
  privacy_policy= '{"differential_privacy_policy": {"privacy_unit_column": "last_name", "max_epsilon_per_query": 10.01, "epsilon_budget": 25.6, "delta_per_query": 0.01, "delta_budget": 1000.0, "max_groups_contributed": 2}}'
)
AS ( SELECT * FROM mydataset.ExamTable );

创建视图后,请稍等片刻,然后运行以下查询三次:

-- Error after three query runs: epsilon budget exceeded
SELECT
  WITH DIFFERENTIAL_PRIVACY
    OPTIONS(epsilon=10, delta=.01, max_groups_contributed=2, privacy_unit_column=last_name)
    test_id,
    AVG(test_score, contribution_bounds_per_group => (0,100)) AS average_test_score
FROM mydataset.ExamView
GROUP BY test_id;

-- Epsilon parameters are set very high due to the small dataset.
-- In practice, epsilon should be much smaller.

阻止超出增量预算的查询

增量表示用于确定数据是否会意外泄露的阈值。增量越少,阈值越高,增量越大,阈值越小。即使阈值较高,对相同数据的多个查询最终也可能会泄露差分隐私数据。为避免发生这种情况,您可以创建增量预算。如果您想添加增量预算,请查看视图的差分隐私分析规则中的 delta_budget 值。

运行以下查询七次。第七次,查询被阻止,因为使用的总增量为 0.7,但 ExamView 中的 delta_budget 仅允许 0.6

-- Create a table called ExamTable.
CREATE TABLE mydataset.ExamTable AS (
  SELECT "Hansen" AS last_name, "P91" AS test_id, 510 AS test_score UNION ALL
  SELECT "Wang", "U25", 500 UNION ALL
  SELECT "Wang", "C83", 520 UNION ALL
  SELECT "Wang", "U25", 460 UNION ALL
  SELECT "Hansen", "C83", 420 UNION ALL
  SELECT "Hansen", "C83", 560 UNION ALL
  SELECT "Devi", "U25", 580 UNION ALL
  SELECT "Devi", "P91", 480 UNION ALL
  SELECT "Ivanov", "U25", 490 UNION ALL
  SELECT "Ivanov", "P91", 540 UNION ALL
  SELECT "Silva", "U25", 550);

-- Create a view that includes ExamTable.
CREATE VIEW mydataset.ExamView
OPTIONS(
  privacy_policy= '{"differential_privacy_policy": {"privacy_unit_column": "last_name", "max_epsilon_per_query": 10.01, "epsilon_budget": 0.2, "delta_per_query": 0.01, "delta_budget": 0.6, "max_groups_contributed": 2}}'
)
AS ( SELECT * FROM mydataset.ExamTable );

创建视图后,请稍等片刻,然后运行以下查询七次:

-- Error after seven query runs: epsilon budget exceeded
SELECT
  WITH DIFFERENTIAL_PRIVACY
    OPTIONS(epsilon=10, delta=.01, max_groups_contributed=2, privacy_unit_column=last_name)
    test_id,
    AVG(test_score, contribution_bounds_per_group => (0,100)) AS average_test_score
FROM mydataset.ExamView
GROUP BY test_id;

-- Epsilon parameters are set very high due to the small dataset.
-- In practice, epsilon should be much smaller.

屏蔽允许群组贡献过多内容的查询

您可以限制每个实体在差分隐私查询中可以参与的组数。如要确保差分隐私查询每个实体可参与的组数量有限,请密切关注差分隐私分析规则中 max_groups_contributed 的值。

例如,在以下查询中,查询会被阻止并报错,因为 DIFFERENTIAL_PRIVACY 子句中的 max_groups_contributed 高于 ExamView 中的 max_groups_contributed

-- Create a table called ExamTable.
CREATE TABLE mydataset.ExamTable AS (
  SELECT "Hansen" AS last_name, "P91" AS test_id, 510 AS test_score UNION ALL
  SELECT "Wang", "U25", 500 UNION ALL
  SELECT "Wang", "C83", 520 UNION ALL
  SELECT "Wang", "U25", 460 UNION ALL
  SELECT "Hansen", "C83", 420 UNION ALL
  SELECT "Hansen", "C83", 560 UNION ALL
  SELECT "Devi", "U25", 580 UNION ALL
  SELECT "Devi", "P91", 480 UNION ALL
  SELECT "Ivanov", "U25", 490 UNION ALL
  SELECT "Ivanov", "P91", 540 UNION ALL
  SELECT "Silva", "U25", 550);

-- Create a view that includes ExamTable.
CREATE VIEW mydataset.ExamView
OPTIONS(
  privacy_policy= '{"differential_privacy_policy": {"privacy_unit_column": "last_name", "max_epsilon_per_query": 10.01, "epsilon_budget": 1000.0, "delta_per_query": 0.01, "delta_budget": 1000.0, "max_groups_contributed": 2}}'
)
AS ( SELECT * FROM mydataset.ExamTable );

创建视图后,请稍等片刻,然后运行以下查询:

-- Error: max_groups_contributed is out of bounds.
SELECT
  WITH DIFFERENTIAL_PRIVACY
    OPTIONS(epsilon=10, delta=.02, max_groups_contributed=3, privacy_unit_column=last_name)
    test_id,
    AVG(test_score, contribution_bounds_per_group => (0,100)) AS average_test_score
FROM mydataset.ExamView
GROUP BY test_id;

-- Epsilon parameters are set very high due to the small dataset.
-- In practice, epsilon should be much smaller.

联接限制分析规则

联接限制分析规则限制可用于视图中特定列的联接类型。您可以在视图中使用此分析规则。包含此分析规则的视图还可以包含聚合阈值分析规则差分隐私分析规则

为视图定义联接限制分析规则

您可以在数据净室中或使用以下语句为视图定义联接限制分析规则。

CREATE OR REPLACE VIEW VIEW_NAME
  OPTIONS (
    privacy_policy= '{
      "join_restriction_policy": {
        "join_condition": "JOIN_CONDITION",
        "join_allowed_columns": JOIN_ALLOWED_COLUMNS
      }
    }'
  )
  AS QUERY;

替换以下值:

  • JOIN_CONDITION:要对视图强制执行的联接限制的类型。可以是下列值之一:

    • JOIN_NOT_REQUIRED:查询此视图不需要联接。如果使用联接,则只能使用 join_allowed_columns 中的列。

    • JOIN_BLOCKED:此视图无法沿任何列联接。在这种情况下,请勿设置 join_allowed_columns

    • JOIN_ANY:如需查询此视图,join_allowed_columns 中必须联接至少一列。

    • JOIN_ALLjoin_allowed_columns 中的所有列都必须内联接才能查询此视图。

  • JOIN_ALLOWED_COLUMNS:可以执行联接操作的列。

  • QUERY:对视图的查询。

在以下示例中,将创建联接限制分析规则:

-- Create a table called ExamTable.
CREATE TABLE mydataset.ExamTable AS (
  SELECT "Hansen" AS last_name, "P91" AS test_id, 510 AS test_score UNION ALL
  SELECT "Wang", "U25", 500 UNION ALL
  SELECT "Wang", "C83", 520 UNION ALL
  SELECT "Wang", "U25", 460 UNION ALL
  SELECT "Hansen", "C83", 420 UNION ALL
  SELECT "Hansen", "C83", 560 UNION ALL
  SELECT "Devi", "U25", 580 UNION ALL
  SELECT "Devi", "P91", 480 UNION ALL
  SELECT "Ivanov", "U25", 490 UNION ALL
  SELECT "Ivanov", "P91", 540 UNION ALL
  SELECT "Silva", "U25", 550);

-- Create a view that includes ExamTable.
CREATE VIEW mydataset.ExamView
OPTIONS(
  privacy_policy= '{"join_restriction_policy": {"join_condition": "JOIN_ANY", "join_allowed_columns": ["test_id", "test_score"]}}'
)
AS ( SELECT * FROM mydataset.ExamTable );

将联接限制分析规则与另一分析规则搭配使用

您可以将联接限制分析规则与聚合阈值分析规则或差分隐私分析规则结合使用。但是,在将联接限制与视图的其他分析规则一起使用后,您将无法更改该分析规则。

在以下示例中,联接限制分析规则和聚合阈值分析规则结合使用:

-- Create a table called ExamTable.
CREATE TABLE mydataset.ExamTable AS (
  SELECT "Hansen" AS last_name, "P91" AS test_id, 510 AS test_score UNION ALL
  SELECT "Wang", "U25", 500 UNION ALL
  SELECT "Wang", "C83", 520 UNION ALL
  SELECT "Wang", "U25", 460 UNION ALL
  SELECT "Hansen", "C83", 420 UNION ALL
  SELECT "Hansen", "C83", 560 UNION ALL
  SELECT "Devi", "U25", 580 UNION ALL
  SELECT "Devi", "P91", 480 UNION ALL
  SELECT "Ivanov", "U25", 490 UNION ALL
  SELECT "Ivanov", "P91", 540 UNION ALL
  SELECT "Silva", "U25", 550);

-- Create a view that includes ExamTable.
CREATE VIEW mydataset.ExamView
OPTIONS(
  privacy_policy= '{"join_restriction_policy": {"join_condition": "JOIN_ANY", "join_allowed_columns": ["test_id", "test_score"]}, "aggregation_threshold_policy": {"threshold": 3, "privacy_unit_column": "last_name"}}'
)
AS ( SELECT * FROM mydataset.ExamTable );

更新视图的联接限制分析规则

您可以使用数据净室或以下语句更改视图的联接限制分析规则:

ALTER VIEW VIEW_NAME
SET OPTIONS (
  privacy_policy= '{
    "join_restriction_policy": {
      "join_condition": "JOIN_CONDITION",
      "join_allowed_columns": JOIN_ALLOWED_COLUMNS
    }
  }'
)

替换以下值:

  • JOIN_CONDITION:要对视图强制执行的联接限制的类型。可以是下列值之一:

    • JOIN_NOT_REQUIRED:查询此视图不需要联接。如果使用联接,则只能使用 join_allowed_columns 中的列。

    • JOIN_BLOCKED:此视图无法沿任何列联接。在这种情况下,请勿设置 join_allowed_columns

    • JOIN_ANY:如需查询此视图,join_allowed_columns 中必须联接至少一列。

    • JOIN_ALLjoin_allowed_columns 中的所有列都必须内联接才能查询此视图。

  • JOIN_ALLOWED_COLUMNS:可以执行联接操作的列。

  • QUERY:对视图的查询。

在以下示例中,系统会更新联接限制分析规则:

ALTER VIEW mydataset.ExamView
SET OPTIONS(
  privacy_policy= '{"join_restriction_policy": {"join_condition": "JOIN_ALL", "join_allowed_columns": ["test_id", "test_score"]}}'
);

如需查看 ALTER VIEWprivacy_policy 语法,请参阅 ALTER VIEW SET OPTIONS 中的 OPTIONS 列表。

查询联接限制分析规则强制执行的视图

您可以对具有联接限制分析规则的视图执行联接操作。如需查看 JOIN 操作的语法,请参阅联接运算

确保联接限制性查询运行

您应该进行测试,以确保联接限制性查询可以运行。

例如,在以下查询中,将从 ExamViewStudentTable 成功返回已联接的数据:

-- Create a table called ExamTable.
CREATE TABLE mydataset.ExamTable AS (
  SELECT "Hansen" AS last_name, "P91" AS test_id, 510 AS test_score UNION ALL
  SELECT "Wang", "U25", 500 UNION ALL
  SELECT "Wang", "C83", 520 UNION ALL
  SELECT "Wang", "U25", 460 UNION ALL
  SELECT "Hansen", "C83", 420 UNION ALL
  SELECT "Hansen", "C83", 560 UNION ALL
  SELECT "Devi", "U25", 580 UNION ALL
  SELECT "Devi", "P91", 480 UNION ALL
  SELECT "Ivanov", "U25", 490 UNION ALL
  SELECT "Ivanov", "P91", 540 UNION ALL
  SELECT "Silva", "U25", 550);

-- Create a table called StudentTable.
CREATE TABLE mydataset.StudentTable AS (
  SELECT "Hansen" AS last_name, 510 AS test_score UNION ALL
  SELECT "Wang", 500 UNION ALL
  SELECT "Devi", 580 UNION ALL
  SELECT "Ivanov", 490 UNION ALL
  SELECT "Silva", 550);

-- Create a view that includes ExamTable.
CREATE VIEW mydataset.ExamView
OPTIONS(
  privacy_policy= '{"join_restriction_policy": {"join_condition": "JOIN_ANY", "join_allowed_columns": ["test_score"]}}'
)
AS ( SELECT * FROM mydataset.ExamTable );

-- Query the ExamView view.
SELECT *
FROM mydataset.ExamView INNER JOIN mydataset.StudentTable USING (test_score)
GROUP BY test_id;

-- These results will change each time you run the query.
-- Smaller aggregations might be removed.
/*---------+--------------------*
 | test_id | average_test_score |
 +---------+--------------------+
 | P91     | ???                |
 | U25     | ???                |
 *---------+--------------------*/

阻止没有必需列的联接操作

如果联接操作不包含至少一个必需列,则可以阻止该联接操作。为此,请在联接限制分析规则中添加以下部分:

"join_restriction_policy": {
  "join_condition": "JOIN_ANY",
  "join_allowed_columns": ["column_name", ...]
}

例如,在以下查询中,查询会被阻止并报错,因为查询不包含 ExamViewStudentTabletest_scoretest_id 列的任何联接操作:

-- Create a table called ExamTable.
CREATE TABLE mydataset.ExamTable AS (
  SELECT "Hansen" AS last_name, "P91" AS test_id, 510 AS test_score UNION ALL
  SELECT "Wang", "U25", 500 UNION ALL
  SELECT "Wang", "C83", 520 UNION ALL
  SELECT "Wang", "U25", 460 UNION ALL
  SELECT "Hansen", "C83", 420 UNION ALL
  SELECT "Hansen", "C83", 560 UNION ALL
  SELECT "Devi", "U25", 580 UNION ALL
  SELECT "Devi", "P91", 480 UNION ALL
  SELECT "Ivanov", "U25", 490 UNION ALL
  SELECT "Ivanov", "P91", 540 UNION ALL
  SELECT "Silva", "U25", 550);

-- Create a table called StudentTable.
CREATE TABLE mydataset.StudentTable AS (
  SELECT "Hansen" AS last_name, 510 AS test_score UNION ALL
  SELECT "Wang", 500 UNION ALL
  SELECT "Devi", 580 UNION ALL
  SELECT "Ivanov", 490 UNION ALL
  SELECT "Silva", 550);

-- Create a view that includes ExamTable.
CREATE VIEW mydataset.ExamView
OPTIONS(
  privacy_policy= '{"join_restriction_policy": {"join_condition": "JOIN_ANY", "join_allowed_columns": ["test_score", "test_id"]}}'
)
AS ( SELECT * FROM mydataset.ExamTable );

-- Query the ExamView view.
SELECT *
FROM mydataset.ExamView INNER JOIN mydataset.StudentTable USING (last_name)
GROUP BY test_id;

如需运行上述查询,请在 USING 子句中将 last_name 替换为 test_score

阻止没有联接操作的查询

如果查询必须具有联接操作,则可以使用以下联接限制分析规则之一在没有联接操作的情况下阻止查询:

"join_restriction_policy": {
  "join_condition": "JOIN_NOT_REQUIRED"
}
"join_restriction_policy": {
  "join_condition": "JOIN_NOT_REQUIRED",
  "join_allowed_columns": []
}

例如,在以下查询中,查询会被阻止,因为查询中没有联接操作:

-- Create a table called ExamTable.
CREATE TABLE mydataset.ExamTable AS (
  SELECT "Hansen" AS last_name, "P91" AS test_id, 510 AS test_score UNION ALL
  SELECT "Wang", "U25", 500 UNION ALL
  SELECT "Wang", "C83", 520 UNION ALL
  SELECT "Wang", "U25", 460 UNION ALL
  SELECT "Hansen", "C83", 420 UNION ALL
  SELECT "Hansen", "C83", 560 UNION ALL
  SELECT "Devi", "U25", 580 UNION ALL
  SELECT "Devi", "P91", 480 UNION ALL
  SELECT "Ivanov", "U25", 490 UNION ALL
  SELECT "Ivanov", "P91", 540 UNION ALL
  SELECT "Silva", "U25", 550);

-- Create a table called StudentTable.
CREATE TABLE mydataset.StudentTable AS (
  SELECT "Hansen" AS last_name, 510 AS test_score UNION ALL
  SELECT "Wang", 500 UNION ALL
  SELECT "Devi", 580 UNION ALL
  SELECT "Ivanov", 490 UNION ALL
  SELECT "Silva", 550);

-- Create a view that includes ExamTable.
CREATE VIEW mydataset.ExamView
OPTIONS(
  privacy_policy= '{"join_restriction_policy": {"join_condition": "JOIN_NOT_REQUIRED"}}'
)
AS ( SELECT * FROM mydataset.ExamTable );

-- Query the ExamView view.
SELECT *
FROM mydataset.ExamView
GROUP BY test_id;

阻止没有联接操作和必需列的查询

如果查询必须具有联接操作,并且联接操作必须至少包含一个必需列,请在联接限制分析规则中添加以下部分:

"join_restriction_policy": {
  "join_condition": "JOIN_NOT_REQUIRED",
  "join_allowed_columns": ["column_name", ...]
}

例如,在以下查询中,由于联接操作不包含 join_allowed_columns 数组中的列,因此查询会被阻止:

-- Create a table called ExamTable.
CREATE TABLE mydataset.ExamTable AS (
  SELECT "Hansen" AS last_name, "P91" AS test_id, 510 AS test_score UNION ALL
  SELECT "Wang", "U25", 500 UNION ALL
  SELECT "Wang", "C83", 520 UNION ALL
  SELECT "Wang", "U25", 460 UNION ALL
  SELECT "Hansen", "C83", 420 UNION ALL
  SELECT "Hansen", "C83", 560 UNION ALL
  SELECT "Devi", "U25", 580 UNION ALL
  SELECT "Devi", "P91", 480 UNION ALL
  SELECT "Ivanov", "U25", 490 UNION ALL
  SELECT "Ivanov", "P91", 540 UNION ALL
  SELECT "Silva", "U25", 550);

-- Create a table called StudentTable.
CREATE TABLE mydataset.StudentTable AS (
  SELECT "Hansen" AS last_name, 510 AS test_score UNION ALL
  SELECT "Wang", 500 UNION ALL
  SELECT "Devi", 580 UNION ALL
  SELECT "Ivanov", 490 UNION ALL
  SELECT "Silva", 550);

-- Create a view that includes ExamTable.
CREATE VIEW mydataset.ExamView
OPTIONS(
  privacy_policy= '{"join_restriction_policy": {"join_condition": "JOIN_NOT_REQUIRED", "join_allowed_columns": ["test_score"]}}'
)
AS ( SELECT * FROM mydataset.ExamTable );

-- Query the ExamView view.
SELECT *
FROM mydataset.ExamView INNER JOIN mydataset.StudentTable USING (last_name)
GROUP BY test_id;

如需运行上述查询,请在 USING 子句中将 last_name 替换为 test_score

阻止所有联接操作

您可以阻止所有联接操作。为此,只需在联接限制分析规则中添加以下部分:

"join_restriction_policy": {
  "join_condition": "JOIN_BLOCKED",
}

例如,在以下查询中,由于存在联接操作,查询会被阻止:

-- Create a table called ExamTable.
CREATE TABLE mydataset.ExamTable AS (
  SELECT "Hansen" AS last_name, "P91" AS test_id, 510 AS test_score UNION ALL
  SELECT "Wang", "U25", 500 UNION ALL
  SELECT "Wang", "C83", 520 UNION ALL
  SELECT "Wang", "U25", 460 UNION ALL
  SELECT "Hansen", "C83", 420 UNION ALL
  SELECT "Hansen", "C83", 560 UNION ALL
  SELECT "Devi", "U25", 580 UNION ALL
  SELECT "Devi", "P91", 480 UNION ALL
  SELECT "Ivanov", "U25", 490 UNION ALL
  SELECT "Ivanov", "P91", 540 UNION ALL
  SELECT "Silva", "U25", 550);

-- Create a table called StudentTable.
CREATE TABLE mydataset.StudentTable AS (
  SELECT "Hansen" AS last_name, 510 AS test_score UNION ALL
  SELECT "Wang", 500 UNION ALL
  SELECT "Devi", 580 UNION ALL
  SELECT "Ivanov", 490 UNION ALL
  SELECT "Silva", 550);

-- Create a view that includes ExamTable.
CREATE VIEW mydataset.ExamView
OPTIONS(
  privacy_policy= '{"join_restriction_policy": {"join_condition": "JOIN_BLOCKED"}}'
)
AS ( SELECT * FROM mydataset.ExamTable );

-- Query the ExamView view.
SELECT *
FROM mydataset.ExamView INNER JOIN mydataset.StudentTable USING (last_name)
GROUP BY test_id;

如需运行上述查询,请移除 INNER JOIN 操作。

阻止不含所有必需列的内联接操作

如果内联接操作不包含所有必需的列,则可以阻止该操作。为此,请在联接限制分析规则中添加以下部分:

"join_restriction_policy": {
  "join_condition": "JOIN_ALL",
  "join_allowed_columns": ["column_name", ...]
}

例如,在以下查询中,由于查询在联接操作中不包含 test_score,因此查询会被阻止并报错:

-- Create a table called ExamTable.
CREATE TABLE mydataset.ExamTable AS (
  SELECT "Hansen" AS last_name, "P91" AS test_id, 510 AS test_score UNION ALL
  SELECT "Wang", "U25", 500 UNION ALL
  SELECT "Wang", "C83", 520 UNION ALL
  SELECT "Wang", "U25", 460 UNION ALL
  SELECT "Hansen", "C83", 420 UNION ALL
  SELECT "Hansen", "C83", 560 UNION ALL
  SELECT "Devi", "U25", 580 UNION ALL
  SELECT "Devi", "P91", 480 UNION ALL
  SELECT "Ivanov", "U25", 490 UNION ALL
  SELECT "Ivanov", "P91", 540 UNION ALL
  SELECT "Silva", "U25", 550);

-- Create a table called StudentTable.
CREATE TABLE mydataset.StudentTable AS (
  SELECT "Hansen" AS last_name, 510 AS test_score UNION ALL
  SELECT "Wang", 500 UNION ALL
  SELECT "Devi", 580 UNION ALL
  SELECT "Ivanov", 490 UNION ALL
  SELECT "Silva", 550);

-- Create a view that includes ExamTable.
CREATE VIEW mydataset.ExamView
OPTIONS(
  privacy_policy= '{"join_restriction_policy": {"join_condition": "JOIN_ALL", "join_allowed_columns": ["test_score", "last_name"]}}'
)
AS ( SELECT * FROM mydataset.ExamTable );

-- Query the ExamView view.
SELECT *
FROM mydataset.ExamView INNER JOIN mydataset.StudentTable USING (last_name)
GROUP BY test_id;

如需运行上述查询,请将 USING (last_name) 替换为 USING (last_name, test_score)

列出重叠情况分析规则

限制可用于特定列的联接类型。联接需要存在于查询中,并且某些列不能被阻止。您可以在数据净室中为视图定义和更新列表重叠分析规则。如需了解详情,请参阅与数据净室共享敏感数据

限制

分析规则具有以下限制:

  • 如果您已向视图添加分析规则,则无法在聚合阈值分析规则和差分分析规则之间切换。

聚合阈值分析规则具有以下限制:

  • 您只能在针对聚合阈值分析规则强制执行的视图的查询中使用支持的聚合函数
  • 您无法向具体化视图添加聚合阈值分析规则。
  • 如果您在聚合阈值查询中使用聚合阈值分析规则强制执行的视图,则它们的隐私单元列必须具有相同的值。
  • 如果您在聚合阈值查询中使用由聚合阈值分析规则强制执行的视图,则查询中的阈值必须大于或等于视图中的阈值。
  • 具有聚合阈值分析规则的任何视图上都会停用时间旅行功能。

差分隐私分析规则具有以下限制:

  • 视图的隐私预算用尽后,该视图将无法使用,并且您必须创建新的视图。

联接限制分析规则具有以下限制:

  • 如果您未将 privacy_unit_column 作为 join_allowed_column 放入联接限制分析规则中,则在某些情况下,您可能无法联接任何列。

价格