Dan Zhang, July 2017
The Auto-Wipeout project is part of a Firebase project to help Firebase developer protect privacy of their end-users. The wipeout project is a cloud function library which can automatically delete user data on end-user account deletion. In this way, our customers (Firebase developers) can focus on making their apps special with a lower cost of writing safer apps.
In order to make using the wipeout library closer to a “one-click” experience, wipeout rules could be automatically generated by information inferred from the authorization rules. This doc explains technical details about the inference process from Real Time Database (RTDB) rules. For more general design and concepts, go to the main design doc.
Quotes from page Understand Firebase Realtime Database Rules:
Firebase Realtime Database Rules determine who has read and write access to your database, how your data is structured, and what indexes exist. These rules live on the Firebase servers and are enforced automatically at all times. Every read and write request will only be completed if your rules allow it.
More detailed references can be found here: Firebase Database Security Rules API.
The authorization rule of the RTDB is parsed to be a json object which naturally has a tree structure. We traverse the tree in a Breadth-First search for “.write” rules. Whenever we find a write rule, we first check the status implied by the write rule (discussed in "Single write rule analysis" below), and combine it with the status of its ancestor (discussed "Hierarchical write rules" below) to decide the status of the current location. If it turns out that the current place holds personal data, the patterns for the location (in the format described in the main design doc) will be recorded in wipeout configuration.
First, we discuss how to deal with a single
The $location
variable in firebase security rules. A variable
that can be used to reference the key of a $location
that was used
earlier in a rule structure. For example, replacing the variable
$uid
in /users/$uid/
with the user name ‘Barney’ will produce path
/users/Barney/
.
A sequence of constant and variables defining the path from root to
the current location of the write rule. For example, /key/$k1/$k2
is
a path pattern. It could match multiple places in the database with
different values for the variables.
A special variable which could represent Firebase authentication uid
of any single user. We use #WIPEOUT_UID
as the auth placeholder to
avoid potential conflicts with variables of the same name since # is
not a valid path component in Firebase RTDB rules.
Path patterns with variables replaced by auth placeholders according
to the write rule. For a particular user with #WIPEOUT_UID
, an
access pattern describes one or a set of paths to places they have
write access to. For example, /users/#WIPEOUT_UID
is an access
pattern describing a single path, and follow/$followee/#WIPEOUT_UID
describes a set of path.
Variables in access patterns which are not auth placeholders. They could take any value which matches the access pattern to a valid path in the database.
Only data at paths owned by the user could be deleted at wipeout.
If a user is the only one with access to a particular path then that
implies the user owns the path. (With the exception of creating an
entry: when data.value == null
, anyone can write to a place, but
only the specific user can write to it afterwards. This case the user
owns the data, but the write rule will be considered accessible by
multiple users. This is not currently dealt with).
Given this assumption, if there’s only one valid access pattern for
a path pattern (means any instance of the pattern can only be
written by one user), then user with #WIPEOUT_UID
owns the
instance(s) of the access pattern.
Now the problem becomes how do we tell the numbers of access patterns?
Recall access patterns are describe possible ways that variables in a
path could be replaced by auth placeholder. And the auth placeholders
will be later replaced by Firebase auth.uid
. So we go into the write
rule and look for variables that must match auth.uid. E.g. with rule
auth.uid == $uid
at path pattern /users/$uid
, $uid
is the variable we
are looking for.
Since write rules could be a logical combination of many sub-rules, we normalize the equation to auth.uid == EXP. The underlying meaning is that: the write rule will not evaluates to false if and only if EXP evaluates to true (only considering auth). Here EXP can be an expression consists of conjunctions and disjunctions of variables or a fixed true or false value.
In order to make future reasoning easier, we convert EXP to Disjunctive Normal Form (DNF). In other word, EXP must be disjunction (OR) of one or more conjunctions (AND) of one or more literals (variable in our case).
Now it is clear that with EXP in DNF, each conjunctive clause maps to one access pattern. In conclusion, in order to find out path patterns with single access pattern, we only need to find out the EXP with single conjunctive clause in DNF.
Here is an example based on the previous definition and reasoning.
Assume we have path pattern /key/$k1/$k2
, the table below shows the
corresponding outcomes with different write rules.
Write Rule | Access Pattern(s) | EXP in DNF | Results |
---|---|---|---|
Auth.uid == $k1 | /key/#WUID/$k2 | ($k2) | Single access |
Auth.uid == $k2 | /key/$k1/#WUID | ($k1) | Single access |
Auth.uid == $k1 && Auth.uid == $k1 | /key/#WUID/#WUID | ($k1 && $k2) | Single access |
Auth.uid == $k1 || Auth.uid == $k1 | /key/#WUID/$k2; /key/$k1/#WUID | ($k1) | ($k2) | Multiple access |
Auth.uid != null | - | (True) | Multiple access |
Auth.uid == null | - | (False) | No access |
Auth.uid == SOME_FIX_ID | - | (False) | No access |
(In this table, use #WUID instead of #WIPEOUT_UID to save space.)
As shown in the table, a write rule could lead to three access results for a path pattern
- [No access] User with uid has no write access to the any instance
of the path pattern. EXP has a fixed False value in this case.
(Note we are looking for patterns for general users , so
auth.uid == SOME_SPECIAL_UID
also implies no access ) - [Single access] Any instance of the path pattern could be written with one specific auth uid. EXP should have only one conjunctive clause in DNF in this case.
- Multiple access] Instances of the path pattern could be written with more than one auth uids. EXP has more than one conjunctive clauses in DNF or has fixed True value in this case.
The operations of the DNF expressions are implemented in expression.js. An expression object has two fields, booleanValue and conjunctionLists. booleanValue can be true or false or undefined. conjunctionLists represents EXP, and is a list of lists, each inner list represents the list of literals in a conjunction clause, and the outer list represents the disjunction of all the conjunctive clauses.
When booleanValue is True or False, the value of EXP is fixed and the conjunctionList doesn’t matter, and it will be set to an empty list. When booleanValue is undefined, we go ahead and look at the conjunctionLists.
We build EXP for write rules in a bottom-up manners.
- Any literal with the value true or false will produce
EXP = True
orEXP = False
. - Equations (Binary expression) with
auth.uid
on one side will produce aEXP = $other_side_variable
.
We always keep EXP in DNF, we could do “AND” and “OR” operations of EXPs.
- AND of two DNF expressions: cross product of the two expressions, where product of two clauses means the union of the their literals.
- OR of two DNF expressions: concatenate clauses from two expressions
The result of operations may need to be simplified according to classical logic:
- When there are true or false in the expression:
- A & 0 = 0, A | 1 = 1
- A & 1 = A, A | 0 = A
- For literals and conjunction clauses, sort and remove duplicates
because:
- Idempotence: A & A = A, A | A = A
- Commutativity: A & B = B & A, A | B = B | A
- Remove redundant clauses according using absorption
- Absorption: A | (A & B) = A
As stated in the previous section, there are three possible access status associated with a write rule, they are: NO_ACCESS, SINGLE_ACCESS, MULT_ACCESS. However, since write rules cascade, getting the access status from a single rule is not enough.
Note: Shallower security rules override rules at deeper paths. Child rules can only grant additional privileges to what parent nodes have already declared. They cannot revoke a read or write privilege.
Child rule can grant additional access, thus the access status of the child node not only depends on the child rule, but also the status of its parent.
For example, parent /keys/$k1
with rule auth.uid == $k1
has
SINGLE_ACCESS, and child /keys/$k1/$k2
with rule auth.uid == $k2
.
The child rule itself is single access, but for the node, it is
granting additional access besides its parent. So for the child node,
the resulting access patterns are [/keys/#WUID, /keys/$k1/#WUID]
and
the access status is MULT_ACCESS then.
The table below shows ACCESS of child node according to parent node access and child rule access
PARENT-NO | PARENT-SINGLE | PARENT-MULT | |
---|---|---|---|
CHILD-NO | NO | PARENT-SINGLE | MULT |
CHILD-SINGLE | CHILD-SINGLE | PARENT-SINGLE/MULT | MULT |
CHILD-MULT | MULT | MULT | MULT |
Note with Parent and child node both having SINGLE_ACCESS, there are two possible outcomes. We need to look into the child rule and see if it grants additional access to its parent. This is done by looking at the conjunction list of parent and child rule. Recall with SINGLE_ACCESS, there should be only one conjunction, each literal in the conjunction adds a restriction to auth.uid. So we look at every literal in the parents conjunction and see if any restriction has been removed in the child rule, if so the child node has MULT_ACCESS, else the child node will have same SINGLE_ACCESS as its parent.
Following the previous example, the parent node has conjunction [$k1]
and the child rule has conjunction [$k2]. The restriction that
auth.uid == $k1
has been removed in the child node this means the
child rule granted additional access and the child node has
MULT_ACCESS. If the child rule has conjunction [$k1,$k2]
, the child
rule is adding additional restriction to the parent node while keeping
the old restriction and no additional access granted. Since child
nodes can’t revoke privilege, the child node will inherit access from
its parent.
A new class Access is introduced to describe the access status of nodes and rule. It has two member:
- accessStatus: NO/SINGLE/MULT_ACCESS
- variableList: Empty list if accessStatus is NO/MULT; list of literals in conjunction if access is SINGLE
An access object can be created from expression object of write rules. Static method nodeAccess() calculates child node access from child rule access and parent node access.
Besides variables in the path, Firebase RTDB security rule also supports references to data through methods like child(), and parent(). Our auto extraction also supports a subset of references as listed below
- Predefined variables:
- data: corresponds to the current data in RTDB at the location of the currently rule. [supported]
- root: corresponds to the data at the root of the RTDB. [supported]
- newData :corresponds to the data that will result if the write is allowed. [ignored] because the restriction on new data is not related to the privilege to write (access) at a place().
- Referencing methods
- child(): Gets a reference for the location at the specified relative path. One single argument required to specify the path.[supported]
- parent(): Gets a reference for the location parent path. No argument. [supported]
- Evaluating methods
- val(): Gets the value from the reference. [supported]
- exists(): Return true if the reference contains any data. [supported]
Data references are evaluated to a string and treated as a normal literal/identifier when we parse the write rule at a particular location. They may imply additional conditions to the access pattern Here are some examples:
With path = /user/data/$uid
- newData.val() => undefined
- data.val() => val(rules,user,data,$uid)
- data.exists() => exists(rules,user,data,$uid)
- data.child(‘name’).val() => val(rules,user,data,$uid,name)
- data.child(‘name’).parent().child(‘age’).val() => val(rules,user,data,$uid,age)
- data.parent().child(auth.uid).val() => val(rules,user,data,#WIPEOUT_UID)
- root.child(‘data’).child(data.child(‘friend’)).val() => val(rules,user,data,val(rules,user,data,$uid,friend))
Besides restrictions on authentication id, there are also additional conditions on variables in the path or data references. We include these conditions in a ‘condition’ field in wipeout rules.
Condition could be
- A single data reference with
exists()
value, likedata.exists()
- Or a binary expression of Literal/Identifier/data references connected by operators like ‘==’ , ‘<’ , ‘>’, ’!=’... For example, data.val() != null.
Conditions expression additional restrictions on write accesses, and they have a ‘AND’ relationship restrictions on authentication information.
In terms of implementation, conditions are associated with expressions and access objects. An Expression object with booleanValue True or Undefined can have valid condition. Only Access objects with with SINGLE_ACCESS have conditions.
When we do AND and OR operations of Expressions, their conditions will be joined by the corresponding operator. When we are calculating child node Access based on parent node Access and child rule Access, there’s only one case where conditions will be involved: parent and child rule both have SINGLE_ACCESS and child node have SINGLE_ACCESS. In this case the resulting condition of child node is the OR of the two conditions.