Skip to content

Commit

Permalink
Docs for NetworkX-compatible implementation
Browse files Browse the repository at this point in the history
  • Loading branch information
j6k4m8 committed Aug 12, 2020
1 parent 75c8dff commit 7fb3c6a
Show file tree
Hide file tree
Showing 6 changed files with 393 additions and 709 deletions.
47 changes: 18 additions & 29 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,40 +1,29 @@
# Grand Isomorphisms

Grand is a virtual graph database. Because DynamoDB is a true-serverless database, it makes sense to use serverless scalable technologies to run graph queries against Grand.
Subgraph isomorphism is a resource-heavy (but branch-parallelizable) algorithm that is hugely impactful for large graph analysis. SotA algorithms for this (Ullmann, VF2, BB-Graph) are heavily RAM-bound, but this is due to a large number of small processes each of which hold a small portion of a traversal tree in memory.

In particular, subgraph isomorphism is a resource-heavy (but branch-parallelizable) algorithm that is hugely impactful for large graph analysis. SotA algorithms for this (Ullmann, VF2, BB-Graph) are heavily RAM-bound, but this is due to a large number of small processes each of which hold a small portion of a traversal tree in memory.

_Grand-Iso_ is a subgraph isomorphism algorithm that leverages serverless technology to run in the AWS cloud at infinite scale.\*

> <small>\* You may discover that "infinite" here is predominantly bounded by your wallet, which is no joke.</small>
_Grand-Iso_ is a subgraph isomorphism algorithm that exchanges this resource-limitation for a parallelizable (albeit much much longer) partial-match queue structure.

## Pseudocode for novel "Grand-Iso" algorithm

```
- Provision a DynamoDB table for result storage.
- Accept a motif M, and a host graph H.
- Create an empty list for result storage, R.
- Create an empty queue, Q.
- Preprocessing
- Identify highest-degree node in motif, M1
- Identify second-highest degree node in motif, M2, connected to M1 by
a single edge.
- Identify all nodes with degree of M1 or greater in the host graph,
which also have all required attributes of the M1 and M2 nodes. If
neither M1 nor M2 have degree > 1 nor attributes, select M1 and M2 as
two nodes with attributes defined.
- Enumerate all paths in the host graph from M1 candidates to M2 candidates,
as candidate "backbones" in AWS SQS.
- Identify the most "interesting" node in motif M, m1.
- Add to Q a set of mappings with a single node, with one map for all
nodes in H that satisfy the requirements of m1: degree, attributes, etc
- Motif Search
- For each backbone candidate:
- Schedule an AWS Lambda:
- Pop the backbone from the queue.
- Traverse all shortest paths in the motif starting at the nearest
of either M1 or M2
- If multiple nodes are valid candidates, queue a new backbone with
each option, and terminate the current Lambda.
- When all paths are valid paths in the host graph, add the list
of participant nodes to a result in the DynamoDB table.
- "Pop" a backbone B from Q
- Identify as m1 the most interesting node in motif M that does not yet
have a mapping assigned in B.
- Identify all nodes that are valid mappings from the backbone to m1,
based upon degree, attributes, etc.
- If multiple nodes are valid candidates, add each new backbone to Q.
- Otherwise, when all nodes in M have a valid mapping in B to H, add
the mapping to the results set R.
- Continue while there are still backbones in Q.
- Reporting
- Return a serialization of the results from the DynamoDB table.
- Cleanup
- Delete the backbone queue
- Delete the results table (after collection)
- Return the set R to the user.
```
57 changes: 11 additions & 46 deletions docs/reference/grandiso/grandiso.md
Original file line number Diff line number Diff line change
@@ -1,61 +1,26 @@
## _Class_ `_GrandIsoLimit`
## *Function* `sort_motif_nodes_by_interestingness(motif: nx.Graph) -> dict`

A limit supervisor that limits the execution of a GrandIso algorithm run.

## _Class_ `GrandIso`
Sort the nodes in a motif by their interestingness.

A high-level class for managing cloud-scale subgraph isomorphism using the novel grand-iso backbone algorithm.
Most interesting nodes are defined to be those that most rapidly filter the list of nodes down to a smaller set.

## _Function_ `instance_id(self)`

Get the unique instance ID for this run-instance of Grand-Iso.

## _Function_ `ready(self)`
## *Function* `find_motifs(motif: nx.DiGraph, host: nx.DiGraph) -> List[dict]`

Returns True if the instance is ready to begin execution.

## _Function_ `set_graph(self, graph: grand.Graph)`
Get a list of mappings from motif node IDs to host graph IDs.

Set a pointer to the dynamo-backed grand Graph object.
### form

### Arguments

> - **graph** (`grand.Graph`: `None`): The graph to use (must be dynamo-backed)
### Returns

None

## _Function_ `_provision_queue(self)`

Provision the SQS for this run.

## _Function_ `_provision_backbone_lambda(self)`

Provision a new lambda function to run each backbone check.

## _Function_ `_provision_results_table(self)`

Provision a new DynamoDB table to hold the results of this run.

## _Function_ `provision_resources(self, wait: bool = True)`

Provision all cloud resources that will be required for this run.
```
> - **motif_id** (`None`: `None`): host_id, ...}] ```
### Arguments

> - **wait** (`bool`: `True`): Whether to wait for all resources before
returning from this call. If False, the user must await the creation of resources manually.
> - **motif** (`nx.DiGraph`: `None`): The motif graph (needle) to search for
> - **host** (`nx.DiGraph`: `None`): The host graph (haystack) to search within
### Returns
> - **List[dict]** (`None`: `None`): A list of mappings from motif node IDs to host graph IDs
> - **any]** (`None`: `None`): A tuple containing the provisioned resources
## _Function_ `destroy(self)`

Destroy this instance, force-terminating any running tasks.

Destroys and terminates all cloud resources associated with this instance, so exercise proper caution.

This function is best reserved for when a run appears to have gone astray and is running for too long.
Loading

0 comments on commit 7fb3c6a

Please sign in to comment.