-
Notifications
You must be signed in to change notification settings - Fork 10
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
0 parents
commit 425727a
Showing
5 changed files
with
386 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,40 @@ | ||
# Grand Isomorphisms | ||
|
||
Grand is a virtual graph database. Because DynamoDB is a true-serverless database, it makes sense to use serverless scalable technologies to run graph queries against Grand. | ||
|
||
In particular, subgraph isomorphism is a resource-heavy (but branch-parallelizable) algorithm that is hugely impactful for large graph analysis. SotA algorithms for this (Ullmann, VF2, BB-Graph) are heavily RAM-bound, but this is due to a large number of small processes each of which hold a small portion of a traversal tree in memory. | ||
|
||
_Grand-Iso_ is a subgraph isomorphism algorithm that leverages serverless technology to run in the AWS cloud at infinite scale.\* | ||
|
||
> <small>\* You may discover that "infinite" here is predominantly bounded by your wallet, which is no joke.</small> | ||
## Pseudocode for novel "Grand-Iso" algorithm | ||
|
||
``` | ||
- Provision a DynamoDB table for result storage. | ||
- Preprocessing | ||
- Identify highest-degree node in motif, M1 | ||
- Identify second-highest degree node in motif, M2, connected to M1 by | ||
a single edge. | ||
- Identify all nodes with degree of M1 or greater in the host graph, | ||
which also have all required attributes of the M1 and M2 nodes. If | ||
neither M1 nor M2 have degree > 1 nor attributes, select M1 and M2 as | ||
two nodes with attributes defined. | ||
- Enumerate all paths in the host graph from M1 candidates to M2 candidates, | ||
as candidate "backbones" in AWS SQS. | ||
- Motif Search | ||
- For each backbone candidate: | ||
- Schedule an AWS Lambda: | ||
- Pop the backbone from the queue. | ||
- Traverse all shortest paths in the motif starting at the nearest | ||
of either M1 or M2 | ||
- If multiple nodes are valid candidates, queue a new backbone with | ||
each option, and terminate the current Lambda. | ||
- When all paths are valid paths in the host graph, add the list | ||
of participant nodes to a result in the DynamoDB table. | ||
- Reporting | ||
- Return a serialization of the results from the DynamoDB table. | ||
- Cleanup | ||
- Delete the backbone queue | ||
- Delete the results table (after collection) | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,107 @@ | ||
## *Class* `_GrandIsoLimit` | ||
|
||
|
||
A limit supervisor that limits the execution of a GrandIso algorithm run. | ||
|
||
|
||
|
||
## *Class* `GrandIso` | ||
|
||
|
||
A high-level class for managing cloud-scale subgraph isomorphism using the novel grand-iso backbone algorithm. | ||
|
||
### Pseudocode | ||
|
||
- Provision a DynamoDB table for result storage. - Preprocessing | ||
- Identify highest-degree node in motif, M1 - Identify second-highest degree node in motif, M2, connected to M1 by | ||
a single edge. | ||
- Identify all nodes with degree of M1 or greater in the host graph, | ||
which also have all required attributes of the M1 and M2 nodes. If neither M1 nor M2 have degree > 1 nor attributes, select M1 and M2 as two nodes with attributes defined. | ||
- Enumerate all paths in the host graph from M1 candidates to M2 candidates, | ||
as candidate "backbones" in AWS SQS. | ||
- Motif Search | ||
### candidate | ||
### Lambda | ||
- Pop the backbone from the queue. - Traverse all shortest paths in the motif starting at the nearest | ||
of either M1 or M2 | ||
- If multiple nodes are valid candidates, queue a new backbone with | ||
each option, and terminate the current Lambda. | ||
- When all paths are valid paths in the host graph, add the list | ||
of participant nodes to a result in the DynamoDB table. | ||
- Reporting | ||
- Return a serialization of the results from the DynamoDB table. | ||
- Cleanup | ||
- Delete the backbone queue - Delete the results table (after collection) | ||
|
||
|
||
|
||
## *Function* `instance_id(self)` | ||
|
||
|
||
Get the unique instance ID for this run-instance of Grand-Iso. | ||
|
||
|
||
|
||
## *Function* `ready(self)` | ||
|
||
|
||
Returns True if the instance is ready to begin execution. | ||
|
||
|
||
## *Function* `set_graph(self, graph: grand.Graph)` | ||
|
||
|
||
Set a pointer to the dynamo-backed grand Graph object. | ||
|
||
### Arguments | ||
> - **graph** (`grand.Graph`: `None`): The graph to use (must be dynamo-backed) | ||
### Returns | ||
None | ||
|
||
|
||
|
||
## *Function* `_provision_queue(self)` | ||
|
||
|
||
Provision the SQS for this run. | ||
|
||
|
||
|
||
## *Function* `_provision_backbone_lambda(self)` | ||
|
||
|
||
Provision a new lambda function to run each backbone check. | ||
|
||
|
||
|
||
## *Function* `_provision_results_table(self)` | ||
|
||
|
||
Provision a new DynamoDB table to hold the results of this run. | ||
|
||
|
||
|
||
## *Function* `provision_resources(self, wait: bool = True)` | ||
|
||
|
||
Provision all cloud resources that will be required for this run. | ||
|
||
### Arguments | ||
> - **wait** (`bool`: `True`): Whether to wait for all resources before | ||
returning from this call. If False, the user must await the creation of resources manually. | ||
|
||
### Returns | ||
> - **any]** (`None`: `None`): A tuple containing the provisioned resources | ||
|
||
|
||
## *Function* `destroy(self)` | ||
|
||
|
||
Destroy this instance, force-terminating any running tasks. | ||
|
||
Destroys and terminates all cloud resources associated with this instance, so exercise proper caution. | ||
|
||
This function is best reserved for when a run appears to have gone astray and is running for too long. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,207 @@ | ||
from typing import Union, Optional | ||
|
||
from grand import Graph | ||
|
||
from uuid import uuid4 | ||
|
||
|
||
class _GrandIsoLimit: | ||
""" | ||
A limit supervisor that limits the execution of a GrandIso algorithm run. | ||
""" | ||
|
||
def __init__( | ||
self, | ||
parent: "GrandIso", | ||
lambda_count_limit: int, | ||
wallclock_limit_seconds: float, | ||
) -> None: | ||
""" | ||
Create a new GrandIso limit supervisor. | ||
The `lambda_count_limit` is a TOTAL number of lambda functions to run | ||
in the lifetime of the execution, NOT a concurrency limit. | ||
The wallclock limit is the total number of seconds from initial run, | ||
after which new lambdas will not be scheduled. A lambda that is running | ||
when the clock times out will continue to run to completion, and will | ||
not turn back into a pumpkin. | ||
Arguments: | ||
parent (GrandIso): The GrandIso algorithm pointer to monitor | ||
lambda_count_limit (int): The maximum number of lambdas to run | ||
wallclock_limit_seconds (float): The maximum number of run seconds | ||
Returns: | ||
None | ||
""" | ||
self.parent = parent | ||
self.lambda_count_limit = lambda_count_limit | ||
self.wallclock_limit_seconds = wallclock_limit_seconds | ||
|
||
|
||
class _UnboundedGrandIsoLimit(_GrandIsoLimit): | ||
def __init__(self, parent: "GrandIso"): | ||
self.parent = parent | ||
self.lambda_count_limit = None | ||
self.wallclock_limit_seconds = None | ||
|
||
|
||
class GrandIso: | ||
""" | ||
A high-level class for managing cloud-scale subgraph isomorphism using the | ||
novel grand-iso backbone algorithm. | ||
Pseudocode: | ||
- Provision a DynamoDB table for result storage. | ||
- Preprocessing | ||
- Identify highest-degree node in motif, M1 | ||
- Identify second-highest degree node in motif, M2, connected to M1 by | ||
a single edge. | ||
- Identify all nodes with degree of M1 or greater in the host graph, | ||
which also have all required attributes of the M1 and M2 nodes. If | ||
neither M1 nor M2 have degree > 1 nor attributes, select M1 and M2 as | ||
two nodes with attributes defined. | ||
- Enumerate all paths in the host graph from M1 candidates to M2 candidates, | ||
as candidate "backbones" in AWS SQS. | ||
- Motif Search | ||
- For each backbone candidate: | ||
- Schedule an AWS Lambda: | ||
- Pop the backbone from the queue. | ||
- Traverse all shortest paths in the motif starting at the nearest | ||
of either M1 or M2 | ||
- If multiple nodes are valid candidates, queue a new backbone with | ||
each option, and terminate the current Lambda. | ||
- When all paths are valid paths in the host graph, add the list | ||
of participant nodes to a result in the DynamoDB table. | ||
- Reporting | ||
- Return a serialization of the results from the DynamoDB table. | ||
- Cleanup | ||
- Delete the backbone queue | ||
- Delete the results table (after collection) | ||
""" | ||
|
||
def __init__( | ||
self, | ||
graph: Optional[Union[grand.Graph]], | ||
exact_match: bool = False, | ||
limits: _GrandIsoLimit = None, | ||
**kwargs | ||
): | ||
""" | ||
Initialize a new GrandIso management instance. | ||
Arguments: | ||
exact_match (bool: False): Whether to allow edges in the host graph | ||
that were not explicitly specified in the motif | ||
""" | ||
# Whether to allow edges in the host graph that were not explicitly | ||
# specified in the motif | ||
self._exact_match = exact_match | ||
|
||
# Attach a limit handler to prevent runaway jobs. By default, there is | ||
# no limit (¡CUIDADO!) | ||
self._limits = limits if limits else _UnboundedGrandIsoLimit(self) | ||
|
||
# Assign a unique identifier to this instance so that all resources | ||
# (such as db tables, queues, lambdas) run in isolation from other | ||
# Grand-Iso runs. | ||
self._instance_id = str(uuid4()) | ||
|
||
# Save a pointer to the graph. If the graph is not set, then proceed | ||
# as usual but do not let the user run an execution until it is set. | ||
self._graph = graph | ||
self._graph_specified = True if self._graph else False | ||
|
||
@property | ||
def instance_id(self): | ||
""" | ||
Get the unique instance ID for this run-instance of Grand-Iso. | ||
""" | ||
return self._instance_id | ||
|
||
def ready(self): | ||
""" | ||
Returns True if the instance is ready to begin execution. | ||
""" | ||
return self._graph_specified | ||
|
||
def set_graph(self, graph: grand.Graph): | ||
""" | ||
Set a pointer to the dynamo-backed grand Graph object. | ||
Arguments: | ||
graph (grand.Graph): The graph to use (must be dynamo-backed) | ||
Returns: | ||
None | ||
""" | ||
self._graph = graph | ||
self._graph_specified = True | ||
|
||
def _provision_queue(self): | ||
""" | ||
Provision the SQS for this run. | ||
""" | ||
raise NotImplementedError() | ||
|
||
def _provision_backbone_lambda(self): | ||
""" | ||
Provision a new lambda function to run each backbone check. | ||
""" | ||
raise NotImplementedError() | ||
|
||
def _provision_results_table(self): | ||
""" | ||
Provision a new DynamoDB table to hold the results of this run. | ||
""" | ||
raise NotImplementedError() | ||
|
||
def provision_resources(self, wait: bool = True): | ||
""" | ||
Provision all cloud resources that will be required for this run. | ||
Arguments: | ||
wait (bool: True): Whether to wait for all resources before | ||
returning from this call. If False, the user must await the | ||
creation of resources manually. | ||
Returns: | ||
Tuple[any, any, any]: A tuple containing the provisioned resources | ||
""" | ||
queue = self._provision_queue() | ||
function = self._provision_backbone_lambda() | ||
table = self._provision_results_table() | ||
|
||
if wait: | ||
raise NotImplementedError() | ||
|
||
return (queue, function, table) | ||
|
||
def destroy(self): | ||
""" | ||
Destroy this instance, force-terminating any running tasks. | ||
Destroys and terminates all cloud resources associated with this | ||
instance, so exercise proper caution. | ||
This function is best reserved for when a run appears to have gone | ||
astray and is running for too long. | ||
""" | ||
raise NotImplementedError() | ||
|
||
@property | ||
def limits(self): | ||
return self._limits |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
import unittest | ||
|
||
from . import GrandIso | ||
|
||
|
||
class TestGrandIso(unittest.TestCase): | ||
def test_can_create(self): | ||
GrandIso() | ||
|
||
def test_can_create_without_limits(self): | ||
self.assertEqual(GrandIso().limits.wallclock_limit_seconds, None) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
import setuptools | ||
|
||
with open("README.md", "r") as fh: | ||
long_description = fh.read() | ||
|
||
setuptools.setup( | ||
name="grand-iso", | ||
version="0.1.0", | ||
author="Jordan Matelsky", | ||
author_email="[email protected]", | ||
description="Subgraph isomorphism at cloud-scale", | ||
long_description=long_description, | ||
long_description_content_type="text/markdown", | ||
url="https://github.com/j6k4m8/grand-iso", | ||
packages=setuptools.find_packages(), | ||
classifiers=[ | ||
"Programming Language :: Python :: 3", | ||
"License :: OSI Approved :: MIT License", | ||
"Operating System :: OS Independent", | ||
], | ||
) |