🎉 Initial commit

aplbrain · Aug 10, 2020 · 425727a · 425727a
commit 425727a
Show file tree

Hide file tree

Showing 5 changed files with 386 additions and 0 deletions.
diff --git a/README.md b/README.md
@@ -0,0 +1,40 @@
+# Grand Isomorphisms
+
+Grand is a virtual graph database. Because DynamoDB is a true-serverless database, it makes sense to use serverless scalable technologies to run graph queries against Grand.
+
+In particular, subgraph isomorphism is a resource-heavy (but branch-parallelizable) algorithm that is hugely impactful for large graph analysis. SotA algorithms for this (Ullmann, VF2, BB-Graph) are heavily RAM-bound, but this is due to a large number of small processes each of which hold a small portion of a traversal tree in memory.
+
+_Grand-Iso_ is a subgraph isomorphism algorithm that leverages serverless technology to run in the AWS cloud at infinite scale.\*
+
+> <small>\* You may discover that "infinite" here is predominantly bounded by your wallet, which is no joke.</small>
+
+## Pseudocode for novel "Grand-Iso" algorithm
+
+```
+- Provision a DynamoDB table for result storage.
+- Preprocessing
+    - Identify highest-degree node in motif, M1
+    - Identify second-highest degree node in motif, M2, connected to M1 by
+        a single edge.
+    - Identify all nodes with degree of M1 or greater in the host graph,
+        which also have all required attributes of the M1 and M2 nodes. If
+        neither M1 nor M2 have degree > 1 nor attributes, select M1 and M2 as
+        two nodes with attributes defined.
+    - Enumerate all paths in the host graph from M1 candidates to M2 candidates,
+        as candidate "backbones" in AWS SQS.
+- Motif Search
+    - For each backbone candidate:
+        - Schedule an AWS Lambda:
+            - Pop the backbone from the queue.
+            - Traverse all shortest paths in the motif starting at the nearest
+                of either M1 or M2
+            - If multiple nodes are valid candidates, queue a new backbone with
+                each option, and terminate the current Lambda.
+            - When all paths are valid paths in the host graph, add the list
+                of participant nodes to a result in the DynamoDB table.
+- Reporting
+    - Return a serialization of the results from the DynamoDB table.
+- Cleanup
+    - Delete the backbone queue
+    - Delete the results table (after collection)
+```
diff --git a/docs/reference/grandiso/grandiso.md b/docs/reference/grandiso/grandiso.md
@@ -0,0 +1,107 @@
+## *Class* `_GrandIsoLimit`
+
+
+A limit supervisor that limits the execution of a GrandIso algorithm run.
+
+
+
+## *Class* `GrandIso`
+
+
+A high-level class for managing cloud-scale subgraph isomorphism using the novel grand-iso backbone algorithm.
+
+### Pseudocode
+
+    - Provision a DynamoDB table for result storage.     - Preprocessing
+        - Identify highest-degree node in motif, M1         - Identify second-highest degree node in motif, M2, connected to M1 by
+            a single edge.
+        - Identify all nodes with degree of M1 or greater in the host graph,
+            which also have all required attributes of the M1 and M2 nodes. If             neither M1 nor M2 have degree > 1 nor attributes, select M1 and M2 as             two nodes with attributes defined.
+        - Enumerate all paths in the host graph from M1 candidates to M2 candidates,
+            as candidate "backbones" in AWS SQS.
+    - Motif Search
+### candidate
+### Lambda
+                - Pop the backbone from the queue.                 - Traverse all shortest paths in the motif starting at the nearest
+                    of either M1 or M2
+                - If multiple nodes are valid candidates, queue a new backbone with
+                    each option, and terminate the current Lambda.
+                - When all paths are valid paths in the host graph, add the list
+                    of participant nodes to a result in the DynamoDB table.
+    - Reporting
+        - Return a serialization of the results from the DynamoDB table.
+    - Cleanup
+        - Delete the backbone queue         - Delete the results table (after collection)
+
+
+
+## *Function* `instance_id(self)`
+
+
+Get the unique instance ID for this run-instance of Grand-Iso.
+
+
+
+## *Function* `ready(self)`
+
+
+Returns True if the instance is ready to begin execution.
+
+
+## *Function* `set_graph(self, graph: grand.Graph)`
+
+
+Set a pointer to the dynamo-backed grand Graph object.
+
+### Arguments
+> - **graph** (`grand.Graph`: `None`): The graph to use (must be dynamo-backed)
+
+### Returns
+    None
+
+
+
+## *Function* `_provision_queue(self)`
+
+
+Provision the SQS for this run.
+
+
+
+## *Function* `_provision_backbone_lambda(self)`
+
+
+Provision a new lambda function to run each backbone check.
+
+
+
+## *Function* `_provision_results_table(self)`
+
+
+Provision a new DynamoDB table to hold the results of this run.
+
+
+
+## *Function* `provision_resources(self, wait: bool = True)`
+
+
+Provision all cloud resources that will be required for this run.
+
+### Arguments
+> - **wait** (`bool`: `True`): Whether to wait for all resources before
+        returning from this call. If False, the user must await the         creation of resources manually.
+
+### Returns
+> - **any]** (`None`: `None`): A tuple containing the provisioned resources
+
+
+
+## *Function* `destroy(self)`
+
+
+Destroy this instance, force-terminating any running tasks.
+
+Destroys and terminates all cloud resources associated with this instance, so exercise proper caution.
+
+This function is best reserved for when a run appears to have gone astray and is running for too long.
+
diff --git a/grandiso/__init__.py b/grandiso/__init__.py
@@ -0,0 +1,207 @@
+from typing import Union, Optional
+
+from grand import Graph
+
+from uuid import uuid4
+
+
+class _GrandIsoLimit:
+    """
+    A limit supervisor that limits the execution of a GrandIso algorithm run.
+
+    """
+
+    def __init__(
+        self,
+        parent: "GrandIso",
+        lambda_count_limit: int,
+        wallclock_limit_seconds: float,
+    ) -> None:
+        """
+        Create a new GrandIso limit supervisor.
+
+        The `lambda_count_limit` is a TOTAL number of lambda functions to run
+        in the lifetime of the execution, NOT a concurrency limit.
+
+        The wallclock limit is the total number of seconds from initial run,
+        after which new lambdas will not be scheduled. A lambda that is running
+        when the clock times out will continue to run to completion, and will
+        not turn back into a pumpkin.
+
+        Arguments:
+            parent (GrandIso): The GrandIso algorithm pointer to monitor
+            lambda_count_limit (int): The maximum number of lambdas to run
+            wallclock_limit_seconds (float): The maximum number of run seconds
+
+        Returns:
+            None
+
+        """
+        self.parent = parent
+        self.lambda_count_limit = lambda_count_limit
+        self.wallclock_limit_seconds = wallclock_limit_seconds
+
+
+class _UnboundedGrandIsoLimit(_GrandIsoLimit):
+    def __init__(self, parent: "GrandIso"):
+        self.parent = parent
+        self.lambda_count_limit = None
+        self.wallclock_limit_seconds = None
+
+
+class GrandIso:
+    """
+    A high-level class for managing cloud-scale subgraph isomorphism using the
+    novel grand-iso backbone algorithm.
+
+    Pseudocode:
+
+        - Provision a DynamoDB table for result storage.
+        - Preprocessing
+            - Identify highest-degree node in motif, M1
+            - Identify second-highest degree node in motif, M2, connected to M1 by
+                a single edge.
+            - Identify all nodes with degree of M1 or greater in the host graph,
+                which also have all required attributes of the M1 and M2 nodes. If
+                neither M1 nor M2 have degree > 1 nor attributes, select M1 and M2 as
+                two nodes with attributes defined.
+            - Enumerate all paths in the host graph from M1 candidates to M2 candidates,
+                as candidate "backbones" in AWS SQS.
+        - Motif Search
+            - For each backbone candidate:
+                - Schedule an AWS Lambda:
+                    - Pop the backbone from the queue.
+                    - Traverse all shortest paths in the motif starting at the nearest
+                        of either M1 or M2
+                    - If multiple nodes are valid candidates, queue a new backbone with
+                        each option, and terminate the current Lambda.
+                    - When all paths are valid paths in the host graph, add the list
+                        of participant nodes to a result in the DynamoDB table.
+        - Reporting
+            - Return a serialization of the results from the DynamoDB table.
+        - Cleanup
+            - Delete the backbone queue
+            - Delete the results table (after collection)
+
+    """
+
+    def __init__(
+        self,
+        graph: Optional[Union[grand.Graph]],
+        exact_match: bool = False,
+        limits: _GrandIsoLimit = None,
+        **kwargs
+    ):
+        """
+        Initialize a new GrandIso management instance.
+
+        Arguments:
+            exact_match (bool: False): Whether to allow edges in the host graph
+                that were not explicitly specified in the motif
+
+        """
+        # Whether to allow edges in the host graph that were not explicitly
+        # specified in the motif
+        self._exact_match = exact_match
+
+        # Attach a limit handler to prevent runaway jobs. By default, there is
+        # no limit (¡CUIDADO!)
+        self._limits = limits if limits else _UnboundedGrandIsoLimit(self)
+
+        # Assign a unique identifier to this instance so that all resources
+        # (such as db tables, queues, lambdas) run in isolation from other
+        # Grand-Iso runs.
+        self._instance_id = str(uuid4())
+
+        # Save a pointer to the graph. If the graph is not set, then proceed
+        # as usual but do not let the user run an execution until it is set.
+        self._graph = graph
+        self._graph_specified = True if self._graph else False
+
+    @property
+    def instance_id(self):
+        """
+        Get the unique instance ID for this run-instance of Grand-Iso.
+
+        """
+        return self._instance_id
+
+    def ready(self):
+        """
+        Returns True if the instance is ready to begin execution.
+        """
+        return self._graph_specified
+
+    def set_graph(self, graph: grand.Graph):
+        """
+        Set a pointer to the dynamo-backed grand Graph object.
+
+        Arguments:
+            graph (grand.Graph): The graph to use (must be dynamo-backed)
+
+        Returns:
+            None
+
+        """
+        self._graph = graph
+        self._graph_specified = True
+
+    def _provision_queue(self):
+        """
+        Provision the SQS for this run.
+
+        """
+        raise NotImplementedError()
+
+    def _provision_backbone_lambda(self):
+        """
+        Provision a new lambda function to run each backbone check.
+
+        """
+        raise NotImplementedError()
+
+    def _provision_results_table(self):
+        """
+        Provision a new DynamoDB table to hold the results of this run.
+
+        """
+        raise NotImplementedError()
+
+    def provision_resources(self, wait: bool = True):
+        """
+        Provision all cloud resources that will be required for this run.
+
+        Arguments:
+            wait (bool: True): Whether to wait for all resources before
+                returning from this call. If False, the user must await the
+                creation of resources manually.
+
+        Returns:
+            Tuple[any, any, any]: A tuple containing the provisioned resources
+
+        """
+        queue = self._provision_queue()
+        function = self._provision_backbone_lambda()
+        table = self._provision_results_table()
+
+        if wait:
+            raise NotImplementedError()
+
+        return (queue, function, table)
+
+    def destroy(self):
+        """
+        Destroy this instance, force-terminating any running tasks.
+
+        Destroys and terminates all cloud resources associated with this
+        instance, so exercise proper caution.
+
+        This function is best reserved for when a run appears to have gone
+        astray and is running for too long.
+
+        """
+        raise NotImplementedError()
+
+    @property
+    def limits(self):
+        return self._limits
diff --git a/grandiso/test_grandiso.py b/grandiso/test_grandiso.py
@@ -0,0 +1,11 @@
+import unittest
+
+from . import GrandIso
+
+
+class TestGrandIso(unittest.TestCase):
+    def test_can_create(self):
+        GrandIso()
+
+    def test_can_create_without_limits(self):
+        self.assertEqual(GrandIso().limits.wallclock_limit_seconds, None)
diff --git a/setup.py b/setup.py
@@ -0,0 +1,21 @@
+import setuptools
+
+with open("README.md", "r") as fh:
+    long_description = fh.read()
+
+setuptools.setup(
+    name="grand-iso",
+    version="0.1.0",
+    author="Jordan Matelsky",
+    author_email="[email protected]",
+    description="Subgraph isomorphism at cloud-scale",
+    long_description=long_description,
+    long_description_content_type="text/markdown",
+    url="https://github.com/j6k4m8/grand-iso",
+    packages=setuptools.find_packages(),
+    classifiers=[
+        "Programming Language :: Python :: 3",
+        "License :: OSI Approved :: MIT License",
+        "Operating System :: OS Independent",
+    ],
+)