diff --git a/LICENSE.md b/LICENSE.md deleted file mode 100644 index 68b01d6..0000000 --- a/LICENSE.md +++ /dev/null @@ -1,40 +0,0 @@ -© 2015, Sean Chester (schester@cs.au.dk) -All rights reserved. - -This software package is called the AlphaProximity suite. The -AlphaProximity suite is free software: redistribution and use -in source and binary forms, with or without modification, are -permitted provided that the following conditions are met: - - 1. Redistributions of source code must retain the above copyright - notice, this list of conditions and the following disclaimer. - - 2. Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in - the documentation and/or other materials provided with the distribution. - - 3. Neither the name of the copyright holder nor the names of its - contributors may be used to endorse or promote products derived - from this software without specific prior written permission. - - 4. Any and all academic use of this, or any part of this, software must - cite the following article: - - S. Chester and G. Srivastava. 2011. "Social network privacy for - attribute disclosure attacks." In Proceedings of the 2011 - International Conference on Advances in Social Networks Analysis - and Mining (ASONAM), pp. 445-449. doi: 10.1109/ASONAM.2011.105 - - -THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" -AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE -IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE -ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE -LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR -CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF -SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS -INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN -CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) -ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF -THE POSSIBILITY OF SUCH DAMAGE. - diff --git a/README.md b/README.md index 16228ce..d7cf942 100644 --- a/README.md +++ b/README.md @@ -1,57 +1,127 @@ +| GraphAnon, version 2.0 | +|:--------------------------------:| +| © 2015 Sean Chester | +| (sean.chester@idi.ntnu.no) | - AlphaProximity suite, version 1.0 +------------------------------------ - © 2015, Sean Chester - (sean.chester@idi.ntnu.no) - All rights reserved. -This is the AlphaProximity suite, version 1.0, with software for -transforming a vertex-labelled graph into a supergraph that is -resistance to neighbourhood attribute disclosure (NAD) attacks. -For more details about attribute disclosure attacks, you are -encouraged to read the short article: +### Table of Contents - S. Chester and G. Srivastava. 2011. "Social network privacy for - attribute disclosure attacks." In Proceedings of the 2011 - International Conference on Advances in Social Networks Analysis - and Mining (ASONAM), pp. 445-449. doi: 10.1109/ASONAM.2011.105 + * [Introduction](#introduction) + * [Requirements](#requirements) + * [Installation](#installation) + * [Documentation](#documentation) + * [License](#license) + * [Contact](#contact) + +------------------------------------ +### Introduction + -Licensing ---------- +This is the GraphAnon software suite, version 2.0, with software for +transforming a graph into supergraphs that are +resistance to identity and attribute disclosure attacks. +For more details about _attribute disclosure attacks_, you are +encouraged to read the short article, which presents the material +implemented in the _attribute mode_ of this software: -You are free to use, modify, and redistribute this software as you -see fit. Consult the attached LICENSE.md for more details. +> S. Chester and G. Srivastava. 2011. "Social network privacy for +> attribute disclosure attacks." In: Proceedings of the 2011 +> International Conference on Advances in Social Networks Analysis +> and Mining (ASONAM), pp. 445-449. +> doi: [10.1109/ASONAM.2011.105](https://dx.doi.org/10.1109/ASONAM.2011.105) +For more details about _identity disclosure attacks_, you are +encouraged to read either of the following articles (the +conference version or the expanded journal version), which presents +the material implemented in the _identity mode_ of this software: +> S. Chester et al. 2013. "Why Waldo befriended the dummy? +> k-Anonymization of social networks with pseudo-nodes." +> Social Network Analysis and Mining 3(3): 381-399. Springer Vienna. +> doi: [10.1007/s13278-012-0084-6](https://dx.doi.org/10.1007/s13278-012-0084-6) -Documentation -------------- +The earlier conference version: -The code has been documented for doxygen. If the doc/html/ -directory is empty or stale, you can regenerate the documentation -by running the doxygen command without arguments from the -doc/ directory of this package. The doxygen settings are included -here in doc/Doxyfile. +> S. Chester et al. 2011. "k-Anonymization of social networks +> by vertex addition." In: Proceedings II of the 15th +> East-European Conference on Advances in Databases and +> Information Systems (ADBIS), pp. 107--116. +> url: http://ceur-ws.org/Vol-789/paper11.pdf + + +------------------------------------ +### Requirements + + +GraphAnon relies on the following packages/libraries: + + * OpenMP for to parallelise the calculation of graph statistics + * C++ 11 for newer STL containers such as unordered_set -Installation ------------- +------------------------------------ +### Installation + -To generate an executable, simply type "make all" from the root directory -of this package (the same directory in which you found this README file). -The makefile will generate the executable bin/alpha_proximity. If you -encounter difficulties, try first typing "make deepclean" and ensure that -the bin/ directory exists. You can run the executable from a terminal with +To generate an executable, simply type `make all` from the root directory +of this package (the same directory in which you found this `README.md` file). +The makefile will generate the executable `bin/graphAnon`. If you +encounter difficulties, try first typing `make deepclean` and ensure that +the `bin/` directory exists. You can run the executable from a terminal with no command line arguments to get usage instructions. -Contact -------- + +------------------------------------ +### Documentation + + +The code has been documented for `doxygen`. If the `doc/html/` +directory is empty or stale, you can regenerate the documentation +by running `doxygen Doxyfile` from within the `doc/` subdirectory. +The `doxygen` settings are included in `doc/Doxyfile` and can be +freely modified to suit your preferences. + + +------------------------------------ +### License + + +Copyright (c) 2015 Sean Chester + +GraphAnon, version 2.0, is distributed freely under the *MIT License*: + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in all +copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +SOFTWARE. + + +------------------------------------ +### Contact + This software suite may be updated and so you are encouraged to check -https://github.com/sean-chester/anon to ensure this is the latest -version. Also, please do not hesitate to contact the author if you have -comments, questions, or bugs to report. +[GraphAnon on GitHub](https://github.com/sean-chester/graphAnon) to ensure +this is the latest version. Do not hesitate to contact the author +if you have comments, questions, or bugs to report, but please first +consult the documentation. +------------------------------------ \ No newline at end of file diff --git a/bin/graphAnon b/bin/graphAnon new file mode 100755 index 0000000..dca7e78 Binary files /dev/null and b/bin/graphAnon differ diff --git a/doc/Doxyfile b/doc/Doxyfile index 4e3fee9..82e9735 100644 --- a/doc/Doxyfile +++ b/doc/Doxyfile @@ -32,19 +32,19 @@ DOXYFILE_ENCODING = UTF-8 # title of most generated pages and in a few other places. # The default value is: My Project. -PROJECT_NAME = "AlphaProximity" +PROJECT_NAME = "GraphAnon" # The PROJECT_NUMBER tag can be used to enter a project or revision number. This # could be handy for archiving the generated documentation or if some version # control system is used. -PROJECT_NUMBER = 1.0 +PROJECT_NUMBER = 2.0 # Using the PROJECT_BRIEF tag one can provide an optional one line description # for a project that appears at the top of each page and should give viewer a # quick idea about the purpose of the project. Keep the description short. -PROJECT_BRIEF = "α-proximity: protection against attribute disclosure attacks" +PROJECT_BRIEF = "Protection against attribute and identity disclosure attacks" # With the PROJECT_LOGO tag one can specify an logo or icon that is included in # the documentation. The maximum height of the logo should not exceed 55 pixels @@ -753,7 +753,7 @@ WARN_LOGFILE = # spaces. # Note: If this tag is empty the current directory is searched. -INPUT = ../src/ +INPUT = ../src/ ../README.md # This tag can be used to specify the character encoding of the source files # that doxygen parses. Internally doxygen uses the UTF-8 encoding. Doxygen uses @@ -889,7 +889,7 @@ FILTER_SOURCE_PATTERNS = # (index.html). This can be useful if you have a project on for instance GitHub # and want to reuse the introduction page also for the doxygen output. -USE_MDFILE_AS_MAINPAGE = +USE_MDFILE_AS_MAINPAGE = README.md #--------------------------------------------------------------------------- # Configuration options related to source browsing diff --git a/doc/references.bib b/doc/references.bib index 2cd19d2..6dcb816 100644 --- a/doc/references.bib +++ b/doc/references.bib @@ -6,3 +6,45 @@ @InProceedings{asonam pages = {445--449}, note = {{http://dx.doi.org/10.1109/ASONAM.2011.105}}, } + +@Article{waldo, + author = {Chester, Sean and Kapron, Bruce M and Ramesh, Ganesh and Srivastava, Gautam and Thomo, Alex and Venkatesh, S}, + title = {Why Waldo befriended the dummy? k-Anonymization of social networks with pseudo-nodes}, + journal = {Social Network Analysis and Mining}, + year = {2013}, + month = {September}, + pages = {381--399}, + volume = {3}, + number = {3}, + publisher = {Springer Vienna}, + issn = {1869-5469}, + note = {{http://dx.doi.org/10.1007/s13278-012-0084-6}}, +} + +@InProceedings{vertex, + author = {Chester, Sean and Kapron, Bruce M and Ramesh, Ganesh and Srivastava, Gautam and Thomo, Alex and Venkatesh, S}, + title = {k-Anonymization of Social Networks By Vertex Addition}, + booktitle = {Proc. East-European Conference on Advances in Databases and Information Systems (ADBIS)}, + year = {2011}, + pages = {107-116}, + publisher = {CEUR-WS.org}, + note = {{http://ceur-ws.org/Vol-789/paper11.pdf}}, +} + +@InProceedings{ying, + author = {Ying, X and Pan, K and Wu, X and Guo, L}, + title = {Comparisons of randomization and k-degree anonymization schemes for privacy preserving social network publishing}, + booktitle = {Proceedings of 3rd workshop on social network mining and analysis (SNA-KDD)}, + year = {2009}, + pages = {10:1–-10:10}, + publisher = {ACM}, + address = {New York}, +} + +@InProceedings{terzi, + author = {Liu, K and Terzi, E}, + title = {Towards identity anonymization on graphs}, + booktitle = {Proceedings of ACM Special Interest Group on Management of Data (SIGMOD)}, + year = {2008}, + pages = {93--106}, +} \ No newline at end of file diff --git a/makefile b/makefile index 82196a9..3859e30 100644 --- a/makefile +++ b/makefile @@ -1,8 +1,8 @@ ############################################################ -# Makefile for AlphaProximity Suite # +# Makefile for GraphAnon Suite # # # # Copyright (c) 2015, Sean Chester # -# (schester@cs.au.dk) # +# (sean.chester@idi.ntnu.no) # ############################################################ RM = rm -rf @@ -10,28 +10,29 @@ MV = mv CP = cp -rf CC = g++ -TARGET = $(OUT)/alpha_proximity +TARGET = $(OUT)/graphAnon -SRC = $(wildcard src/*.cpp) +SRC = $(wildcard src/*.cpp) $(wildcard src/labelled_graph/*.cpp) \ + $(wildcard src/unlabelled_graph/*.cpp) OBJ = $(addprefix $(OUT)/,$(notdir $(SRC:.cpp=.o))) OUT = bin LIB_DIR = # used as -L$(LIB_DIR) -INCLUDES = -I ./src/ +INCLUDES = -I ./src/:./src/labelled_graph/:./src/unlabelled_graph/ LIB = # Forces make to look these directories -VPATH = src: +VPATH = src:src/labelled_graph:src/unlabelled_graph # By default compiling for performance (optimal) CXXFLAGS = -O3 -m64 -march=native -mavx \ -Wno-deprecated -Wno-write-strings -nostdlib -Wpointer-arith \ - -Wcast-qual -Wcast-align -std=c++0x + -Wcast-qual -Wcast-align -std=c++0x -fopenmp -LDFLAGS=-m64 #-lrt +LDFLAGS=-m64 -mavx -march=native -fopenmp #-lrt # All Target all: $(TARGET) diff --git a/src/graph.h b/src/graph.h deleted file mode 100644 index 7d4b197..0000000 --- a/src/graph.h +++ /dev/null @@ -1,262 +0,0 @@ -/** - * @file - * @brief Definition of a simple, undirected Graph class. - * - * @date Jun 12, 2015 - * @version 1.0 - * @author Sean Chester (schester@cs.au.dk) - * - * @copyright © 2015, Sean Chester (schester@cs.au.dk) - * All rights reserved. - * - * This file is a part of the AlphaProximity suite. - * The AlphaProximity suite is free software: redistribution and use in - * source and binary forms, with or without modification, are - * permitted provided that the following conditions are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. Neither the name of the copyright holder nor the names of its - * contributors may be used to endorse or promote products derived - * from this software without specific prior written permission. - * - * 4. Any and all academic use of this, or any part of this, software - * must cite the article referenced here: @cite asonam . - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS - * FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT - * HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - */ - -#ifndef GRAPH_H_ -#define GRAPH_H_ - -#include /* For uint32_t */ -#include /* for std::ofstream */ - -/* STL libraries in use */ -#include -#include - -#include "label_distribution.h" - -/** - * @brief A simple, undirected, vertex-labelled graph with no self-loops that is - * equipped with methods for attribute disclosure protection. - * @todo Unit test this class, particular the is_alpha_proximal() and - * greedy() methods. - */ -class Graph { -public: - - /** - * Constructs a Graph object from a file - * @param filename The path to the input file containing the graph - * @post Constructs a new Graph object - * @warning Does minimal error-checking. If the file format is - * invalid or filename is an incorrect path, then the behaviour - * of this constructor is undefined. - * @see An example file - * consisting of the example Graph from Figure 1 of @cite asonam , - * represented in the vertex-labelled adjacency list format. - * - * Constructs a pre-built graph from a file. The file format is a - * vertex-labelled adjacency list. The first row gives white-space - * separated meta-data about the Graph, namely the number of vertices - * and then the number of distinct labels (e.g., "6 2" is a 6-node graph - * that has a binary label alphabet). - * Each subsequent line corresponds to a vertex. The first value is the - * label of the vertex, and the remaining variable-length space-separated - * integers are the node ids of all neighbours (e.g., "1 2 5 9" indicates a - * vertex with label 1 who is connected (only) to vertices 2, 5, and 9. - * Note that there should beexactly n+1 lines in the file, the first line - * should contain exactly two numbers, and every subsequent line must - * contain at least one number. - */ - Graph( const std::string filename ); - - - /** - * Constructs a vertex-labelled graph with n isloated vertices. - * @param num_vertices The number of vertices in the graph. - * @param num_labels The size of the label alphabet (i.e., the - * number of unique vertex labels). - * @post Constructs a new Graph object - */ - Graph( const uint32_t num_vertices, const uint32_t num_labels ); - - /** - * Destroys the graphs. - */ - virtual ~Graph(); - - /** - * Initializes empty Graph data structures: should be called - * by all overloaded constructors once n_ and l_ are set. - */ - void init(); - - /** - * Populates the graph with num_edges undirected edges, randomly chosen - * with uniform distribution. - * @param num_edges The number of edges to insert into the graph. - * @returns false if num_edges cannot be inserted; true otherwise. - * @post The graph contains num_edges more edges than it had before the - * method was invoked, unless it is impossible to add num_edges more edges - * to the graph (then no edges are added). - * - * This method iterates , picking two vertices u,v uniformly at random. - * If the edge (u,v) does not yet exist, it is added. Once num_edges - * successful edge additions have taken place, the routine terminates. If - * num_edges > n * (n - 1) - the number of edges already in the graph, the - * method returns false (failure). - */ - bool populate_uniformly( const uint32_t num_edges ); - - /** - * Assigns a random label to each vertex such that (to the maximum extent - * possible) every label appears with the same frequency. - */ - void evenly_distribute_labels(); - - /** - * Retrieves the percentage of possible edges tha are present in the graph. - * @return If E is the edge set and V is the vertex set, the return value is - * |E| / |V| / ( |V| - 1 ). Will also return 0 if |V| = 0. - */ - float get_occupancy(); - - bool is_complete(); - - /** - * Determines whether this graph is alpha-proximal. - * @param alpha The privacy threshold - * @return True if every vertex has a LabelDistribution within a distance - * of alpha of the global LabelDistribution - * @see Definition 2.6 of @cite asonam - */ - bool is_alpha_proximal( const float alpha ); - - /** - * Naively transforms the graph into an alpha-proximal graph by alternately - * adding a random edge and then checking if the graph is alpha-proximal. The - * algorithm is guaranteed to reach a solution because the complete graph is - * a solution (every vertex's neighbourhood LabelDistribution is exactly the global - * LabelDistribution). - * @param alpha The privacy threshold - * @post Inserts edges into the graph so that the graph is alpha-proximal - */ - void hopeful( const float alpha ); - - /** - * Transforms the graph into an alpha-proximal graph, using the Greedy - * alpha-proximity algorithm from @cite asonam (Algorithm 1), hopefully - * inducing much fewer edge additions than the hopeful algorithm. - * @param alpha The privacy threshold - * @post Inserts edges into the graph so that the graph is alpha-proximal - */ - void greedy( const float alpha ); - - /** - * Prints the graph to outstream in vertex-labelled adjacency list format - * (primarily for the purpose of testing). - * @param outstream The file stream to which the Graph should be output - * @see An example file - * consisting of the example Graph from Figure 1 of @cite asonam , - * represented in the vertex-labelled adjacency list format. - */ - void print( std::ofstream *outstream ); - -private: - - /** - * Inserts the undirected edge (u,v) into the graph if it does not already exist. - * @param u The source vertex of the edge - * @param v The destination vertex of the edge - * @return True if the edge was added, false if it already existed - * @post Edge (u,v) exists in the graph (irrespective of whether it was there - * prior to invoking the method) - */ - bool add_edge( const uint32_t u, const uint32_t v ); - - /** - * Inserts a random new edge into the graph if the graph is not already - * a complete graph. - * @post The graph remains unaffected if it is complete. Otherwise, - * one edge that previously was not in the graph now appears. - */ - void add_random_edge(); - - /** - * Obtains and constructs at address ld a LabelDistribution - * corresponding to the global frequencies of all labels for all - * vertices in the graph. - * @param ld The address at which the new LabelDistribution should - * be constructted. - * @post ld contains a new LabelDistribution instance. - */ - void inline get_global_ld( LabelDistribution **ld ); - - /** - * Obtains and constructs at address ld a LabelDistribution - * corresponding to the frequencies of all labels of vertices - * within the 1-hop neighbourhood of vertex v. - * @param ld The address at which the new LabelDistribution should - * be constructed. - * @param v The vertex id for whom the neighbourhood LabelDistribution - * should be calculated - * @post ld contains a new LabelDistribution instance. - */ - void inline get_neighbourhood_ld( LabelDistribution **ld, const uint32_t v ); - - - /** - * Runs an iteration of the Greedy Alpha-Proximity algorithm (Lines 2--4 in - * Algorithm 1 of @cite asonam ). - * @param alpha The privacy threshold - * @return The number of edges that were added to the graph during - * this iteration - * @post The graph contains new edges and has greedily moved closer to being - * alpha-proximal. - */ - uint32_t run_greedy_iteration( const float alpha ); - - /** - * The number of vertices in the graph - */ - uint32_t n_; - /** - * The number of edges in the graph. - */ - uint32_t m_; - /** - * The size of the label set. - */ - uint32_t l_; - /** - * The adjacency list: adjacency_list[i] is a set - * of node ids that are neighbours for the node with id i. - */ - std::vector< std::unordered_set < uint32_t > > adjacency_list_; - /** - * The vertex-labelling function (i.e., a mapping between vertex and - * vertex label). - */ - std::vector< uint32_t > vertex_labels_; -}; - - -#endif /* GRAPH_H_ */ diff --git a/src/label_distribution.test.cpp b/src/label_distribution.test.cpp deleted file mode 100644 index a9e1b7c..0000000 --- a/src/label_distribution.test.cpp +++ /dev/null @@ -1,103 +0,0 @@ -/** - * @file - * @brief A set of functions for unit testing the LabelDistribution class. - * - * @date Jun 12, 2015 - * @version 1.0 - * @author Sean Chester (schester@cs.au.dk) - * - * @copyright © 2015, Sean Chester (schester@cs.au.dk) - * All rights reserved. - * - * This file is a part of the AlphaProximity suite. - * The AlphaProximity suite is free software: redistribution and use in - * source and binary forms, with or without modification, are - * permitted provided that the following conditions are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. Neither the name of the copyright holder nor the names of its - * contributors may be used to endorse or promote products derived - * from this software without specific prior written permission. - * - * 4. Any and all academic use of this, or any part of this, software - * must cite the article referenced here: @cite asonam . - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS - * FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT - * HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - */ - -#include /* for uint32_t */ -#include -#include -#include - -#include "label_distribution.test.h" -#include "label_distribution.h" - -bool test_distance() { - - bool passed = true; - - /** - * @test Example from paper - * Definition 2.4 of @cite asonam contains an example of the distance - * function. This test case is exactly that example. - * The distance between <0.7, 0.2, 0.1> and <0.2, 0.4, 0.4> should be 0.7 - * (the pairwise absolute difference between the first n-1 elements). - */ - std::vector< uint32_t > l1_counts { 7, 2, 1 }; - std::vector< uint32_t > l2_counts { 2, 4, 4 }; - LabelDistribution *l1 = new LabelDistribution( &l1_counts ); - LabelDistribution *l2 = new LabelDistribution( &l2_counts ); - if( l1->distance( l2 ) < 0.699999 || l1->distance( l2 ) > 0.700001 ) { passed = false; } - delete l2; - delete l1; - - /** - * @test Boundary case: only one label - * This test case checks boundary case handling for a - * LabelDistribution that has only one element. The - * distance between < 5 > and < 9 > should be 0, because - * the first label has the same relative frequency. - */ - std::vector< uint32_t > l1_onelabel { 5 }; - std::vector< uint32_t > l2_onelabel { 9 }; - l1 = new LabelDistribution( &l1_onelabel ); - l2 = new LabelDistribution( &l2_onelabel ); - if( l1->distance( l2 ) < -0.000001 || l1->distance( l2 ) > 0.000001 ) { passed = false; } - delete l2; - delete l1; - - /** - * @test Malformed input: inequal lengths - * This test case ensures that LabelDistributions with inequal - * lengths will throw the LD_INCOMPARABLE error condition. - */ - std::vector< uint32_t > l1_badlengths { 5 }; - std::vector< uint32_t > l2_badlengths { 9, 4 }; - l1 = new LabelDistribution( &l1_badlengths ); - l2 = new LabelDistribution( &l2_badlengths ); - if( l1->distance( l2 ) != LD_INCOMPARABLE ) { passed = false; } - delete l2; - delete l1; - - - return passed; -} - - diff --git a/src/label_distribution.test.h b/src/label_distribution.test.h deleted file mode 100644 index a9b0c14..0000000 --- a/src/label_distribution.test.h +++ /dev/null @@ -1,54 +0,0 @@ -/** - * @file - * @brief Definition of test methods for the LabelDistribution class. - * - * @date Jun 12, 2015 - * @version 1.0 - * @author Sean Chester (schester@cs.au.dk) - * - * @copyright © 2015, Sean Chester (schester@cs.au.dk) - * All rights reserved. - * - * This file is a part of the AlphaProximity suite. - * The AlphaProximity suite is free software: redistribution and use in - * source and binary forms, with or without modification, are - * permitted provided that the following conditions are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. Neither the name of the copyright holder nor the names of its - * contributors may be used to endorse or promote products derived - * from this software without specific prior written permission. - * - * 4. Any and all academic use of this, or any part of this, software - * must cite the article referenced here: @cite asonam . - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS - * FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT - * HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - */ - -#ifndef LABEL_DISTRIBUTION_TEST_H_ -#define LABEL_DISTRIBUTION_TEST_H_ - -/** - * Asserts the correctness of the distance() function in the - * LabelDistribution class, by executing a series of unit tests. - * @return True if all the tests pass; false if any test fails. - */ -bool test_distance(); - -#endif /* LABEL_DISTRIBUTION_TEST_H_ */ diff --git a/src/label_distribution.cpp b/src/labelled_graph/label_distribution.cpp similarity index 55% rename from src/label_distribution.cpp rename to src/labelled_graph/label_distribution.cpp index e25886d..8d182fd 100644 --- a/src/label_distribution.cpp +++ b/src/labelled_graph/label_distribution.cpp @@ -2,43 +2,32 @@ * @file * @brief An implementation of the LabelDistribution class. * - * @date Jun 12, 2015 - * @version 1.0 - * @author Sean Chester (schester@cs.au.dk) + * @date 22 Oct 2015 + * @version 1.1 + * @author Sean Chester (sean.chester@idi.ntnu.no) * - * @copyright © 2015, Sean Chester (schester@cs.au.dk) - * All rights reserved. - * - * This file is a part of the AlphaProximity suite. - * The AlphaProximity suite is free software: redistribution and use in - * source and binary forms, with or without modification, are - * permitted provided that the following conditions are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. Neither the name of the copyright holder nor the names of its - * contributors may be used to endorse or promote products derived - * from this software without specific prior written permission. - * - * 4. Any and all academic use of this, or any part of this, software - * must cite the article referenced here: @cite asonam . - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS - * FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT - * HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + * @copyright Copyright (c) 2015 Sean Chester + * + * This file is part of the GraphAnon suite. + * GraphAnon, version 2.0, is distributed freely under the *MIT License*: + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to deal + * in the Software without restriction, including without limitation the rights + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + * copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in all + * copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. */ #include /* for uint32_t */ diff --git a/src/label_distribution.h b/src/labelled_graph/label_distribution.h similarity index 63% rename from src/label_distribution.h rename to src/labelled_graph/label_distribution.h index a42ab6e..00d1102 100644 --- a/src/label_distribution.h +++ b/src/labelled_graph/label_distribution.h @@ -3,43 +3,32 @@ * @brief The definition of a Label Distribution (Definition 2.3 in @cite asonam) * @see label_distribution.test.h (The unit test suite for this class) * - * @date Jun 12, 2015 - * @version 1.0 - * @author Sean Chester (schester@cs.au.dk) + * @date 22 Oct 2015 + * @version 1.1 + * @author Sean Chester (sean.chester@idi.ntnu.no) * - * @copyright © 2015, Sean Chester (schester@cs.au.dk) - * All rights reserved. - * - * This file is a part of the AlphaProximity suite. - * The AlphaProximity suite is free software: redistribution and use in - * source and binary forms, with or without modification, are - * permitted provided that the following conditions are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. Neither the name of the copyright holder nor the names of its - * contributors may be used to endorse or promote products derived - * from this software without specific prior written permission. - * - * 4. Any and all academic use of this, or any part of this, software - * must cite the article referenced here: @cite asonam . - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS - * FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT - * HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + * @copyright Copyright (c) 2015 Sean Chester + * + * This file is part of the GraphAnon suite. + * GraphAnon, version 2.0, is distributed freely under the *MIT License*: + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to deal + * in the Software without restriction, including without limitation the rights + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + * copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in all + * copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. */ #ifndef LABEL_DISTRIBUTION_H_ diff --git a/src/labelled_graph/label_distribution.test.cpp b/src/labelled_graph/label_distribution.test.cpp new file mode 100644 index 0000000..3aa7605 --- /dev/null +++ b/src/labelled_graph/label_distribution.test.cpp @@ -0,0 +1,92 @@ +/** + * @file + * @brief A set of functions for unit testing the LabelDistribution class. + * + * @date 22 Oct 2015 + * @version 1.1 + * @author Sean Chester (sean.chester@idi.ntnu.no) + * + * @copyright Copyright (c) 2015 Sean Chester + * + * This file is part of the GraphAnon suite. + * GraphAnon, version 2.0, is distributed freely under the *MIT License*: + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to deal + * in the Software without restriction, including without limitation the rights + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + * copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in all + * copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#include /* for uint32_t */ +#include +#include +#include + +#include "label_distribution.test.h" +#include "label_distribution.h" + +bool test_distance() { + + bool passed = true; + + /** + * @test Example from paper + * Definition 2.4 of @cite asonam contains an example of the distance + * function. This test case is exactly that example. + * The distance between <0.7, 0.2, 0.1> and <0.2, 0.4, 0.4> should be 0.7 + * (the pairwise absolute difference between the first n-1 elements). + */ + std::vector< uint32_t > l1_counts { 7, 2, 1 }; + std::vector< uint32_t > l2_counts { 2, 4, 4 }; + LabelDistribution *l1 = new LabelDistribution( &l1_counts ); + LabelDistribution *l2 = new LabelDistribution( &l2_counts ); + if( l1->distance( l2 ) < 0.699999 || l1->distance( l2 ) > 0.700001 ) { passed = false; } + delete l2; + delete l1; + + /** + * @test Boundary case: only one label + * This test case checks boundary case handling for a + * LabelDistribution that has only one element. The + * distance between < 5 > and < 9 > should be 0, because + * the first label has the same relative frequency. + */ + std::vector< uint32_t > l1_onelabel { 5 }; + std::vector< uint32_t > l2_onelabel { 9 }; + l1 = new LabelDistribution( &l1_onelabel ); + l2 = new LabelDistribution( &l2_onelabel ); + if( l1->distance( l2 ) < -0.000001 || l1->distance( l2 ) > 0.000001 ) { passed = false; } + delete l2; + delete l1; + + /** + * @test Malformed input: inequal lengths + * This test case ensures that LabelDistributions with inequal + * lengths will throw the LD_INCOMPARABLE error condition. + */ + std::vector< uint32_t > l1_badlengths { 5 }; + std::vector< uint32_t > l2_badlengths { 9, 4 }; + l1 = new LabelDistribution( &l1_badlengths ); + l2 = new LabelDistribution( &l2_badlengths ); + if( l1->distance( l2 ) != LD_INCOMPARABLE ) { passed = false; } + delete l2; + delete l1; + + + return passed; +} + + diff --git a/src/labelled_graph/label_distribution.test.h b/src/labelled_graph/label_distribution.test.h new file mode 100644 index 0000000..c1acb6a --- /dev/null +++ b/src/labelled_graph/label_distribution.test.h @@ -0,0 +1,43 @@ +/** + * @file + * @brief Definition of test methods for the LabelDistribution class. + * + * @date 22 Oct 2015 + * @version 1.1 + * @author Sean Chester (sean.chester@idi.ntnu.no) + * + * @copyright Copyright (c) 2015 Sean Chester + * + * This file is part of the GraphAnon suite. + * GraphAnon, version 2.0, is distributed freely under the *MIT License*: + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to deal + * in the Software without restriction, including without limitation the rights + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + * copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in all + * copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#ifndef LABEL_DISTRIBUTION_TEST_H_ +#define LABEL_DISTRIBUTION_TEST_H_ + +/** + * Asserts the correctness of the distance() function in the + * LabelDistribution class, by executing a series of unit tests. + * @return True if all the tests pass; false if any test fails. + */ +bool test_distance(); + +#endif /* LABEL_DISTRIBUTION_TEST_H_ */ diff --git a/src/graph.cpp b/src/labelled_graph/labelled_graph.cpp similarity index 63% rename from src/graph.cpp rename to src/labelled_graph/labelled_graph.cpp index 23cc65b..c18fae0 100644 --- a/src/graph.cpp +++ b/src/labelled_graph/labelled_graph.cpp @@ -1,44 +1,33 @@ /** * @file - * @brief Implementation of the Graph class in graph.h + * @brief Implementation of the LabelledGraph class in labelled_graph.h * - * @date Jun 12, 2015 - * @version 1.0 - * @author Sean Chester (schester@cs.au.dk) + * @date 22 Oct 2015 + * @version 2.0 + * @author Sean Chester (sean.chester@idi.ntnu.no) * - * @copyright © 2015, Sean Chester (schester@cs.au.dk) - * All rights reserved. - * - * This file is a part of the AlphaProximity suite. - * The AlphaProximity suite is free software: redistribution and use in - * source and binary forms, with or without modification, are - * permitted provided that the following conditions are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. Neither the name of the copyright holder nor the names of its - * contributors may be used to endorse or promote products derived - * from this software without specific prior written permission. - * - * 4. Any and all academic use of this, or any part of this, software - * must cite the article referenced here: @cite asonam . - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS - * FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT - * HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + * @copyright Copyright (c) 2015 Sean Chester + * + * This file is part of the GraphAnon suite. + * GraphAnon, version 2.0, is distributed freely under the *MIT License*: + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to deal + * in the Software without restriction, including without limitation the rights + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + * copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in all + * copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. */ #include /* for uint32_t */ @@ -53,9 +42,9 @@ #include #include -#include "graph.h" /* implementing this class. */ +#include "labelled_graph.h" /* implementing this class. */ -void Graph::init() { +void LabelledGraph::init() { /* Initialize adjacency list with n_ empty vectors and every vertex * to have the same label. */ adjacency_list_.reserve( n_ ); @@ -71,10 +60,10 @@ void Graph::init() { srand (time(NULL)); } -Graph::Graph( const uint32_t num_vertices, const uint32_t num_labels ) : - n_ ( num_vertices ), l_ ( num_labels ) { init(); } +LabelledGraph::LabelledGraph( const uint32_t num_vertices, const uint32_t num_labels ) : + UnlabelledGraph( num_vertices ), l_ ( num_labels ) { init(); } -Graph::Graph( const std::string filename ) { +LabelledGraph::LabelledGraph( const std::string filename ) { std::string line; std::cout << filename << std::endl; std::ifstream infile( filename ); @@ -89,7 +78,7 @@ Graph::Graph( const std::string filename ) { /* check whether n_ was at least read correctly -- the only real * error checking done in this constructor. */ - if( n_ < 0 ) { + if( n_ <= 0 ) { std::cerr << "Did not parse a positive number of vertices from input file. " << "Did you format the file correctly and specify the correct path?" << std::endl; @@ -126,58 +115,9 @@ Graph::Graph( const std::string filename ) { } } -Graph::~Graph() {} +LabelledGraph::~LabelledGraph() {} -bool Graph::add_edge( const uint32_t u, const uint32_t v ) { - if( adjacency_list_[ u ].count( v ) > 0 || u == v ) { return false; } - adjacency_list_[ u ].insert( v ); - adjacency_list_[ v ].insert( u ); - ++m_; - return true; -} - -void Graph::add_random_edge() { - - /* Error checking -- are there edges to add? */ - if( is_complete() ) { return; } - - while( true ) { - /* get random edge */ - const uint32_t u = rand() % n_; - const uint32_t v = rand() % n_; - - /* add it if it doesn't yet exist */ - if ( add_edge( u, v ) ) { return; } - } -} - - -bool Graph::populate_uniformly( const uint32_t num_edges ) { - /* error checking: can we add this many edges? */ - if ( num_edges > n_ * ( n_ - 1 ) - m_ ) { return false; } - - /* create a list of all possible edges and randomly shuffle the list */ - std::vector< std::pair < uint32_t, uint32_t > > possible_edges; - for( uint32_t i = 0; i < n_; ++i ) { - for( uint32_t j = i + 1; j < n_; ++j ) { - possible_edges.push_back( std::pair< uint32_t, uint32_t > ( i, j ) ); - } - } - std::random_shuffle( possible_edges.begin(), possible_edges.end() ); - - /* Add the first num_edges randomly shuffled edges that do not already - * exist in the graph. - */ - uint32_t num_added = 0; - for( auto it = possible_edges.begin(); it != possible_edges.end(); ++it ) { - if ( add_edge( it->first, it->second ) ) { - if( ++num_added == num_edges ) { return true; } /* Done! */ - } - } - return false; /* should be an unreachable statement! */ -} - -void Graph::evenly_distribute_labels() { +void LabelledGraph::evenly_distribute_labels() { const uint32_t vertices_per_label = n_ / l_; uint32_t labels_left = n_ - vertices_per_label; @@ -216,14 +156,7 @@ void Graph::evenly_distribute_labels() { } } -bool Graph::is_complete() { return m_ == n_ * ( n_ - 1 ); } - -float Graph::get_occupancy() { - if( n_ == 0 ) { return 0; } - else return m_ / (float) ( n_ * ( n_ - 1 ) ) * 2; /* x2 because undirected */ -} - -void Graph::print( std::ofstream *outstream ) { +void LabelledGraph::print( std::ofstream *outstream ) { (*outstream) << n_ << " " << l_ << std::endl; for( uint32_t i = 0; i < n_; ++i ) { (*outstream) << vertex_labels_[ i ] << " "; @@ -234,7 +167,7 @@ void Graph::print( std::ofstream *outstream ) { } } -void inline Graph::get_global_ld( LabelDistribution **ld ) { +void inline LabelledGraph::get_global_ld( LabelDistribution **ld ) { /* Initialize an empty solution. */ std::vector< uint32_t > counts; @@ -251,7 +184,8 @@ void inline Graph::get_global_ld( LabelDistribution **ld ) { *ld = new LabelDistribution( &counts ); } -void inline Graph::get_neighbourhood_ld( LabelDistribution **ld, const uint32_t v ) { +void inline LabelledGraph::get_neighbourhood_ld( LabelDistribution **ld, + const uint32_t v ) { /* Initialize an empty solution. */ std::vector< uint32_t > counts; @@ -271,7 +205,7 @@ void inline Graph::get_neighbourhood_ld( LabelDistribution **ld, const uint32_t *ld = new LabelDistribution( &counts ); } -bool Graph::is_alpha_proximal( const float alpha ) { +bool LabelledGraph::is_alpha_proximal( const float alpha ) { LabelDistribution *global, *neighbourhood; float max_distance = 0; @@ -291,7 +225,7 @@ bool Graph::is_alpha_proximal( const float alpha ) { return max_distance <= alpha; } -void Graph::hopeful( const float alpha ) { +void LabelledGraph::hopeful( const float alpha ) { bool leaks_privacy = !is_alpha_proximal( alpha ); while( leaks_privacy && !is_complete() ) { if( is_alpha_proximal( alpha ) ) { leaks_privacy = false; } @@ -299,7 +233,7 @@ void Graph::hopeful( const float alpha ) { } } -uint32_t Graph::run_greedy_iteration( const float alpha ) { +uint32_t LabelledGraph::run_greedy_iteration( const float alpha ) { LabelDistribution *global, *neighbourhood; std::vector< std::pair< uint32_t, uint32_t > > visit_order; @@ -366,7 +300,7 @@ uint32_t Graph::run_greedy_iteration( const float alpha ) { return num_edges_added; } -void Graph::greedy( const float alpha ) { +void LabelledGraph::greedy( const float alpha ) { bool leaks_privacy = !is_alpha_proximal( alpha ); while( leaks_privacy && !is_complete() ) { const uint32_t num_new_edges = run_greedy_iteration( alpha ); diff --git a/src/labelled_graph/labelled_graph.h b/src/labelled_graph/labelled_graph.h new file mode 100644 index 0000000..c3e5d84 --- /dev/null +++ b/src/labelled_graph/labelled_graph.h @@ -0,0 +1,183 @@ +/** + * @file + * @brief Definition of a simple, undirected, vertex-labelled Graph class. + * + * @date 22 Oct 2015 + * @version 2.0 + * @author Sean Chester (sean.chester@idi.ntnu.no) + * + * @copyright Copyright (c) 2015 Sean Chester + * + * This file is part of the GraphAnon suite. + * GraphAnon, version 2.0, is distributed freely under the *MIT License*: + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to deal + * in the Software without restriction, including without limitation the rights + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + * copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in all + * copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#ifndef LABELLED_GRAPH_H_ +#define LABELLED_GRAPH_H_ + +#include /* For uint32_t */ +#include /* for std::ofstream */ + +/* STL libraries in use */ +#include +#include + +#include "../unlabelled_graph/unlabelled_graph.h" +#include "label_distribution.h" + +/** + * @brief A simple, undirected, vertex-labelled graph with no self-loops that is + * equipped with methods for attribute disclosure protection. + * @extends UnlabelledGraph + * @todo Unit test this class, particular the is_alpha_proximal() and + * greedy() methods. + */ +class LabelledGraph : public UnlabelledGraph { +public: + + /** + * Constructs a LabelledGraph object from a file + * @param filename The path to the input file containing the graph + * @post Constructs a new LabelledGraph object + * @warning Does minimal error-checking. If the file format is + * invalid or filename is an incorrect path, then the behaviour + * of this constructor is undefined. + * @see An example file + * consisting of the example LabelledGraph from Figure 1 of @cite asonam , + * represented in the vertex-labelled adjacency list format. + */ + LabelledGraph( const std::string filename ); + + + /** + * Constructs a vertex-labelled graph with n isloated vertices. + * @param num_vertices The number of vertices in the graph. + * @param num_labels The size of the label alphabet (i.e., the + * number of unique vertex labels). + * @post Constructs a new LabelledGraph object + */ + LabelledGraph( const uint32_t num_vertices, const uint32_t num_labels ); + + /** + * Destroys the LabelledGraph. + */ + virtual ~LabelledGraph(); + + /** + * Initializes empty LabelledGraph data structures: should be called + * by all overloaded constructors once n_ and l_ are set. + */ + void init(); + + /** + * Assigns a random label to each vertex such that (to the maximum extent + * possible) every label appears with the same frequency. + */ + void evenly_distribute_labels(); + + /** + * Determines whether this graph is alpha-proximal. + * @param alpha The privacy threshold + * @return True if every vertex has a LabelDistribution within a distance + * of alpha of the global LabelDistribution + * @see Definition 2.6 of @cite asonam + */ + bool is_alpha_proximal( const float alpha ); + + /** + * Naively transforms the graph into an alpha-proximal graph by alternately + * adding a random edge and then checking if the graph is alpha-proximal. The + * algorithm is guaranteed to reach a solution because the complete graph is + * a solution (every vertex's neighbourhood LabelDistribution is exactly the global + * LabelDistribution). + * @param alpha The privacy threshold + * @post Inserts edges into the graph so that the graph is alpha-proximal + */ + void hopeful( const float alpha ); + + /** + * Transforms the graph into an alpha-proximal graph, using the Greedy + * alpha-proximity algorithm from @cite asonam (Algorithm 1), hopefully + * inducing much fewer edge additions than the hopeful algorithm. + * @param alpha The privacy threshold + * @post Inserts edges into the graph so that the graph is alpha-proximal + */ + void greedy( const float alpha ); + + /** + * Prints the graph to outstream in vertex-labelled adjacency list format + * (primarily for the purpose of testing). + * @param outstream The file stream to which the Graph should be output + * @see An example file + * consisting of the example Graph from Figure 1 of @cite asonam , + * represented in the vertex-labelled adjacency list format. + */ + void print( std::ofstream *outstream ); + +private: + + /** + * Obtains and constructs at address ld a LabelDistribution + * corresponding to the global frequencies of all labels for all + * vertices in the graph. + * @param ld The address at which the new LabelDistribution should + * be constructted. + * @post ld contains a new LabelDistribution instance. + */ + void inline get_global_ld( LabelDistribution **ld ); + + /** + * Obtains and constructs at address ld a LabelDistribution + * corresponding to the frequencies of all labels of vertices + * within the 1-hop neighbourhood of vertex v. + * @param ld The address at which the new LabelDistribution should + * be constructed. + * @param v The vertex id for whom the neighbourhood LabelDistribution + * should be calculated + * @post ld contains a new LabelDistribution instance. + */ + void inline get_neighbourhood_ld( LabelDistribution **ld, const uint32_t v ); + + + /** + * Runs an iteration of the Greedy Alpha-Proximity algorithm (Lines 2--4 in + * Algorithm 1 of @cite asonam ). + * @param alpha The privacy threshold + * @return The number of edges that were added to the graph during + * this iteration + * @post The graph contains new edges and has greedily moved closer to being + * alpha-proximal. + */ + uint32_t run_greedy_iteration( const float alpha ); + + + /* Private member variables. */ + /** + * The vertex-labelling function (i.e., a mapping between vertex and + * vertex label). + */ + std::vector< uint32_t > vertex_labels_; + uint32_t l_; /**< The size of the label set. */ + +}; + + +#endif /* LABELLED_GRAPH_H_ */ diff --git a/src/main.cpp b/src/main.cpp index 6546a5e..d4508c4 100644 --- a/src/main.cpp +++ b/src/main.cpp @@ -1,79 +1,46 @@ /** * @file - * @brief The main driver for the AlphaProximity suite. + * @brief The main driver for the GraphAnon suite. * It parses user input to generate and anonymise a Graph. * - * @date Jun 12, 2015 - * @version 1.0 - * @author Sean Chester (schester@cs.au.dk) + * @date 22 Oct 2015 + * @version 2.0 + * @author Sean Chester (sean.chester@idi.ntnu.no) * - * @copyright © 2015, Sean Chester (schester@cs.au.dk) - * All rights reserved. - * - * This file is a part of the AlphaProximity suite. - * The AlphaProximity suite is free software: redistribution and use in - * source and binary forms, with or without modification, are - * permitted provided that the following conditions are met: - * - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * 3. Neither the name of the copyright holder nor the names of its - * contributors may be used to endorse or promote products derived - * from this software without specific prior written permission. - * - * 4. Any and all academic use of this, or any part of this, software - * must cite the article referenced here: @cite asonam . - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS - * FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT - * HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * - * @mainpage - * AlphaProximity Suite, version 1.0 - * This is the AlphaProximity suite, an implementation of the algorithms and experiments - * that appeared in the 2011 article "Social network privacy for attribute disclosure attacks" - * by S. Chester and G. Srivastava @cite asonam . which introduces a greedy algorithm for - * calculating alpha-proximal graphs - * that are resistant to attribute disclosure attacks. - * - * An attribute disclosure attack occurs when an adversary targets an individual in a network - * not necessarily with the intention of identifying the target, but instead to learn a sensitive - * attribute. For example, an adversary may not need to identify you to learn your political - * affiliation; it can sometimes be sufficient to learn the distribution of political affiliations - * among your friends. This can perhaps suggest a higher probability estimate of - * your political affiliation than the adversary would have known prior to the attack. - * For more information, you are encouraged to read the five page article. - * - * This documentation describes the code. For licensing details, consult the LICENSE - * file in the root directory. For general installation instructions, consult the README - * file, also in the root directory. The code consists primarily of a Graph class. The - * greedy() and hopeful() member functions anonymise a Graph instance using Algorithm 1 - * from the paper and a naive randomized approach, respectively. Running either anonymisation - * function will modify the Graph instance so that its global LabelDistribution is within a - * factor of α of the LabelDistribution for any neighbourhood in the Graph. - * - * If, after consulting this documentation, you still have questions about the code, please - * contact the author (schester@cs.au.dk). + * @copyright Copyright (c) 2015 Sean Chester + * + * This file is part of the GraphAnon suite. + * GraphAnon, version 2.0, is distributed freely under the *MIT License*: + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to deal + * in the Software without restriction, including without limitation the rights + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + * copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in all + * copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. */ #include /* For std::cout, std::endl */ #include /* For std::find */ +#include /* For strcmp() */ + +#include "labelled_graph/labelled_graph.h" +#include "unlabelled_graph/unlabelled_graph.h" +#include "labelled_graph/label_distribution.test.h" -#include "graph.h" -#include "label_distribution.test.h" +/* STL containers in use */ +#include /** * Finds a specified option among the command line arguments @@ -103,18 +70,24 @@ void print_usage_instructions( const char *bin_path ) { << bin_path << " [-option value]" << std::endl << std::endl; std::cout << "\tPossible options include:" << std::endl; std::cout << "\t\t[-h] or [--help] shows these usage instructions" << std::endl; + std::cout << "\t\t[-mode {identity,attribute} [type of anonymization to conduct]]" << std::endl; std::cout << "\t\t[-f [path to input file]]" << std::endl; + std::cout << "\t\t[-format {adjList, edgeList, adjListVL} [format of input file (adjList by default)]]" << std::endl; std::cout << "\t\t[-o [path to output file]]" << std::endl; - std::cout << "\t\t[-alpha [privacy threshold]]" << std::endl; + std::cout << "\t\t[-k [identity privacy threshold]]" << std::endl; + std::cout << "\t\t[-alpha [attribute privacy threshold]]" << std::endl; std::cout << "\t\t[-n [number of vertices in random graph]]" << std::endl; std::cout << "\t\t[-occ [occupancy rate in random graph (i.e., percentage of possible edges)]]" << std::endl; - std::cout << "\t\t[-l [label set size in random graph]]" << std::endl << std::endl; + std::cout << "\t\t[-l [label set size in random graph]]" << std::endl; + std::cout << "\t\t[-stats [enables printing of graph properties to stdout]]" << std::endl; + std::cout << "\t\t[-hide-additional [enables the anonymisation of newly added vertices]]" << std::endl << std::endl; std::cout << "\tNote that if an input file is specified, all random graph parametres are ignored. " << std::endl << "\tIf no input file is specified, -n, -occ, and -l are mandatory. " << std::endl << "\t-alpha, the privacy threshold, is always mandatory." << std::endl << std::endl; std::cout << "\tExample usage:" << std::endl; - std::cout << "\t\t" << bin_path << " -alpha 0.10001 -f ./workloads/paper_example.adjList -o private_graph.adjList" << std::endl; - std::cout << "\t\t" << bin_path << " -alpha 0.05 -n 100 -occ .01 -l 2" << std::endl << std::endl; + std::cout << "\t\t" << bin_path << " -mode attribute -alpha 0.10001 -f ./workloads/asonam11_example.adjList -o private_graph.adjList" << std::endl; + std::cout << "\t\t" << bin_path << " -mode attribute -alpha 0.05 -n 100 -occ .01 -l 2" << std::endl << std::endl; + std::cout << "\t\t" << bin_path << " -mode identity -k 3 -f ./workloads/snam_example1.adjList -o anon_graph.adjList -stats" << std::endl << std::endl; std::cout << "\tOutput:" << std::endl; std::cout << "\t\tThe input graph is made alpha-secure from a neighbourhood attribute disclosure (NAD) " << std::endl; std::cout << "\t\tattack. The extent to which the graph is modified is echoed to stdout in the form: " << std::endl; @@ -126,25 +99,40 @@ void print_usage_instructions( const char *bin_path ) { } /** - * Main driver method that constructs a new Graph, anonymises it, and - * then reports the change to the graph's occupancy rate. + * Echoes to stdout statistics (namely clustering coefficient, + * hop plot, and average path length) for a graph. + * @param g The graph for which to print out statistics. + */ +void inline print_stats( UnlabelledGraph *g ) { + std::cout << "|V|: " << g->num_vertices() << std::endl; + std::cout << "|E|: " << g->num_edges() << std::endl; + std::cout << "Occ: " << g->get_occupancy() << std::endl; + std::cout << " CC: " << g->clustering_coefficient() << std::endl; + std::cout << " SC: " << g->subgraph_centrality( 120 ) << std::endl; + HopPlot hop_plot; + g->hop_plot( &hop_plot ); + std::cout << " HP: "; + for( auto it = hop_plot.begin(); it != hop_plot.end(); ++it ) { std::cout << it->first << ":" << it->second << " "; } + std::cout << std::endl; + std::cout << "APL: " << g->average_path_length< true >( &hop_plot ) << std::endl; + std::cout << " HM: " << g->harmonic_mean( &hop_plot ) << std::endl; +} + + +/** + * Runs the software to create a alpha-proximal graph, + * according to command-line specifications. * @param argc The number of command line arguments provided by the user * @param argv An array of strings, each string containing a command * line argument. - * @returns 0 on graceful exit. (Theoretically, other values could have - * been returned, but every path through this method returns 0). + * @returns 0 on successful computation + * 1 if there is an error in the user input + * 2 if there is a software error (either a unit test + * fails or the algorithm fails to produce an alpha-proximal graph) */ -int main(int argc, char** argv) { +uint32_t run_attribute_mode( int argc, char** argv ) { - Graph *g; - - /* Parse input parametres to create a graph. */ - if( argc == 1 || getCmdOption( argv, argv + argc, "-h", false ) > 0 - || getCmdOption( argv, argv + argc, "--help", false ) > 0 ) { - - print_usage_instructions( *argv ); - return 0; - } + LabelledGraph *g; char *filename = getCmdOption( argv, argv + argc, "-f", true ); char *alpha = getCmdOption( argv, argv + argc, "-alpha", true ); @@ -154,10 +142,10 @@ int main(int argc, char** argv) { std::cerr << std::endl << "\tYou must specify a value for alpha (e.g., -alpha 0.1)" << std::endl; - return 0; + return 1; } if( filename != 0 ) { - g = new Graph( filename ); + g = new LabelledGraph( filename ); } else { /* Gather parametres for a random graph */ @@ -165,7 +153,7 @@ int main(int argc, char** argv) { char *occupancy = getCmdOption( argv, argv + argc, "-occ", true ); char *alphabet_size = getCmdOption( argv, argv + argc, "-l", true ); if( graph_size > 0 && occupancy > 0 && alphabet_size > 0 ) { - g = new Graph( atoi( graph_size ), atoi( alphabet_size ) ); + g = new LabelledGraph( atoi( graph_size ), atoi( alphabet_size ) ); g->evenly_distribute_labels(); const uint32_t num_edges = atof( occupancy ) * atoi( graph_size ) * ( atoi( graph_size ) - 1 ) / 2; /* 1/2 b/c undirected */ @@ -177,35 +165,36 @@ int main(int argc, char** argv) { << "\tYou must specify all values for the random graph " << "or provide an input file (e.g., -n 100 =occ .01 -l 2)" << std::endl; - return 0; + + delete g; + return 1; } } - const float original_occupancy = g->get_occupancy(); /* Run unit tests first. */ if( !test_distance() ) { std::cerr << "Failed unit test of LabelDistribution" << " distance function! Aborting." << std::endl; - exit(0); + + delete g; + return 2; } + /* Execute algorithm. */ g->greedy( atof( alpha ) ); if( !g->is_alpha_proximal( atof( alpha ) ) ) { - std::cerr << "This instance was not evidently not solved. "; - std::cerr << "The software must have a bug?"; - exit(0); + std::cerr << "This instance was evidently not solved. "; + std::cerr << "The software must have a bug? "; + std::cerr << "You should contact the developer."; + + delete g; + return 2; } - /* Echo output (final occupancy, and % occupancy change). Note that - * the paper @cite asonam uses the relative difference in occupancy - * as the evaluation measure, not the absolute difference and - * the reported results here are as in the paper. - */ - const float new_occupancy = g->get_occupancy(); - const float occupancy_change = ( new_occupancy - original_occupancy ) / original_occupancy; - std::cout << original_occupancy << " " << new_occupancy - << " " << occupancy_change << std::endl; + /* If requested in command line args, echo to stdout the orig graph stats. */ + char *stats = getCmdOption( argv, argv + argc, "-stats", false ); + if( stats > 0 ) { print_stats( g ); } /* If requested in command line args, write output Graph to file. */ @@ -216,7 +205,154 @@ int main(int argc, char** argv) { g->print( &outfile ); outfile.close(); } + + /* clean up. */ + delete g; + return 0; +} + + +/** + * Runs the software to create a k-degree-anonymous graph, + * according to command-line specifications. + * @param argc The number of command line arguments provided by the user + * @param argv An array of strings, each string containing a command + * line argument. + * @returns 0 on successful computation + * 1 if there is an error in the user input + * 2 if there is a software error (either a unit test + * fails or the algorithm fails to produce an alpha-proximal graph) + */ +uint32_t run_identity_mode( int argc, char** argv ) { + + UnlabelledGraph *g; + + char *filename = getCmdOption( argv, argv + argc, "-f", true ); + char *k = getCmdOption( argv, argv + argc, "-k", true ); + + + if( k == 0 ) { + + std::cerr << std::endl + << "\tYou must specify a privacy threshold, k (e.g., -k 5)" + << std::endl; + return 1; + } + + if( filename != 0 ) { + char *format = getCmdOption( argv, argv + argc, "-format", true ); + if( format == 0 || strcmp( format, "adjList" ) == 0 ) { + g = new UnlabelledGraph( filename, FILE_TYPE_ADJLIST ); + } + else if( strcmp( format, "edgeList" ) == 0 ) { + g = new UnlabelledGraph( filename, FILE_TYPE_EDGELIST ); + } + else if( strcmp( format, "adjListVL" ) == 0 ) { + g = new UnlabelledGraph( filename, FILE_TYPE_ADJLIST_VL ); + } + else { + std::cerr << std::endl + << "\tFormat \"" << format << "\" not supported." + << std::endl; + + return 1; + } + } + else { + /* Gather parametres for a random graph */ + char *graph_size = getCmdOption( argv, argv + argc, "-n", true ); + char *occupancy = getCmdOption( argv, argv + argc, "-occ", true ); + + if( graph_size > 0 && occupancy > 0 ) { + g = new UnlabelledGraph( atoi( graph_size ) ); + const uint32_t num_edges = atof( occupancy ) * atoi( graph_size ) * + ( atoi( graph_size ) - 1 ) / 2; /* 1/2 b/c undirected */ + g->populate_uniformly( num_edges ); + } + else { + std::cerr << std::endl + << "\tYou must specify all values for the random graph " + << "or provide an input file (e.g., -n 100 =occ .01 -l 2)" + << std::endl; + + delete g; + return 1; + } + } + const float original_occupancy = g->get_occupancy(); + + /* Determine whether or not all vertices should be hidden. */ + char *hide_all = getCmdOption( argv, argv + argc, "-hide-additional", false ); + + /* Execute algorithm. */ + if( hide_all > 0 ) { + g->hide_waldo< true >( atoi( k ) ); + if( !g->is_anonymous( atoi( k ) ) ) { + std::cerr << "This instance was evidently not solved. "; + std::cerr << "Did you ensure k <= n?" << std::endl; + + delete g; + return 2; + } + } + else { g->hide_waldo< false >( atoi( k ) ); } + /* If requested in command line args, echo to stdout the anon graph stats. */ + char *stats = getCmdOption( argv, argv + argc, "-stats", false ); + if( stats > 0 ) { print_stats( g ); } + + /* If requested in command line args, write output Graph to file. */ + char *output_filename = getCmdOption( argv, argv + argc, "-o", true ); + if( output_filename > 0 ) { + std::ofstream outfile; + outfile.open( output_filename ); + g->print( &outfile ); + outfile.close(); + } + + /* clean up. */ delete g; return 0; } + +/** + * Main driver method that constructs a new Graph, anonymises it, and + * then reports the change to the graph's occupancy rate. + * @param argc The number of command line arguments provided by the user + * @param argv An array of strings, each string containing a command + * line argument. + * @returns 0 on graceful exit. (Theoretically, other values could have + * been returned, but every path through this method returns 0). + */ +int main(int argc, char** argv) { + + /* Parse input parametres to create a graph. */ + if( argc == 1 || getCmdOption( argv, argv + argc, "-h", false ) > 0 + || getCmdOption( argv, argv + argc, "--help", false ) > 0 ) { + + print_usage_instructions( *argv ); + return 0; + } + + char *mode = getCmdOption( argv, argv + argc, "-mode", true ); + if( mode == 0 ) { + //print_usage_instructions( *argv ); + std::cerr << std::endl + << "\tYou must specify an operation mode (e.g., -mode attribute)" + << std::endl; + return 0; + } + + if( strcmp( mode, "attribute" ) == 0 ) { + run_attribute_mode( argc, argv ); + } + else if( strcmp( mode, "identity" ) == 0) { + run_identity_mode( argc, argv ); + } + else { + std::cerr << "Mode \"" << mode << "\" not supported. Please try either "; + std::cerr << "\"identity\" or \"attribute\" instead." << std::endl; + } + + return 0; +} diff --git a/src/unlabelled_graph/unlabelled_graph.cpp b/src/unlabelled_graph/unlabelled_graph.cpp new file mode 100644 index 0000000..ed2d9e2 --- /dev/null +++ b/src/unlabelled_graph/unlabelled_graph.cpp @@ -0,0 +1,428 @@ +/** + * @file + * @brief Implementation of the UnlabelledGraph class in unlabelled_graph.h + * + * @date 22 Oct 2015 + * @version 2.0 + * @author Sean Chester (sean.chester@idi.ntnu.no) + * @see unlabelled_graph.tpp for the implementation of templated functions. + * + * @copyright Copyright (c) 2015 Sean Chester + * + * This file is part of the GraphAnon suite. + * GraphAnon, version 2.0, is distributed freely under the *MIT License*: + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to deal + * in the Software without restriction, including without limitation the rights + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + * copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in all + * copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + + + +#include /* for uint32_t */ +#include /* for random_shuffle */ +#include /* for cout, endl */ +#include /* for srand, rand */ +#include /* for ffs and std::string */ +#include /* for ifstream, infile */ +#include /* for istringstream, getline */ + +/* STL stuff in use. */ +#include +#include +#include +#include + +#include "omp.h" + +#include "unlabelled_graph.h" /* implementing this class. */ + +void UnlabelledGraph::init() { + /* Initialize adjacency list with n_ empty vectors and every vertex + * to have the same label. */ + adjacency_list_.reserve( n_ ); + for( uint32_t i = 0; i < n_ ; ++i ) { + adjacency_list_.push_back( std::unordered_set< uint32_t >() ); + } + + /* Originally, there are no edges yet (every vertex is isolated). */ + m_ = 0; + + /* initialize random seed for generating random edges later */ + srand (time(NULL)); +} + +UnlabelledGraph::UnlabelledGraph( const uint32_t num_vertices ) : + n_ ( num_vertices ) { init(); } + + +UnlabelledGraph::UnlabelledGraph() : n_ ( 0 ) { init(); } + +UnlabelledGraph::UnlabelledGraph( const std::string filename, const uint32_t file_type ) { + std::string line; + std::cout << filename << std::endl; + std::ifstream infile( filename ); + + /* first parse the graph and label alphabet sizes from + * the first line of the file + */ + std::getline( infile, line ); + std::istringstream iss( line ); + iss >> n_; + + /* check whether n_ was at least read correctly -- the only real + * error checking done in this constructor. + */ + if( n_ <= 0 ) { + std::cerr << "Did not parse a positive number of vertices from input file. " + << "Did you format the file correctly and specify the correct path?" + << std::endl; + return; + } + + /* Init like other constructors now that the data structure sizes + * are known. + */ + init(); + + if( file_type == FILE_TYPE_ADJLIST || file_type == FILE_TYPE_ADJLIST_VL ) { + + uint32_t v; + + /* check if extra meta data for label set size and throw out. */ + if( file_type == FILE_TYPE_ADJLIST_VL ) { + iss >> v; + } + + /* Iterate exactly enough times to fill the data structures, + * irrespective of the length of the file. Each iteration handles + * one vertex, and the file is assumed to be sorted by increasing vertex + * id, with all vertex ids represented as integers in a contiguous sequence + * starting from 0. + */ + for( uint32_t u = 0; u < n_; ++u ) { + + /* grab line related to vertex u. */ + std::getline( infile, line ); + std::istringstream iss( line ); + + /* if labelled, throw out label first. */ + if( file_type == FILE_TYPE_ADJLIST_VL ) { + iss >> v; + } + + /* all numbers on the adjacency list line + * neighbours of u: add them to u's adjacency list. + * Note: undirected graph, so also reciprocally adds + * (v, u), even if that isn't in the input file + */ + while( iss >> v ) { add_edge( u, v ); } + } + } + else if( file_type == FILE_TYPE_EDGELIST ) { + uint32_t u, v; + /* Iterate every edge in the input file. */ + while( std::getline( infile, line ) ) { + std::istringstream iss( line ); + iss >> u >> v; + add_edge( u, v ); + } + } +} + +UnlabelledGraph::~UnlabelledGraph() {} + +uint32_t UnlabelledGraph::num_vertices() { return n_; } +uint32_t UnlabelledGraph::num_edges() { return m_; } + +bool UnlabelledGraph::add_edge( const uint32_t u, const uint32_t v ) { + if( adjacency_list_[ u ].count( v ) > 0 || u == v ) { return false; } + if( adjacency_list_[ v ].count( u ) > 0 ) { return false; } + adjacency_list_[ u ].insert( v ); + adjacency_list_[ v ].insert( u ); + ++m_; + return true; +} + +void UnlabelledGraph::add_vertices( const uint32_t num_vertices ) { + + for( uint32_t i = 0; i < num_vertices ; ++i ) { + adjacency_list_.push_back( std::unordered_set< uint32_t >() ); + } + n_ += num_vertices; +} + +void UnlabelledGraph::add_random_edge() { + + /* Error checking -- are there edges to add? */ + if( is_complete() ) { return; } + + while( true ) { + /* get random edge */ + const uint32_t u = rand() % n_; + const uint32_t v = rand() % n_; + + /* add it if it doesn't yet exist */ + if ( add_edge( u, v ) ) { return; } + } +} + + +bool UnlabelledGraph::populate_uniformly( const uint32_t num_edges ) { + /* error checking: can we add this many edges? */ + if ( num_edges > n_ * ( n_ - 1 ) - m_ ) { return false; } + + /* create a list of all possible edges and randomly shuffle the list */ + std::vector< std::pair < uint32_t, uint32_t > > possible_edges; + for( uint32_t i = 0; i < n_; ++i ) { + for( uint32_t j = i + 1; j < n_; ++j ) { + possible_edges.push_back( std::pair< uint32_t, uint32_t > ( i, j ) ); + } + } + std::random_shuffle( possible_edges.begin(), possible_edges.end() ); + + /* Add the first num_edges randomly shuffled edges that do not already + * exist in the graph. + */ + uint32_t num_added = 0; + for( auto it = possible_edges.begin(); it != possible_edges.end(); ++it ) { + if ( add_edge( it->first, it->second ) ) { + if( ++num_added == num_edges ) { return true; } /* Done! */ + } + } + return false; /* should be an unreachable statement! */ +} + +bool UnlabelledGraph::is_complete() { return m_ == n_ * ( n_ - 1 ); } + +bool UnlabelledGraph::is_anonymous( const uint32_t k ) { + + /* First calculate the counts for every degree in the graph. */ + std::unordered_map< uint32_t, uint32_t > degree_counts; + for (uint32_t i = 0; i < n_; ++i ) { + const uint32_t next_degree = adjacency_list_[ i ].size(); + if( degree_counts.count( next_degree ) == 0 ) { + degree_counts[ next_degree ] = 1; + } + else { ++degree_counts[ next_degree ]; } + } + + /* Then ensure every count is at least k. */ + for( auto it = degree_counts.begin(); it != degree_counts.end(); ++it ) { + if( it->second < k ) { return false; } + } + return true; +} + +float UnlabelledGraph::get_occupancy() { + if( n_ == 0 ) { return 0; } + else return m_ / (float) ( n_ * ( n_ - 1 ) ) * 2; /* x2 because undirected */ +} + +float UnlabelledGraph::clustering_coefficient() { + + //return clustering_coefficient_brute_force(); + + uint64_t closed_triangles = 0; + uint64_t possible_triangles = 0; + + /* First count denominator -- how many open triangles exist. */ +#pragma omp parallel for reduction( +: possible_triangles ) + for( uint32_t i = 0; i < n_; ++i ) { + NeighbourList *my_neighbours = &( adjacency_list_[ i ] ); + possible_triangles += my_neighbours->size() * ( my_neighbours->size() - 1 ); + } + + /* Then count numerator -- how many closed triangles exist. */ +#pragma omp parallel for reduction( +: closed_triangles ) + for( uint32_t u = 0; u < n_; ++u ) { + for( auto v = adjacency_list_[ u ].begin(); v != adjacency_list_[ u ].end(); ++v ) { + for( auto w = adjacency_list_[ u ].begin(); w != adjacency_list_[ u ].end(); ++w ) { + if( *v == *w ) { continue; } + auto closure = std::find( adjacency_list_[ *v ].begin(), adjacency_list_[ *v ].end(), *w ); + if( closure != adjacency_list_[ *v ].end() ) { ++closed_triangles; } + } + } + } + + return closed_triangles / (float) possible_triangles; +} + +float UnlabelledGraph::clustering_coefficient_brute_force() { + uint64_t closed_triangles = 0; + uint64_t possible_triangles = 0; + + /* Iterate all ordered triplets of vertices. */ +#pragma omp parallel for reduction ( +: closed_triangles, possible_triangles ) + for( uint32_t u = 0; u < n_; ++u ) { + for( uint32_t v = 0; v < n_; ++v ) { + if( u == v ) { continue; } + for( uint32_t w = 0; w < n_; ++w ) { + if( u == w || v == w ) { continue; } + if( adjacency_list_[ u ].count( v ) == 1 && + adjacency_list_[ v ].count( w ) == 1 ) { + + ++possible_triangles; + if( adjacency_list_[ u ].count( w ) == 1 ) { + ++closed_triangles; + } + } + } + } + } + + return closed_triangles / (float) possible_triangles; +} + + +void UnlabelledGraph::hop_plot( HopPlot *hop_plot ) { + + std::unordered_set< uint32_t > visited; + std::queue< std::pair< uint32_t, uint32_t > > q; /* (vertex, path length) pairs. */ + uint32_t num_threads; + +#pragma omp parallel + { + num_threads = omp_get_num_threads(); + } + HopPlot hopplots[ num_threads ]; + +#pragma omp parallel for private( visited, q ) + for( uint32_t i = 0; i < n_; ++i ) { + + HopPlot *my_hop_plot = hopplots + omp_get_thread_num(); + + /* clear queue and visited set, although set i so we don't loop. */ + visited.clear(); + visited.insert( i ); + + /* Init queue to contain all direct neighbours of vertex i. */ + for( auto it = adjacency_list_[ i ].begin(); it != adjacency_list_[ i ].end(); ++it ) { + q.push( std::pair< uint32_t, uint32_t >( *it, 1 ) ); + visited.insert( *it ); + } + /* Add all neighbours of i to hop plot score = 1. */ + if( my_hop_plot->count( 1 ) == 1 ) { my_hop_plot->at( 1 ) += adjacency_list_[ i ].size(); } + else { (*my_hop_plot)[ 1 ] = adjacency_list_[ i ].size(); } + + /* Iterate breadth-first through remaining paths. */ + while( ! q.empty() ) { + const uint32_t v = q.front().first; + const uint32_t d = q.front().second; + q.pop(); + + for( auto it = adjacency_list_[ v ].begin(); it != adjacency_list_[ v ].end(); ++it ) { + if( visited.count( *it ) == 0 ) { + visited.insert( *it ); + q.push( std::pair< uint32_t, uint32_t >( *it, d + 1 ) ); + if( my_hop_plot->count( d + 1 ) == 0 ) { (*my_hop_plot)[ d + 1 ] = 1; } + else { ++my_hop_plot->at( d + 1 ); } + } + } + } + } + + /* Reduce all the hop plots from each thread. */ + for( uint32_t t = 0; t < num_threads; ++t ) { + for( auto it = hopplots[ t ].begin(); it != hopplots[ t ].end(); ++it ) { + if( hop_plot->count( it->first ) == 0 ) { (*hop_plot)[ it->first ] = it->second; } + else { hop_plot->at( it->first ) += it->second; } + } + } +} + + +float UnlabelledGraph::harmonic_mean( HopPlot *hop_plot ) { + float h = 0; + + for( auto it = hop_plot->begin(); it != hop_plot->end(); ++it ) { + h += it->second / (float) it->first; + } + return ( h == 0 ? -1 : n_ * ( n_ - 1 ) / h ); +} + +/* Computes sc by repeatedly exponentiating matrix and summing diagonals. */ +double UnlabelledGraph::subgraph_centrality( const uint32_t limit ) { + double summation = 0; + double factorial = 1; + + /* First, create double-buffer adjacency matrix explicitly. + * Need doubles to avoid overflow in matrix. */ + double *adjacency_matrix = new double[ n_ * n_ ]; + double *adjacency_matrix_to_lth = new double[ n_ * n_ ]; + double *new_values = new double[ n_ * n_ ]; + + /* Populate adjacency matrix. */ +#pragma omp parallel for + for( uint32_t i = 0; i < n_; ++i ) { + const uint32_t offset = i * n_; + + for( uint32_t j = 0; j < n_; ++j ) { + adjacency_matrix[ offset + j ] = 0; + adjacency_matrix_to_lth[ offset + j ] = 0; + } + + NeighbourList *neighbours = &( adjacency_list_[ i ] ); + for( auto it = neighbours->cbegin(); it != neighbours->cend(); ++it ) { + adjacency_matrix[ offset + *it ] = 1; + adjacency_matrix_to_lth[ offset + *it ] = 1; + } + } + + /* Iterate over path lengths */ + for( uint32_t l = 2; l <= limit; ++l ) { + factorial *= l; + + /* Raise adjacency matrix to next power. */ +#pragma omp parallel for reduction ( +: summation ) + for( uint32_t i = 0; i < n_; ++i ) { + const uint32_t row_offset = i * n_; + for( uint32_t j = 0; j < n_; ++j ) { + double cell_value = 0; + for( uint32_t k = 0; k < n_; ++k ) { + const uint32_t transpose_offset = k * n_; + cell_value += adjacency_matrix[ row_offset + k ] + * adjacency_matrix_to_lth[ transpose_offset + j]; + } + new_values[ row_offset + j ] = cell_value; + if( j == i ) { + + /* divide by factorial and add to running sum */ + summation += cell_value / factorial; + } + } + } + + /* swap buffers */ + double *tmp = adjacency_matrix_to_lth; + adjacency_matrix_to_lth = new_values; + new_values = tmp; + } + + delete [] adjacency_matrix, adjacency_matrix_to_lth, new_values; + return summation / n_; +} + +void UnlabelledGraph::print( std::ofstream *outstream ) { + (*outstream) << n_ << std::endl; + for( uint32_t i = 0; i < n_; ++i ) { + for( auto it = adjacency_list_[i].begin(); it != adjacency_list_[i].end(); ++it ) { + (*outstream) << *it << " "; + } + (*outstream) << std::endl; + } +} diff --git a/src/unlabelled_graph/unlabelled_graph.h b/src/unlabelled_graph/unlabelled_graph.h new file mode 100644 index 0000000..b30b58e --- /dev/null +++ b/src/unlabelled_graph/unlabelled_graph.h @@ -0,0 +1,378 @@ +/** + * @file + * @brief Definition of a simple, undirected, unlabelled Graph class. + * + * @date 22 Oct 2015 + * @version 2.0 + * @author Sean Chester (sean.chester@idi.ntnu.no) + * + * @copyright Copyright (c) 2015 Sean Chester + * + * This file is part of the GraphAnon suite. + * GraphAnon, version 2.0, is distributed freely under the *MIT License*: + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to deal + * in the Software without restriction, including without limitation the rights + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + * copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in all + * copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#ifndef UNLABELLED_GRAPH_H_ +#define UNLABELLED_GRAPH_H_ + +#include /* For uint32_t */ +#include /* for std::ofstream */ + +/* STL libraries in use */ +#include +#include +#include + +/** + * The file format is an adjacency list. + * The first row gives white-space separated meta-data about + * the UnlabelledGraph, namely the number of vertices (e.g., "6" is a 6-node + * graph). Each subsequent line corresponds to one vertex with a variable-length + * list of space-separated integers indicating the node ids of all neighbours + * (e.g., "1 2 5 9" indicates a vertex with who is connected (only) to vertices + * 1, 2, 5, and 9. Note that there should be exactly n+1 lines in the file, + * the first line should contain exactly one number, and every subsequent line + * can contain zero or more point ids. + */ +#define FILE_TYPE_ADJLIST 0 + +/** + * The file format is a vertex-labelled adjacency list. + * The first row gives white-space + * separated meta-data about the LabelledGraph, namely the number of vertices + * and then the number of distinct labels (e.g., "6 2" is a 6-node graph + * that has a binary label alphabet). + * Each subsequent line corresponds to a vertex. The first value is the + * label of the vertex, and the remaining variable-length space-separated + * integers are the node ids of all neighbours (e.g., "1 2 5 9" indicates a + * vertex with label 1 who is connected (only) to vertices 2, 5, and 9. + * Note that there should beexactly n+1 lines in the file, the first line + * should contain exactly two numbers, and every subsequent line must + * contain at least one number. + */ +#define FILE_TYPE_ADJLIST_VL 1 + +/** + * Input file is a list of edges. The first line is a single number indicating + * the number of vertices in the graph. Each subsequent line gives two white-space + * separated integers representing the origin and destination of an edge, respectively. + * E.g., Figure 1 of @cite waldo would be represented as follows: + * 4 + * 0 1 + * 1 0 + * 1 2 + * 1 3 + * 2 1 + * 2 3 + * 3 1 + * 3 2 + */ +#define FILE_TYPE_EDGELIST 2 + + +/** + * A HopPlot is a histogram of path lengths in a graph. + * It maps from each integer i = 1,... the number of + * vertex pairs in the graph that are reachable with a + * shortest path of exactly i hops. + */ +typedef std::map< uint32_t, uint64_t > HopPlot; + +/** + * A DegreeSequence is a list of the degrees for each of the n_ + * vertices in a graph, sorted in descending order. + */ +typedef std::vector< std::pair< uint32_t, uint32_t > > DegreeSequence; + +/** + * A NeighbourList is a set of neighbours for a given vertex. + * If vertex i is in the list, then the vertex to whom this + * NeighbourList is associated is connected to vertex i. + */ +typedef std::unordered_set < uint32_t > NeighbourList; + +/** + * An AdjacencyList is a format for representing the connectivity of + * a graph. It is a list of length n_, where the i'th element is the list + * of neighbours for the i'th vertex. + */ +typedef std::vector< NeighbourList > AdjacencyList; + +/** + * @brief A simple, undirected, unlabelled graph with no self-loops that is + * equipped with methods for identity disclosure protection. + */ +class UnlabelledGraph { +public: + + /** + * Constructs an UnlabelledGraph object from a file + * @param filename The path to the input file containing the graph + * @param file_type Indicates the format of the input file. + * @post Constructs a new UnlabelledGraph object + * @warning Does minimal error-checking. If the file format is + * invalid or filename is an incorrect path, then the behaviour + * of this constructor is undefined. + * @see An example file + * consisting of the example UnlabelledGraph from Figure 1 of @cite waldo , + * represented in the adjacency list format. + */ + UnlabelledGraph( const std::string filename, const uint32_t file_type ); + + + /** + * Constructs an unlabelled graph with n isloated vertices. + * @param num_vertices The number of vertices in the graph. + * @post Constructs a new UnlabelledGraph object + */ + UnlabelledGraph( const uint32_t num_vertices ); + + /** + * Empty constructor to create an UnlabelledGraph with no vertices + * and no edges. + */ + UnlabelledGraph(); + + /** + * Destroys the UnlabelledGraph. + */ + virtual ~UnlabelledGraph(); + + /** + * Initializes empty UnlabelledGraph data structures: should be called + * by all overloaded constructors once n_ and l_ are set. + */ + void init(); + + /** + * Accessor method to retrieve the number of vertices in the graph, |V|. + */ + uint32_t num_vertices(); + + /** + * Accessor method to retrieve the number of edges in the graph, |E|. + */ + uint32_t num_edges(); + + /** + * Populates the UnlabelledGraph with num_edges undirected edges, + * randomly chosen with uniform distribution. + * @param num_edges The number of edges to insert into the graph. + * @returns false if num_edges cannot be inserted; true otherwise. + * @post The graph contains num_edges more edges than it had before the + * method was invoked, unless it is impossible to add num_edges more edges + * to the graph (then no edges are added). + * + * This method iterates , picking two vertices u,v uniformly at random. + * If the edge (u,v) does not yet exist, it is added. Once num_edges + * successful edge additions have taken place, the routine terminates. If + * num_edges > n * (n - 1) - the number of edges already in the graph, the + * method returns false (failure). + */ + bool populate_uniformly( const uint32_t num_edges ); + + /** + * Retrieves the percentage of possible edges tha are present in the graph. + * @return If E is the edge set and V is the vertex set, the return value is + * |E| / |V| / ( |V| - 1 ). Will also return 0 if |V| = 0. + */ + float get_occupancy(); + + /** + * Calculates the clustering coefficient of the graph. + */ + float clustering_coefficient(); + + /** + * Calculates the harmonic mean of the graph from a hop plot. + * @param hop_plot The hop plot generated for this graph + * @returns The harmonic mean of the graph. + * @pre Assumes that hop_plot has been populated with a call to hop_plot() + */ + float harmonic_mean( HopPlot *hop_plot ); + + /** + * Calculates the average path length of the graph from a hop plot. + * @tparam include_self_paths A boolean suggesting whether paths of + * the form (u,u) with length 0 should be included; if so, the denominator + * of the APL expression will increase. Note that this option is adopted + * inconsistently in @cite waldo : the football dataset is calculated with a + * false value and the netscience dataset is calculated with a true value, for + * example. To obtain the values from @cite ying use _false_ + * @param hop_plot The hop plot generated for this graph + * @returns The average path length between any two connected vertices. + * @pre Assumes that hop_plot has been populated with a call to hop_plot() + */ + template + float average_path_length( HopPlot *hop_plot ); + + /** + * Calculates the subgraph centrality of the graph. + * @param limit The maximum length walk over which to compute subgraph + * centrality. + * @returns The subgraph centrality of the graph. + */ + double subgraph_centrality( const uint32_t limit ); + + /** + * Populates the hop plot for this graph. + * @param hop_plot A map from path length i to the number of ordered + * vertex pairs whose shortest path between them is of length i. + * @post The hop_plot map is first cleared and then populated with the + * hop plot data for this graph. + * @see average_path_length() + */ + void hop_plot( HopPlot *hop_plot ); + + bool is_complete(); + + /** + * Determines whether or not the UnlabelledGraph is + * k-degree-anonymous. + * @param k The privacy threshold, k. Every vertex must + * have the same degree as at least k-1 other vertices in + * order to be k-degree-anonymous. + * @returns True if the UnlabelledGraph is k-degree-anonymous; + * false if not. + */ + bool is_anonymous( const uint32_t k ); + + /** + * Modifies the UnlabelledGraph so that it is k-degree-anonymous + * using the algorithm from @cite waldo . + * @tparam hide_new_vertices A boolean flag indicating whether or + * not the newly added vertices should also be anonymised. Note that + * the experiments in @cite waldo have this flag set to _false_. + * @param k The privacy threshold, k. + * @post The UnlabelledGraph is modified to be a super-graph with + * a larger vertex set and edge set such that it is k-degree-anonymous. + * @see is_anonymous() + * @note You may observe some minor variances from the anonymisation + * reported in @cite waldo because the order of vertices within an + * equivalence class is not clearly defined. + */ + template < bool hide_new_vertices > + void hide_waldo( const uint32_t k ); + + /** + * Prints the graph to outstream in adjacency list format + * (primarily for the purpose of testing). + * @param outstream The file stream to which the UnlabelledGraph should be output + * @see An example file + * consisting of the example Graph from Figure 1 of @cite waldo , + * represented in the adjacency list format. + */ + void print( std::ofstream *outstream ); + +protected: + + /** + * Inserts the undirected edge (u,v) into the graph if it does not already exist. + * @param u The source vertex of the edge + * @param v The destination vertex of the edge + * @return True if the edge was added, false if it already existed + * @post Edge (u,v) exists in the graph (irrespective of whether it was there + * prior to invoking the method) + */ + bool add_edge( const uint32_t u, const uint32_t v ); + + /** + * Adds a specified number of isolated vertices to the graph. + * @param num_vertices The number of vertices that should be added + * to the graph. + * @post The graph contains num_vertices more vertices (all isolated) + * than before execution of the subroutine. + */ + void add_vertices( const uint32_t num_vertices ); + + /** + * Inserts a random new edge into the graph if the graph is not already + * a complete graph. + * @post The graph remains unaffected if it is complete. Otherwise, + * one edge that previously was not in the graph now appears. + */ + void add_random_edge(); + + /** + * Populates the degree sequence for this UnlabelledGraph. + * @param degrees A vector to populate with the degree sequence, where each + * element is a pair of the form (degree, vertex id). + * @post degrees is emptied and then populated with a list of degrees + * for each vertex, not necessarily unique and in ascending order. + */ + void inline retrieve_degree_sequence( DegreeSequence *degrees ); + + /** + * Returns the path length between vertex u and vertex v. + * @param u The id of the source vertex + * @param v The id of the destination vertex + * @returns -1 if u and v are disconnected; otherwise, the + * minimum number of edges that must be traversed in order to + * reach v from u. + */ + int inline calculate_path_length( uint32_t u, uint32_t v ); + + + /* Member variables */ + + uint32_t n_; /**< The number of vertices in the graph. */ + uint32_t m_; /**< The number of edges in the graph. */ + + /** + * The adjacency list: adjacency_list[i] is a set + * of node ids that are neighbours for the node with id i. + */ + AdjacencyList adjacency_list_; + +private: + + /** + * Calculates the average path length of the graph in a slow + * but definitely correct manner by executing a breadth first + * search for shortest paths between every pair of vertices + * in the graph. + * @tparam include_self_paths A boolean suggesting whether paths of + * the form (u,u) with length 0 should be included; if so, the denominator + * of the APL expression will increase.This option is adopted in @cite waldo + * in Figures 10-13, but the football dataset in Table 2 mistakenly is + * calculated with a false value. To obtain the values from @cite ying + * one should again use use _true_. + * @returns The average path length between any two connected + * vertices. + * @note This method is slow and primarily for testing purposes. + * @see average_path_length() + */ + template + float average_path_length_brute_force(); + + /** + * Calculates the clustering coefficient of the graph in a slow + * but definitely correct manner. + * @returns The clustering coefficient of the graph. + * @note This method is slow and primarily for testing purposes. + * @see clustering_coefficient() + */ + float clustering_coefficient_brute_force(); +}; + +#include "unlabelled_graph.tpp" + +#endif /* UNLABELLED_GRAPH_H_ */ diff --git a/src/unlabelled_graph/unlabelled_graph.tpp b/src/unlabelled_graph/unlabelled_graph.tpp new file mode 100644 index 0000000..680cf71 --- /dev/null +++ b/src/unlabelled_graph/unlabelled_graph.tpp @@ -0,0 +1,256 @@ +/** + * @file Implementation of templated functions for unlabelled_graph.h + * + * @date 6 Nov 2015 + * @author Sean Chester (sean.chester@idi.ntnu.no) + * + * @copyright Copyright (c) 2015 Sean Chester + * + * This file is part of the GraphAnon suite. + * GraphAnon, version 2.0, is distributed freely under the *MIT License*: + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to deal + * in the Software without restriction, including without limitation the rights + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + * copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in all + * copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#include +#include + +#include "omp.h" + +int inline UnlabelledGraph::calculate_path_length( uint32_t u, uint32_t v ) { + std::unordered_set< uint32_t > visited; + std::queue< std::pair< uint32_t, uint32_t > > q; /* (vertex, path length) pairs. */ + + /* Check if source and destination are the same. */ + if( u == v ) { return 0; } + + /* Insert all direct neighbours into the visited list. */ + for( auto it = adjacency_list_[ u ].begin(); it != adjacency_list_[ u ].end(); ++it ) { + + /* If neighbour is v, we are done. */ + if( *it == v ) { + return 1; + } + /* Otherwise, push it onto the queue for revisiting in breadth-first order. */ + else { + q.push( std::pair< uint32_t, uint32_t >( *it, 1 ) ); + visited.insert( *it ); + } + } + + while( !q.empty() ) { + + /* Pop top off the queue. */ + const uint32_t vertex = q.front().first; + const uint32_t num_hops = q.front().second; + q.pop(); + + /* Iterate neighbours of vertex to see if they are v. */ + std::unordered_set< uint32_t > *neighbours = &( adjacency_list_[ vertex ] ); + for( auto it = neighbours->begin(); it != neighbours->end(); ++it ) { + /* First check if we have found our destination. */ + if( *it == v ) { return num_hops + 1; } + + /* Otherwise, push it onto the queue if we have not already visited it. */ + else if( visited.count( *it ) == 0 ) { + visited.insert( *it ); + q.push( std::pair< uint32_t, uint32_t >( *it, num_hops + 1 ) ); + } + } + } + return -1; +} + +template +float UnlabelledGraph::average_path_length_brute_force() { + uint32_t sum_of_path_lengths = 0; + uint32_t number_of_connected_paths = 0; + + /* Iterate all pairs of distinct vertices. */ +#pragma omp parallel for reduction ( +: sum_of_path_lengths, number_of_connected_paths ) + for( uint32_t u = 0; u < n_; ++u ) { + for( uint32_t v = ( include_self_paths ? u : u + 1 ); v < n_; ++v ) { + const int path_length = calculate_path_length( u , v ); + if( path_length >= 0 ) { + if( u < v ) { /* double count when uu too */ + number_of_connected_paths += 2; + sum_of_path_lengths += 2 * path_length; + } + else { /* u == v */ + ++number_of_connected_paths; + sum_of_path_lengths += path_length; + } + } + } + } + + if ( number_of_connected_paths > 0 ) { + return sum_of_path_lengths / (float) number_of_connected_paths; + } + else { return 0; } +} + +template +float UnlabelledGraph::average_path_length( HopPlot *hop_plot ) { + + //return average_path_length_brute_force< include_self_paths >(); + + /* init with/without the paths (i,i) of length 0 */ + uint64_t sum = 0; + uint64_t count = ( include_self_paths ? n_ : 0 ); + + /* Iterate hop plot to process all length > 0 paths */ + for( auto it = hop_plot->begin(); it != hop_plot->end(); ++it ) { + if( it->second > 0 ) { + sum += it->first * it->second; + count += it->second; + } + } + + return ( count == 0 ? 0 : sum / (float) count ); +} + +/** + * Optimally k-anonymizes the degree sequence such that max_deficiency is minimized. + * @param degrees The original degree sequence as pairs of (degree, vertex id) + * @param k The privacy threshold, k. + * @return The maximum deficiency calculated to transform the original degree sequence + * into a k-anonymous one. + * @post The degree sequence, degrees, is modified such that every element that appears + * in the list appears at least k times. + * @see Section 3.1 and Table 1 of @cite waldo + */ +uint32_t inline anonymize_degree_sequence( DegreeSequence *degrees, const uint32_t k ) { + + const uint32_t n = degrees->size(); + + /* arrays to store dynamic programming results. */ + uint32_t costs[ n ]; + uint32_t starts[ n ]; + + /* trivially populate first 2k - 1 positions, since cannot split. */ + for( uint32_t i = 0; i < 2 * k - 1; ++i ) { + starts[ i ] = 0; + costs[ i ] = degrees->at( 0 ).first - degrees->at( i ).first; + } + + /* compute best split for remaining n - (2k - 1) positions. */ + for( uint32_t i = 2 * k - 1; i < n; ++i ) { + const uint32_t range_end = i - k; + const uint32_t range_start = ( k - 1 > i - 2 * k + 1 ? k - 1 : i - 2 * k + 1 ); + + uint32_t best_split_pos = range_start + 1; + + const uint32_t cost_left = costs[ range_start ]; + const uint32_t cost_right = degrees->at( range_start + 1 ).first - degrees->at( i ).first; + uint32_t best_cost = ( cost_left > cost_right ? cost_left : cost_right ); + uint32_t best_sum = cost_left + cost_right; + + for( uint32_t j = range_start + 1; j <= range_end; ++j ) { + const uint32_t cost_left = costs[ j ]; + const uint32_t cost_right = degrees->at( j + 1 ).first - degrees->at( i ).first; + const uint32_t full_cost = ( cost_left > cost_right ? cost_left : cost_right ); + const uint32_t sum_cost = cost_left + cost_right; + if( full_cost < best_cost || ( full_cost == best_cost && sum_cost < best_sum ) ) { + best_split_pos = j + 1; + best_cost = full_cost; + best_sum = sum_cost; + } + } + starts[ i ] = best_split_pos; + costs[ i ] = best_cost; + } + + /* Update degrees to k-anonymize the degree sequence by replaying the + * dynamic programming results backwards. */ + for( int i = n - 1; i >= 0; i = starts[ i ] - 1 ) { + const uint32_t block_start = starts[ i ]; + const uint32_t block_degree = degrees->at( block_start ).first; + for( uint32_t j = block_start + 1; j <= i; ++j ) { + degrees->at( j ).first = block_degree; + } + } + + /* Return max deficiency. */ + return costs[ n - 1 ]; +} + +void inline UnlabelledGraph::retrieve_degree_sequence( DegreeSequence *degrees ) { + + /* First create list of pairs. */ + degrees->clear(); + for( uint32_t i = 0; i < n_; ++i ) { + const uint32_t next_degree = adjacency_list_[ i ].size(); + degrees->push_back( std::pair< uint32_t, uint32_t >( next_degree, i ) ); + } + + /* Then sort them by descending degree. */ + std::sort( degrees->begin(), degrees->end(), + std::greater< std::pair< uint32_t, uint32_t > >() ); +} + +template < bool hide_new_vertices > +void UnlabelledGraph::hide_waldo( const uint32_t k ) { + + /* Section 3.1: First anonymize degree sequence. */ + DegreeSequence degrees; + retrieve_degree_sequence( °rees ); + DegreeSequence anon_degrees( degrees ); + uint32_t max_def = anonymize_degree_sequence( &anon_degrees, k ); + + /* Section 3.2: Augment graph with min # vertices. */ + if( max_def > 0 ) { + const uint32_t first_new_vertex = n_; + if ( hide_new_vertices ) { + const uint32_t md_or_k = ( max_def > k ? max_def : k ); + const uint32_t new_vertices = ( md_or_k % 2 ? md_or_k : md_or_k + 1 ); + add_vertices( new_vertices ); + } + else { add_vertices( max_def ); } + + /* Section 3.3: Add new edges cyclically to anonymize original graph. */ + uint32_t cursor = first_new_vertex; + for( uint32_t i = 0; i < first_new_vertex; ++i ) { + const uint32_t deficiency = anon_degrees[ i ].first - degrees[ i ].first; + for( uint32_t j = 0; j < deficiency; ++j ) { + add_edge( degrees[ i ].second, cursor ); + if( cursor == n_ - 1 ) { cursor = first_new_vertex; } + else { ++cursor; } + } + } + + /* finally, check whether the new vertices are k-anonymous, or whether the + * pairing procedure is necessary. */ + if( hide_new_vertices && cursor != first_new_vertex ) { + if( ! is_anonymous( k ) ) { + while( cursor < n_ - 1 ) { + add_edge( cursor, cursor + 1 ); + cursor += 2; + } + if( cursor == n_ - 1 ) { + add_edge( n_ - 1, first_new_vertex ); + for( cursor = first_new_vertex + 1; cursor < n_; cursor += 2 ) { + add_edge( cursor, cursor + 1 ); + } + } + } + } + } + return; +} diff --git a/workloads/paper_example.adjList b/workloads/asonam11_example.adjList similarity index 100% rename from workloads/paper_example.adjList rename to workloads/asonam11_example.adjList diff --git a/workloads/data_sources.md b/workloads/data_sources.md new file mode 100644 index 0000000..e149cef --- /dev/null +++ b/workloads/data_sources.md @@ -0,0 +1,9 @@ +#### Scalability Tests + * [Enron](http://snap.stanford.edu/data/) + * [Net Science](http://www-personal.umich.edu/~mejn/netdata/) + * Prefuse --- We never could recall where this came from and no longer have it. + * [Football](http://www-personal.umich.edu/~mejn/netdata/) + +#### Comparability Tests + * [polblogs](http://www.casos.cs.cmu.edu/computational_tools/datasets/external/polblogs/index11.php) + \ No newline at end of file diff --git a/workloads/gml_to_edgelist.sh b/workloads/gml_to_edgelist.sh new file mode 100755 index 0000000..478f91e --- /dev/null +++ b/workloads/gml_to_edgelist.sh @@ -0,0 +1,4 @@ +cat <( grep -o "source [0-9]\{1,\}" $1 | grep -o "[0-9]\{1,\}" ) <(grep -o "target [0-9]\{1,\}" $1 | grep -o "[0-9]\{1,\}") | sort -n | uniq | nl -v 0 | awk '{ print $2 " " $1}' > .tmp_map; \ +paste -d' ' <( grep -o "source [0-9]\{1,\}" $1 | grep -o "[0-9]\{1,\}" ) <(grep -o "target [0-9]\{1,\}" $1 | grep -o "[0-9]\{1,\}") > .tmp_edgelist; \ +tail -n 1 .tmp_map | awk '{ V = $2 + 1; print V }' > $2; \ +awk 'NR==FNR {map[$1]=$2; next}{print map[$1],map[$2]}' .tmp_map .tmp_edgelist >> $2; rm .tmp_map .tmp_edgelist diff --git a/workloads/snam_example1.adjList b/workloads/snam_example1.adjList new file mode 100644 index 0000000..1236eb5 --- /dev/null +++ b/workloads/snam_example1.adjList @@ -0,0 +1,5 @@ +4 +1 2 3 +0 2 +0 1 +0 diff --git a/workloads/snam_example2.adjList b/workloads/snam_example2.adjList new file mode 100644 index 0000000..584e911 --- /dev/null +++ b/workloads/snam_example2.adjList @@ -0,0 +1,8 @@ +7 +1 2 3 4 5 +0 +0 +0 4 +0 3 5 +0 4 6 +5
4 + * 0 1 + * 1 0 + * 1 2 + * 1 3 + * 2 1 + * 2 3 + * 3 1 + * 3 2