Skip to content

daichou03/AnchoredDensestSubgraph

 
 

Repository files navigation

README - Last update: 20241025

THIS IS THE RELEASE BRANCH FOR PAPER:

On Density-based Local Community Search

Depending on your objective, follow steps as below:

flowchart LR
    A>"Getting Started"]
    B>"(Optional) Change LP Solver"]
    C1>"Run One Instance"]
    C20>"Download Datasets for Case Study"]
    C2>"Replicate Case Study"]
    C3>"Experiment"]
    A --> B
    B --> C1
    B --> C20
    C20 --> C2
    C20 --> C3
    style B stroke-dasharray: 5 5
    style C20 stroke-dasharray: 5 5
Loading

flowchart LR
    A["Getting Started"]
Loading

Prerequisites

Julia

Packages required

Installing packages in Julia for example:

using Pkg
Pkg.add("MatrixNetworks")
Pkg.add("MAT")
Pkg.add("StatsBase")
Pkg.add("JuMP")
Pkg.add("HiGHS")  # Skip if using other LP Solver
Pkg.add("CSV")
Pkg.add("DataFrames")

Working folder

For all tasks, open cmd/bash, navigate to the repository's root directory, then enter the ./src folder and run julia


flowchart LR
    B["(Optional) Change LP Solver"]
    style B stroke-dasharray: 5 5
Loading

By default, the LP (Linear-Programming) solver HiGHS would be (installed and) used.

If you want to use another LP solver, take CPLEX for example, you need to refer to:


flowchart LR
    C1["Run One Instance"]
Loading
include("LP_algorithm.jl")

Read/create a graph

Load a toy graph in ./Example_small/ (there are some other toy graphs in the same folder for exploration):

A = readIN("lobster.in", "../Example_small/")

This graph is same as:

A = sparse([1,1,1,2,2,3,3,4,2,3,4,3,4,4,5,5], [2,3,4,3,4,4,5,5,1,1,1,2,2,3,3,4], ones(Float64, 16), 5, 5)

Run GADS under graph A with $R = [1,2], x (\omega_{12}) = 1, y (\omega_{24}) = -1$ (see Section 1.4):

R = [1,2]
x = 1
y = -1
# 7 Numbers in weight corresponds to the Weight Configuration with 10 edge weights as Definition 1 (in paper)
# minus 3 edge weights excluded by Definition 4-C2 (edge weights of edges between V_3 and V_4).
# Note: GADS algorithm only guarantee to work on 0 <= x <= 2, y <= 0 and all other weights same as below.
weight = [2,x,0,0,0,0,y]  # Weight Configuration Ω
ds = DoSolveLocalADS(SOLVER_LP_ADSS, A, R, false, false, DEFAULT_LP_SOLVER, weight)
ds.alpha_star  # The Local Densest Graph of G under weight configuration Ω and seed set R
ds.source_nodes  # The local density of this subgraph

flowchart LR
    C20["Download Datasets for Case Study"]
    style C20 stroke-dasharray: 5 5
Loading

Here for dataset used in case study, including:

  • DBLP graph data csdblp.in
  • Test seed sets csdblp.anchor
  • Author names corresponding to ID ent.author

Choose "Download all", extract the content to the root folder of the repository so it would look like

.
├── src/
├── ...
├── AnchorNodes/
│   ├── Baseline/
│   │   ├── csdblp.anchor
├── CaseStudy/
│   ├── DBLP/
│   │   ├── Raw
│   │   │   ├── ent.author
├── Example_SCC/
│   ├── csdblp.in

flowchart LR
    C2["Replicate Case Study"]
Loading

To replicate the query demonstrated in A.1 Case Study:

include("LP_algorithm.jl")
include("CS_DBLP.jl")  # Loading graph and author names, takes time. Variable B would be the dblp graph.
queryID = 8  # The one used in Case Study
dataName = "csdblp"
R = readAnchors(dataName, "Baseline")[queryID]
x = 2
y = 0
weight = [2,x,0,0,0,0,y]
resultSet = DoSolveLocalADS(SOLVER_LP_ADSS, B, R, false, false, DEFAULT_LP_SOLVER, weight).source_nodes  # List of ID of authors of S_{2,0} in Case Study
allNames[resultSet]  # Names of authors of S_{2,0} in Case Study

flowchart LR
    C3["Experiment"]
Loading

Not in the original paper, but you can perform bulk evaluation with a set of seed nodes.

In this example, we use the same dataset for Case Study to show how to perform the experiment.

include("LP_compare_test.jl")
data_name = "csdblp"
# Defines to find seed sets. "Baseline" is default.
# Say your data graph file is .\Example_SCC\csdblp.in
# Then the seed sets file should be prepared as .\AnchorNodes\Baseline\csdblp.anchor
seedset_folder="Baseline"
# For solvers:
# Set the 1st entry to true to evaluate the flow network algorithm (FNLA) in paper "Anchored Densest Subgraph".
# Set the 2nd entry to true to evaluate the linear programming based algorithm (LPLAS) in this paper with edge weights = weight.
# Note that weight only affects LPLAS, as FNLA only works on one specific edge weights.
solvers = [false, true]
x = 1
y = -1
weight = [2,x,0,0,0,0,y]
test_name = "myTest"  # Change output folder
seedset_size = 100  # Evaluate first seedset_size many seed sets of the aforementioned seed sets file. 0 means all in seed sets file.
ProcessAndOutputAlgorithms(data_name, seedset_folder, solvers, weight, test_name, seedset_size)

The output files would appear under .\LPCompResults\. For example, the above test would produce these 2 files:

  • csdblp-LPLAS-myRun.lpcompsets: result set of each evaluation
  • csdblp-LPLAS-myRun.lpcompstats: stats:
    • alpha: max local density
    • ext_time: total time taken
    • int_time: For LPLAS, (sum of) time taken reported by LP solver only. For FNLA equals to ext_time
    • lnsize: Number of nodes in the last working graph (of the local algorithm)
    • lmsize: Number of edges in the last working graph
    • iters: number of iterative expansions
    • ssize: cardinality of the result set

flowchart LR
    D["Accepted input graph file format by Utils_io.readIN()"]
Loading

This algorithm can work on other data graphs you downloaded, for example, uk2007 (large!).

All our experimental data graphs can be found at SNAP, KONECT and Network Repository.

In our experiment, we preprocessed all the data graphs we used so that they can be loaded by readIN() (see above for syntax):

  • The first line contains two integers - the number of vertices (n) and the number of edges (m), respectively.
  • For the next m lines, each line contains two integers representing an (unweighted, undirected) edge; all vertices are 1-indexed.
  • Any contents after these m lines will not be processed; Any line starts with # and % is ignored.

If it does not meet the above conditions, you may need to preprocess the raw data first.

Example of a legitimate .in file:

% "Lobster" toy graph
# 4-clique:
5 8  
1 2  
1 3  
1 4  
2 3  
2 4  
3 4  
# + An extra extruding triangle:
3 5  
4 5  

This line is also a comment.

Alternatively, a readRaw() does the same as readIN() but assumes the file does not have a header line for # vertices and # edges.

Instead, you pass the number of vertices and the number of edges as the parameters:

A = readRaw("zebra.txt", 27, 111, "../Example_raw")

flowchart LR
    Acknowledgements
Loading

This code is a fork of HypergraphFlowClustering by Nate Veldt. We are grateful for [Nate Veldt]'s contributions, such as:

  • maxflow.jl with modifications.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Julia 92.1%
  • Gnuplot 7.9%