Add the Lightglue matcher (#285)

Support our newest LightGlue matcher: https://github.com/cvg/LightGlue --------- Co-authored-by: Paul-Edouard Sarlin <[email protected]>
cvg · Jul 11, 2023 · 9fdab1e · 9fdab1e
1 parent 61e0cd0
commit 9fdab1e
Show file tree

Hide file tree

Showing 6 changed files with 137 additions and 8,301 deletions.
diff --git a/README.md b/README.md
@@ -1,10 +1,10 @@
 # hloc - the hierarchical localization toolbox
 
-This is `hloc`, a modular toolbox for state-of-the-art 6-DoF visual localization. It implements [Hierarchical Localization](https://arxiv.org/abs/1812.03506), leveraging image retrieval and feature matching, and is fast, accurate, and scalable. This codebase won the indoor/outdoor localization challenges at [CVPR 2020](https://sites.google.com/view/vislocslamcvpr2020/home) and [ECCV 2020](https://sites.google.com/view/ltvl2020/), in combination with [SuperGlue](https://psarlin.com/superglue/), our graph neural network for feature matching.
+This is `hloc`, a modular toolbox for state-of-the-art 6-DoF visual localization. It implements [Hierarchical Localization](https://arxiv.org/abs/1812.03506), leveraging image retrieval and feature matching, and is fast, accurate, and scalable. This codebase combines and makes easily accessible years of research on image matching and Structure-from-Motion.
 
 With `hloc`, you can:
 
-- Reproduce [our CVPR 2020 winning results](https://www.visuallocalization.net/workshop/cvpr/2020/) on outdoor (Aachen) and indoor (InLoc) datasets
+- Reproduce state-of-the-art results on multiple indoor and outdoor visual localization benchmarks
 - Run Structure-from-Motion with SuperPoint+SuperGlue to localize with your own datasets
 - Evaluate your own local features or image retrieval for visual localization
 - Implement new localization pipelines and debug them easily 🔥
@@ -43,13 +43,13 @@ jupyter notebook --ip 0.0.0.0 --port 8888 --no-browser --allow-root
 
 The toolbox is composed of scripts, which roughly perform the following steps:
 
-1. Extract SuperPoint local features for all database and query images
+1. Extract local features, like [SuperPoint](https://arxiv.org/abs/1712.07629) or [DISK](https://arxiv.org/abs/2006.13566), for all database and query images
 2. Build a reference 3D SfM model
    1. Find covisible database images, with retrieval or a prior SfM model
-   2. Match these database pairs with SuperGlue
+   2. Match these database pairs with [SuperGlue](https://psarlin.com/superglue/) or the faster [LightGlue](https://github.com/cvg/LightGlue)
    3. Triangulate a new SfM model with COLMAP
 3. Find database images relevant to each query, using retrieval
-4. Match the query images with SuperGlue
+4. Match the query images
 5. Run the localization
 6. Visualize and debug
 
@@ -93,8 +93,8 @@ We show in [`pipeline_SfM.ipynb`](https://nbviewer.jupyter.org/github/cvg/Hierar
 
 ## Results
 
-- Supported local feature extractors: [SuperPoint](https://arxiv.org/abs/1712.07629), [D2-Net](https://arxiv.org/abs/1905.03561), [SIFT](https://www.cs.ubc.ca/~lowe/papers/ijcv04.pdf), and [R2D2](https://arxiv.org/abs/1906.06195).
-- Supported feature matchers: [SuperGlue](https://arxiv.org/abs/1911.11763) and nearest neighbor search with ratio test, distance test, and/or mutual check.
+- Supported local feature extractors: [SuperPoint](https://arxiv.org/abs/1712.07629), [DISK](https://arxiv.org/abs/2006.13566), [D2-Net](https://arxiv.org/abs/1905.03561), [SIFT](https://www.cs.ubc.ca/~lowe/papers/ijcv04.pdf), and [R2D2](https://arxiv.org/abs/1906.06195).
+- Supported feature matchers: [SuperGlue](https://arxiv.org/abs/1911.11763), its faster follow-up [LightGlue](https://github.com/cvg/LightGlue), and nearest neighbor search with ratio test, distance test, and/or mutual check. hloc also supports dense matching with [LoFTR](https://github.com/zju3dv/LoFTR).
 - Supported image retrieval: [NetVLAD](https://arxiv.org/abs/1511.07247), [AP-GeM/DIR](https://github.com/naver/deep-image-retrieval), [OpenIBL](https://github.com/yxgeee/OpenIBL), and [CosPlace](https://github.com/gmberton/CosPlace).
 
 Using NetVLAD for retrieval, we obtain the following best results:

diff --git a/demo.ipynb b/demo.ipynb
diff --git a/hloc/match_features.py b/hloc/match_features.py
@@ -21,6 +21,20 @@
     - model: the model configuration, as passed to a feature matcher.
 '''
 confs = {
+    'superpoint+lightglue': {
+        'output': 'matches-superpoint-lightglue',
+        'model': {
+            'name': 'lightglue',
+            'features': 'disk',
+        },
+    },
+    'disk+lightglue': {
+        'output': 'matches-disk-lightglue',
+        'model': {
+            'name': 'lightglue',
+            'features': 'disk',
+        },
+    },
     'superglue': {
         'output': 'matches-superglue',
         'model': {

diff --git a/hloc/matchers/lightglue.py b/hloc/matchers/lightglue.py
@@ -0,0 +1,25 @@
+from ..utils.base_model import BaseModel
+from lightglue import LightGlue as LightGlue_
+
+class LightGlue(BaseModel):
+    default_conf = {
+        'features': 'superpoint',
+        'depth_confidence': 0.95,
+        'width_confidence': 0.99,
+    }
+    required_inputs = [
+        'image0', 'keypoints0', 'descriptors0',
+        'image1', 'keypoints1', 'descriptors1',
+    ]
+
+    def _init(self, conf):
+        self.net = LightGlue_(conf.pop('features'), **conf)
+
+    def _forward(self, data):
+        data['descriptors0'] = data['descriptors0'].transpose(-1, -2)
+        data['descriptors1'] = data['descriptors1'].transpose(-1, -2)
+
+        return self.net({
+            'image0': {k[:-1]: v for k, v in data.items() if k[-1] == '0'},
+            'image1': {k[:-1]: v for k, v in data.items() if k[-1] == '1'}
+        })
diff --git a/hloc/utils/viz_3d.py b/hloc/utils/viz_3d.py
@@ -80,7 +80,9 @@ def plot_camera(
         color: str = 'rgb(0, 0, 255)',
         name: Optional[str] = None,
         legendgroup: Optional[str] = None,
-        size: float = 1.0):
+        fill: bool = False,
+        size: float = 1.0,
+        text: Optional[str] = None):
     """Plot a camera frustum from pose and intrinsic matrix."""
     W, H = K[0, 2]*2, K[1, 2]*2
     corners = np.array([[0, 0], [W, 0], [W, H], [0, H], [0, 0]])
@@ -92,32 +94,31 @@ def plot_camera(
         scale = 1.0
     corners = to_homogeneous(corners) @ np.linalg.inv(K).T
     corners = (corners / 2 * scale) @ R.T + t
-
-    x, y, z = corners.T
-    rect = go.Scatter3d(
-        x=x, y=y, z=z, line=dict(color=color), legendgroup=legendgroup,
-        name=name, marker=dict(size=0.0001), showlegend=False)
-    fig.add_trace(rect)
+    legendgroup = legendgroup if legendgroup is not None else name
 
     x, y, z = np.concatenate(([t], corners)).T
     i = [0, 0, 0, 0]
     j = [1, 2, 3, 4]
     k = [2, 3, 4, 1]
 
-    pyramid = go.Mesh3d(
-        x=x, y=y, z=z, color=color, i=i, j=j, k=k,
-        legendgroup=legendgroup, name=name, showlegend=False)
-    fig.add_trace(pyramid)
+    if fill:
+        pyramid = go.Mesh3d(
+            x=x, y=y, z=z, color=color, i=i, j=j, k=k,
+            legendgroup=legendgroup, name=name, showlegend=False,
+            hovertemplate=text.replace('\n', '<br>'))
+        fig.add_trace(pyramid)
+
     triangles = np.vstack((i, j, k)).T
     vertices = np.concatenate(([t], corners))
     tri_points = np.array([
         vertices[i] for i in triangles.reshape(-1)
     ])
-
     x, y, z = tri_points.T
+
     pyramid = go.Scatter3d(
         x=x, y=y, z=z, mode='lines', legendgroup=legendgroup,
-        name=name, line=dict(color=color, width=1), showlegend=False)
+        name=name, line=dict(color=color, width=1), showlegend=False,
+        hovertemplate=text.replace('\n', '<br>'))
     fig.add_trace(pyramid)
 
 
@@ -134,6 +135,7 @@ def plot_camera_colmap(
         image.projection_center(),
         camera.calibration_matrix(),
         name=name or str(image.image_id),
+        text=image.summary(),
         **kwargs)
 
 
@@ -156,16 +158,22 @@ def plot_reconstruction(
         min_track_length: int = 2,
         points: bool = True,
         cameras: bool = True,
+        points_rgb: bool = True,
         cs: float = 1.0):
     # Filter outliers
     bbs = rec.compute_bounding_box(0.001, 0.999)
     # Filter points, use original reproj error here
-    xyzs = [p3D.xyz for _, p3D in rec.points3D.items() if (
+    p3Ds = [p3D for _, p3D in rec.points3D.items() if (
                             (p3D.xyz >= bbs[0]).all() and
                             (p3D.xyz <= bbs[1]).all() and
                             p3D.error <= max_reproj_error and
                             p3D.track.length() >= min_track_length)]
+    xyzs = [p3D.xyz for p3D in p3Ds]
+    if points_rgb:
+        pcolor = [p3D.color for p3D in p3Ds]
+    else:
+        pcolor = color
     if points:
-        plot_points(fig, np.array(xyzs), color=color, ps=1, name=name)
+        plot_points(fig, np.array(xyzs), color=pcolor, ps=1, name=name)
     if cameras:
         plot_cameras(fig, rec, color=color, legendgroup=name, size=cs)
diff --git a/requirements.txt b/requirements.txt
@@ -10,3 +10,4 @@ h5py
 pycolmap>=0.3.0
 kornia>=0.6.7
 gdown
+git+https://github.com/cvg/LightGlue