Skip to content

Commit

Permalink
Add the Lightglue matcher (#285)
Browse files Browse the repository at this point in the history
Support our newest LightGlue matcher: https://github.com/cvg/LightGlue

---------

Co-authored-by: Paul-Edouard Sarlin <[email protected]>
  • Loading branch information
Phil26AT and sarlinpe authored Jul 11, 2023
1 parent 61e0cd0 commit 9fdab1e
Show file tree
Hide file tree
Showing 6 changed files with 137 additions and 8,301 deletions.
14 changes: 7 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
# hloc - the hierarchical localization toolbox

This is `hloc`, a modular toolbox for state-of-the-art 6-DoF visual localization. It implements [Hierarchical Localization](https://arxiv.org/abs/1812.03506), leveraging image retrieval and feature matching, and is fast, accurate, and scalable. This codebase won the indoor/outdoor localization challenges at [CVPR 2020](https://sites.google.com/view/vislocslamcvpr2020/home) and [ECCV 2020](https://sites.google.com/view/ltvl2020/), in combination with [SuperGlue](https://psarlin.com/superglue/), our graph neural network for feature matching.
This is `hloc`, a modular toolbox for state-of-the-art 6-DoF visual localization. It implements [Hierarchical Localization](https://arxiv.org/abs/1812.03506), leveraging image retrieval and feature matching, and is fast, accurate, and scalable. This codebase combines and makes easily accessible years of research on image matching and Structure-from-Motion.

With `hloc`, you can:

- Reproduce [our CVPR 2020 winning results](https://www.visuallocalization.net/workshop/cvpr/2020/) on outdoor (Aachen) and indoor (InLoc) datasets
- Reproduce state-of-the-art results on multiple indoor and outdoor visual localization benchmarks
- Run Structure-from-Motion with SuperPoint+SuperGlue to localize with your own datasets
- Evaluate your own local features or image retrieval for visual localization
- Implement new localization pipelines and debug them easily 🔥
Expand Down Expand Up @@ -43,13 +43,13 @@ jupyter notebook --ip 0.0.0.0 --port 8888 --no-browser --allow-root

The toolbox is composed of scripts, which roughly perform the following steps:

1. Extract SuperPoint local features for all database and query images
1. Extract local features, like [SuperPoint](https://arxiv.org/abs/1712.07629) or [DISK](https://arxiv.org/abs/2006.13566), for all database and query images
2. Build a reference 3D SfM model
1. Find covisible database images, with retrieval or a prior SfM model
2. Match these database pairs with SuperGlue
2. Match these database pairs with [SuperGlue](https://psarlin.com/superglue/) or the faster [LightGlue](https://github.com/cvg/LightGlue)
3. Triangulate a new SfM model with COLMAP
3. Find database images relevant to each query, using retrieval
4. Match the query images with SuperGlue
4. Match the query images
5. Run the localization
6. Visualize and debug

Expand Down Expand Up @@ -93,8 +93,8 @@ We show in [`pipeline_SfM.ipynb`](https://nbviewer.jupyter.org/github/cvg/Hierar

## Results

- Supported local feature extractors: [SuperPoint](https://arxiv.org/abs/1712.07629), [D2-Net](https://arxiv.org/abs/1905.03561), [SIFT](https://www.cs.ubc.ca/~lowe/papers/ijcv04.pdf), and [R2D2](https://arxiv.org/abs/1906.06195).
- Supported feature matchers: [SuperGlue](https://arxiv.org/abs/1911.11763) and nearest neighbor search with ratio test, distance test, and/or mutual check.
- Supported local feature extractors: [SuperPoint](https://arxiv.org/abs/1712.07629), [DISK](https://arxiv.org/abs/2006.13566), [D2-Net](https://arxiv.org/abs/1905.03561), [SIFT](https://www.cs.ubc.ca/~lowe/papers/ijcv04.pdf), and [R2D2](https://arxiv.org/abs/1906.06195).
- Supported feature matchers: [SuperGlue](https://arxiv.org/abs/1911.11763), its faster follow-up [LightGlue](https://github.com/cvg/LightGlue), and nearest neighbor search with ratio test, distance test, and/or mutual check. hloc also supports dense matching with [LoFTR](https://github.com/zju3dv/LoFTR).
- Supported image retrieval: [NetVLAD](https://arxiv.org/abs/1511.07247), [AP-GeM/DIR](https://github.com/naver/deep-image-retrieval), [OpenIBL](https://github.com/yxgeee/OpenIBL), and [CosPlace](https://github.com/gmberton/CosPlace).

Using NetVLAD for retrieval, we obtain the following best results:
Expand Down
8,346 changes: 67 additions & 8,279 deletions demo.ipynb

Large diffs are not rendered by default.

14 changes: 14 additions & 0 deletions hloc/match_features.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,20 @@
- model: the model configuration, as passed to a feature matcher.
'''
confs = {
'superpoint+lightglue': {
'output': 'matches-superpoint-lightglue',
'model': {
'name': 'lightglue',
'features': 'disk',
},
},
'disk+lightglue': {
'output': 'matches-disk-lightglue',
'model': {
'name': 'lightglue',
'features': 'disk',
},
},
'superglue': {
'output': 'matches-superglue',
'model': {
Expand Down
25 changes: 25 additions & 0 deletions hloc/matchers/lightglue.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
from ..utils.base_model import BaseModel
from lightglue import LightGlue as LightGlue_

class LightGlue(BaseModel):
default_conf = {
'features': 'superpoint',
'depth_confidence': 0.95,
'width_confidence': 0.99,
}
required_inputs = [
'image0', 'keypoints0', 'descriptors0',
'image1', 'keypoints1', 'descriptors1',
]

def _init(self, conf):
self.net = LightGlue_(conf.pop('features'), **conf)

def _forward(self, data):
data['descriptors0'] = data['descriptors0'].transpose(-1, -2)
data['descriptors1'] = data['descriptors1'].transpose(-1, -2)

return self.net({
'image0': {k[:-1]: v for k, v in data.items() if k[-1] == '0'},
'image1': {k[:-1]: v for k, v in data.items() if k[-1] == '1'}
})
38 changes: 23 additions & 15 deletions hloc/utils/viz_3d.py
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,9 @@ def plot_camera(
color: str = 'rgb(0, 0, 255)',
name: Optional[str] = None,
legendgroup: Optional[str] = None,
size: float = 1.0):
fill: bool = False,
size: float = 1.0,
text: Optional[str] = None):
"""Plot a camera frustum from pose and intrinsic matrix."""
W, H = K[0, 2]*2, K[1, 2]*2
corners = np.array([[0, 0], [W, 0], [W, H], [0, H], [0, 0]])
Expand All @@ -92,32 +94,31 @@ def plot_camera(
scale = 1.0
corners = to_homogeneous(corners) @ np.linalg.inv(K).T
corners = (corners / 2 * scale) @ R.T + t

x, y, z = corners.T
rect = go.Scatter3d(
x=x, y=y, z=z, line=dict(color=color), legendgroup=legendgroup,
name=name, marker=dict(size=0.0001), showlegend=False)
fig.add_trace(rect)
legendgroup = legendgroup if legendgroup is not None else name

x, y, z = np.concatenate(([t], corners)).T
i = [0, 0, 0, 0]
j = [1, 2, 3, 4]
k = [2, 3, 4, 1]

pyramid = go.Mesh3d(
x=x, y=y, z=z, color=color, i=i, j=j, k=k,
legendgroup=legendgroup, name=name, showlegend=False)
fig.add_trace(pyramid)
if fill:
pyramid = go.Mesh3d(
x=x, y=y, z=z, color=color, i=i, j=j, k=k,
legendgroup=legendgroup, name=name, showlegend=False,
hovertemplate=text.replace('\n', '<br>'))
fig.add_trace(pyramid)

triangles = np.vstack((i, j, k)).T
vertices = np.concatenate(([t], corners))
tri_points = np.array([
vertices[i] for i in triangles.reshape(-1)
])

x, y, z = tri_points.T

pyramid = go.Scatter3d(
x=x, y=y, z=z, mode='lines', legendgroup=legendgroup,
name=name, line=dict(color=color, width=1), showlegend=False)
name=name, line=dict(color=color, width=1), showlegend=False,
hovertemplate=text.replace('\n', '<br>'))
fig.add_trace(pyramid)


Expand All @@ -134,6 +135,7 @@ def plot_camera_colmap(
image.projection_center(),
camera.calibration_matrix(),
name=name or str(image.image_id),
text=image.summary(),
**kwargs)


Expand All @@ -156,16 +158,22 @@ def plot_reconstruction(
min_track_length: int = 2,
points: bool = True,
cameras: bool = True,
points_rgb: bool = True,
cs: float = 1.0):
# Filter outliers
bbs = rec.compute_bounding_box(0.001, 0.999)
# Filter points, use original reproj error here
xyzs = [p3D.xyz for _, p3D in rec.points3D.items() if (
p3Ds = [p3D for _, p3D in rec.points3D.items() if (
(p3D.xyz >= bbs[0]).all() and
(p3D.xyz <= bbs[1]).all() and
p3D.error <= max_reproj_error and
p3D.track.length() >= min_track_length)]
xyzs = [p3D.xyz for p3D in p3Ds]
if points_rgb:
pcolor = [p3D.color for p3D in p3Ds]
else:
pcolor = color
if points:
plot_points(fig, np.array(xyzs), color=color, ps=1, name=name)
plot_points(fig, np.array(xyzs), color=pcolor, ps=1, name=name)
if cameras:
plot_cameras(fig, rec, color=color, legendgroup=name, size=cs)
1 change: 1 addition & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -10,3 +10,4 @@ h5py
pycolmap>=0.3.0
kornia>=0.6.7
gdown
git+https://github.com/cvg/LightGlue

0 comments on commit 9fdab1e

Please sign in to comment.