-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathintro.tex
215 lines (182 loc) · 7.87 KB
/
intro.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
%!TEX root = main.tex
\chapter{Introduction}\label{chap:intro}
\section{Background}\label{sect:background}
Depth estimation from stereo image pairs is about inferring the depth
of each pixel, essentially adding a third dimension to a 2-dimensional
image. Many applications can benefit from this information; one of the
earliest uses was in the field of photogrammetry for automatically
constructing topographic elevation maps from aerial images. In
robotics, depth information is vital for navigation and manipulation.
3D reconstruction from images can enable view interpolation and
image-based rendering, allowing a user to freely chose a synthetic
view-point other than the views straight out of the cameras.
Figure \ref{fig:applications-of-depth-information}, borrowed from
Richard Szeliskis excellent book \textit{Computer Vision: Algorithms
and Applications}\cite{computer-vision-book} shows some of the more
recent uses researchers have been able to create with depth
information.
\begin{figure}
\label{fig:applications-of-depth-information}
\centering
\begin{subfigure}[b]{0.3\textwidth}
\centering
\includegraphics[width=\textwidth]{images/input.png}
\caption{}
\end{subfigure}
~
\begin{subfigure}[b]{0.3\textwidth}
\centering
\includegraphics[width=\textwidth]{images/computed-depth.png}
\caption{}
\end{subfigure}
~
\begin{subfigure}[b]{0.3\textwidth}
\centering
\includegraphics[width=\textwidth]{images/synthesized-view.png}
\caption{}
\end{subfigure}
% newline
\begin{subfigure}[b]{0.24\textwidth}
\centering
\includegraphics[width=\textwidth]{images/three-view.png}
\caption{}
\end{subfigure}
~
\begin{subfigure}[b]{0.2\textwidth}
\centering
\includegraphics[width=\textwidth]{images/input-face.png}
\caption{}
\end{subfigure}
~
\begin{subfigure}[b]{0.2\textwidth}
\centering
\includegraphics[width=\textwidth]{images/synthesized-face.png}
\caption{}
\end{subfigure}
~
\begin{subfigure}[b]{0.2\textwidth}
\centering
\includegraphics[width=\textwidth]{images/z-key.png}
\caption{}
\end{subfigure}
% newline
\begin{subfigure}[b]{0.3\textwidth}
\centering
\includegraphics[width=\textwidth]{images/3d-input-left.png}
\caption{}
\end{subfigure}
~
\begin{subfigure}[b]{0.3\textwidth}
\centering
\includegraphics[width=\textwidth]{images/3d-input-right.png}
\caption{}
\end{subfigure}
~
\begin{subfigure}[b]{0.3\textwidth}
\centering
\includegraphics[width=\textwidth]{images/3d-reconstruction.png}
\caption{}
\end{subfigure}
\caption{Applications of stereo vision: (a) input image, (b)
computed depth map, and (c) new view generagtion from multi-view
stereo \cite{matthies-kanade-szeliski}; (d) view morphing between
two images \cite{seitz-dyer}; (e-f) 3D face modeling (images
courtesy of Frederic Devernay); (g) z-keying live and
computer-generated imagery \cite{kanade-yoshida-oda}; (h-j)
building 3D surface models from multiple video streams in
Virtualized Reality \cite{kanade-rander-narayanan}}
\end{figure}
\section{Problem Statement}\label{sect:prob-statement}
There are plenty of algorithms for disparity calculations. They all
have their characteristics; fast but coarse quality, good quality but
slow, good at object boundary but bad at continuous textureless areas.
These have all been researched heavily for decades, but has until
recently been very limited by the processing power of the days
computers. With the advent of cheap parallel computing devices in the
form of cheap consumer GPUs and cheap multi-cored CPUs, and frameworks
such as OpenCL and CUDA maturing, super-computer-like number crunching
has never been so fast and easy to do on home computers.
The goal of this thesis is to provide an implementation of the
algorithm most likely able to achieve real time estimation with High
Definition input.
\section{Main Contributions}\label{sect:contributions}
The main contribution of this thesis is a working OpenCL
implementation for GPUs able to run the depth estimation pipeline in
real time on high definition images. The evolution from a simple CPU
implementation to the final real time GPU implementation is documented
and analyzed, including many of the GPU architecture optimizations
approaches.
The secondary contribution is the \textit{compute mask} technique.
Sequential video frames contain a lot of redundant information. This
is especially true for statically mounted cameras filming some scene,
i.eg a theater stage. In such a scenario, the only moving objects in
the frames will often be just an actor or two, making the calculation
of the background redundant since it will result in the same value as
in the previous frame. The compute mask is a binary map indicating
whether a pixel needs to be calculated, or if the disparity value can
be copied from the disparity map of the previous frame.
\section{Limitations}\label{sect:limitations}
\subsubsection{Discretized disparity space}
The implementation produces 8-bit single channel gray-scale disparity
maps of the same dimensions as the input, which limits the number of
disparity levels to 256. This level of quantization may be fine for
some applications, like robotic navigation, but can lead to
unappealing results for other applications, like image-based view
synthesis.
\subsubsection{Camera calibration and Stereo rectification}
Input images must be stereo rectified. For a block matching algorithm
to be as effective as possible, it has to be able to assume that
corresponding points lie along the horizontal scan-line
(x-axis). Stereo rectification and camera calibration is such a
well-understood topic in computer vision, that most stereo
correspondence algorithms assumes its cameras are calibrated and the
inputs are rectified.
\begin{figure}
\begin{subfigure}[b]{0.48\textwidth}
\centering
\includegraphics[width=\textwidth]{images/rectification-example.png}
\caption{}
\end{subfigure}
~
\begin{subfigure}[b]{0.48\textwidth}
\centering
\includegraphics[width=\textwidth]{images/rectification-example-2.png}
\caption{}
\end{subfigure}
\begin{subfigure}[b]{0.48\textwidth}
\centering
\includegraphics[width=\textwidth]{images/rectification-example-3.png}
\caption{}
\end{subfigure}
~
\begin{subfigure}[b]{0.48\textwidth}
\centering
\includegraphics[width=\textwidth]{images/rectification-example-4.png}
\caption{}
\end{subfigure}
\caption{An example showing various stages of Loop and
Zhangs\cite{loop-zhang} proposed rectification algorithm. (a)
Original image pair overlaid with several epipolar lines; (b)
Image pair transformed so that epipolar lines are parallel to each
other in each image; (c) Rectified image pair; (d) Final shearing
transformation}
\end{figure}
To convert the disparity values to depth values, the intrinsic and
extrinsic matrices from the cameras that took the input images must be
provided. More specifically, the baseline (distance between the two
cameras) and the lenses focal length is needed in a formula, as will
be explained in chapter \ref{chap:depthestimation_theory}.
\section{Outline}
Chapter \ref{chap:depthestimation_theory} presents the core ideas of
stereo depth estimation in computer vision, and related work done in
the field. The OpenCL framework is presented in chapter
\ref{chap:arch}, along with some optimization strategies for GPU
devices. Chapter \ref{chap:impl} presents the code that has been
implemented. It starts off with the most basic depth estimator CPU
implementation, followed by an as direct as possible port to OpenCL.
From there, various optimizations are applied, followed finally by the
refinement kernels.
Lastly, chapter \ref{chap:eval} evaluates the results. It shows the
improvements most optimizations are able to bring, as well as the
quality of the depth maps using the different methods presented in
\ref{chap:depthestimation_theory}.