Skip to content
This repository has been archived by the owner on Jul 14, 2021. It is now read-only.

Commit

Permalink
Mykel comments, cleaned up tex files
Browse files Browse the repository at this point in the history
  • Loading branch information
mossr committed May 1, 2021
1 parent 69dc11e commit e2226f3
Show file tree
Hide file tree
Showing 10 changed files with 23 additions and 70 deletions.
36 changes: 0 additions & 36 deletions MSCS-TODO.txt

This file was deleted.

2 changes: 1 addition & 1 deletion algorithms/mcts-pw-algorithm-part2.tex
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@
\State $\bar{a}^* \leftarrow \textproc{UpdateBestAction}(s,Q)$
\State \Return $R(p,e,d,\tau)$
\ElsIf {$d = \floor{d_\text{max}/2}$}
\State $\bar{a} \leftarrow \bar{a}^*$ \algorithmiccomment{feed best action}
\State $\bar{a} \leftarrow \bar{a}^*$ \algorithmiccomment{exploit the best action}
\Else
\State $\bar{a} \leftarrow \textproc{SampleAction}(s,Q)$
\EndIf
Expand Down
2 changes: 1 addition & 1 deletion algorithms/monte-carlo-algorithm.tex
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
\Function{DirectMonteCarlo$(s_0,n,d)$}{}
\For {$1 \to n$}
\State \textproc{Initialize}$(\bar{\mathcal{S}})$
\State \textproc{Rollout}$(s_0, d)$ \algorithmiccomment{without action feeding}
\State \textproc{Rollout}$(s_0, d)$ \algorithmiccomment{without expoiting the best action}
\EndFor
\EndFunction
\end{algorithmic}
Expand Down
5 changes: 0 additions & 5 deletions chapters/abstract.tex
Original file line number Diff line number Diff line change
Expand Up @@ -5,22 +5,17 @@
The original cross-entropy method relies on enough objective function calls to accurately estimate the optimal parameters of the proposal distribution and may get stuck in local minima.
The variants we introduce attempt to address these concerns and the primary idea is to use every sample to build a surrogate model to offload computation from an expensive system under test.
To test our approach, we created a parameterized test objective function with many local minima and a single global minimum, where the test function can be adjusted to control the spread and distinction of the minima.
% Experiments were run to stress the cross-entropy method variants and results indicate that the surrogate model-based approach reduces local minima convergence using the same number of function evaluations.

To find failure events and their likelihoods in computationally expensive open-loop systems, we propose a modification to the black-box stress testing approach called \textit{adaptive stress testing}.
This modification generalizes adaptive stress testing to be broadly applied to episodic systems, where a reward is only received at the end of an episode.
To test this approach, we analyze an aircraft trajectory predictor from a developmental commercial flight management system.
The intention of this work is to find likely failures and report them back to the developers so they can address and potentially resolve shortcomings of the system before deployment.
We use a modified Monte Carlo tree search algorithm with progressive widening as our adversarial reinforcement learner with the goal of finding potential problems otherwise not found by traditional requirements-based avionics testing.
% Results indicate that our adaptive stress testing approach finds more failures with higher likelihoods relative to the baselines.

When validating a system that relies on a static validation dataset, one could exhaustively evalute the entire dataset, yet that process may be computationally intractable especially when validating minor modification to the system under test.
To address this, we reformulate the problem to intelligently select candidate validation data points that we predict to likely cause a failure, using knowledge of the system failures experienced so far.
We propose an adaptive black-box validation framework that will learn system weaknesses over time and exploit this knowledge to propose validation samples that will likely result in a failure.
To further reduce computational load, we use a low-dimensional encoded representation of inputs to train the adversarial failure classifier, which will select candidate failures to evaluate.
% Experiments were run to test our approach against a random candidate selection process and we also compare against full knowledge of the true system failures.
% We stress test a black-box neural network classifier trained on the MNIST dataset,
% and results show that using our framework, the adversarial failure classifier selects failures about $3$ times more often than random.

A motivating principle of this work is a committement to open-source software.
We believe everyone benefits when we treat ideas for black-box validation as collaborative and non-competitive; thus resulting in a net safety increase for all.
Expand Down
12 changes: 4 additions & 8 deletions chapters/cem_variants.tex
Original file line number Diff line number Diff line change
Expand Up @@ -162,12 +162,9 @@ \subsection{Surrogate Models}
This means a GP can be sampled to generate random functions, which can then be fit to our given data $\mat{X}$.
A Gaussian process is parameterized by a mean function $\m(\mat{X})$ and kernel function $\mat{K}(\mat{X},\mat{X})$, which captures the relationship between data points as covariance values.
We denote a Gaussian process that produces estimates $\hat{\vec{y}}$ as:
\begin{align*}
\hat{\vec{y}} &\sim\mathcal{N}\left(\vec{m}(\mat{X}),\vec{K}(\mat{X},\mat{X})\right)\\
&= \begin{bmatrix} % Changed `m` to `n`
\hat{S}(\vec{x}_1), \ldots, \hat{S}(\vec{x}_n)
\end{bmatrix}
\end{align*}
\begin{equation*}
\hat{\vec{y}} \sim\mathcal{N}\left(\vec{m}(\mat{X}),\vec{K}(\mat{X},\mat{X})\right)
\end{equation*}
where
\begin{gather*}
\vec{m}(\mat{X}) = \begin{bmatrix} m(\vec{x}_1), \ldots, m(\vec{x}_n) \end{bmatrix}\\
Expand All @@ -179,7 +176,6 @@ \subsection{Surrogate Models}
\end{gather*}
Note that we use the zero-mean function $m(\vec{x}_i) = \vec{0}$, which is generally conventional.
For the kernel function $k(\vec{x}_i, \vec{x}_i)$, we use the squared exponential kernel with variance $\sigma^2$ and characteristic scale-length $\ell$, where larger $\ell$ values increase the correlation between successive data points, thus smoothing out the generated functions. The squared exponential kernel is defined as:
% Isotropic Squared Exponential kernel (covariance): \exp(-\frac{r^2}{2\ell^2})
\begin{align*}
k(\vec{x},\vec{x}^\prime) = \sigma^2\exp\left(- \frac{(\vec{x} - \vec{x}^\prime)^\top(\vec{x} - \vec{x}^\prime)}{2\ell^2}\right)
\end{align*}
Expand Down Expand Up @@ -529,7 +525,7 @@ \subsection{Results and Analysis} \label{sec:cem_results}
\small
\centering
\caption{\label{tab:cem_results} Experimental Results.}
\begin{tabular}{cllll} % p{3cm}
\begin{tabular}{cllll}
\toprule
\textbf{Exper.} & \textbf{Algorithm} & \textbf{Runtime} & $\bar{b}_v$ & $\bar{b}_d$\\
\midrule
Expand Down
8 changes: 4 additions & 4 deletions chapters/episodic_ast.tex
Original file line number Diff line number Diff line change
Expand Up @@ -348,7 +348,7 @@ \section{Experiments}
\label{sec:ast_experiments}
Experiments were run to test the AST approach using MCTS-PW against direct Monte Carlo (MC) simulations as a na\"ive baseline and the cross-entropy method as an importance sampling baseline.
We also perform Monte Carlo sampling over the routes in the navigational database as another baseline.
Algorithm \ref{alg:mc} describes the direct Monte Carlo simulation approach for $n$ episodes, starting at an initial state $s_0$, with a rollout depth $d$. Note the rollout function does not use the action feeding procedure described in \cref{sec:ast_mcts}.
Algorithm \ref{alg:mc} describes the direct Monte Carlo simulation approach for $n$ episodes, starting at an initial state $s_0$, with a rollout depth $d$. Note this rollout function does not exploit the best action as described in \cref{sec:ast_mcts}.

\input{algorithms/monte-carlo-algorithm.tex}

Expand Down Expand Up @@ -469,8 +469,8 @@ \subsection{Results and Analysis}\label{sec:ast_results}
This is because CEM is using importance sampling and after re-weighting the samples using the true distribution, we would expect to get these extremely small likelihood values.
AST has the lowest mean miss distance $\bar{X}_d$, noting the large standard deviation which is a result of large differences between miss distances from failure and non-failure events.
Each approach finds their first failure early in the experiment, with AST finding failures the earliest.
The effect of feeding the best action midway through the rollout accelerates finding these failures.
Once found, AST will exploit the failures to maximize their likelihood.
The effect of exploiting the best action midway through the rollout accelerates finding these failures.
Once found, AST will optimize the failures to maximize their likelihood.
We see that AST finds failures in about $88\%$ of episodes (i.e., system executions), where as standard MC and CEM find failures in about $0.1\%$ and $0.48\%$ of episodes, respectively.


Expand Down Expand Up @@ -519,7 +519,7 @@ \section{Discussion}
\label{sec:ast_discussion}
Adaptive stress testing was extended for sequential systems with episodic reward to find likely failures in FMS trajectory predictors.
To improve search performance, we used Monte Carlo tree search with progressive widening and modified the rollout with end-of-depth evaluations.
We feed the best action midway through the rollout to encourage exploration of promising actions, resulting in exploiting failures to maximize their likelihood.
We replace the current with the best action midway through the rollout to encourage further exploration of promising actions, resulting in exploiting failures to maximize their likelihood.
A simulation environment was constructed to evaluate the trajectory predictor, and a navigational database was sampled to compare to existing methods of finding failures during development.
Performance of AST using MCTS-PW was compared against direct Monte Carlo simulations and the cross-entropy method.
Results suggest that the AST approach finds more failures with both higher severity and higher relative likelihood.
Expand Down
1 change: 0 additions & 1 deletion diagrams/arc-length.tex
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,6 @@

\draw[->, rotate around={\ang:(end)}] (end) -- node (nextstraight) [very near end, label = {above:$\ell_2$},] {} (end |-, 0);

% xshift=-4pt,yshift=0pt
\draw [,thick,decorate,decoration={brace,amplitude=30pt}] (start) -- (end) node [black,midway,above=33pt,pos=0.65] {$\alpha r$};
\draw [red,,thick,decorate,decoration={brace,amplitude=20pt}] (failend) -- (failstart) node [red,midway,below=27pt,pos=0.63] {$\beta$};

Expand Down
4 changes: 0 additions & 4 deletions kill-latexmk.bat

This file was deleted.

1 change: 0 additions & 1 deletion main.tex
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,6 @@

\input{preamble/preamble}
\addbibresource{references/references.bib}
% TODO. More authors not et al.?

\begin{document}
\title{Algorithms for efficient validation of black-box systems}
Expand Down
22 changes: 13 additions & 9 deletions suthesis-ms-2e.sty
Original file line number Diff line number Diff line change
Expand Up @@ -110,6 +110,9 @@
% changed "dissertation" to "thesis"
% removed "committee of graduate studies" mentions

% Modified Apr 2021 by Robert Moss
% removed signature lines and bolded advisor names

%%%%%
%%%%% PRELIMS
%%%%%
Expand Down Expand Up @@ -453,7 +456,7 @@ smaller print.


\newlength{\signaturespace}
\setlength{\signaturespace}{.5in}
\setlength{\signaturespace}{.1in}


\long\def\signature#1{%
Expand All @@ -466,8 +469,9 @@ of Master of Science.
\par
\vspace{\signaturespace}
%\hbox to 4in{\hfil\shortstack{\vrule width 3in height 0.4pt\\ #1}}
\hbox to 5in{\hfil\begin{tabular}{@{}l@{}}\vrule width 3in height
0.4pt depth 0pt\\ #1\end{tabular}}
%\hbox to 5in{\hfil\begin{tabular}{@{}l@{}}\vrule width 3in height
% 0.4pt depth 0pt\\ #1\end{tabular}}
#1
\end{minipage}
\end{flushright}}

Expand All @@ -490,25 +494,25 @@ of Master of Science.
\def\thepage{}
\thispagestyle{myheadings}
\markboth{\rm \@author}{\rm \@author}\fi
\signature{(\@principaladviser)\quad Principal \advis@r}
\signature{\hfill\textbf{\@principaladviser, Principal \advis@r}}
\vfill
% if second principal advisor
\if*\@coprincipaladviser \else
\signature{(\@coprincipaladviser)\quad Principal \advis@r}
\signature{\hfill\textbf{\@coprincipaladviser, Principal \advis@r}}
\vfill\fi
\if*\@firstreader \else
\signature{(\@firstreader)}
\signature{\hfill\textbf{\@firstreader}}
\vfill\fi
\if*\@secondreader \else
\signature{(\@secondreader)}
\signature{\hfill\textbf{\@secondreader}}
\vfill\fi
% if thirdreader then do \signature\@thirdreader \vfill
\if*\@thirdreader \else
\signature{(\@thirdreader)}
\signature{\hfill\textbf{\@thirdreader}}
\vfill\fi
% if fourthreader then do \signature\@fourthreader \vfill
\if*\@fourthreader \else
\signature{(\@fourthreader)}
\signature{\hfill\textbf{\@fourthreader}}
\vfill\fi
}

Expand Down

0 comments on commit e2226f3

Please sign in to comment.