diff --git a/README.md b/README.md index 5854535..88b55c0 100644 --- a/README.md +++ b/README.md @@ -1,4 +1,4 @@ -# AI Verification: Constrained Deep Learning [![Open in MATLAB Online](https://www.mathworks.com/images/responsive/global/open-in-matlab-online.svg)](https://matlab.mathworks.com/open/github/v1?repo=matlab-deep-learning/constrained-deep-learning) +# AI Verification: Constrained Deep Learning Constrained deep learning is an advanced approach to training deep neural networks by incorporating domain-specific constraints into the learning process. By integrating these constraints into the construction and training of neural networks, you can guarantee desirable behaviour in safety-critical scenarios where such guarantees are paramount. @@ -6,7 +6,8 @@ This project aims to develop and evaluate deep learning models that adhere to pr

- +

@@ -32,12 +33,12 @@ The repository contains several introductory, interactive examples as well as lo ### Introductory Examples (Short) Below are links for markdown versions of MATLAB Live Scripts that you can view in GitHub®. -- [Fully Input Convex Neural Networks in 1-Dimension](examples/convex/introductory/PoC_Ex1_1DFICNN.md) -- [Fully Input Convex Neural Networks in n-Dimensions](examples/convex/introductory/PoC_Ex2_nDFICNN.md) -- [Partially Input Convex Neural Networks in n-Dimensions](examples/convex/introductory/PoC_Ex3_nDPICNN.md) -- [Fully Input Monotonic Neural Networks in 1-Dimension](examples/monotonic/introductory/PoC_Ex1_1DFMNN.md) -- [Fully Input Monotonic Neural Networks in n-Dimensions](examples/monotonic/introductory/PoC_Ex2_nDFMNN.md) -- [Lipschitz Continuous Neural Networks in 1-Dimension](examples/lipschitz/introductory/PoC_Ex1_1DLNN.md) +- [Fully input convex neural networks in 1-dimension](examples/convex/introductory/PoC_Ex1_1DFICNN.md) +- [Fully input convex neural networks in n-dimensions](examples/convex/introductory/PoC_Ex2_nDFICNN.md) +- [Partially input convex neural networks in n-dimensions](examples/convex/introductory/PoC_Ex3_nDPICNN.md) +- [Fully input monotonic neural networks in 1-dimension](examples/monotonic/introductory/PoC_Ex1_1DFMNN.md) +- [Fully input monotonic neural networks in n-dimensions](examples/monotonic/introductory/PoC_Ex2_nDFMNN.md) +- [Lipschitz continuous neural networks in 1-dimensions](examples/lipschitz/introductory/PoC_Ex1_1DLNN.md) These examples make use of [custom training loops](https://uk.mathworks.com/help/deeplearning/deep-learning-custom-training-loops.html) and the [`arrayDatastore`](https://uk.mathworks.com/help/matlab/ref/matlab.io.datastore.arraydatastore.html) object. To learn more, click the links. @@ -70,13 +71,7 @@ As discussed in [1] (see 3.4.1.5), in certain situations, small violations in th ## Technical Articles -This repository focuses on the development and evaluation of deep learning models that adhere to constraints crucial for safety-critical applications, such as predictive maintenance for industrial machinery and equipment. Specifically, it focuses on enforcing monotonicity, convexity, and Lipschitz continuity within neural networks to ensure predictable and controlled behavior. - -By emphasizing constraints like monotonicity, constrained neural networks ensure that predictions of the Remaining Useful Life (RUL) of components behave intuitively: as a machine's condition deteriorates, the estimated RUL should monotonically decrease. This is crucial in applications like aerospace or manufacturing, where an accurate and reliable estimation of RUL can prevent failures and save costs. - -Alongside monotonicity, Lipschitz continuity is also enforced to guarantee model robustness and controlled behavior. This is essential in environments where safety and precision are paramount such as control systems in autonomous vehicles or precision equipment in healthcare. - -Convexity is especially beneficial for control systems as it inherently provides boundedness properties. For instance, by ensuring that the output of a neural network lies within a convex hull, it is possible to guarantee that the control commands remain within a safe and predefined operational space, preventing erratic or unsafe system behaviors. This boundedness property, derived from the convex nature of the model's output space, is critical for maintaining the integrity and safety of control systems under various conditions. +This repository focuses on the development and evaluation of deep learning models that adhere to constraints crucial for safety-critical applications, such as predictive maintenance for industrial machinery and equipment. Specifically, it focuses on enforcing monotonicity, convexity, and Lipschitz continuity within neural networks to ensure predictable and controlled behavior. By emphasizing constraints like monotonicity, constrained neural networks ensure that predictions of the Remaining Useful Life (RUL) of components behave intuitively: as a machine's condition deteriorates, the estimated RUL should monotonically decrease. This is crucial in applications like aerospace or manufacturing, where an accurate and reliable estimation of RUL can prevent failures and save costs. Alongside monotonicity, Lipschitz continuity is also enforced to guarantee model robustness and controlled behavior. This is essential in environments where safety and precision are paramount such as control systems in autonomous vehicles or precision equipment in healthcare. Convexity is especially beneficial for control systems as it inherently provides boundedness properties. For instance, by ensuring that the output of a neural network lies within a convex hull, it is possible to guarantee that the control commands remain within a safe and predefined operational space, preventing erratic or unsafe system behaviors. This boundedness property, derived from the convex nature of the model's output space, is critical for maintaining the integrity and safety of control systems under various conditions. These technical articles explain key concepts of AI verification in the context of constrained deep learning. They include discussions on how to achieve the specified constraints in neural networks at construction and training time, as well as deriving and proving useful properties of constrained networks in AI verification applications. It is not necessary to go through these articles in order to explore this repository, however, you can find references and more in depth discussion here. @@ -90,4 +85,4 @@ These technical articles explain key concepts of AI verification in the context - [3] Gouk, Henry, et al. “Regularisation of Neural Networks by Enforcing Lipschitz Continuity.” Machine Learning, vol. 110, no. 2, Feb. 2021, pp. 393–416. DOI.org (Crossref), https://doi.org/10.1007/s10994-020-05929-w - [4] Kitouni, Ouail, et al. Expressive Monotonic Neural Networks. arXiv:2307.07512, arXiv, 14 July 2023. arXiv.org, http://arxiv.org/abs/2307.07512. -Copyright 2024, The MathWorks, Inc. +Copyright (c) 2024, The MathWorks, Inc. diff --git a/conslearn/+conslearn/+convex/buildFICNN.m b/conslearn/+conslearn/+convex/buildFICNN.m index b141ce4..3a20e70 100644 --- a/conslearn/+conslearn/+convex/buildFICNN.m +++ b/conslearn/+conslearn/+convex/buildFICNN.m @@ -13,13 +13,13 @@ % % BUILDFICNN name-value arguments: % -% 'PositiveNonDecreasingActivation' - Specify the positive, convex, +% 'ConvexNonDecreasingActivation' - Specify the convex, % non-decreasing activation functions. % The options are 'softplus' or 'relu'. % The default is 'softplus'. % % The construction of this network corresponds to Eq 2 in [1] with the -% exception that the application of the positive, non-decreasing activation +% exception that the application of the convex, non-decreasing activation % function on the network output is not applied. This maintains convexity % but permits positive and negative network outputs. % @@ -31,7 +31,7 @@ arguments inputSize (1,:) numHiddenUnits (1,:) - options.PositiveNonDecreasingActivation = 'softplus' + options.ConvexNonDecreasingActivation = 'softplus' end % Construct the correct input layer @@ -43,7 +43,7 @@ end % Loop over construction of hidden units -switch options.PositiveNonDecreasingActivation +switch options.ConvexNonDecreasingActivation case 'relu' pndFcn = @(k)reluLayer(Name="pnd_" + k); case 'softplus' @@ -68,10 +68,10 @@ % Add a cascading residual connection for ii = 2:depth - tempLayers = fullyConnectedLayer(numHiddenUnits(ii),Name="fc_y_+_" + ii); + tempLayers = fullyConnectedLayer(numHiddenUnits(ii),Name="fc_y_" + ii); lgraph = addLayers(lgraph,tempLayers); - lgraph = connectLayers(lgraph,"input","fc_y_+_" + ii); - lgraph = connectLayers(lgraph,"fc_y_+_" + ii,"add_" + ii + "/in2"); + lgraph = connectLayers(lgraph,"input","fc_y_" + ii); + lgraph = connectLayers(lgraph,"fc_y_" + ii,"add_" + ii + "/in2"); end % Initialize dlnetwork diff --git a/conslearn/+conslearn/+convex/buildPICNN.m b/conslearn/+conslearn/+convex/buildPICNN.m index 37384d5..ed9032a 100644 --- a/conslearn/+conslearn/+convex/buildPICNN.m +++ b/conslearn/+conslearn/+convex/buildPICNN.m @@ -13,7 +13,7 @@ % % BUILDPICNN name-value arguments: % -% 'PositiveNonDecreasingActivation' - Specify the positive, convex, +% 'ConvexNonDecreasingActivation' - Specify the convex, % non-decreasing activation functions. % The options are 'softplus' or 'relu'. % The default is 'softplus'. @@ -32,7 +32,7 @@ % default value is 1. % % The construction of this network corresponds to Eq 3 in [1] with the -% exception that the application of the positive, non-decreasing activation +% exception that the application of the convex, non-decreasing activation % function on the network output is not applied. This maintains convexity % but permits positive and negative network outputs. Additionally, and in % keeping with the notation used in the reference, in this implementation @@ -50,7 +50,7 @@ arguments inputSize (1,:) {iValidateInputSize(inputSize)} numHiddenUnits (1,:) - options.PositiveNonDecreasingActivation = 'softplus' + options.ConvexNonDecreasingActivation = 'softplus' options.Activation = 'tanh' options.ConvexChannelIdx = 1 end @@ -63,7 +63,7 @@ convexInputSize = numel(convexChannels); % Prepare the two types of valid activation functions -switch options.PositiveNonDecreasingActivation +switch options.ConvexNonDecreasingActivation case 'relu' pndFcn = @(k)reluLayer(Name="pnd_" + k); case 'softplus' diff --git a/conslearn/buildConstrainedNetwork.m b/conslearn/buildConstrainedNetwork.m index a85e264..1a3d852 100644 --- a/conslearn/buildConstrainedNetwork.m +++ b/conslearn/buildConstrainedNetwork.m @@ -19,7 +19,7 @@ % % These options and default values apply to convex constrained networks: % -% PositiveNonDecreasingActivation - Positive, convex, non-decreasing +% ConvexNonDecreasingActivation - Convex, non-decreasing % ("fully-convex") activation functions. % ("partially-convex") The options are "softplus" or "relu". % The default is "softplus". @@ -96,10 +96,10 @@ iValidateInputSize(inputSize)} numHiddenUnits (1,:) {mustBeInteger,mustBeReal,mustBePositive} % Convex - options.PositiveNonDecreasingActivation {... + options.ConvexNonDecreasingActivation {... mustBeTextScalar, ... - mustBeMember(options.PositiveNonDecreasingActivation,["relu","softplus"]),... - iValidateConstraintWithPositiveNonDecreasingActivation(options.PositiveNonDecreasingActivation, constraint)} + mustBeMember(options.ConvexNonDecreasingActivation,["relu","softplus"]),... + iValidateConstraintWithConvexNonDecreasingActivation(options.ConvexNonDecreasingActivation, constraint)} options.ConvexChannelIdx (1,:) {... iValidateConstraintWithConvexChannelIdx(options.ConvexChannelIdx, inputSize, constraint), ... mustBeNumeric,mustBePositive,mustBeInteger} @@ -131,15 +131,15 @@ switch constraint case "fully-convex" % Set defaults - if ~any(fields(options) == "PositiveNonDecreasingActivation") - options.PositiveNonDecreasingActivation = "softplus"; + if ~any(fields(options) == "ConvexNonDecreasingActivation") + options.ConvexNonDecreasingActivation = "softplus"; end net = conslearn.convex.buildFICNN(inputSize, numHiddenUnits, ... - PositiveNonDecreasingActivation=options.PositiveNonDecreasingActivation); + ConvexNonDecreasingActivation=options.ConvexNonDecreasingActivation); case "partially-convex" % Set defaults - if ~any(fields(options) == "PositiveNonDecreasingActivation") - options.PositiveNonDecreasingActivation = "softplus"; + if ~any(fields(options) == "ConvexNonDecreasingActivation") + options.ConvexNonDecreasingActivation = "softplus"; end if ~any(fields(options) == "Activation") options.Activation = "tanh"; @@ -148,7 +148,7 @@ options.ConvexChannelIdx = 1; end net = conslearn.convex.buildPICNN(inputSize, numHiddenUnits,... - PositiveNonDecreasingActivation=options.PositiveNonDecreasingActivation,... + ConvexNonDecreasingActivation=options.ConvexNonDecreasingActivation,... Activation=options.Activation,... ConvexChannelIdx=options.ConvexChannelIdx); case "fully-monotonic" @@ -259,9 +259,9 @@ function iValidateConstraintWithMonotonicTrend(param, constraint) end end -function iValidateConstraintWithPositiveNonDecreasingActivation(param, constraint) +function iValidateConstraintWithConvexNonDecreasingActivation(param, constraint) if ( ~isequal(constraint, "fully-convex") && ~isequal(constraint,"partially-convex") ) && ~isempty(param) - error("'PositiveNonDecreasingActivation' is not an option for constraint " + constraint); + error("'ConvexNonDecreasingActivation' is not an option for constraint " + constraint); end end diff --git a/conslearn/trainConstrainedNetwork.m b/conslearn/trainConstrainedNetwork.m index e627007..15c7e92 100644 --- a/conslearn/trainConstrainedNetwork.m +++ b/conslearn/trainConstrainedNetwork.m @@ -167,6 +167,16 @@ end end end + +% Update the training monitor status +if trainingOptions.TrainingMonitor + if monitor.Stop == 1 + monitor.Status = "Training stopped"; + else + monitor.Status = "Training complete"; + end +end + end %% Helpers diff --git a/documentation/AI-Verification-Convexity.md b/documentation/AI-Verification-Convexity.md index bd58099..db8bf4f 100644 --- a/documentation/AI-Verification-Convexity.md +++ b/documentation/AI-Verification-Convexity.md @@ -36,7 +36,7 @@ and remain within the set. A function $f:\mathbb{R}^n\rightarrow\mathbb{R}$ is convex on $S\subset \mathbb{R}^n$ provided $S$ is a convex set, and for any $\lambda\in[0, 1]$, the following holds: -$$ f((1−\lambda)x+\lambda y) \leq (1−\lambda)f(x)+ \lambda f(y) $$ +$f((1−\lambda)x+\lambda y) \leq (1−\lambda)f(x)+ \lambda f(y)$ This means that the line segment connecting any two points on the graph of the function lies above or on the graph. @@ -90,9 +90,9 @@ This means that if you take any two inputs to the network and any convex combination of them, then the resulting outputs will respect the convexity inequality. -The recurrence equation defined in Eq. 2 in [1] gives a fully input convex neural network 'k-layer' architecture and is transcribed here for brevity: +The recurrence equation defined in Eq. 2 in [1] gives a fully input convex neural network '$k$-layer' architecture and is transcribed here for brevity: -$$ z_{i+1} = g_i (W_i^{(z)}z_i + W_i^{(y)} + b_i) $$ +$ z_{i+1} = g_i (W_i^{(z)}z_i + W_i^{(y)} + b_i) $ Here, the network input is denoted $y$, $z_0,W_0^{(z)}=0$, and $g_i$ is an activation function. You can view a ‘2-layer’ FICNN architecture in Figure 3. @@ -105,14 +105,14 @@ architecture in Figure 3. To guarantee convexity of the network, FICNNs require activation -functions $g_i$ that are positive and non-decreasing. For example, see the -positive, non-decreasing relu layer “pnd\_1” in Fig 3. Another common +functions $g_i$ that are convex and non-decreasing. For example, see the +convex, non-decreasing relu layer “pnd\_1” in Fig 3. Another common choice of activation function is the softplus function. Additionally, -the weights in certain parts of the network, particularly those -associated with the input or the input's interaction with latent layers, -are constrained to be non-negative to maintain the convexity property. -In the figure above, the weight matrices for the fully connected layers -“fc\_z\_+\_2” and “fc\_y\_+\_2” are constrained to be positive (as +the weights of all fully-connected layers, except those directly connected +to the input, must be constrained to be non-negative to preserve the +convexity property. +In the figure above, the weight matrix for the fully connected layer +“fc\_z\_+\_2” is constrained to be positive (as indicated by the “\_+\_” in the layer name). Note that in this implementation, the final activation function, $g_k$, is not applied. This still guarantees convexity but removes the restriction that outputs of the network must be non-negative. **Partially Input Convex Neural Network (PICNN)** @@ -141,13 +141,13 @@ Here, $\tilde{g}_i$ is any activation function, $u_0=x$ where $x$ are the set of To guarantee convexity of the network, PICNNs require activation -functions in the $z$ ‘output’ evolution to be positive and +functions in the $z$ ‘output’ evolution to be convex and non-decreasing (see layer “pnd\_0” in Fig 4), but allows freedom for activation functions evolving the state, such as $tanh$  activation layers (see layer “nca\_0” in Fig 4). As with FICNNs, the weights in certain parts of the network are constrained to be non-negative to maintain the partial convexity property. In the figure above, the weight matrices for -the fully connected layer “fc\_z\_+\_1” are constrained to be positive +the fully connected layer “fc\_z\_+\_1” is constrained to be positive (as indicated by the “\_+\_” in the layer name). All other fully connected weight matrices in Fig 4 are unconstrained, giving freedom to fit any purely feedforward network – see proposition 2 [1]. Note again that in our implementation, the final activation function, $g_k$, is not applied. This still guarantees partial convexity but removes the restriction that outputs of the network must be non-negative. @@ -176,11 +176,11 @@ discussed above. **One-Dimensional ICNN** Recall that a function $f:\mathbb{R}\rightarrow\mathbb{R}$ is convex on $S\subset \mathbb{R}$ provided $S$ is a convex set and if for all $x,y\in S$ and for any $\lambda\in[0, 1]$, the following inequality holds, -$f((1−\lambda)x+\lambda y) \leq (1−\lambda)f(x)+ \lambda f(y)$. Intervals are convex sets in $\mathbb{R}$ and it immediately follows from the definition of convexity that for $S = [a,b]$, the upper bound on the interval is, +$f((1−\lambda)x+\lambda y) \leq (1−\lambda)f(x)+ \lambda f(y)$. Intervals are convex sets in $\mathbb{R}$ and it immediately follows from the definition of convexity that for $S = [a,b]$, the upper bound on the interval is, -$$ f(x) \leq max(f(a),f(b)) $$ +$ f(x) \leq max(f(a),f(b))$ -To find the minimum of $f$ on the interval, you could use an optimization routine, such as projected gradient descent, interior-point +To find the minimum of $f$ on the interval, you could use a optimization routine, such as projected gradient descent, interior-point methods, or barrier methods. However, you can use the properties of convex functions to accelerate the search in certain scenarios. @@ -190,15 +190,15 @@ If $f(a) \gt f(b)$, then either the minimum is at $x=b$ or the minimum lies strictly in the interior of the interval, $x \in (a,b)$. To assess whether the minimum is at $x=b$, look at the derivative, $\nabla f(x)$, at the interval bounds. If $f$ is not differentiable at the interval bounds, for example the network has relu activation -functions that define a set of non-differentiable points in $\mathbb{R}$, evaluate +functions that defines a set of non-differentiable points in $\mathbb{R}$, evaluate both the left and right derivate of $f$ at the interval bounds instead. Then examine the sign of the directional derivatives at the interval bounds, -directed to the interior of the interval: $sgn( \nabla f(a), -\nabla f(b) ) = (\pm , \pm)$. Note that the sign of 0 is taken as positive in this discussion. +directed to the interior of the interval: $sgn(\nabla f(a), -\nabla f(b)) = (\pm,\pm)$. Note that the sign of 0 is taken as positive in this discussion. If $f$ is differentiable at the interval bounds, then there are two possible sign -combinations since $\nabla f(a) \leq m \lt 0$ where $m$ is the gradient of the chord. +combinations since $ \nabla f(a) \leq m \lt 0 $ where $m$ is the gradient of the chord. -- $sgn(\nabla f(a), -\nabla f(b)) = (−,+)$, then the minimum must lie at $x = b$, i.e., $f(x) \geq f(b)$. +- $sgn(\nabla f(a), -\nabla f(b)) = (−,+)$, then the minimum must lie at $x = b$, i.e., $f(x) \geq = f(b)$. - $sgn(\nabla f(a), -\nabla f(b)) = (-,−)$, then the minimum must lie in the interior of the interval, $x \in (a,b)$. If $f$ is not differentiable at the interval bounds, then there are still two @@ -240,7 +240,7 @@ possible sign combinations since, at $x=b$, convexity means that $-\nabla f(b+\e In the case that $f(a) = f(b)$, the function must either be constant and the minimum is $f(a) = f(b)$. Or the minimum again -lies in the interior. If $sgn(\nabla f(a)) = +$, then $\nabla f(a) = 0$ else this violates convexity since $f(a) = f(b)$. Similar is true for +lies at the interior. If $sgn(\nabla f(a)) = +$, then $\nabla f(a) = 0$ else this violates convexity since $f(a) = f(b)$. Similar is true for $-sgn(\nabla f(b)) = +$. In this case, all sign combinations are possible owing to possible non-differentiability of $f$ at the interval bounds: @@ -262,8 +262,8 @@ convex functions. This idea can be extended to many intervals. Take a 1-dimensional ICNN. Consider subdividing the operational design domain into a union of intervals $I_i$, where $I_i = [a_i,a_{i+1}]$ and $a_i \lt a_{i+1}$. A tight lower and upper bound on each interval can be computed with a -single forward pass through the network of all interval boundary values in the union of intervals, a -single backward pass through the network to compute derivatives at the interval boundary values, and +single forward pass through the network of all interval bounds values in the union of intervals, a +single backward pass through the network to compute derivatives at the interval bounds values, and one final convex optimization on the interval containing the global minimum. Furthermore, since bounds are computed at forward and backward passes through the network, you can compute a 'boundedness metric' during @@ -279,35 +279,36 @@ and $sgn(0) = +$. The previous discussion focused on 1-dimensional convex functions, however, this idea extends to n-dimensional convex functions, $f:\mathbb{R}^n \rightarrow \mathbb{R}$. Note that a vector valued convex function is convex in each output, so it is sufficient to keep the target as $\mathbb{R}$. In the discussion in this section, take the convex set to be the n-dimensinal hypercube, $H_n$, with vertices, $V_n = {(\pm 1,\pm 1, \dots,\pm 1)}$. General convex hulls will be discussed later. -An important property of convex functions in n-dimensions is that every 1-dimensional restriction also defines a convex function. This is easily seen from the -definition. Define $g:\mathbb{R} \rightarrow \mathbb{R}$ as $g(t) = f(t\hat{n}) \text{ where } \hat{n}$ is +An important property of convex functions in n-dimensions is that every 1-dimension restriction also defines a convex function. This is easily seen from the +definition. Define $g:\mathbb{R} \rightarrow \mathbb{R}$ as $g(t) = f(t\hat{n})$ where $\hat{n}$ is some unit vector in $\mathbb{R}^n$. Then, by definition of convexity of $f$, letting $x = t\hat{n}$ and $y = t'\hat{n}$, it follows that, -$$ g((1−\lambda)t+\lambda t') \leq (1−\lambda)g(t)+ \lambda g(t') $$ +$g((1−\lambda)t+\lambda t') \leq (1−\lambda)g(t)+ \lambda g(t')$ -Note that the restriction to 1-dimensional convex functions will be used several times in the following discussion. +Note that the restriction to 1-dimensional convex function will be used several times in the following discussion. To determine an upper bound of $f$ on the hypercube, note that any point in $H_n$ can be expressed as a convex combination of its vertices, i.e., for $z \in H_n$, it follows that $z = \sum_i \lambda_i v_i$ where $\sum_i \lambda_i = 1$ and $v_i \in V_n$. Therefore, using the definition of convexity in the first inequality and that $\lambda_i \leq 1$ in the second equality, -$$ f(z) = f(\sum_i \lambda_i v_i) \leq \sum \lambda_i f(v_i) \leq \underset{v \in V_n}{\text{max }} f(v) $$ +$ f(z) = f(\sum_i \lambda_i v_i) \leq \sum \lambda_i f(v_i) \leq \underset{v \in V_n}{\text{max }} f(v) $. -Consider now the lower bound of $f$ over a hypercubic grid. Here we take the -approach of looking for hypercubes where there is a guarantee that the -minimum lies at a vertex of the hypercube and when this guarantee is not met, fall back to solving the convex optimization over that particular -hypercubic. For the n-dimensional approach, we will split the +Consider now the lower bound of $f$ over the hypercube. Here we take the +approach of looking for cases where there is a guarantee that the +minimum lies at a vertex of the hypercube and when this guarantee cannot +be met, falling back to solving the convex optimization over this +hypercubic domain. For the n-dimensional approach, we will split the discussion into differentiable and non-differentiable $f$, and consider these separately. **Multi-Dimensional Differentiable Convex Functions** -Consider the derivatives evaluated at each vertex of a hypercube. For each $\nabla f(v)$, $v \in V_n$, take the directional derivatives, +Consider the derivatives evaluated at each vertex of the hypercube. For each $\nabla f(v)$, $v \in V_n$, take the directional derivatives, pointing inward along a hypercubic edge. Without loss of generality, recall $V_n = \{(±1,±1,…,±1) \in \mathbb{R}^n\}$ and therefore the hypercube is aligned along the standard basis vectors $e_i$. The $\text{i}^{\text{th}}$-directional derivative, pointing inward, is defined as, -$$ −sgn(v_i)e_i\cdot \nabla f(v) e_i = −sgn(v_i) \nabla_i f(v) $$ +$ −sgn(v_i)e_i\cdot \nabla f(v) e_i = −sgn(v_i) \nabla_i f(v)$ where $sgn(v_i)$ denotes the sign of $\text{i}^{\text{th}}$ component of the vertex $v$, and the minus ensures the directional @@ -321,7 +322,8 @@ construction on a cube.

-Analogous to the 1-dimensional case, analyze the signatures of the derivatives at the vertices. The notation $(\pm,\pm,…,\pm)_v$ denotes the overall sign of $−sgn(v_i)\nabla_i f(v)$ at $v$ for each $i$, and is used in the rest of this article. +Analogous to the 1-dimensional case, analyze the +signatures of the derivatives at the vertices. The notation $(\pm,...,\pm)_v $ denotes the overall sign of $−sgn(v_i)\nabla_i f(v)$ at $v$ for each $i$, and is used in the rest of this article. **Lemma**: @@ -337,10 +339,12 @@ vector in direction $z-w$. Since the directional derivatives at $w$ pointing inwards are all positive, and $f$ is differentiable, the derivative along the line at $w$, pointing inwards, is given by, -$$ \hat{n} \cdot \nabla f(w) = \sum_i -|n_i|\cdot sgn(w_i) \cdot \nabla_i f(w) = \sum_i |n_i| \cdot (-sgn(w_i) \cdot \nabla_i f(w)) \geq 0 $$ +$ \hat{n} \cdot \nabla f(w) = \sum_i -|n_i|\cdot sgn(w_i) \cdot \nabla_i f(w) = \sum_i |n_i| \cdot (-sgn(w_i) \cdot \nabla_i f(w)) \geq 0 $ -and is positive, as $\hat{n} = - |n_i| \cdot sgn(w_i) \cdot e_i $. -The properties proved previously can then be applied to this 1-dimensional restriction. Hence, a vertex with inward +is positive, as $\hat{n} = - |n_i| \cdot sgn(w_i) \cdot e_i $. +The properties proved previously can then by applied to this 1-dimensional restriction, i.e., if the +gradient of $f$ as the interval bounds of an interval is positive, then $f$ has +a minimum value at this interval bounds. Hence, a vertex with inward directional derivative signature $(+,+,…,+)$ is a lower bound for $f$ over the hypercube. ◼ If there are multiple vertices sharing this signature, then since every @@ -351,9 +355,9 @@ at vertices sharing these signatures so it is sufficient to select any. If no vertex has signature $(+,+,…,+)$, solve for the minimum using a convex optimization routine over this hypercube. Since all local minima are -global minima, there is at least one hypercube requiring this approach. +global minima, there is at least one hypercube requiring this solution. If the function has a flat section at its minima, there may be other -hypercubes, also without a vertex with all positive signature. Note that empirically, +hypercubes in the operational design domain, also without a vertex with all positive signature. Note that empirically, this seldom happens for convex neural networks as it requires fine tuning of the parameters to create such a landscape. @@ -377,7 +381,7 @@ As depicted in figure 7, the vertices $w$ of the square (hypercube of dimension bisecting these directional derivatives, into the interior of the square, has a negative gradient. This is because the vertex is at the intersection of two planes and is a non-differentiable point, so the derivative through this point is path -dependent. This is a well-known property of non-differentiable functions and breaks the assertion that this vertex is the minimum of $f$ over this +dependent. This is a well-known observation but this breaks the assertion that this vertex if the minimum of $f$ over this square region. From this example, it is clear the minimum lies at the apex at $(0,0)$. To ameliorate this issue, in the case that the convex function is @@ -388,13 +392,17 @@ $relu$ operations. In practice, this means that a vertex may be a non-differentiable point if the network has pre-activations to $relu$ layers that have exact zeros. In practice, this is seldom the case. The probability of this occurring can be further reduced by offsetting any -hypercube or hypercubic grid origin by a small random perturbation. If there are -any zeros in these pre-activations, lower bounds for hypercubes that contain that vertex can be recomputed using -a convex optimization routine instead. +hypercube or hypercubic grid origin by a small random perturbation. It +is assumed during training, for efficiency of computing bounds during training, that the convex neural network is differentiable everywhere. For final post-training analysis, this implementation checks the $relu$ +pre-activations for any exact zeros for all vertices. If there are +any zeros in these pre-activations, lower bounds for hypercubes that contain that vertex are recomputed using +an minimization routine. As a demonstration that these bounds are +correct, in the examples, we also run the minimization optimization routine on every +hypercube to show that bounds agree. As a final comment, for general convex hulls, the argument for the upper bound value of the function over the convex hull trivially extends, defined as the largest function value over the set of points defining the hull. The lower bound should be determined using an optimization routine, constrained to the set of point in the convex hull. **References** - [1] Amos, Brandon, et al. Input Convex Neural Networks. arXiv:1609.07152, arXiv, 14 June 2017. arXiv.org, https://doi.org/10.48550/arXiv.1609.07152. -- [2] Ławryńczuk, Maciej. “Input Convex Neural Networks in Nonlinear Predictive Control: A Multi-Model Approach.” Neurocomputing, vol. 513, Nov. 2022, pp. 273–93. ScienceDirect, https://doi.org/10.1016/j.neucom.2022.09.108. +- [2] Ławryńczuk, Maciej. “Input Convex Neural Networks in Nonlinear Predictive Control: A Multi-Model Approach.” Neurocomputing, vol. 513, Nov. 2022, pp. 273–93. ScienceDirect, https://doi.org/10.1016/j.neucom.2022.09.108. \ No newline at end of file diff --git a/documentation/AI-Verification-Lipschitz.md b/documentation/AI-Verification-Lipschitz.md index 24864bd..ff9b3be 100644 --- a/documentation/AI-Verification-Lipschitz.md +++ b/documentation/AI-Verification-Lipschitz.md @@ -4,7 +4,7 @@ In the field of deep learning, neural networks have demonstrated remarkable succ Lipschitz continuity is a mathematical concept that describes the rate at which a function's output can change with respect to changes in its input. Formally, a function $f:X \rightarrow Y$ is said to be Lipschitz continuous if there exists a constant $\lambda \geq 0$ such that for all $x_1$ and $x_2$ in the domain *X*, the following inequality holds: -$$|f(x_1) - f(x_2)| \leq \lambda |x_1 - x_2|$$ +$|f(x_1) - f(x_2)| \leq \lambda |x_1 - x_2|$ Here, $\lambda$ is referred to as the Lipschitz constant. It essentially bounds the gradient (or the steepness) of the function, ensuring that the output does not change too dramatically for small changes in the input. @@ -18,19 +18,19 @@ Enforcing Lipschitz continuity in neural networks is not straightforward. Calcul The choice of the p-norm in the context of Lipschitz constraints has a significant impact on the way distances are measured between points and consequently how to define and enforce Lipschitz continuity in neural networks. The p-norm (or Lp norm) is a generalization of the Euclidean distance and is defined for a vector *x* in a real or complex space as: -$$||x||_p = (|x_1|^p + |x_2|^p + ... + |x_n|^p)^{(1/p)}$$ +$||x||_p = (|x_1|^p + |x_2|^p + ... + |x_n|^p)^{(1/p)}$ where $|x_i|$ denotes the absolute value of the i-th component of the vector *x*, and $p \geq 1$. When talking about Lipschitz continuity using a p-norm, it corresponds to the inequality: -$$||f(x_1) - f(x_2)||_p \leq \lambda_p ||x_1 - x_2||_p$$ +$||f(x_1) - f(x_2)||_p \leq \lambda_p ||x_1 - x_2||_p$ where *f* is the function representing the neural network, and $\lambda_p$ is the Lipschitz constant for choice of norm *p*. This choice of *p* determines the geometry of the space in which to measure the distances and can have several implications: - $\ell_1$-Norm (Manhattan Distance) -When $p = 1$, the $\ell_1$-norm sums the absolute values of the components of the vector. This norm is less sensitive to outliers than the $\ell_2$-norm and can lead to sparser solutions in optimization problems. In the context of Lipschitz continuity, using the 1-norm can result in a model that is robust to small changes in many input dimensions simultaneously. +When `p = 1`, the $\ell_1$-norm sums the absolute values of the components of the vector. This norm is less sensitive to outliers than the $\ell_2$-norm and can lead to sparser solutions in optimization problems. In the context of Lipschitz continuity, using the 1-norm can result in a model that is robust to small changes in many input dimensions simultaneously. - $\ell_2$-Norm (Euclidean Distance) -The $\ell_2$-norm $p = 2$ is the most commonly used norm, representing the straight-line distance between two points. It is rotationally invariant and often leads to smoother and more isotropic gradients. When enforcing Lipschitz continuity with the $\ell_2$-norm, the model is encouraged to be robust to perturbations in any direction in the input space. +The $\ell_2$-norm (`p = 2`) is the most commonly used norm, representing the straight-line distance between two points. It is rotationally invariant and often leads to smoother and more isotropic gradients. When enforcing Lipschitz continuity with the $\ell_2$-norm, the model is encouraged to be robust to perturbations in any direction in the input space. - $\ell_\infty$-Norm (Maximum Norm) The $\infty$-norm takes the maximum absolute value among the components of the vector. It measures the largest change in any single dimension. In the context of Lipschitz continuity, this norm is concerned with the worst-case scenario, where the model is robust to the largest change in any single input dimension. @@ -60,7 +60,7 @@ As an explicit example, consider the $\ell_p$-Lipschitz constrained network with You can compute an upper bound Lipschitz constant for this network by taking the product of Lipschitz constant for each layer. For the relu activation, $\lambda_p = 1$. For the fully connected layers, the Lipschitz constant is given by $||W||_p$, and a suitable proximal operator that ensures the network has upper bound Lipschitz constant, $\lambda_p = 2$, is -$$W \rightarrow \frac{1}{max(1,||W||_p/\sqrt{\lambda_p})}W$$ +$W \rightarrow \frac{1}{max(1,||W||_p/\sqrt{\lambda_p})}W$. This ensures that the product of Lipschitz constants is at most $\lambda_p$. There are alternative proximal operators, some of which depends on the p-norm, for example using the $\ell_1$-norm as discussed in [2]. @@ -73,4 +73,4 @@ Lipschitz continuity offers a mathematical framework to understand and potential **References** - [1] Gouk, Henry, et al. “Regularisation of Neural Networks by Enforcing Lipschitz Continuity.” Machine Learning, vol. 110, no. 2, Feb. 2021, pp. 393–416. DOI.org (Crossref), https://doi.org/10.1007/s10994-020-05929-w -- [2] Kitouni, Ouail, et al. Expressive Monotonic Neural Networks. arXiv:2307.07512, arXiv, 14 July 2023. arXiv.org, http://arxiv.org/abs/2307.07512. +- [2] Kitouni, Ouail, et al. Expressive Monotonic Neural Networks. arXiv:2307.07512, arXiv, 14 July 2023. arXiv.org, http://arxiv.org/abs/2307.07512. \ No newline at end of file diff --git a/documentation/AI-Verification-Monotonicity.md b/documentation/AI-Verification-Monotonicity.md index 747e5b0..2d67f04 100644 --- a/documentation/AI-Verification-Monotonicity.md +++ b/documentation/AI-Verification-Monotonicity.md @@ -27,9 +27,15 @@ To circumvent these challenges, an alternative approach is to construct neural n - **Constrained Weights**: Ensuring that all weights in the network are non-negative can guarantee monotonicity. You can achieve this by using techniques like weight clipping or transforming weights during training. - **Architectural Considerations**: Designing network architectures that facilitate monotonic behavior. For example, architectures that avoid certain types of skip connections or layer types that could introduce non-monotonic behavior. -The approach taken in this repository is to utilize a combination of activation function, weight and architectural restrictions and is based on the construction outlined in [1]. Ref [1] discusses the derivation in the context of row vector representations of network inputs however MATLAB utilizes a column vector representation of network inputs. This means that the 1-norm discussed in [1] is replaced by the $\infty$-norm for implementations in MATLAB. +The approach taken in this repository is to utilize a combination of these three aspects and is based on the construction outlined in [1]. As [1] discusses the derivation in the context of row vector representations of network inputs, we derive the result for column vector inputs here, as MATLAB utilizes a column vector representation of network inputs. -Note that for different choices of p-norm, the derivation in [1] still yields a monotonic function $f$, however there may be couplings between the magnitudes of the partial derivatives (shown for p=2 in [1]). By default, the implementation in this repository sets $p=\infty$ for monotonic networks but other values are explored as these may yield better fits. +Consider a scalar network $f:\mathbb{R}^n \rightarrow \mathbb{R}$ where $f(x) = g(x) + \lambda \sum_{k \in S} x_k$, $S$ denotes the set of monotonically dependent input indices and $g:\mathbb{R}^n \rightarrow \mathbb{R}$ is a Lipschitz continuous network, i.e., $\forall x,y \in \mathbb{R}^n$, $|g(x)-g(y)| \leq \lambda ||x-y||_p$. For monotonic decreasing, $f(x) = g(x) - \lambda \sum_{k \in S} x_k$. + +Take $p=\infty$. The matrix $\infty$-norm is the maximum absolute sum of each row, i.e., $||A||_\infty = max_i \sum_j |a_{ij}| $. Therefore, for multi-layer perceptron networks $g$ (as discussed in [1]), an upper bound on $\lambda = \prod_i ||W^{(i)}||_\infty$ where $W^{(i)}$ is the weight matrix of the $i$-th fully connected layer. It follows from Lipschitz continuity that $|| \nabla g ||_\infty \leq \lambda$ where, since $\nabla g$ is also taken column vector, $|| \nabla g ||_\infty = max_k |\partial g/\partial x_k| \leq \lambda$. Hence the choice of $\infty$-norm decouples the magnitudes of the directional derivatives in the monotonic features for column vector inputs and column vector gradients, or in other words, each partial derivative is free to take any value in the interval $[-\lambda, \lambda]$. + +From the definition of $f$, $\partial f/\partial x_k = \partial g/\partial x_k + \lambda \geq 0$ for $k \in S$ and hence the network $f$ is monotonic in $x_k$ for $k \in S$ by construction. The decoupling of the magnitudes of the directional derivatives means that the partial derivatives of $f$ can be as large as $2\lambda$ in each monotonic direction. + +Note that for different choices of p-norm, the derivation above still yields a monotonic function $f$, however there may be couplings between the magnitudes of the partial derivatives (shown for p=2 in [1]). By default, the implementation in this repository sets $p=\infty$ for monotonic networks but other values are explored as these may yield better fits. A simple monotonic architecture is shown in Figure 1. @@ -50,7 +56,7 @@ The main challenge with expressive monotonic networks is to balance the inherent For networks constructed to be monotonic, verification becomes more straightforward and comes down to architectural and weight inspection, i.e., provided the network architecture is of a specified monotonic topology, and that the weights in the network are appropriately related - see [1] - then the network is monotonic. -In summary, while verifying monotonicity in general neural networks is complex due to non-linearities and high dimensionality, constructing networks with inherent monotonic properties simplifies verification. By using constrained architectures and weights, you can design networks that are guaranteed to be monotonic, thus facilitating the verification process and making the network more suitable for applications where monotonic behavior is essential. +In summary, while verifying monotonicity in general neural networks is complex due to non-linearities and high dimensionality, constructing networks with inherent monotonic properties simplifies verification. By using monotonic activation functions and ensuring non-negative weights, you can design networks that are guaranteed to be monotonic, thus facilitating the verification process and making the network more suitable for applications where monotonic behavior is essential. **References** diff --git a/documentation/figures/ficnn_network.jpg b/documentation/figures/ficnn_network.jpg index 384d5b2..5625902 100644 Binary files a/documentation/figures/ficnn_network.jpg and b/documentation/figures/ficnn_network.jpg differ diff --git a/examples/convex/classificationCIFAR10/TrainICNNOnCIFAR10Example.md b/examples/convex/classificationCIFAR10/TrainICNNOnCIFAR10Example.md index bef92d1..fad1e0b 100644 --- a/examples/convex/classificationCIFAR10/TrainICNNOnCIFAR10Example.md +++ b/examples/convex/classificationCIFAR10/TrainICNNOnCIFAR10Example.md @@ -76,9 +76,9 @@ plot(ficnnet) ```
-

- -

+

+ +

# Specify Training Options @@ -127,9 +127,9 @@ trained_ficnnet = trainConstrainedNetwork("fully-convex",ficnnet,mbqTrain,... ```
-

- -

+

+ +

# Evaluate Trained Network @@ -160,7 +160,7 @@ disp("Training accuracy: " + (1-trainError)*100 + "%") ``` ```matlabTextOutput -Training accuracy: 97.7364% +Training accuracy: 90.4848% ``` Compute the accuracy on the test set. @@ -173,7 +173,7 @@ disp("Test accuracy: " + (1-testError)*100 + "%") ``` ```matlabTextOutput -Test accuracy: 31.5848% +Test accuracy: 27.4554% ``` The networks output has been constrained to be convex in every pixel in every colour. Even with this level of restriction, the network is able to fit reasonably well to the training data. You can see poor accuracy on the test data set but, as discussed at the start of the example, it is not anticipated that such a fully input convex network comprising of fully connected operations should generalize well to natural image classification. @@ -190,9 +190,9 @@ cm.RowSummary = "row-normalized"; ```
-

- -

+

+ +

To summarise, the fully input convex network is able to fit to the training data set, which is labelled natural images. The training can take a considerable amount of time owing to the weight projection to the constrained set after each gradient update, which slows down training convergence. Nevertheless, this example illustrates the flexibility and expressivity convex neural networks have to correctly classifying natural images. diff --git a/examples/convex/classificationCIFAR10/TrainICNNOnCIFAR10Example.mlx b/examples/convex/classificationCIFAR10/TrainICNNOnCIFAR10Example.mlx index b09e1c8..d495b3e 100644 Binary files a/examples/convex/classificationCIFAR10/TrainICNNOnCIFAR10Example.mlx and b/examples/convex/classificationCIFAR10/TrainICNNOnCIFAR10Example.mlx differ diff --git a/examples/convex/classificationCIFAR10/downloadCIFARData.m b/examples/convex/classificationCIFAR10/downloadCIFARData.m index 5370538..095c383 100644 --- a/examples/convex/classificationCIFAR10/downloadCIFARData.m +++ b/examples/convex/classificationCIFAR10/downloadCIFARData.m @@ -1,7 +1,5 @@ function downloadCIFARData(destination) -% Copyright 2024 The MathWorks, Inc. - url = 'https://www.cs.toronto.edu/~kriz/cifar-10-matlab.tar.gz'; unpackedData = fullfile(destination,'cifar-10-batches-mat'); diff --git a/examples/convex/classificationCIFAR10/figures/TrainICNN_Fig1.jpg b/examples/convex/classificationCIFAR10/figures/TrainICNN_Fig1.jpg deleted file mode 100644 index 4ad884c..0000000 Binary files a/examples/convex/classificationCIFAR10/figures/TrainICNN_Fig1.jpg and /dev/null differ diff --git a/examples/convex/classificationCIFAR10/figures/TrainICNN_Fig1.png b/examples/convex/classificationCIFAR10/figures/TrainICNN_Fig1.png new file mode 100644 index 0000000..6f49779 Binary files /dev/null and b/examples/convex/classificationCIFAR10/figures/TrainICNN_Fig1.png differ diff --git a/examples/convex/classificationCIFAR10/figures/TrainICNN_Fig2.jpg b/examples/convex/classificationCIFAR10/figures/TrainICNN_Fig2.jpg deleted file mode 100644 index 4d819c8..0000000 Binary files a/examples/convex/classificationCIFAR10/figures/TrainICNN_Fig2.jpg and /dev/null differ diff --git a/examples/convex/classificationCIFAR10/figures/TrainICNN_Fig2.png b/examples/convex/classificationCIFAR10/figures/TrainICNN_Fig2.png new file mode 100644 index 0000000..b11b61f Binary files /dev/null and b/examples/convex/classificationCIFAR10/figures/TrainICNN_Fig2.png differ diff --git a/examples/convex/classificationCIFAR10/figures/TrainICNN_Fig3.jpg b/examples/convex/classificationCIFAR10/figures/TrainICNN_Fig3.jpg deleted file mode 100644 index 4530e3d..0000000 Binary files a/examples/convex/classificationCIFAR10/figures/TrainICNN_Fig3.jpg and /dev/null differ diff --git a/examples/convex/classificationCIFAR10/figures/TrainICNN_Fig3.png b/examples/convex/classificationCIFAR10/figures/TrainICNN_Fig3.png new file mode 100644 index 0000000..6f1bea2 Binary files /dev/null and b/examples/convex/classificationCIFAR10/figures/TrainICNN_Fig3.png differ diff --git a/examples/convex/classificationCIFAR10/loadCIFARData.m b/examples/convex/classificationCIFAR10/loadCIFARData.m index 7a23aa7..a896d98 100644 --- a/examples/convex/classificationCIFAR10/loadCIFARData.m +++ b/examples/convex/classificationCIFAR10/loadCIFARData.m @@ -1,7 +1,5 @@ function [XTrain,YTrain,XTest,YTest] = loadCIFARData(location) -% Copyright 2024 The MathWorks, Inc. - location = fullfile(location,'cifar-10-batches-mat'); [XTrain1,YTrain1] = loadBatchAsFourDimensionalArray(location,'data_batch_1.mat'); diff --git a/examples/convex/introductory/PoC_Ex1_1DFICNN.md b/examples/convex/introductory/PoC_Ex1_1DFICNN.md index f9daa44..4a41724 100644 --- a/examples/convex/introductory/PoC_Ex1_1DFICNN.md +++ b/examples/convex/introductory/PoC_Ex1_1DFICNN.md @@ -33,9 +33,9 @@ xlabel("x") ```
-

- -

+

+ +

# Prepare Data @@ -60,7 +60,7 @@ As discussed in [AI Verification: Convex](../../../documentation/AI-Verification inputSize = 1; numHiddenUnits = [16 8 4 1]; ficnnet = buildConstrainedNetwork("fully-convex",inputSize,numHiddenUnits, ... - PositiveNonDecreasingActivation="relu") + ConvexNonDecreasingActivation="relu") ``` ```matlabTextOutput @@ -92,9 +92,9 @@ end ```
-

- -

+

+ +

# Train FICNN @@ -119,9 +119,9 @@ trained_ficnnet = trainConstrainedNetwork("fully-convex",ficnnet,mbqTrain, ... ```
-

- -

+

+ +

Evaluate the accuracy on the true underlying convex function from an independent random sampling from the interval [-1,1]. @@ -138,7 +138,7 @@ lossAgainstUnderlyingSignal = gpuArray single - 0.0338 + 0.0362 ``` # Train Unconstrained MLP @@ -180,9 +180,9 @@ end ```
-

- -

+

+ +

Specify the training options and then train the network using the trainnet function. @@ -204,39 +204,40 @@ trained_mlpnet = trainnet(mbqTrain,mlpnet,lossMetric,options); Iteration Epoch TimeElapsed LearnRate TrainingLoss _________ _____ ___________ _________ ____________ 1 1 00:00:00 0.05 0.26302 - 50 50 00:00:07 0.045 0.12781 - 100 100 00:00:11 0.03645 0.12262 - 150 150 00:00:14 0.032805 0.10849 - 200 200 00:00:18 0.026572 0.1102 - 250 250 00:00:22 0.021523 0.11806 - 300 300 00:00:25 0.019371 0.10301 - 350 350 00:00:29 0.015691 0.096023 - 400 400 00:00:32 0.012709 0.10675 - 450 450 00:00:36 0.011438 0.097555 - 500 500 00:00:39 0.0092651 0.094147 - 550 550 00:00:43 0.0075047 0.090284 - 600 600 00:00:46 0.0067543 0.088997 - 650 650 00:00:50 0.0054709 0.086944 - 700 700 00:00:53 0.0044315 0.085979 - 750 750 00:00:57 0.0039883 0.085362 - 800 800 00:01:00 0.0032305 0.08497 - 850 850 00:01:04 0.0026167 0.08464 - 900 900 00:01:07 0.0023551 0.084311 - 950 950 00:01:11 0.0019076 0.084135 - 1000 1000 00:01:15 0.0015452 0.083962 - 1050 1050 00:01:19 0.0013906 0.083793 - 1100 1100 00:01:22 0.0011264 0.08367 - 1150 1150 00:01:26 0.0009124 0.08356 - 1200 1200 00:01:29 0.00082116 0.083461 + 50 50 00:00:05 0.045 0.12781 + 100 100 00:00:08 0.03645 0.12262 + 150 150 00:00:11 0.032805 0.10938 + 200 200 00:00:14 0.026572 0.10655 + 250 250 00:00:17 0.021523 0.11237 + 300 300 00:00:20 0.019371 0.104 + 350 350 00:00:23 0.015691 0.10177 + 400 400 00:00:26 0.012709 0.097083 + 450 450 00:00:28 0.011438 0.094851 + 500 500 00:00:31 0.0092651 0.092311 + 550 550 00:00:34 0.0075047 0.093058 + 600 600 00:00:36 0.0067543 0.089904 + 650 650 00:00:39 0.0054709 0.088938 + 700 700 00:00:42 0.0044315 0.087454 + 750 750 00:00:45 0.0039883 0.086143 + 800 800 00:00:48 0.0032305 0.085586 + 850 850 00:00:51 0.0026167 0.085192 + 900 900 00:00:54 0.0023551 0.08487 + 950 950 00:00:57 0.0019076 0.084659 + 1000 1000 00:00:59 0.0015452 0.084424 + 1050 1050 00:01:02 0.0013906 0.084303 + 1100 1100 00:01:05 0.0011264 0.084138 + 1150 1150 00:01:08 0.0009124 0.084049 + 1200 1200 00:01:11 0.00082116 0.083947 Training stopped: Max epochs completed ```
-

- -

+

+ +

+ Evaluate the accuracy on an independent random sampling from the interval [-1,1]. Observe that the loss against the underlying monotonic signal here is higher as the network has fitted to the sinusoidal contamination. ```matlab @@ -244,7 +245,7 @@ lossAgainstUnderlyingSignal = computeLoss(trained_mlpnet,xTest,tTest,lossMetric) ``` ```matlabTextOutput -lossAgainstUnderlyingSignal = 0.0699 +lossAgainstUnderlyingSignal = 0.0696 ``` # Network Comparison @@ -264,11 +265,12 @@ legend("FICNN","MLP","Training Data") ```
-

- -

+

+ +

+ It is visually evident that the MLP solution is not convex over the interval but the FICNN is convex, owing to its convex construction and constrained learning. # Guaranteed Bounds for FICNN @@ -318,9 +320,9 @@ title("Guarantees of upper and lower bounds for FICNN network"); ```
-

- -

+

+ +

# Violated Bounds for MLP @@ -351,9 +353,9 @@ grid on; ```
-

- -

+

+ +

# Helper Functions diff --git a/examples/convex/introductory/PoC_Ex1_1DFICNN.mlx b/examples/convex/introductory/PoC_Ex1_1DFICNN.mlx index 7b1e4be..87ac560 100644 Binary files a/examples/convex/introductory/PoC_Ex1_1DFICNN.mlx and b/examples/convex/introductory/PoC_Ex1_1DFICNN.mlx differ diff --git a/examples/convex/introductory/PoC_Ex2_nDFICNN.md b/examples/convex/introductory/PoC_Ex2_nDFICNN.md index ef691e0..e9a16e4 100644 --- a/examples/convex/introductory/PoC_Ex2_nDFICNN.md +++ b/examples/convex/introductory/PoC_Ex2_nDFICNN.md @@ -31,9 +31,9 @@ ylabel("x2") ```
-

- -

+

+ +

# Prepare Data @@ -58,7 +58,7 @@ In this proof of concept example, build a 2\-dimensional FICNN using fully conne inputSize = 2; numHiddenUnits = [16 8 4 1]; ficnnet = buildConstrainedNetwork("fully-convex",inputSize,numHiddenUnits,... - PositiveNonDecreasingActivation="softplus") + ConvexNonDecreasingActivation="softplus") ``` ```matlabTextOutput @@ -90,10 +90,8 @@ end ```
-

- -

-
+

+ # Train FICNN @@ -117,9 +115,9 @@ trained_ficnnet = trainConstrainedNetwork("fully-convex",ficnnet,mbqTrain,... ```

-

- -

+

+ +

Evaluate the accuracy on the training set. @@ -133,7 +131,7 @@ loss = gpuArray single - 0.0134 + 0.0156 ``` Plot the network predictions with the training data. @@ -151,9 +149,9 @@ legend("Training Data","Network Prediction",Location="northwest") ```
-

- -

+

+ +

# Guaranteed Bounds for 2\-D FICNN @@ -191,9 +189,9 @@ hold off ```
-

- -

+

+ +

# Helper Functions diff --git a/examples/convex/introductory/PoC_Ex2_nDFICNN.mlx b/examples/convex/introductory/PoC_Ex2_nDFICNN.mlx index db1e5be..dfdc467 100644 Binary files a/examples/convex/introductory/PoC_Ex2_nDFICNN.mlx and b/examples/convex/introductory/PoC_Ex2_nDFICNN.mlx differ diff --git a/examples/convex/introductory/PoC_Ex3_nDPICNN.md b/examples/convex/introductory/PoC_Ex3_nDPICNN.md index 78b74b1..5e705cc 100644 --- a/examples/convex/introductory/PoC_Ex3_nDPICNN.md +++ b/examples/convex/introductory/PoC_Ex3_nDPICNN.md @@ -31,11 +31,12 @@ ylabel("x2") ```
-

- -

+

+ +

+ Observe the overall underlying convex behavior in x2 given x1, and non\-convex behavior in x1 given x2. # Prepare Data @@ -61,7 +62,7 @@ In this proof of concept example, build a 2\-dimensional PICNN using fully conne inputSize = 2; numHiddenUnits = [32 8 1]; picnnet = buildConstrainedNetwork("partially-convex",inputSize,numHiddenUnits,... - PositiveNonDecreasingActivation="softplus",... + ConvexNonDecreasingActivation="softplus",... Activation="tanh",... ConvexChannelIdx=2) ``` @@ -95,9 +96,9 @@ end ```
-

- -

+

+ +

# Train PICNN @@ -122,9 +123,9 @@ trained_picnnet = trainConstrainedNetwork("partially-convex",picnnet,mbqTrain,.. ```
-

- -

+

+ +

Evaluate the accuracy on the training set. @@ -138,7 +139,7 @@ loss = gpuArray single - 0.0265 + 0.0275 ``` Plot the network predictions with the training data. @@ -156,9 +157,9 @@ legend("Training Data","Network Prediction",Location="northwest") ```
-

- -

+

+ +

# Guaranteed Bounds for 2\-D PICNN in 1\-D Restrictions @@ -187,9 +188,9 @@ xlabel("x2") ```
-

- -

+

+ +

As in the 1\-dimensional convex case, compute bounds for 1\-dimensional restrictions for fixed x1. @@ -235,9 +236,9 @@ title("Guarantees of upper and lower bounds for PICNN network for fixed x1=" + x ```
-

- -

+

+ +

# Helper Functions diff --git a/examples/convex/introductory/PoC_Ex3_nDPICNN.mlx b/examples/convex/introductory/PoC_Ex3_nDPICNN.mlx index 962295f..6ffd164 100644 Binary files a/examples/convex/introductory/PoC_Ex3_nDPICNN.mlx and b/examples/convex/introductory/PoC_Ex3_nDPICNN.mlx differ diff --git a/examples/convex/introductory/figures/PoC_Ex1_1DFICNN_Fig1.jpg b/examples/convex/introductory/figures/PoC_Ex1_1DFICNN_Fig1.jpg deleted file mode 100644 index ee27a8c..0000000 Binary files a/examples/convex/introductory/figures/PoC_Ex1_1DFICNN_Fig1.jpg and /dev/null differ diff --git a/examples/convex/introductory/figures/PoC_Ex1_1DFICNN_Fig1.png b/examples/convex/introductory/figures/PoC_Ex1_1DFICNN_Fig1.png new file mode 100644 index 0000000..fe7189b Binary files /dev/null and b/examples/convex/introductory/figures/PoC_Ex1_1DFICNN_Fig1.png differ diff --git a/examples/convex/introductory/figures/PoC_Ex1_1DFICNN_Fig2.jpg b/examples/convex/introductory/figures/PoC_Ex1_1DFICNN_Fig2.jpg deleted file mode 100644 index 010d41f..0000000 Binary files a/examples/convex/introductory/figures/PoC_Ex1_1DFICNN_Fig2.jpg and /dev/null differ diff --git a/examples/convex/introductory/figures/PoC_Ex1_1DFICNN_Fig2.png b/examples/convex/introductory/figures/PoC_Ex1_1DFICNN_Fig2.png new file mode 100644 index 0000000..bc3612c Binary files /dev/null and b/examples/convex/introductory/figures/PoC_Ex1_1DFICNN_Fig2.png differ diff --git a/examples/convex/introductory/figures/PoC_Ex1_1DFICNN_Fig3.jpg b/examples/convex/introductory/figures/PoC_Ex1_1DFICNN_Fig3.jpg deleted file mode 100644 index 85ae223..0000000 Binary files a/examples/convex/introductory/figures/PoC_Ex1_1DFICNN_Fig3.jpg and /dev/null differ diff --git a/examples/convex/introductory/figures/PoC_Ex1_1DFICNN_Fig3.png b/examples/convex/introductory/figures/PoC_Ex1_1DFICNN_Fig3.png new file mode 100644 index 0000000..92c3b24 Binary files /dev/null and b/examples/convex/introductory/figures/PoC_Ex1_1DFICNN_Fig3.png differ diff --git a/examples/convex/introductory/figures/PoC_Ex1_1DFICNN_Fig4.jpg b/examples/convex/introductory/figures/PoC_Ex1_1DFICNN_Fig4.jpg deleted file mode 100644 index ccb3b25..0000000 Binary files a/examples/convex/introductory/figures/PoC_Ex1_1DFICNN_Fig4.jpg and /dev/null differ diff --git a/examples/convex/introductory/figures/PoC_Ex1_1DFICNN_Fig4.png b/examples/convex/introductory/figures/PoC_Ex1_1DFICNN_Fig4.png new file mode 100644 index 0000000..f48a056 Binary files /dev/null and b/examples/convex/introductory/figures/PoC_Ex1_1DFICNN_Fig4.png differ diff --git a/examples/convex/introductory/figures/PoC_Ex1_1DFICNN_Fig5.jpg b/examples/convex/introductory/figures/PoC_Ex1_1DFICNN_Fig5.jpg deleted file mode 100644 index a56182c..0000000 Binary files a/examples/convex/introductory/figures/PoC_Ex1_1DFICNN_Fig5.jpg and /dev/null differ diff --git a/examples/convex/introductory/figures/PoC_Ex1_1DFICNN_Fig5.png b/examples/convex/introductory/figures/PoC_Ex1_1DFICNN_Fig5.png new file mode 100644 index 0000000..6ee4cbe Binary files /dev/null and b/examples/convex/introductory/figures/PoC_Ex1_1DFICNN_Fig5.png differ diff --git a/examples/convex/introductory/figures/PoC_Ex1_1DFICNN_Fig6.jpg b/examples/convex/introductory/figures/PoC_Ex1_1DFICNN_Fig6.jpg deleted file mode 100644 index bf57a1b..0000000 Binary files a/examples/convex/introductory/figures/PoC_Ex1_1DFICNN_Fig6.jpg and /dev/null differ diff --git a/examples/convex/introductory/figures/PoC_Ex1_1DFICNN_Fig6.png b/examples/convex/introductory/figures/PoC_Ex1_1DFICNN_Fig6.png new file mode 100644 index 0000000..18de7e5 Binary files /dev/null and b/examples/convex/introductory/figures/PoC_Ex1_1DFICNN_Fig6.png differ diff --git a/examples/convex/introductory/figures/PoC_Ex1_1DFICNN_Fig7.jpg b/examples/convex/introductory/figures/PoC_Ex1_1DFICNN_Fig7.jpg deleted file mode 100644 index 52248a6..0000000 Binary files a/examples/convex/introductory/figures/PoC_Ex1_1DFICNN_Fig7.jpg and /dev/null differ diff --git a/examples/convex/introductory/figures/PoC_Ex1_1DFICNN_Fig7.png b/examples/convex/introductory/figures/PoC_Ex1_1DFICNN_Fig7.png new file mode 100644 index 0000000..b6ea101 Binary files /dev/null and b/examples/convex/introductory/figures/PoC_Ex1_1DFICNN_Fig7.png differ diff --git a/examples/convex/introductory/figures/PoC_Ex1_1DFICNN_Fig8.jpg b/examples/convex/introductory/figures/PoC_Ex1_1DFICNN_Fig8.jpg deleted file mode 100644 index 8b9595e..0000000 Binary files a/examples/convex/introductory/figures/PoC_Ex1_1DFICNN_Fig8.jpg and /dev/null differ diff --git a/examples/convex/introductory/figures/PoC_Ex1_1DFICNN_Fig8.png b/examples/convex/introductory/figures/PoC_Ex1_1DFICNN_Fig8.png new file mode 100644 index 0000000..43ccf46 Binary files /dev/null and b/examples/convex/introductory/figures/PoC_Ex1_1DFICNN_Fig8.png differ diff --git a/examples/convex/introductory/figures/PoC_Ex2_nDFICNN_Fig1.jpg b/examples/convex/introductory/figures/PoC_Ex2_nDFICNN_Fig1.jpg deleted file mode 100644 index 3f4a295..0000000 Binary files a/examples/convex/introductory/figures/PoC_Ex2_nDFICNN_Fig1.jpg and /dev/null differ diff --git a/examples/convex/introductory/figures/PoC_Ex2_nDFICNN_Fig1.png b/examples/convex/introductory/figures/PoC_Ex2_nDFICNN_Fig1.png new file mode 100644 index 0000000..18f4234 Binary files /dev/null and b/examples/convex/introductory/figures/PoC_Ex2_nDFICNN_Fig1.png differ diff --git a/examples/convex/introductory/figures/PoC_Ex2_nDFICNN_Fig2.jpg b/examples/convex/introductory/figures/PoC_Ex2_nDFICNN_Fig2.jpg deleted file mode 100644 index 010d41f..0000000 Binary files a/examples/convex/introductory/figures/PoC_Ex2_nDFICNN_Fig2.jpg and /dev/null differ diff --git a/examples/convex/introductory/figures/PoC_Ex2_nDFICNN_Fig2.png b/examples/convex/introductory/figures/PoC_Ex2_nDFICNN_Fig2.png new file mode 100644 index 0000000..49731f3 Binary files /dev/null and b/examples/convex/introductory/figures/PoC_Ex2_nDFICNN_Fig2.png differ diff --git a/examples/convex/introductory/figures/PoC_Ex2_nDFICNN_Fig3.jpg b/examples/convex/introductory/figures/PoC_Ex2_nDFICNN_Fig3.jpg deleted file mode 100644 index 4bb16a4..0000000 Binary files a/examples/convex/introductory/figures/PoC_Ex2_nDFICNN_Fig3.jpg and /dev/null differ diff --git a/examples/convex/introductory/figures/PoC_Ex2_nDFICNN_Fig3.png b/examples/convex/introductory/figures/PoC_Ex2_nDFICNN_Fig3.png new file mode 100644 index 0000000..3759686 Binary files /dev/null and b/examples/convex/introductory/figures/PoC_Ex2_nDFICNN_Fig3.png differ diff --git a/examples/convex/introductory/figures/PoC_Ex2_nDFICNN_Fig4.jpg b/examples/convex/introductory/figures/PoC_Ex2_nDFICNN_Fig4.jpg deleted file mode 100644 index 3a5c37b..0000000 Binary files a/examples/convex/introductory/figures/PoC_Ex2_nDFICNN_Fig4.jpg and /dev/null differ diff --git a/examples/convex/introductory/figures/PoC_Ex2_nDFICNN_Fig4.png b/examples/convex/introductory/figures/PoC_Ex2_nDFICNN_Fig4.png new file mode 100644 index 0000000..8c0aeb3 Binary files /dev/null and b/examples/convex/introductory/figures/PoC_Ex2_nDFICNN_Fig4.png differ diff --git a/examples/convex/introductory/figures/PoC_Ex2_nDFICNN_Fig5.jpg b/examples/convex/introductory/figures/PoC_Ex2_nDFICNN_Fig5.jpg deleted file mode 100644 index 4f9fc15..0000000 Binary files a/examples/convex/introductory/figures/PoC_Ex2_nDFICNN_Fig5.jpg and /dev/null differ diff --git a/examples/convex/introductory/figures/PoC_Ex2_nDFICNN_Fig5.png b/examples/convex/introductory/figures/PoC_Ex2_nDFICNN_Fig5.png new file mode 100644 index 0000000..5c0712d Binary files /dev/null and b/examples/convex/introductory/figures/PoC_Ex2_nDFICNN_Fig5.png differ diff --git a/examples/convex/introductory/figures/PoC_Ex3_nDPICNN_Fig1.jpg b/examples/convex/introductory/figures/PoC_Ex3_nDPICNN_Fig1.jpg deleted file mode 100644 index 4a0dbc9..0000000 Binary files a/examples/convex/introductory/figures/PoC_Ex3_nDPICNN_Fig1.jpg and /dev/null differ diff --git a/examples/convex/introductory/figures/PoC_Ex3_nDPICNN_Fig1.png b/examples/convex/introductory/figures/PoC_Ex3_nDPICNN_Fig1.png new file mode 100644 index 0000000..bc28a12 Binary files /dev/null and b/examples/convex/introductory/figures/PoC_Ex3_nDPICNN_Fig1.png differ diff --git a/examples/convex/introductory/figures/PoC_Ex3_nDPICNN_Fig2.jpg b/examples/convex/introductory/figures/PoC_Ex3_nDPICNN_Fig2.jpg deleted file mode 100644 index 188fd11..0000000 Binary files a/examples/convex/introductory/figures/PoC_Ex3_nDPICNN_Fig2.jpg and /dev/null differ diff --git a/examples/convex/introductory/figures/PoC_Ex3_nDPICNN_Fig2.png b/examples/convex/introductory/figures/PoC_Ex3_nDPICNN_Fig2.png new file mode 100644 index 0000000..890a058 Binary files /dev/null and b/examples/convex/introductory/figures/PoC_Ex3_nDPICNN_Fig2.png differ diff --git a/examples/convex/introductory/figures/PoC_Ex3_nDPICNN_Fig3.jpg b/examples/convex/introductory/figures/PoC_Ex3_nDPICNN_Fig3.jpg deleted file mode 100644 index eb97474..0000000 Binary files a/examples/convex/introductory/figures/PoC_Ex3_nDPICNN_Fig3.jpg and /dev/null differ diff --git a/examples/convex/introductory/figures/PoC_Ex3_nDPICNN_Fig3.png b/examples/convex/introductory/figures/PoC_Ex3_nDPICNN_Fig3.png new file mode 100644 index 0000000..54d4879 Binary files /dev/null and b/examples/convex/introductory/figures/PoC_Ex3_nDPICNN_Fig3.png differ diff --git a/examples/convex/introductory/figures/PoC_Ex3_nDPICNN_Fig4.jpg b/examples/convex/introductory/figures/PoC_Ex3_nDPICNN_Fig4.jpg deleted file mode 100644 index a01dfa3..0000000 Binary files a/examples/convex/introductory/figures/PoC_Ex3_nDPICNN_Fig4.jpg and /dev/null differ diff --git a/examples/convex/introductory/figures/PoC_Ex3_nDPICNN_Fig4.png b/examples/convex/introductory/figures/PoC_Ex3_nDPICNN_Fig4.png new file mode 100644 index 0000000..be4e800 Binary files /dev/null and b/examples/convex/introductory/figures/PoC_Ex3_nDPICNN_Fig4.png differ diff --git a/examples/convex/introductory/figures/PoC_Ex3_nDPICNN_Fig5.jpg b/examples/convex/introductory/figures/PoC_Ex3_nDPICNN_Fig5.jpg deleted file mode 100644 index 77a2db2..0000000 Binary files a/examples/convex/introductory/figures/PoC_Ex3_nDPICNN_Fig5.jpg and /dev/null differ diff --git a/examples/convex/introductory/figures/PoC_Ex3_nDPICNN_Fig5.png b/examples/convex/introductory/figures/PoC_Ex3_nDPICNN_Fig5.png new file mode 100644 index 0000000..0e2bd78 Binary files /dev/null and b/examples/convex/introductory/figures/PoC_Ex3_nDPICNN_Fig5.png differ diff --git a/examples/convex/introductory/figures/PoC_Ex3_nDPICNN_Fig6.jpg b/examples/convex/introductory/figures/PoC_Ex3_nDPICNN_Fig6.jpg deleted file mode 100644 index 4f6ccff..0000000 Binary files a/examples/convex/introductory/figures/PoC_Ex3_nDPICNN_Fig6.jpg and /dev/null differ diff --git a/examples/convex/introductory/figures/PoC_Ex3_nDPICNN_Fig6.png b/examples/convex/introductory/figures/PoC_Ex3_nDPICNN_Fig6.png new file mode 100644 index 0000000..aed903e Binary files /dev/null and b/examples/convex/introductory/figures/PoC_Ex3_nDPICNN_Fig6.png differ diff --git a/examples/convex/neuralODE/TrainConvexNeuralODENetworkWithEulerODESolverExample.md b/examples/convex/neuralODE/TrainConvexNeuralODENetworkWithEulerODESolverExample.md index dc3bde7..1106c69 100644 --- a/examples/convex/neuralODE/TrainConvexNeuralODENetworkWithEulerODESolverExample.md +++ b/examples/convex/neuralODE/TrainConvexNeuralODENetworkWithEulerODESolverExample.md @@ -23,22 +23,23 @@ where $A$ is a 2\-by\-2 matrix. The neural network of this example takes as input an initial condition and computes the ODE solution through the learned neural ODE model.
-

- -

+

+ +

The neural ODE operation, given an initial condition, outputs the solution of an ODE model. In this example, specify a fully input convex neural network (FICNN) block of '2\-layer' depth, i.e., with a fully connected layer, a softplus layer, a second fully connected layer that is then combined with the input via a residual fully connected operation, as the ODE model.
-

- -

+

+ +

In this example, the ODE that defines the model is solved numerically using the Euler method. Unlike the higher order Runge\-Kutta (4,5) pair of Dormand and Prince \[2\], Euler method for solving ODEs is a first order, linear procedure and so preserves convexity. That is, an Euler update procedure with a convex network governing the dynamics of the physical system preserves overall convexity of the input, $y(t)$ , with respect to the output, $y(t+1)$ , i.e., $y(t+1)=g(y(t))$ where $g:{\mathbb{R}}^2 \to {\mathbb{R}}^2$ is fully input convex in each output. -In this example, you use forwardEuler, an implementation of a forward Euler method for dlarray that behaves similar to dlode45. For more information, see [forwardEuler](./forwardEuler.md). + +In this example, you use forwardEuler, an implementation of a forward Euler method for dlarray that behaves similar to dlode45. For more information, see [forwardEuler](./forwardEuler.m). # Synthesize Data of Target Dynamics @@ -69,9 +70,9 @@ grid on ```
-

- -

+

+ +

# Define and Initialize Model Parameters @@ -84,7 +85,7 @@ dt = t(2); timesteps = (0:neuralOdeTimesteps)*dt; ``` -Construct a 2\-dimensional FICNN using fully connected layers and softplus activation functions. For more information on the architectural construction, see [AI Verification: Convex](../../../documentation/AI-Verification-Convexity.md), or, for a proof\-of\-concept example, see [PoC_Ex2_nDFICNN](../introductory/PoC_Ex2_nDFICNN.md). The first fully connected operation takes as input a vector of size stateSize and increases its length to hiddenSize. Conversely, the subsequent fully connected operation takes as input a vector of length hiddenSize and decreases its length to stateSize. The residual connection applies a fully connected operation that takes stateSize to stateSize. +Construct a 2\-dimensional FICNN using fully connected layers and softplus activation functions. For more information on the architectural construction, see [AI Verification: Convex](../../../documentation/AI-Verification-Convexity.md), or, for a proof\-of\-concept example, see [PoC_Ex2_nDFICNN](../ProofOfConcept/PoC_Ex2_nDFICNN.md). The first fully connected operation takes as input a vector of size stateSize and increases its length to hiddenSize. Conversely, the subsequent fully connected operation takes as input a vector of length hiddenSize and decreases its length to stateSize. The residual connection applies a fully connected operation that takes stateSize to stateSize. ```matlab stateSize = size(xTrain,1); @@ -92,7 +93,7 @@ hiddenSize = 20; numHiddenUnits = [hiddenSize stateSize]; neuralOdeFICNN = buildConstrainedNetwork("fully-convex",stateSize,numHiddenUnits,... - PositiveNonDecreasingActivation="softplus") + ConvexNonDecreasingActivation="softplus") ``` ```matlabTextOutput @@ -116,9 +117,9 @@ plot(neuralOdeFICNN) ```
-

- -

+

+ +

## Define Model Function @@ -226,15 +227,15 @@ end ```
-

- -

+

+ +

-

- -

+

+ +

# Evaluate Model @@ -293,9 +294,9 @@ plotTrueAndPredictedSolutions(xTrue4, xPred4); ```
-

- -

+

+ +

# Formal Boundedness Guarantees @@ -392,9 +393,9 @@ end ```
-

- -

+

+ +

From the figure, you observe that the true trajectory (black line) is always within the red bounding box at any point in time. This guarantees the bounded behaviour of all trajectories for a given region of initial conditions, time step sizes and total time evolution. diff --git a/examples/convex/neuralODE/TrainConvexNeuralODENetworkWithEulerODESolverExample.mlx b/examples/convex/neuralODE/TrainConvexNeuralODENetworkWithEulerODESolverExample.mlx index b3f7f1b..45881ab 100644 Binary files a/examples/convex/neuralODE/TrainConvexNeuralODENetworkWithEulerODESolverExample.mlx and b/examples/convex/neuralODE/TrainConvexNeuralODENetworkWithEulerODESolverExample.mlx differ diff --git a/examples/convex/neuralODE/figures/neuralODE_Fig2.jpg b/examples/convex/neuralODE/figures/neuralODE_Fig2.jpg deleted file mode 100644 index b7e1cef..0000000 Binary files a/examples/convex/neuralODE/figures/neuralODE_Fig2.jpg and /dev/null differ diff --git a/examples/convex/neuralODE/figures/neuralODE_Fig2.png b/examples/convex/neuralODE/figures/neuralODE_Fig2.png new file mode 100644 index 0000000..a17d8e3 Binary files /dev/null and b/examples/convex/neuralODE/figures/neuralODE_Fig2.png differ diff --git a/examples/convex/neuralODE/figures/neuralODE_Fig3.jpg b/examples/convex/neuralODE/figures/neuralODE_Fig3.jpg deleted file mode 100644 index 3cb89e6..0000000 Binary files a/examples/convex/neuralODE/figures/neuralODE_Fig3.jpg and /dev/null differ diff --git a/examples/convex/neuralODE/figures/neuralODE_Fig3.png b/examples/convex/neuralODE/figures/neuralODE_Fig3.png new file mode 100644 index 0000000..54b231a Binary files /dev/null and b/examples/convex/neuralODE/figures/neuralODE_Fig3.png differ diff --git a/examples/convex/neuralODE/figures/neuralODE_Fig4.jpg b/examples/convex/neuralODE/figures/neuralODE_Fig4.jpg deleted file mode 100644 index 4b81673..0000000 Binary files a/examples/convex/neuralODE/figures/neuralODE_Fig4.jpg and /dev/null differ diff --git a/examples/convex/neuralODE/figures/neuralODE_Fig4.png b/examples/convex/neuralODE/figures/neuralODE_Fig4.png new file mode 100644 index 0000000..0b31e3c Binary files /dev/null and b/examples/convex/neuralODE/figures/neuralODE_Fig4.png differ diff --git a/examples/convex/neuralODE/figures/neuralODE_Fig5.jpg b/examples/convex/neuralODE/figures/neuralODE_Fig5.jpg deleted file mode 100644 index cc11a8c..0000000 Binary files a/examples/convex/neuralODE/figures/neuralODE_Fig5.jpg and /dev/null differ diff --git a/examples/convex/neuralODE/figures/neuralODE_Fig5.png b/examples/convex/neuralODE/figures/neuralODE_Fig5.png new file mode 100644 index 0000000..a78e561 Binary files /dev/null and b/examples/convex/neuralODE/figures/neuralODE_Fig5.png differ diff --git a/examples/convex/neuralODE/figures/neuralODE_Fig6.jpg b/examples/convex/neuralODE/figures/neuralODE_Fig6.jpg deleted file mode 100644 index d7d6962..0000000 Binary files a/examples/convex/neuralODE/figures/neuralODE_Fig6.jpg and /dev/null differ diff --git a/examples/convex/neuralODE/figures/neuralODE_Fig6.png b/examples/convex/neuralODE/figures/neuralODE_Fig6.png new file mode 100644 index 0000000..9a606b0 Binary files /dev/null and b/examples/convex/neuralODE/figures/neuralODE_Fig6.png differ diff --git a/examples/convex/neuralODE/figures/neuralODE_Fig7.jpg b/examples/convex/neuralODE/figures/neuralODE_Fig7.jpg deleted file mode 100644 index e05cf40..0000000 Binary files a/examples/convex/neuralODE/figures/neuralODE_Fig7.jpg and /dev/null differ diff --git a/examples/convex/neuralODE/figures/neuralODE_Fig7.png b/examples/convex/neuralODE/figures/neuralODE_Fig7.png new file mode 100644 index 0000000..e4b95c0 Binary files /dev/null and b/examples/convex/neuralODE/figures/neuralODE_Fig7.png differ diff --git a/examples/convex/neuralODE/figures/neuralODE_Fig8.jpg b/examples/convex/neuralODE/figures/neuralODE_Fig8.jpg deleted file mode 100644 index 750b263..0000000 Binary files a/examples/convex/neuralODE/figures/neuralODE_Fig8.jpg and /dev/null differ diff --git a/examples/convex/neuralODE/figures/neuralODE_Fig8.png b/examples/convex/neuralODE/figures/neuralODE_Fig8.png new file mode 100644 index 0000000..43a1f45 Binary files /dev/null and b/examples/convex/neuralODE/figures/neuralODE_Fig8.png differ diff --git a/tests/system/tFullyInputConvexNetwork.m b/tests/system/tFullyInputConvexNetwork.m index e74d543..953c096 100644 --- a/tests/system/tFullyInputConvexNetwork.m +++ b/tests/system/tFullyInputConvexNetwork.m @@ -28,7 +28,7 @@ function verifyNetworkOutputIsFullyConvex(testCase, PndActivationFunctionSet, Ta inputSize = 1; numHiddenUnits = [16 8 4 1]; ficnn = buildConstrainedNetwork("fully-convex",inputSize,numHiddenUnits, ... - PositiveNonDecreasingActivation=PndActivationFunctionSet); + ConvexNonDecreasingActivation=PndActivationFunctionSet); % Train fully convex network. Use just 1 epoch. maxEpochs = 1; diff --git a/tests/system/tPartiallyInputConvexNetwork.m b/tests/system/tPartiallyInputConvexNetwork.m index a10615a..47251ba 100644 --- a/tests/system/tPartiallyInputConvexNetwork.m +++ b/tests/system/tPartiallyInputConvexNetwork.m @@ -44,7 +44,7 @@ function verifyNetworkIsPartiallyConvex(testCase, PndActivationFunctionSet, Acti inputSize = 2; numHiddenUnits = [32 8 1]; picnn = buildConstrainedNetwork("partially-convex",inputSize,numHiddenUnits,... - PositiveNonDecreasingActivation=PndActivationFunctionSet,... + ConvexNonDecreasingActivation=PndActivationFunctionSet,... Activation=ActivationFunctionSet,... ConvexChannelIdx=2); diff --git a/tests/unit/conslearn/convex/tbuildFICNN.m b/tests/unit/conslearn/convex/tbuildFICNN.m index 14dcd8b..ac96bf9 100644 --- a/tests/unit/conslearn/convex/tbuildFICNN.m +++ b/tests/unit/conslearn/convex/tbuildFICNN.m @@ -103,7 +103,7 @@ function verifyActivationLayersAreCorrect(testCase, FullyConnectedLayerSizesSet, % Build convex neural network net = conslearn.convex.buildFICNN([28, 28, 1], FullyConnectedLayerSizesSet, ... - PositiveNonDecreasingActivation = ActivationFunctionSet.Input); + ConvexNonDecreasingActivation = ActivationFunctionSet.Input); % Get indices for activation layers pndLayerIdx = find(contains({net.Layers.Name}, "pnd")); diff --git a/tests/unit/conslearn/convex/tbuildPICNN.m b/tests/unit/conslearn/convex/tbuildPICNN.m index b8e399c..f8ed11f 100644 --- a/tests/unit/conslearn/convex/tbuildPICNN.m +++ b/tests/unit/conslearn/convex/tbuildPICNN.m @@ -71,7 +71,7 @@ function verifyPndActivationLayersAreCorrect(testCase, FullyConnectedLayerSizesS % Build network net = conslearn.convex.buildPICNN(inputSize, numHiddenUnits, ... - PositiveNonDecreasingActivation=PndActivationFunctionSet.Input); + ConvexNonDecreasingActivation=PndActivationFunctionSet.Input); % Get indices for activation layers pndLayerIdx = iFindLayerIdxWithName(net, "pnd"); diff --git a/tests/unit/conslearn/monotonic/tmakeParametersMonotonic.m b/tests/unit/conslearn/monotonic/tmakeParametersMonotonic.m index ee975ec..a3a6b87 100644 --- a/tests/unit/conslearn/monotonic/tmakeParametersMonotonic.m +++ b/tests/unit/conslearn/monotonic/tmakeParametersMonotonic.m @@ -10,7 +10,7 @@ function verifyCorrectResultForSimpleCase(testCase, ValidInputs) out = conslearn.monotonic.makeParametersMonotonic( ... ValidInputs.W, ValidInputs.lambda, ValidInputs.pNorm); - testCase.verifyEqual(extractdata(out), extractdata(ValidInputs.ExpectedOutput), AbsTol=1e-12); + testCase.verifyEqual(out, ValidInputs.ExpectedOutput, AbsTol=1e-12); end end diff --git a/tests/unit/tbuildConstrainedNetwork.m b/tests/unit/tbuildConstrainedNetwork.m index d5e6c00..f5cea01 100644 --- a/tests/unit/tbuildConstrainedNetwork.m +++ b/tests/unit/tbuildConstrainedNetwork.m @@ -47,7 +47,7 @@ function canBuildFullyConvexNetworkWithOptionalInputs(testCase, ValidInputSizeSe pndActivation = ValidPndActivationSet; fcn = @() buildConstrainedNetwork(constraint, inputSize, numHiddenUnits, ... - "PositiveNonDecreasingActivation", pndActivation); + "ConvexNonDecreasingActivation", pndActivation); net = testCase.verifyWarningFree(fcn); @@ -63,7 +63,7 @@ function canBuildPartiallyConvexNetworkWithOptionalInputs(testCase, ValidInputSi convexChannelIdx = ValidConvexChannelIdxSet; fcn = @() buildConstrainedNetwork(constraint, inputSize, numHiddenUnits, ... - "PositiveNonDecreasingActivation", pndActivation, ... + "ConvexNonDecreasingActivation", pndActivation, ... "Activation", activation, ... "ConvexChannelIdx", convexChannelIdx); @@ -355,8 +355,8 @@ function errorsForLipschitzConstrainedAndInvalidNameValuePairs(testCase, Invalid Name = "UpperBoundLipschitzConstant", ... Value = 1); -param.PositiveNonDecreasingActivation = struct( ... - "Name", "PositiveNonDecreasingActivation", ... +param.ConvexNonDecreasingActivation = struct( ... + "Name", "ConvexNonDecreasingActivation", ... "Value", "relu"); param.ConvexChannelIdx = struct( ... @@ -373,8 +373,8 @@ function errorsForLipschitzConstrainedAndInvalidNameValuePairs(testCase, Invalid Name = "UpperBoundLipschitzConstant", ... Value = 1); -param.PositiveNonDecreasingActivation = struct( ... - "Name", "PositiveNonDecreasingActivation", ... +param.ConvexNonDecreasingActivation = struct( ... + "Name", "ConvexNonDecreasingActivation", ... "Value", "relu"); param.ConvexChannelIdx = struct( ... @@ -387,8 +387,8 @@ function errorsForLipschitzConstrainedAndInvalidNameValuePairs(testCase, Invalid Name = "ResidualScaling", ... Value = 1); -param.PositiveNonDecreasingActivation = struct( ... - "Name", "PositiveNonDecreasingActivation", ... +param.ConvexNonDecreasingActivation = struct( ... + "Name", "ConvexNonDecreasingActivation", ... "Value", "relu"); param.ConvexChannelIdx = struct( ...