diff --git a/VERSION b/VERSION index 21222ce..2c3fc41 100644 --- a/VERSION +++ b/VERSION @@ -1 +1 @@ -v2.5.0 +v2.5.1 diff --git a/headersources/versionheader.h b/headersources/versionheader.h index 9ae5f6b..59231e5 100644 --- a/headersources/versionheader.h +++ b/headersources/versionheader.h @@ -1,3 +1,3 @@ // define rarray version (i.e. latest git tag) -#define RA_VERSION "v2.5.0" -#define RA_VERSION_NUMBER 2005000 +#define RA_VERSION "v2.5.1" +#define RA_VERSION_NUMBER 2005001 diff --git a/rarray b/rarray index 464b3e6..466bced 100644 --- a/rarray +++ b/rarray @@ -30,8 +30,8 @@ #if __cplusplus >= 201103L //begin #include "versionheader.h" -#define RA_VERSION "v2.5.0" -#define RA_VERSION_NUMBER 2005000 +#define RA_VERSION "v2.5.1" +#define RA_VERSION_NUMBER 2005001 //end #include "versionheader.h" diff --git a/rarraydoc.pdf b/rarraydoc.pdf index 1c7f5eb..ffd8ebb 100644 Binary files a/rarraydoc.pdf and b/rarraydoc.pdf differ diff --git a/rarraydoc.tex b/rarraydoc.tex index a6f2836..ed1441b 100644 --- a/rarraydoc.tex +++ b/rarraydoc.tex @@ -20,38 +20,33 @@ \setlength{\parskip}{1mm} -\title{\texttt{rarray}: Multidimensional Runtime Arrays for \cxx} +\title{\texttt{rarray}: Reference-Counted Multidimensional Arrays for \cxx} \author{Ramses van Zon%\\ %\it\small SciNet High Performance Computing Consortium, University %of Toronto, Toronto, Ontario, Canada \vspace{-8pt}} -\date{February, 2023 (version 2.5.0)\vspace{-7mm}} +\date{May, 2023 (version 2.5.1)\vspace{-7mm}} \maketitle \section{For the impatient: the what, why and how of rarray} -\noindent\textbf{What:} +\noindent\textbf{What:}\\ +Reference-counted and non-owning multidimensional arrays with runtime dimensions. -Rarray provides multidimensional arrays with dimensions determined at runtime. +\noindent\textbf{What not:}\\ +No strides, no linear algebra, overloaded operators etc. -\noindent\textbf{What not:} - -No linear algebra, overloaded operators etc. - -\noindent\textbf{Why:} - -Usually faster than alternatives, - -Uses the same accessors as compile-time (automatic) arrays, - -Data is guarranteed to be contiguous for easy interfacing with -libraries (CBLAS, LAPACKE). - -\noindent\textbf{How:} +\noindent\textbf{Why:}\\ +Usually faster than alternatives.\\ +Uses the same accessors as automatic arrays.\\ +Requires only the C++-11 standard.\\ +Data is contiguous to allow interfacing with +libraries like BLAS, LAPACK, FFTW, etc. +\noindent\textbf{How:}\\ The header file \texttt{rarray} provides the type \texttt{rarray}, where \texttt{T} is any type and {\tt R} is the rank. Element access uses repeated square brackets. Copying rarrays or passing them to functions mean shallow copies, unless explicitly asking for a deep copy. Streaming I/O is also part of the \texttt{rarray} header. \ @@ -89,10 +84,10 @@ \section{For the impatient: the what, why and how of rarray} \rule{0pt}{14pt}A rarray copy of an existing automatic array:& \texttt{rarray h=RARRAY(f).copy();} \\ -\rule{0pt}{14pt}Output a rarray to screen:& +\rule{0pt}{14pt}Output a rarray to console:& \texttt{std::cout << h << std::endl;} \\ -\rule{0pt}{14pt}Read a rarray from keyboard:& +\rule{0pt}{14pt}Read a rarray from console:& \texttt{std::cin >> h;} \\\hline \end{tabular}} @@ -107,31 +102,53 @@ \section{Introduction} While C and thus C++ has some support for multidimensional arrays whose sizes are known at compile time, the support for arrays with sizes that are known only at runtime, is limited. For one-dimensional -arrays, C++ has a reasonable allocation construction in the operators -\texttt{new} and \texttt{delete}. A standard way to allocate a +arrays, C++ has a reasonable allocation and deallocation constructs in the operators +\texttt{new} and \texttt{delete} in the standard. A standard way to allocate a one-dimensional array is as follows: \vspace{-5pt}\begin{framed}\vspace{-14pt}% \begin{verbatim} -float* a; int n = 1000; +float* a; a = new float[n]; a[40] = 2.4; delete[] a; \end{verbatim}% -\vspace{-12pt}\end{framed}\vspace{-5pt}% -It is important to note that this code also works if \texttt{n} was not known yet, e.g., if it was passed as a function argument or read in as input. - -In the above code snippet, the new/delete construct assigns the address of the array to a pointer. This pointer does not remember its size, so this is not really an 'array'. The standard C++ library does provide a one-dimensional array that remembers it size in the form of the \texttt{std::vector}, e.g. +\vspace{-12pt}\end{framed}\vspace{-5pt}\noindent% +It is important to note that this code also works if \texttt{n} was +not known yet at compile time, e.g., if it was passed as a function +argument or read in as input. + +This style of allocation with a +``raw'' pointer is discouraged in C++ in favor of using ``smart'' +pointers, which is possible since the C++17 standard: \vspace{-5pt}\begin{framed}\vspace{-14pt}% \begin{verbatim} -const int n = 1000; +int n = 1000; +std::unique_ptr a(new float[n]); +a[40] = 2.4; +// a gets deallocated automatically, or one can explicitly call a.reset(nullptr) +\end{verbatim}% +\vspace{-12pt}\end{framed}\vspace{-5pt}\noindent% +A unique pointer \texttt{a} cannot be copied. Instead of +\texttt{unique\_ptr} one can use \texttt{shared\_ptr}, which can be +copied and keeps a reference counter to know when to deallocate the +memory. Automatic deallocation happens when \texttt{a} goes out of scope. + +In the above code snippets, the \texttt{new} construct and the +\texttt{std::unique\_ptr/std::shared\_ptr} assign the address of the +array to a pointer. These pointers do not remember its size, so they are not really an 'array'. The standard C++ library does provide a one-dimensional array that remembers its size, in the form of the \texttt{std::vector}, e.g. +\vspace{-5pt}\begin{framed}\vspace{-14pt}% +\begin{verbatim} +int n = 1000; std::vector a(n); a[40] = 2.4; -a.clear(); +// a gets automatically deallocated, or one can explicitly call a.clear() \end{verbatim}% \vspace{-12pt}\end{framed}\vspace{-5pt}% -Multi-dimensional runtime-allocated arrays are not supported by \cxx. +Multi-dimensional runtime-allocated arrays are currently not supported yet by +\cxx (there is a proposal for a non-owning multidimensional array in +the C++23 standard). The textbook \cxx\ solution for multidimensional arrays that are dynamically allocated during runtime, is as follows: \vspace{-5pt}\begin{framed}\vspace{-14pt}% @@ -145,17 +162,16 @@ \section{Introduction} } \end{verbatim}% \vspace{-12pt}\end{framed}\vspace{-5pt}% -Drawbacks of this solution are: +Apart from the fact this will soon be obsolute, drawbacks of this solution are: \begin{itemize} - \item the non-contiguous buffer for the elements, making it unusable + \item the elements are not stored contiguously in memory, making + this multi-dimensional array unusable for many numerical libraries, - \item having to keep track of array dimensions, - \item having the intermediate pointers be non-const, so the + \item one has to keep track of array dimensions, and pass them along + to functions, + \item the intermediate pointers are non-const, so the internal pointer structure can be changed - (conceptually, \texttt{a} ought to be of type \texttt{float*const*const*}, which would prevent this, but then one wouldn't be able to - assign to be intermediate pointers in the above code, and therefore wouldn't be able - to create the array except with some delicate const casts\footnote{Truth - be told, the rarray library does that, but only internally}). + whereas, conceptually, \texttt{a} ought to be of type \texttt{float*const*const*}. \end{itemize} At first, there seems to be no shortage of libraries to fill this lack of \cxx\ support for dynamic multi-dimensional arrays, such as @@ -165,13 +181,16 @@ \section{Introduction} \item Eigen; \item Armadillo; and \item Nested \texttt{vector}s from the Standard Template Library. +\item Kokkos's reference implementation of the C++23 mdspan template. \end{itemize} These typically do have some runtime overhead compared to the above textbook solution, or do not allow arbitrary ranks. In contrast, the purpose of the rarray library is to be a minimal interface for runtime multidimensional arrays of arbitrary rank with -\emph{minimal to no performance overhead} compared to the textbook solution. +\emph{minimal to no performance overhead} compared to the textbook +solution. For the above solutions, only the mdspan implementation in +Kokkos also has no overhead. \noindent{\bf Example:\vspace{-7pt}} \begin{framed}\vspace{-14pt}% @@ -193,8 +212,9 @@ \section{Introduction} \begin{enumerate}\itemsep1pt\parskip3pt \item To have dynamically allocated multidimensional arrays that -combine the convenience of automatic c++ arrays with that of the -typical textbook dynamically allocated pointer-to-pointer +combine the convenience of automatic c++ arrays while being compatible +with the +typical textbook-style dynamically allocated pointer-to-pointer structure. The compatibility requirement with pointer-to-pointer structures @@ -203,18 +223,25 @@ \section{Introduction} \item To be as fast as pointer-to-pointer structures. -\item To have rarrays know their sizes, so that can be passed to +\item To have rarrays know their sizes, so that they can be passed to functions as a single argument. -\item To enable interplay with libraries such as BLAS and LAPACK: this +\item To enable interfacing with libraries such as BLAS and LAPACK: this is achieved by guarranteeing contiguous elements in the multi-dimensional array, and a way to get this data out. -Relatedly, it should be allowed to use an existing buffer. - -The guarrantee of contiguity means strided arrays are not supported. + The guarrantee of contiguity means strided arrays are not supported. + +\item To avoid dangling references (by utilizing reference counting). + + +\item To allow rarrays to hold non-owning views that use an existing buffer, + without having to use a separate type. + +\item To avoid some of the cluttered sematics around \texttt{const} + correctness when converting to pointer-to-pointer structures when + interfacing with legacy code. -\item To avoid some of the cluttered sematics around \texttt{const} correctness when converting to pointer-to-pointer structers. \end{enumerate} \noindent{\bf Features of rarray:\vspace{-3pt}} @@ -277,16 +304,18 @@ \subsection{Defining a multidimensional rarray} or, using an external, pre-allocated buffer, as \begin{framed}\vspace{-18pt}% \begin{verbatim} - float* pre_alloc_data=new float[256*256*256]; + std::unique_ptr pre_alloc_data(new float[256*256*256]); rarray s(pre_alloc_data,256,256,256); s[1][2][3] = 105; // do whatever you need with s - delete[] pre_alloc_data; - s.clear(); + s.clear(); // optional explicit deallocation + pre_alloc_data.reset(nullptr); // optional explicit deallocation \end{verbatim}% \vspace{-14pt} \end{framed} -Without the \texttt{delete[]} statement in the latter example, there would be a memory leak. This reflects that rarray is in this case not responsible +Note that \texttt{s} will have dangling references (often leading to +``Segmentation faults'') if pre\_alloc\_data is deallocated while s is +not. This reflects that rarray is in this case not responsible for the content. The data pointer can also be retrieved using \texttt{s.data()}. The \texttt{s.clear()} statement ensures there are no dangling references to this data left in \texttt{s}. @@ -299,7 +328,7 @@ \subsection{Defining a multidimensional rarray} \subsection{Shorthand rarray types: rvector, rmatrix, rtensor} -When compiling in c++11 mode, there are short cut types for +For convenience, rarray defines shortcut types for one-dimensional, two dimensional and three dimensional arrays, called rvector, rmatrix and rtensor, respectively. The following equivalences hold: @@ -312,7 +341,7 @@ \subsection{Shorthand rarray types: rvector, rmatrix, rtensor} \vspace{-18pt}\end{framed}\noindent for any type \texttt{T}. -\subsection{Accessing the elements} +\subsection{Accessing elements of an rarray} The elements of rarray objects are accessed using the repeated square bracket notation as for automatic \cxx\ arrays. Thus, if \texttt{s} is a \texttt{rarray} of rank \texttt R, the elements are accessed using \texttt{R} times an index of the form \texttt{[n$_i$]}, i.e. \texttt{s[n$_0$][n$_1$]\dots[n$_{\texttt{R}-1}$]} @@ -351,7 +380,9 @@ \subsection{Copying and function arguments} copy for built-in types. For C-style arrays, however, only the pointer to the first element gets copied, not the whole array. The latter is called a shallow copy. Rarrays use shallow copies much like -pointers, but uses atomic reference counting to know when memory can be released (similar to the \texttt{std::shared\_ptr} of C++11). +pointers, but uses atomic reference counting to know when memory can +be released (similar to the \texttt{std::shared\_ptr} of C++11 and +\texttt{std::shared\_ptr} of C++14). What does this essentially mean? Well: \begin{enumerate} @@ -467,12 +498,26 @@ \subsection{Optional bounds checking} \section{Comparison with standard alternatives} -Compared to the textbook method (page 3) or the rarray method (page 4) -of declaring an array, the more-or-less equivalent automatic array version +Compared to the old textbook method of declaring an array (see above), or the rarray method: \vspace{-5pt}\begin{framed}\vspace{-14pt}% \begin{verbatim} - float arr[256][256][256]; +#include +int main() { + int n = 256; + rarray arr(n,n,n); + arr[1][2][3] = 105; +} +\end{verbatim} +\vspace{-14pt}\end{framed} +\noindent +the more-or-less equivalent automatic array version +\vspace{-5pt}\begin{framed}\vspace{-14pt}% +\begin{verbatim} +int main() { + int n = 256; + float arr[n][n][n]; arr[1][2][3] = 105; +} \end{verbatim} \vspace{-14pt}\end{framed} \noindent @@ -503,9 +548,30 @@ \section{Comparison with standard alternatives} \vspace{-14pt}\end{framed}\vspace{-8pt} \noindent which is complicated, is non-contiguous in memory, and likely -slower. +slower. -%\pagebreak[4] +C++23 will have a non-owning library, mdspan, which should work +roughly as follows: +\vspace{-5pt}\begin{framed}\vspace{-14pt}% +%TEST THIS +\begin{verbatim} + #include + #include + int main() { + int n = 256; // size per dimension + std::unique_ptr p (new float[n*n*n]); // or vector or a shared_ptr + using exts = std::extents; + std::mdspan (vector.data(), exts(n,n,n)); + v[1,2,3] = 105; // assign to element (for example) + } +\end{verbatim}% +\vspace{-14pt}\end{framed}\vspace{-8pt} +This example declares all types explicitly, but C++17 has a lot of +deduction capabilities, which would also allow this to be a bit more brief. + + +\pagebreak[4] \section{Class definition} \subsection{Interface} @@ -537,8 +603,9 @@ \subsection{Interface} T**... noconst_ptr_array() const; // converts to a T**... rarray& const_ref() const; // convert to const elements rarray& operator=(const rarray &a);// shallow assignment - operator T*const*... (); // enables element access for assignment - operator const T*const*... () const; // enables element access with [] + operator[](size_t i) const; // enables const element access + operator[](size_t i); // enables element access for assignment + rarray at(size_t i); // retrieve the ith 'row' with bounds checking }; \end{verbatim} \end{framed} @@ -749,20 +816,20 @@ \subsection{linspace} \begin{verbatim} #include int main() { - rvector r = linspace(-1.0, 1.0, 101); + rvector r = linspace(-1.0, 1.0, 101); ... } \end{verbatim} The first argument of linspace is allowed to be greater than the last, in which case, decreasing values are generated. The two arguments are -allowed to be equal as well, and generates a vector with all equal values. In that case, \texttt{end\_incl} can not +allowed to be equal as well, which generates a vector with all equal values. In that case, \texttt{end\_incl} can not be set to false. The case where the number of points is 1 and \texttt{end\_incl=false} is ill defined. Note that for integer types, using linspace without specifying their number (i.e. \texttt{linspace(n1,n2)}) gives the same values as are -generated by the range function without a stepsize and with the +generated by the xrange function without a stepsize and with the endvalue one higher (i.e., \texttt{xrange(n1,n2+1)}). @@ -816,13 +883,6 @@ \subsection{Profiling} calls of rarray that could simply be optimized away, and would pollute the sampling. -A hint regarding the current state of gprof and gcc (Mar 2016): the -newer gcc compilers encode symbols -differently than earlier versions, and gprof relies on the earlier -format. This can impede e.g. profiling by line number. Compiling (and -linking) with the \texttt{-gstabs} flags enables the earlier way of encoding -symbols in the application, and allows for gprof to function fully. - \subsection{Memory overhead using the rarray class} The memory overhead here comes from having to store the dimensions and a pointer-to-pointer structure. The latter account for most of the memory overhead. A rarray object of 100$\times$100$\times$100$\times$100 doubles on a 64-bit machine will have a memory overhead of a bit over 1\%. In general, the memory overhead as a percentage is roughly 100\% divided by the last dimension. Therefore, avoid rarrays with a small last dimension such as 100$\times$100$\times$100$\times$2. @@ -979,7 +1039,7 @@ \subsection{Conversions for function arguments} \vspace{-14pt} \end{framed}\vspace{-8pt} -rarray objects are also easy to pass to function that do not use \texttt{rarray}s. Because there are, by design, no automatic conversions of a +rarray objects are also easy to pass to functions that do not use \texttt{rarray}s. Because there are, by design, no automatic conversions of a rarray, this is done using methods. There are two main ways that such functions expect a multidimensional @@ -1169,7 +1229,7 @@ \section{Installation} respectively. Note that this will fail on recent MacOS versions, in which case, try \texttt{sudo make install PREFIX=/usr/local}. -To modify rarray, do not edit these file separately. Instead, you +To modify rarray, do not edit the rarray header file, as this is a generated file. Instead, you should edit the files in the \texttt{headersources} directory. You can use the included Makefile to assemble the rarray headers with \begin{verbatim}