-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Relax FieldHandler #625
Conversation
Codecov ReportPatch coverage:
📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more Additional details and impacted files@@ Coverage Diff @@
## master #625 +/- ##
==========================================
+ Coverage 92.39% 92.48% +0.08%
==========================================
Files 29 29
Lines 4446 4444 -2
==========================================
+ Hits 4108 4110 +2
+ Misses 338 334 -4
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report in Codecov by Sentry. |
src/Dofs/ConstraintHandler.jl
Outdated
@@ -306,7 +306,7 @@ function add_prescribed_dof!(ch::ConstraintHandler, constrained_dof::Int, inhomo | |||
return ch | |||
end | |||
|
|||
function _add!(ch::ConstraintHandler, dbc::Dirichlet, bcfaces::Set{Index}, interpolation::Interpolation, field_dim::Int, offset::Int, bcvalue::BCValues, cellset::Set{Int}=Set{Int}(1:getncells(ch.dh.grid))) where {Index<:BoundaryIndex} | |||
function _add!(ch::ConstraintHandler, dbc::Dirichlet, bcfaces::Set{Index}, interpolation::Interpolation, field_dim::Int, offset::Int, bcvalue::BCValues, cellset=1:getncells(ch.dh.grid)) where {Index<:BoundaryIndex} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here it might be nice to keep since it is used on L319 to check cellidx ∉ cellset
. Still fast with a UnitRange
, but if you pass a vector it will be slow(er).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some rough benchmarks seem to suggest that it is benefitial to create the set if length(needles) > length(haystack)
, i.e. if we the number of lookups we will do is larger than the size of the collection. Still pretty fast though so perhaps just leave it as a vector if one passes that. In most cases it will already be a set, since it will come from a grid-cellset, which are sets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also thought about this, but decided to go for creating the Set always. We gain significant performance benefits if the Vector is large, while we definitely get some overhead due to the Set construction. However, for small vectors (where it may be enough to not convert it to a Set upfront), the overhead is rather small, so searching for a good heuristic might not be really worth here. I would still go for the conversion to Set, as the speedup can be really significant for large problems (O(N) vs O(log(N)).
src/Dofs/MixedDofHandler.jl
Outdated
# cellnumbers = issorted(fh.cellset) ? sort(collect(fh.cellset)) : fh.cellset | ||
nextdof = _close!( | ||
dh, | ||
cellnumbers, | ||
fh.cellset, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fredrikekre I am not sure what we should do here. I can add another dispatch to _close!
which just creates a sorted vector if the input cellset is of type Set{Int} to preserve the exact same dof assignment across all cases. However, the closing algorithm does not require a vector to work, which is why I decided to allow different dof assignments, depending on the specific dof handler at hand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you think that the sorted order is really that essential we can also recommending sorted set or ordered set as data structures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to have it be more deterministic perhaps. Maybe we should make cellsets in the grid vectors, or keep both a vector version and a set version of the data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mhh. Maybe we can add the set type as a type parameter to the grid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But then all sets need to have the same type. I am thinking we could either maintain both datastructures, but lazily initialize them as needed. Alternatively, use sorted vectors, and make sets when needed. Probably we can get away with doing it only in a few places such as in close!
etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes sense to have the slow path for Set and the optimized paths for UnitRanges, which I will document as the recommended way to setup the subdomains, if we agree on this. UnitRanges keep the caches hot, are small and all operations are fast. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can use OrderedSet
(after JuliaCollections/OrderedCollections.jl#100) and make sure to sort!
in addcellset!
for example.
…ribution across different data types for the cell numbering.
Following the guideline in #629 here the first set of benchmarks.
Reproducerusing Ferrite
using BenchmarkTools
grid = generate_grid(Quadrilateral, (1000, 1000));
∂Ω = union(
getfaceset(grid, "left"),
getfaceset(grid, "right"),
getfaceset(grid, "top"),
getfaceset(grid, "bottom"),
);
dbc = Dirichlet(:v, ∂Ω, (x, t) -> [0, 0]);
ip_v = Lagrange{2,RefCube,2}();
ip_s = Lagrange{2,RefCube,1}();
@btime MixedDofHandler($grid);
@btime add!(dh, $(:v), $2, $ip_v) setup=(dh=MixedDofHandler($grid)) evals=1;
function setup_dh(grid, T)
dh = T(grid)
add!(dh, :v, 2, Lagrange{2,RefCube,2}())
add!(dh, :s, 1, Lagrange{2,RefCube,1}())
return dh
end
@btime close!(dh) setup=(dh=setup_dh($grid, $MixedDofHandler)) evals=1;
function setup_dhclosed(grid, T)
dh = T(grid)
add!(dh, :v, 2, Lagrange{2,RefCube,2}())
add!(dh, :s, 1, Lagrange{2,RefCube,1}())
close!(dh)
return dh
end
@btime ConstraintHandler(dh) setup=(dh=setup_dhclosed($grid, $MixedDofHandler)) evals=1;
function setup_ch(grid, T)
dh = setup_dhclosed(grid, T)
return ConstraintHandler(dh)
end
@btime add!(ch, $dbc) setup=(ch = setup_ch($grid, $MixedDofHandler)) evals=1;
function setup_ch2(grid, T, dbc)
dh = setup_dhclosed(grid, T)
ch = ConstraintHandler(dh)
add!(ch, dbc)
return ch
end
@btime close!(ch) setup=(ch = setup_ch2($grid, $MixedDofHandler, $dbc)) evals=1;
|
This changes `FieldHandler.cellset` to be a sorted `OrderedSet` instead of a `Set`. This ensures that loops over sub-domains are done in ascending cell order. Since e.g. cells, node coordinates, and dofs are stored in ascending cell order this gives a significant performance boost to loops over sub-domains, i.e. assembly-style loops. In particular, this removes the performance gap between `MixedDofHandler` and `DofHandler` in the `create_sparsity_pattern` benchmark in #629. This is a minimal/initial step towards #625 that can be done before the DofHandler merge and rework of FieldHandler/SubDofHandler.
This changes `FieldHandler.cellset` to be a sorted `OrderedSet` instead of a `Set`. This ensures that loops over sub-domains are done in ascending cell order. Since e.g. cells, node coordinates, and dofs are stored in ascending cell order this gives a significant performance boost to loops over sub-domains, i.e. assembly-style loops. In particular, this removes the performance gap between `MixedDofHandler` and `DofHandler` in the `create_sparsity_pattern` benchmark in #629. This is a minimal/initial step towards #625 that can be done before the `DofHandler` merge and rework of `FieldHandler`/`SubDofHandler`.
This changes `FieldHandler.cellset` to be a sorted `OrderedSet` instead of a `Set`. This ensures that loops over sub-domains are done in ascending cell order. Since e.g. cells, node coordinates, and dofs are stored in ascending cell order this gives a significant performance boost to loops over sub-domains, i.e. assembly-style loops. In particular, this removes the performance gap between `MixedDofHandler` and `DofHandler` in the `create_sparsity_pattern` benchmark in #629. This is a minimal/initial step towards #625 that can be done before the `DofHandler` merge and rework of `FieldHandler`/`SubDofHandler`.
This changes `FieldHandler.cellset` to be a `BitSet` (which is sorted) instead of a `Set`. This ensures that loops over sub-domains are done in ascending cell order. Since e.g. cells, node coordinates and dofs are stored in ascending cell order this gives a significant performance boost to loops over sub-domains, i.e. assembly-style loops. In particular, this removes the performance gap between `MixedDofHandler` and `DofHandler` in the `create_sparsity_pattern` benchmark in #629. This is a minimal/initial step towards #625 that can be done before the `DofHandler` merge and rework of `FieldHandler`/`SubDofHandler`.
This patch creates a `BitSet` of `FieldHandler.cellset` in loops, in particular in `close!(::DofHandler)` and `create_sparsity_pattern(::DofHandler)`. Since `BitSet`s are sorted this ensures that these loops are done in ascending cell order, which gives a performance boost due to better memory locality. This is an even smaller change than #654 (and #625) which should be completely non-breaking since the type of `FieldHandler.cellset` is not changed. Larger refactoring, such as using `BitSet` or `OrderedSet` will be done in the `FieldHandler/`SubDofHandler` rework.
This patch creates a `BitSet` of `FieldHandler.cellset` in loops, in particular in `close!(::DofHandler)` and `create_sparsity_pattern(::DofHandler)`. Since `BitSet`s are sorted this ensures that these loops are done in ascending cell order, which gives a performance boost due to better memory locality. This is an even smaller change than #654 (and #625) which should be completely non-breaking since the type of `FieldHandler.cellset` is not changed. Larger refactoring, such as using `BitSet` or `OrderedSet` will be done in the `FieldHandler/`SubDofHandler` rework.
I will redo this on master if there is still interest. |
Okay we should really address this using Ferrite, SparseArrays, BenchmarkTools
grid = generate_grid(Quadrilateral, (1000, 1000));
ip = Lagrange{RefQuadrilateral, 1}()
qr = QuadratureRule{RefQuadrilateral}(2)
cellvalues = CellValues(qr, ip);
dh = DofHandler(grid)
add!(dh, :u, ip)
close!(dh);
K = create_sparsity_pattern(dh)
ch = ConstraintHandler(dh);
∂Ω = union(
getfaceset(grid, "left"),
getfaceset(grid, "right"),
getfaceset(grid, "top"),
getfaceset(grid, "bottom"),
);
dbc = Dirichlet(:u, ∂Ω, (x, t) -> 0)
add!(ch, dbc);
close!(ch)
function assemble_element!(Ke::Matrix, fe::Vector, cellvalues::CellValues)
n_basefuncs = getnbasefunctions(cellvalues)
## Reset to 0
fill!(Ke, 0)
fill!(fe, 0)
## Loop over quadrature points
for q_point in 1:getnquadpoints(cellvalues)
## Get the quadrature weight
dΩ = getdetJdV(cellvalues, q_point)
## Loop over test shape functions
for i in 1:n_basefuncs
δu = shape_value(cellvalues, q_point, i)
∇δu = shape_gradient(cellvalues, q_point, i)
## Add contribution to fe
fe[i] += δu * dΩ
## Loop over trial shape functions
for j in 1:n_basefuncs
∇u = shape_gradient(cellvalues, q_point, j)
## Add contribution to Ke
Ke[i, j] += (∇δu ⋅ ∇u) * dΩ
end
end
end
return Ke, fe
end
function assemble_global(cellvalues::CellValues, K::SparseMatrixCSC, dh::DofHandler)
## Allocate the element stiffness matrix and element force vector
n_basefuncs = getnbasefunctions(cellvalues)
Ke = zeros(n_basefuncs, n_basefuncs)
fe = zeros(n_basefuncs)
## Allocate global force vector f
f = zeros(ndofs(dh))
## Create an assembler
assembler = start_assemble(K, f)
## Loop over all cels
for cell in CellIterator(dh)
## Reinitialize cellvalues for this cell
reinit!(cellvalues, cell)
## Compute element contribution
assemble_element!(Ke, fe, cellvalues)
## Assemble Ke and fe into K and f
assemble!(assembler, celldofs(cell), Ke, fe)
end
return nothing
end
function assemble_global2(cellvalues::CellValues, K::SparseMatrixCSC, dh::DofHandler)
## Allocate the element stiffness matrix and element force vector
n_basefuncs = getnbasefunctions(cellvalues)
Ke = zeros(n_basefuncs, n_basefuncs)
fe = zeros(n_basefuncs)
## Allocate global force vector f
f = zeros(ndofs(dh))
## Create an assembler
assembler = start_assemble(K, f)
## Loop over all cels
for cell in CellIterator(dh.subdofhandlers[1])
## Reinitialize cellvalues for this cell
reinit!(cellvalues, cell)
## Compute element contribution
assemble_element!(Ke, fe, cellvalues)
## Assemble Ke and fe into K and f
assemble!(assembler, celldofs(cell), Ke, fe)
end
return nothing
end
@btime assemble_global(cellvalues, K, dh);
@btime assemble_global2(cellvalues, K, dh); 528.541 ms (12 allocations: 7.65 MiB) |
This change should eliminate the internal collect calls. If a user strongly depends on the ordering while only having a
cellset::Set{Int}
, then the collect should be done manually during construction once (instead of several times internally).I also noticed that the FieldHandler was marked mutable. Is this legacy or is there some deeper reason downstream in some framework utilizing Ferrite as backend?
TODOs