Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Relax FieldHandler #625

Closed
wants to merge 5 commits into from
Closed

Conversation

termi-official
Copy link
Member

@termi-official termi-official commented Mar 21, 2023

This change should eliminate the internal collect calls. If a user strongly depends on the ordering while only having a cellset::Set{Int}, then the collect should be done manually during construction once (instead of several times internally).

I also noticed that the FieldHandler was marked mutable. Is this legacy or is there some deeper reason downstream in some framework utilizing Ferrite as backend?

TODOs

  • OrderedSet for grid cellset
  • Test coverage
  • Docs
  • Devdocs

@codecov-commenter
Copy link

codecov-commenter commented Mar 21, 2023

Codecov Report

Patch coverage: 57.14% and project coverage change: +0.08 🎉

Comparison is base (cc81e6c) 92.39% compared to head (309d385) 92.48%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #625      +/-   ##
==========================================
+ Coverage   92.39%   92.48%   +0.08%     
==========================================
  Files          29       29              
  Lines        4446     4444       -2     
==========================================
+ Hits         4108     4110       +2     
+ Misses        338      334       -4     
Impacted Files Coverage Δ
src/PointEval/PointEvalHandler.jl 92.69% <ø> (ø)
src/Dofs/MixedDofHandler.jl 82.22% <36.36%> (+2.30%) ⬆️
src/Dofs/ConstraintHandler.jl 95.85% <75.00%> (-0.20%) ⬇️
src/L2_projection.jl 100.00% <100.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@@ -306,7 +306,7 @@ function add_prescribed_dof!(ch::ConstraintHandler, constrained_dof::Int, inhomo
return ch
end

function _add!(ch::ConstraintHandler, dbc::Dirichlet, bcfaces::Set{Index}, interpolation::Interpolation, field_dim::Int, offset::Int, bcvalue::BCValues, cellset::Set{Int}=Set{Int}(1:getncells(ch.dh.grid))) where {Index<:BoundaryIndex}
function _add!(ch::ConstraintHandler, dbc::Dirichlet, bcfaces::Set{Index}, interpolation::Interpolation, field_dim::Int, offset::Int, bcvalue::BCValues, cellset=1:getncells(ch.dh.grid)) where {Index<:BoundaryIndex}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here it might be nice to keep since it is used on L319 to check cellidx ∉ cellset. Still fast with a UnitRange, but if you pass a vector it will be slow(er).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some rough benchmarks seem to suggest that it is benefitial to create the set if length(needles) > length(haystack), i.e. if we the number of lookups we will do is larger than the size of the collection. Still pretty fast though so perhaps just leave it as a vector if one passes that. In most cases it will already be a set, since it will come from a grid-cellset, which are sets.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also thought about this, but decided to go for creating the Set always. We gain significant performance benefits if the Vector is large, while we definitely get some overhead due to the Set construction. However, for small vectors (where it may be enough to not convert it to a Set upfront), the overhead is rather small, so searching for a good heuristic might not be really worth here. I would still go for the conversion to Set, as the speedup can be really significant for large problems (O(N) vs O(log(N)).

Comment on lines 243 to 246
# cellnumbers = issorted(fh.cellset) ? sort(collect(fh.cellset)) : fh.cellset
nextdof = _close!(
dh,
cellnumbers,
fh.cellset,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fredrikekre I am not sure what we should do here. I can add another dispatch to _close! which just creates a sorted vector if the input cellset is of type Set{Int} to preserve the exact same dof assignment across all cases. However, the closing algorithm does not require a vector to work, which is why I decided to allow different dof assignments, depending on the specific dof handler at hand.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you think that the sorted order is really that essential we can also recommending sorted set or ordered set as data structures.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to have it be more deterministic perhaps. Maybe we should make cellsets in the grid vectors, or keep both a vector version and a set version of the data.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mhh. Maybe we can add the set type as a type parameter to the grid.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But then all sets need to have the same type. I am thinking we could either maintain both datastructures, but lazily initialize them as needed. Alternatively, use sorted vectors, and make sets when needed. Probably we can get away with doing it only in a few places such as in close! etc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to have the slow path for Set and the optimized paths for UnitRanges, which I will document as the recommended way to setup the subdomains, if we agree on this. UnitRanges keep the caches hot, are small and all operations are fast. What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use OrderedSet (after JuliaCollections/OrderedCollections.jl#100) and make sure to sort! in addcellset! for example.

Dennis Ogiermann added 2 commits March 22, 2023 13:16
…ribution across different data types for the cell numbering.
@termi-official
Copy link
Member Author

termi-official commented Mar 22, 2023

Following the guideline in #629 here the first set of benchmarks.

Master PR
Construction
MixedDofHandler(grid) 1.671 ms (8 allocations: 15.26 MiB) 1.544 ms (8 allocations: 15.26 MiB)
add!(dh, name, dim[, ip]) 34.745 ms (11 allocations: 18.00 MiB) 1.610 μs (3 allocations: 192 bytes)
close!(dh) 2.318 s (297 allocations: 791.71 MiB) 2.200 s (294 allocations: 776.46 MiB)
Constraints
ConstraintHandler(dh) 9.930 μs (12 allocations: 992 bytes) 9.500 μs (12 allocations: 992 bytes)
add!(ch, dbc::Dirichlet) 3.885 ms (8121 allocations: 4.04 MiB) 40.786 ms (8132 allocations: 22.05 MiB) 3.437 ms (8127 allocations: 4.04 MiB)
close!(ch) 2.156 s (12071 allocations: 288.21 MiB) 1.894 s (12071 allocations: 288.21 MiB)

Reproducer

using Ferrite
using BenchmarkTools

grid = generate_grid(Quadrilateral, (1000, 1000));
∂Ω = union(
    getfaceset(grid, "left"),
    getfaceset(grid, "right"),
    getfaceset(grid, "top"),
    getfaceset(grid, "bottom"),
);
dbc = Dirichlet(:v, ∂Ω, (x, t) -> [0, 0]);
ip_v = Lagrange{2,RefCube,2}();
ip_s = Lagrange{2,RefCube,1}();

@btime MixedDofHandler($grid);
@btime add!(dh, $(:v), $2, $ip_v) setup=(dh=MixedDofHandler($grid)) evals=1;

function setup_dh(grid, T)
    dh = T(grid)
    add!(dh, :v, 2, Lagrange{2,RefCube,2}()) 
    add!(dh, :s, 1, Lagrange{2,RefCube,1}()) 
    return dh
end

@btime close!(dh) setup=(dh=setup_dh($grid, $MixedDofHandler)) evals=1;

function setup_dhclosed(grid, T)
    dh = T(grid)
    add!(dh, :v, 2, Lagrange{2,RefCube,2}()) 
    add!(dh, :s, 1, Lagrange{2,RefCube,1}())
    close!(dh)
    return dh
end
@btime ConstraintHandler(dh)  setup=(dh=setup_dhclosed($grid, $MixedDofHandler)) evals=1;

function setup_ch(grid, T)
    dh = setup_dhclosed(grid, T)
    return ConstraintHandler(dh)
end
@btime add!(ch, $dbc) setup=(ch = setup_ch($grid, $MixedDofHandler)) evals=1;

function setup_ch2(grid, T, dbc)
    dh = setup_dhclosed(grid, T)
    ch = ConstraintHandler(dh)
    add!(ch, dbc)
    return ch
end
@btime close!(ch) setup=(ch = setup_ch2($grid, $MixedDofHandler, $dbc)) evals=1;

@fredrikekre fredrikekre added this to the 0.4.0 milestone Mar 23, 2023
fredrikekre added a commit that referenced this pull request Mar 29, 2023
This changes `FieldHandler.cellset` to be a sorted `OrderedSet` instead
of a `Set`. This ensures that loops over sub-domains are done in
ascending cell order.

Since e.g. cells, node coordinates, and dofs are stored in ascending
cell order this gives a significant performance boost to loops over
sub-domains, i.e. assembly-style loops. In particular, this removes the
performance gap between `MixedDofHandler` and `DofHandler` in the
`create_sparsity_pattern` benchmark in #629.

This is a minimal/initial step towards #625 that can be done before the
DofHandler merge and rework of FieldHandler/SubDofHandler.
fredrikekre added a commit that referenced this pull request Mar 29, 2023
This changes `FieldHandler.cellset` to be a sorted `OrderedSet` instead
of a `Set`. This ensures that loops over sub-domains are done in
ascending cell order.

Since e.g. cells, node coordinates, and dofs are stored in ascending
cell order this gives a significant performance boost to loops over
sub-domains, i.e. assembly-style loops. In particular, this removes the
performance gap between `MixedDofHandler` and `DofHandler` in the
`create_sparsity_pattern` benchmark in #629.

This is a minimal/initial step towards #625 that can be done before the
`DofHandler` merge and rework of `FieldHandler`/`SubDofHandler`.
fredrikekre added a commit that referenced this pull request Mar 29, 2023
This changes `FieldHandler.cellset` to be a sorted `OrderedSet` instead
of a `Set`. This ensures that loops over sub-domains are done in
ascending cell order.

Since e.g. cells, node coordinates, and dofs are stored in ascending
cell order this gives a significant performance boost to loops over
sub-domains, i.e. assembly-style loops. In particular, this removes the
performance gap between `MixedDofHandler` and `DofHandler` in the
`create_sparsity_pattern` benchmark in #629.

This is a minimal/initial step towards #625 that can be done before the
`DofHandler` merge and rework of `FieldHandler`/`SubDofHandler`.
fredrikekre added a commit that referenced this pull request Mar 29, 2023
This changes `FieldHandler.cellset` to be a `BitSet` (which is sorted)
instead of a `Set`. This ensures that loops over sub-domains are done in
ascending cell order.

Since e.g. cells, node coordinates and dofs are stored in ascending cell
order this gives a significant performance boost to loops over
sub-domains, i.e. assembly-style loops. In particular, this removes the
performance gap between `MixedDofHandler` and `DofHandler` in the
`create_sparsity_pattern` benchmark in #629.

This is a minimal/initial step towards #625 that can be done before the
`DofHandler` merge and rework of `FieldHandler`/`SubDofHandler`.
fredrikekre added a commit that referenced this pull request Mar 30, 2023
This patch creates a `BitSet` of `FieldHandler.cellset` in loops, in
particular in `close!(::DofHandler)` and
`create_sparsity_pattern(::DofHandler)`. Since `BitSet`s are sorted this
ensures that these loops are done in ascending cell order, which gives a
performance boost due to better memory locality.

This is an even smaller change than #654 (and #625) which should be
completely non-breaking since the type of `FieldHandler.cellset` is not
changed. Larger refactoring, such as using `BitSet` or `OrderedSet` will
be done in the `FieldHandler/`SubDofHandler` rework.
fredrikekre added a commit that referenced this pull request Mar 30, 2023
This patch creates a `BitSet` of `FieldHandler.cellset` in loops, in
particular in `close!(::DofHandler)` and
`create_sparsity_pattern(::DofHandler)`. Since `BitSet`s are sorted this
ensures that these loops are done in ascending cell order, which gives a
performance boost due to better memory locality.

This is an even smaller change than #654 (and #625) which should be
completely non-breaking since the type of `FieldHandler.cellset` is not
changed. Larger refactoring, such as using `BitSet` or `OrderedSet` will
be done in the `FieldHandler/`SubDofHandler` rework.
@termi-official
Copy link
Member Author

I will redo this on master if there is still interest.

@termi-official termi-official deleted the do/relax-fieldhandler branch October 23, 2023 14:28
@termi-official
Copy link
Member Author

Okay we should really address this

using Ferrite, SparseArrays, BenchmarkTools
grid = generate_grid(Quadrilateral, (1000, 1000));

ip = Lagrange{RefQuadrilateral, 1}()
qr = QuadratureRule{RefQuadrilateral}(2)
cellvalues = CellValues(qr, ip);

dh = DofHandler(grid)
add!(dh, :u, ip)
close!(dh);

K = create_sparsity_pattern(dh)

ch = ConstraintHandler(dh);

∂Ω = union(
    getfaceset(grid, "left"),
    getfaceset(grid, "right"),
    getfaceset(grid, "top"),
    getfaceset(grid, "bottom"),
);

dbc = Dirichlet(:u, ∂Ω, (x, t) -> 0)
add!(ch, dbc);

close!(ch)

function assemble_element!(Ke::Matrix, fe::Vector, cellvalues::CellValues)
    n_basefuncs = getnbasefunctions(cellvalues)
    ## Reset to 0
    fill!(Ke, 0)
    fill!(fe, 0)
    ## Loop over quadrature points
    for q_point in 1:getnquadpoints(cellvalues)
        ## Get the quadrature weight= getdetJdV(cellvalues, q_point)
        ## Loop over test shape functions
        for i in 1:n_basefuncs
            δu  = shape_value(cellvalues, q_point, i)
            ∇δu = shape_gradient(cellvalues, q_point, i)
            ## Add contribution to fe
            fe[i] += δu *## Loop over trial shape functions
            for j in 1:n_basefuncs
                ∇u = shape_gradient(cellvalues, q_point, j)
                ## Add contribution to Ke
                Ke[i, j] += (∇δu  ∇u) *end
        end
    end
    return Ke, fe
end

function assemble_global(cellvalues::CellValues, K::SparseMatrixCSC, dh::DofHandler)
    ## Allocate the element stiffness matrix and element force vector
    n_basefuncs = getnbasefunctions(cellvalues)
    Ke = zeros(n_basefuncs, n_basefuncs)
    fe = zeros(n_basefuncs)
    ## Allocate global force vector f
    f = zeros(ndofs(dh))
    ## Create an assembler
    assembler = start_assemble(K, f)
    ## Loop over all cels
    for cell in CellIterator(dh)
        ## Reinitialize cellvalues for this cell
        reinit!(cellvalues, cell)
        ## Compute element contribution
        assemble_element!(Ke, fe, cellvalues)
        ## Assemble Ke and fe into K and f
        assemble!(assembler, celldofs(cell), Ke, fe)
    end
    return nothing
end


function assemble_global2(cellvalues::CellValues, K::SparseMatrixCSC, dh::DofHandler)
    ## Allocate the element stiffness matrix and element force vector
    n_basefuncs = getnbasefunctions(cellvalues)
    Ke = zeros(n_basefuncs, n_basefuncs)
    fe = zeros(n_basefuncs)
    ## Allocate global force vector f
    f = zeros(ndofs(dh))
    ## Create an assembler
    assembler = start_assemble(K, f)
    ## Loop over all cels
    for cell in CellIterator(dh.subdofhandlers[1])
        ## Reinitialize cellvalues for this cell
        reinit!(cellvalues, cell)
        ## Compute element contribution
        assemble_element!(Ke, fe, cellvalues)
        ## Assemble Ke and fe into K and f
        assemble!(assembler, celldofs(cell), Ke, fe)
    end
    return nothing
end

@btime assemble_global(cellvalues, K, dh);
@btime assemble_global2(cellvalues, K, dh);

528.541 ms (12 allocations: 7.65 MiB)
1.499 s (15 allocations: 7.65 MiB)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants