-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GPU] LSTMSequence and LSTMCell optimization #26767
[GPU] LSTMSequence and LSTMCell optimization #26767
Conversation
commit 232d272f11fbe65e82fa9787260a8b9d34b57d20 Author: michal-miotk <[email protected]> Date: Mon Jul 29 11:17:47 2024 +0000 wip commit e642ca3 Author: michal-miotk <[email protected]> Date: Sun Jul 28 22:08:24 2024 +0000 wip commit c6b74d3 Author: michal-miotk <[email protected]> Date: Fri Jul 26 14:10:26 2024 +0000 wip commit 0451429 Author: michal-miotk <[email protected]> Date: Thu Jul 25 20:35:11 2024 +0000 wip3
commit 1164592 Author: michal-miotk <[email protected]> Date: Tue Aug 6 09:25:45 2024 +0000 wip commit 8b2c049 Author: michal-miotk <[email protected]> Date: Tue Aug 6 09:24:02 2024 +0000 wip commit 886b412 Author: michal-miotk <[email protected]> Date: Mon Aug 5 14:59:14 2024 +0000 wip commit 08fb207 Author: michal-miotk <[email protected]> Date: Sun Aug 4 20:21:38 2024 +0000 wip, errors on half commit 125884d Author: michal-miotk <[email protected]> Date: Sat Aug 3 23:59:58 2024 +0000 wip commit af4f209 Author: michal-miotk <[email protected]> Date: Fri Aug 2 17:58:38 2024 +0000 wip commit 12626fc Author: michal-miotk <[email protected]> Date: Fri Aug 2 10:52:15 2024 +0000 wip commit dfdd052 Author: michal-miotk <[email protected]> Date: Thu Aug 1 15:38:41 2024 +0000 wip commit 54ee912 Author: michal-miotk <[email protected]> Date: Thu Aug 1 11:01:55 2024 +0000 only bfyx layout commit 240fe4a Author: michal-miotk <[email protected]> Date: Thu Aug 1 10:34:45 2024 +0000 two outputs from prim commit bc775be Author: michal-miotk <[email protected]> Date: Wed Jul 31 22:13:14 2024 +0000 wip commit d1cfd60 Author: michal-miotk <[email protected]> Date: Wed Jul 31 22:07:06 2024 +0000 wip commit 7d18884 Author: michal-miotk <[email protected]> Date: Wed Jul 31 19:19:04 2024 +0000 begin of handling reverse commit 39f64af Author: michal-miotk <[email protected]> Date: Wed Jul 31 15:37:06 2024 +0000 betterbetter commit 67b3c9a Author: michal-miotk <[email protected]> Date: Wed Jul 31 13:12:39 2024 +0000 better commit 6ded5aa Author: michal-miotk <[email protected]> Date: Wed Jul 31 10:12:31 2024 +0000 wip commit 1ccdacc Author: michal-miotk <[email protected]> Date: Tue Jul 30 23:07:21 2024 +0000 wip commit ab1307c Author: michal-miotk <[email protected]> Date: Tue Jul 30 22:00:50 2024 +0000 test passed commit bc65969 Author: michal-miotk <[email protected]> Date: Tue Jul 30 15:37:20 2024 +0000 wip commit 03cbf57 Author: michal-miotk <[email protected]> Date: Tue Jul 30 15:15:06 2024 +0000 only 2 outputs commit fd5f3dc Author: michal-miotk <[email protected]> Date: Tue Jul 30 14:47:12 2024 +0000 wip commit 939d23c Author: michal-miotk <[email protected]> Date: Tue Jul 30 11:34:56 2024 +0000 wip commit 2bb561f Author: michal-miotk <[email protected]> Date: Tue Jul 30 09:28:03 2024 +0000 added to binary buffer commit 1ef83ff Author: michal-miotk <[email protected]> Date: Mon Jul 29 22:30:57 2024 +0000 not works
…tion only in gpu plugin
Signed-off-by: Michal Miotk <[email protected]>
Signed-off-by: Michal Miotk <[email protected]>
Signed-off-by: Michal Miotk <[email protected]>
std::vector<cldnn::activation_func> activations; | ||
std::vector<cldnn::activation_additional_params> activation_params; | ||
GetLSTMActivationParams(op, activations, activation_params); | ||
float clip = op->get_clip(); | ||
|
||
assert(!inputs[5].pid.empty()); | ||
if (p.use_new_shape_infer()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest replacing it with OPENVINO_ASSERT to ensure that method is called correctly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
op_mode, 1, axis, num_splits)); | ||
p.add_primitive(*op, cldnn::reshape(outputCellID, cldnn::input_info(outputCellCropID), | ||
false, outSzPt, op->get_output_partial_shape(1))); | ||
p.add_primitive(*op, cldnn::lstm_cell(layerName+".out0", inputs[0], inputs[1], inputs[2], inputs[3], inputs[4], inputs[5], \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it? I think item 2 is still relevant. You pass this layerName + "_md_write.1"
argument, and the corresponding parameters from primitive API are still there.
@@ -278,6 +278,9 @@ ov::SupportedOpsMap Plugin::query_model(const std::shared_ptr<const ov::Model>& | |||
|
|||
ExecutionConfig config = m_configs_map.at(device_id); | |||
config.set_user_property(orig_config); | |||
if (ctx->get_engine().get_device_info().supports_immad) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These 2 changes are not needed too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
p.add_primitive(*op, cldnn::crop(cellStr, cldnn::input_info(lstm_elt_id), hiddenSz, cellCropSz)); | ||
} | ||
const float clip = op->get_clip(); | ||
if (op->get_input_shape(2).size() != 3 || op->get_input_shape(3).size() != 1 \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: also redundant backslashes here and in other places. Please remove those
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
p.add_primitive(*op, cldnn::reshape(layerName + ".out0", concatStr, tensor_from_dims(op->get_output_shape(0))), {layerName}); | ||
p.add_primitive(*op, cldnn::reshape(layerName + ".out1", hiddenStr, tensor_from_dims(op->get_output_shape(1)))); | ||
p.add_primitive(*op, cldnn::reshape(layerName + ".out2", cellStr, tensor_from_dims(op->get_output_shape(2)))); | ||
if (p.use_new_shape_infer()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OPENVINO_ASSERT here as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
public: | ||
using parent::parent; | ||
|
||
program_node& input() const { return get_dependency(0); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likely the same unused methods as for lstm_seq primitive
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
std::vector<format::type> in_fmts(node.get_dependencies().size(), format::any); | ||
std::vector<format::type> out_fmts(node.get_outputs_count(), format::any); | ||
|
||
size_t out_rank = node.get_output_layout().get_rank(); | ||
for (size_t idx = 0 ; idx < node.get_dependencies().size() ; idx++) { | ||
if (node.get_dependency(idx).is_constant()) | ||
continue; | ||
|
||
auto target_format = format::get_default_format(out_rank); | ||
|
||
in_fmts[idx] = target_format; | ||
} | ||
out_fmts[0] = format::ybfx; | ||
|
||
return {in_fmts, out_fmts}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that code should actually query onednn for the required tensor formats (as it's done for convolutions). You can do it in the next PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
return node.get_input_layout(0).format == cldnn::format::bfyx || node.get_input_layout(0).format == cldnn::format::fbyx \ | ||
|| node.get_input_layout(0).format == cldnn::format::ybfx; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think tensor format is not the only restriction. At least we need
- Type checks
info.arch == gpu_arch::unknown
(see other impls)- padding checks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1., 2. done, 3.not done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3.done
int i = 0; | ||
auto& input = instance.input_memory(i); | ||
auto offset = onednn::get_offset(instance.get_input_layout(i), | ||
_pd.dnnl::primitive_desc_base::src_desc(i)); | ||
auto mem = input.get_onednn_memory(_pd.dnnl::primitive_desc_base::src_desc(i), offset); | ||
args.insert({DNNL_ARG_SRC_LAYER, mem}); | ||
} | ||
|
||
{ | ||
int i = 1; | ||
auto& input = instance.input_memory(i); | ||
auto offset = onednn::get_offset(instance.get_input_layout(i), | ||
_pd.dnnl::primitive_desc_base::src_desc(i)); | ||
auto mem = input.get_onednn_memory(_pd.dnnl::primitive_desc_base::src_desc(i), offset); | ||
args.insert({DNNL_ARG_SRC_ITER, mem}); | ||
} | ||
|
||
{ | ||
int i = 2; | ||
auto& input = instance.input_memory(i); | ||
auto offset = onednn::get_offset(instance.get_input_layout(i), | ||
_pd.dnnl::primitive_desc_base::src_desc(i)); | ||
auto mem = input.get_onednn_memory(_pd.dnnl::primitive_desc_base::src_desc(i), offset); | ||
args.insert({DNNL_ARG_SRC_ITER_C, mem}); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this code can be done in a loop if you store these DNNL_ARG_SRC_LAYER
, DNNL_ARG_SRC_ITER
, etc in a vector. Same for weights and dst buffers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
auto hiddenSize = reorder_params->get_output_layout().get_shape()[1] / 4; | ||
auto cropSize = cldnn::tensor{dir_num, static_cast<int>(hiddenSize), 1, 1}; | ||
std::string crop_id_b = input_id + "_c"; | ||
auto get_crop_node = [&](int cropNum) -> cldnn::program_node& { | ||
auto crop_id = primitive_id(crop_id_b + std::to_string(cropNum)); | ||
auto crop_prim = std::make_shared<cldnn::crop>(crop_id, input_id, cropSize, cldnn::tensor{0, static_cast<int>(cropNum*hiddenSize), 0, 0}); | ||
return p.get_or_create(crop_prim); | ||
}; | ||
auto& crop0_node = get_crop_node(0); | ||
auto& crop1_node = get_crop_node(1); | ||
auto& crop2_node = get_crop_node(2); | ||
auto& crop3_node = get_crop_node(3); | ||
std::vector<input_info> con_input{input_info(crop1_node.id()), input_info(crop0_node.id()), input_info(crop2_node.id()), input_info(crop3_node.id())}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can it be done with some kind of Slice/StridedSlice primitive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it can be, actually I've deleted one crop, but I don't think it will be easy to have less nodes using StridedSlice primitive
…output of node Signed-off-by: Michal Miotk <[email protected]>
Signed-off-by: Michal Miotk <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, left a couple of minor suggestions
@@ -120,6 +120,9 @@ concatenation_inst::typed_primitive_inst(network& network, concatenation_node co | |||
if (dim == node.get_primitive()->axis) { | |||
concat_count += input_mem_size[dim]; | |||
} else { | |||
if (i.first->get_outputs_count() > 1 && i.first->get_user_index(node) > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you try to use port number i.second
to obtain the proper output layout here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
} | ||
} | ||
p.get_processing_order().calc_processing_order(p); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes sense to call this recalculation only for lstm case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
dispatchData.gws[2] = input.Batch().v; | ||
dispatchData.gws[1] = input.Feature().v; | ||
dispatchData.gws[0] = input.Y().v*input.X().v; | ||
dispatchData.lws = {1, 1, 1}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This workgroup might not provide optimal performance, we may consider optimizing it in the future
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
Signed-off-by: Michal Miotk <[email protected]>
Signed-off-by: Michal Miotk <[email protected]>
…: multiouput prim Signed-off-by: Michal Miotk <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, LGTM. Please check the performance carefully
seed = hash_combine(seed, initial_hidden_state.pid); | ||
seed = hash_combine(seed, initial_cell_state.pid); | ||
seed = hash_combine(seed, seq_lenghts.pid); | ||
seed = hash_combine(seed, W.pid); | ||
seed = hash_combine(seed, R.pid); | ||
seed = hash_combine(seed, B.pid); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comparison and hashing of the primitive ids prevents primitive reuse if we have multiple instances of the same op. So you shall just hash/compare only presence flag to all inputs. As an example you can use convolution
op.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
auto input_i_layout = i.first->get_output_layout(); | ||
auto input_mem_size = input_i_layout.get_dims(); | ||
if (i.first->get_outputs_count() > 1 && i.second > 0) { | ||
input_i_layout = i.first->get_output_layout(false, i.second); | ||
input_mem_size = input_i_layout.get_dims(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that can be changed to
auto input_i_layout = i.first->get_output_layout(false, i.second);
auto input_mem_size = input_i_layout.get_dims();
isn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
bool cell_state_check = one_of(in2_dt, {data_types::f16, data_types::bf16, data_types::f32}) && | ||
one_of(out2_dt, {data_types::f16, data_types::bf16, data_types::f32}); | ||
bool f16_case = everyone_is(data_types::f16, in0_dt, in1_dt, in3_dt, in4_dt, out0_dt, out1_dt); | ||
bool bf16_case = everyone_is(data_types::bf16, in0_dt, in1_dt, in3_dt, in4_dt, out0_dt, out1_dt); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bf16 is not supported by GPU plugin for now. I think it can be removed from here as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
if (node.get_preferred_impl_type() == impl_types::onednn && node.get_preferred_output_fmt() != format::any) { | ||
first_out_fmt = node.get_preferred_output_fmt(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you consider first out port only?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
return {cldnn::layout{ShapeType{lstm_batch_size, num_directions, lstm_seq_length, lstm_hidden_size}, input_layout.data_type, first_out_fmt}, \ | ||
cldnn::layout{ShapeType{lstm_batch_size, num_directions, lstm_hidden_size}, input_layout.data_type, second_out_fmt}, \ | ||
cldnn::layout{ShapeType{lstm_batch_size, num_directions, lstm_hidden_size}, input_layout.data_type, third_out_fmt}}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return {cldnn::layout{ShapeType{lstm_batch_size, num_directions, lstm_seq_length, lstm_hidden_size}, input_layout.data_type, first_out_fmt}, \ | |
cldnn::layout{ShapeType{lstm_batch_size, num_directions, lstm_hidden_size}, input_layout.data_type, second_out_fmt}, \ | |
cldnn::layout{ShapeType{lstm_batch_size, num_directions, lstm_hidden_size}, input_layout.data_type, third_out_fmt}}; | |
return {cldnn::layout{ShapeType{lstm_batch_size, num_directions, lstm_seq_length, lstm_hidden_size}, input_layout.data_type, first_out_fmt}, | |
cldnn::layout{ShapeType{lstm_batch_size, num_directions, lstm_hidden_size}, input_layout.data_type, second_out_fmt}, | |
cldnn::layout{ShapeType{lstm_batch_size, num_directions, lstm_hidden_size}, input_layout.data_type, third_out_fmt}}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
const std::vector<activation_additional_params>& activation_params = {}, | ||
const lstm_weights_order& offset_order = lstm_weights_order::iofz, | ||
const ov::op::RecurrentSequenceDirection direction = ov::op::RecurrentSequenceDirection::FORWARD, | ||
const padding& output_padding = padding(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think padding arg is not needed as it's always set as default
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
} | ||
OPENVINO_ASSERT(!inputs[5].pid.empty()); | ||
OPENVINO_ASSERT(p.use_new_shape_infer()); | ||
p.add_primitive(*op, cldnn::lstm_cell(layerName+".out0", inputs[0], inputs[1], inputs[2], inputs[3], inputs[4], inputs[5], cldnn::input_info(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this ".out0" suffix is not needed for new shape infer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Signed-off-by: Michal Miotk <[email protected]>
Signed-off-by: Michal Miotk <[email protected]>
cmp_fields(W) && | ||
cmp_fields(R) && | ||
cmp_fields(B) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here you also shouldn't compare string values, but rather check presence of inputs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Signed-off-by: Michal Miotk <[email protected]>
a88bf5a
Details:
Tickets: