Mulltimodal pipeline improvement #581

andreygetmanov · 2022-03-01T19:15:47Z

Fixes in composer's work, minor changes and optimization of multimodal tools

pep8speaks · 2022-03-01T19:16:02Z

Hello @andreygetmanov! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2022-04-07 14:19:52 UTC

cases/multi_modal_genre_prediction.py

Dreamlone · 2022-03-03T10:52:45Z

examples/advanced/multi_modal_pipeline.py

+                                                                          train_text, test_text)

    pipeline.fit(input_data=fit_data)


На строчке pipeline.fit(input_data=fit_data) выпадает ошибка

В датасете много классов + он достаточно дисбалансный, поэтому при разделении на train/test не все классы попадали в обе выборки. Поменял датасет, сократил число классов, теперь должно работать.

поэтому при разделении на train/test не все классы попадали в обе выборки

Хорошо бы это починить, конечно. Но можно отдельным PR.

Хорошо бы это починить, конечно.

В этом примере используется небольшой датасет. Подумал, что поменять датасет и сократить число классов - норм идея для демонстрации примера. Но можно и что-то другое сделать, наверное.

Думаю пока можно действительно оставить упрощенный вариант, а уже последующими PR организовать возможность нормально работать с сильно несбалансированными датасетами

cases/multi_modal_genre_prediction.py

examples/advanced/multi_modal_pipeline.py

fedot/core/operations/evaluation/evaluation_interfaces.py

codecov · 2022-04-01T13:44:48Z

Codecov Report

Merging #581 (9716b28) into master (3fa9b2f) will decrease coverage by 0.04%.
The diff coverage is 89.79%.

@@            Coverage Diff             @@
##           master     #581      +/-   ##
==========================================
- Coverage   86.51%   86.47%   -0.05%     
==========================================
  Files         153      153              
  Lines       11208    11230      +22     
==========================================
+ Hits         9697     9711      +14     
- Misses       1511     1519       +8

Impacted Files	Coverage Δ
fedot/core/operations/evaluation/text.py	`75.25% <50.00%> (-1.34%)`	⬇️
...lementations/data_operations/text_preprocessing.py	`89.85% <86.66%> (-2.21%)`	⬇️
fedot/core/data/data.py	`87.81% <88.88%> (+4.34%)`	⬆️
fedot/core/data/load_data.py	`91.91% <100.00%> (+0.16%)`	⬆️
...ore/operations/evaluation/evaluation_interfaces.py	`89.38% <100.00%> (+1.53%)`	⬆️
...aluation/operation_implementations/models/keras.py	`86.31% <100.00%> (+0.44%)`	⬆️
fedot/core/validation/compose/time_series.py	`86.36% <0.00%> (-13.64%)`	⬇️
...on_implementations/models/discriminant_analysis.py	`91.78% <0.00%> (-4.11%)`	⬇️
fedot/core/composer/gp_composer/gp_composer.py	`85.83% <0.00%> (-2.50%)`	⬇️
fedot/core/optimisers/gp_comp/evaluating.py	`68.25% <0.00%> (-1.59%)`	⬇️
... and 8 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3fa9b2f...9716b28. Read the comment docs.

examples/advanced/multi_modal_pipeline.py

cases/multi_modal_genre_prediction.py

Dreamlone

Ещё стоит написать тест на мультимодальную функциональность, которая уже есть. Вроде таких тестов ещё нет, по крайней мере я не нашёл для случая "картинки + текст + таблицы"

cases/multi_modal_genre_prediction.py

fedot/core/data/data.py

examples/advanced/multi_modal_pipeline.py

fedot/core/operations/evaluation/evaluation_interfaces.py

Dreamlone

В целом все ок, осталось только с random_state = 42 ещё разобраться. Может @nicl-nno подскажет стоит ли вводить тут определение random_state через randint, чтобы разбиение все таки отличалось от запуска к запуску. Или это для чего-то конкретного так задумывалось?

Dreamlone · 2022-04-06T09:59:35Z

Ну и unit тест ещё нужен - стоит не забыть

andreygetmanov · 2022-04-06T15:14:01Z

Ещё стоит написать тест на мультимодальную функциональность, которая уже есть. Вроде таких тестов ещё нет, по крайней мере я не нашёл для случая "картинки + текст + таблицы"

https://github.com/nccr-itmo/FEDOT/blob/master/test/unit/pipelines/test_multi_modal.py

Этот тест генерирует и запускает пайплайн для случая "картинки + текст + таблицы". Нужно как-то дополнить его функционал? Не совсем понимаю, что можно в него добавить

nicl-nno · 2022-04-06T18:23:43Z

чтобы разбиение все таки отличалось от запуска к запуску

Да вроде в примере это не обязательно.

Dreamlone · 2022-04-07T12:26:04Z

Да вроде в примере это не обязательно.

Для примера - определенно необязательно. Но мой вопрос про то, стоит ли оставлять такой хардкод во внутренних функциях. То есть если random seed не меняется в _split_any никогда, алгоритм всегда будет делать одно и то же разбиение на всех запусках

Dreamlone

Сделай пожалуйста ребейз, и думаю, можно мерджить

nicl-nno · 2022-04-07T12:30:15Z

Да вроде в примере это не обязательно.

Для примера - определенно необязательно. Но мой вопрос про то, стоит ли оставлять такой хардкод во внутренних функциях. То есть если random seed не меняется в _split_any никогда, алгоритм всегда будет делать одно и то же разбиение на всех запусках

Ок, это действительно можно убрать. Хотя оно и не в этом PR-е появилось.

andreygetmanov · 2022-04-07T14:03:45Z

Ок, это действительно можно убрать. Хотя оно и не в этом PR-е появилось.

Создать issue по этому поводу?

- fixed the optimizer error in multimodal pipeline - fixed the bug #564 'Example multi_modal_pipeline_genres failed' - deleted the example of rating prediction - optimized the process of NLP libraries import - changed the data for multimodal example - upgraded stemmer from Porter to Snowball - fixed bug of merging multimodal data - fixed bug of multimodal data shuffling while loading - CNN now works on multioutput task - Fixed the bug with incorrect type and shape of multioutput predictions

- removed warning during scaling image data - minor changes for readability - test_multi_modal.py is changed accordingly to new structure of multi_modal_pipeline.py

- now there is no useless try of download of stopwords and other nltk packages if they are already downloaded - keras.Input changed to recommended keras.layers.InputLayer - test_multi_modal.py is moved to multimodal folder

nicl-nno · 2022-04-07T14:12:21Z

Можешь просто убрать.

Dreamlone · 2022-04-07T14:23:48Z

Можешь просто убрать.

Ладно, давай уже не в этом PR, предлагаю отдельно снести

andreygetmanov requested a review from nicl-nno March 1, 2022 19:15

andreygetmanov force-pushed the multimodal branch 3 times, most recently from 3000903 to b5e8871 Compare March 1, 2022 19:23

nicl-nno reviewed Mar 2, 2022

View reviewed changes

cases/multi_modal_genre_prediction.py Outdated Show resolved Hide resolved

Dreamlone requested changes Mar 3, 2022

View reviewed changes

andreygetmanov force-pushed the multimodal branch from b5e8871 to a312679 Compare March 4, 2022 14:48

Dreamlone requested changes Mar 5, 2022

View reviewed changes

cases/multi_modal_genre_prediction.py Outdated Show resolved Hide resolved

examples/advanced/multi_modal_pipeline.py Show resolved Hide resolved

andreygetmanov force-pushed the multimodal branch 3 times, most recently from 093fc5c to 3d89d56 Compare March 15, 2022 15:04

Dreamlone reviewed Mar 17, 2022

View reviewed changes

examples/advanced/multi_modal_pipeline.py Outdated Show resolved Hide resolved

examples/advanced/multi_modal_pipeline.py Outdated Show resolved Hide resolved

andreygetmanov force-pushed the multimodal branch from 3d89d56 to 486959d Compare March 18, 2022 15:00

nicl-nno mentioned this pull request Mar 22, 2022

DataMerger refactor #610

Closed

andreygetmanov force-pushed the multimodal branch from 486959d to fb6f8c5 Compare April 1, 2022 13:23

nicl-nno reviewed Apr 1, 2022

View reviewed changes

fedot/core/operations/evaluation/evaluation_interfaces.py Outdated Show resolved Hide resolved

andreygetmanov force-pushed the multimodal branch 2 times, most recently from 0c27d48 to 32353a4 Compare April 1, 2022 15:54

nicl-nno requested a review from Dreamlone April 1, 2022 16:30

nicl-nno approved these changes Apr 1, 2022

View reviewed changes

Dreamlone linked an issue Apr 2, 2022 that may be closed by this pull request

Example multi_modal_pipeline_genres failed #564

Closed

Dreamlone reviewed Apr 2, 2022

View reviewed changes

examples/advanced/multi_modal_pipeline.py Outdated Show resolved Hide resolved

cases/multi_modal_genre_prediction.py Outdated Show resolved Hide resolved

Dreamlone requested changes Apr 2, 2022

View reviewed changes

andreygetmanov force-pushed the multimodal branch from 8c1b416 to 6cd90a3 Compare April 5, 2022 16:00

nicl-nno approved these changes Apr 5, 2022

View reviewed changes

Dreamlone reviewed Apr 6, 2022

View reviewed changes

andreygetmanov force-pushed the multimodal branch from 04da0ad to 3337a68 Compare April 6, 2022 17:35

Dreamlone requested changes Apr 7, 2022

View reviewed changes

andreygetmanov added 3 commits April 7, 2022 17:11

- multimodal data now is prepared as a united MultiModalData object

c636d1e

- removed warning during scaling image data - minor changes for readability - test_multi_modal.py is changed accordingly to new structure of multi_modal_pipeline.py

- added pipeline tuning to multi_modal_genre_prediction.py

9716b28

- now there is no useless try of download of stopwords and other nltk packages if they are already downloaded - keras.Input changed to recommended keras.layers.InputLayer - test_multi_modal.py is moved to multimodal folder

andreygetmanov force-pushed the multimodal branch from 3337a68 to 9716b28 Compare April 7, 2022 14:19

Dreamlone approved these changes Apr 7, 2022

View reviewed changes

andreygetmanov merged commit 92aede4 into master Apr 7, 2022

andreygetmanov deleted the multimodal branch April 13, 2022 13:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mulltimodal pipeline improvement #581

Mulltimodal pipeline improvement #581

andreygetmanov commented Mar 1, 2022

pep8speaks commented Mar 1, 2022 •

edited

Loading

Dreamlone Mar 3, 2022

andreygetmanov Mar 4, 2022

nicl-nno Mar 4, 2022

andreygetmanov Mar 4, 2022

Dreamlone Mar 5, 2022

codecov bot commented Apr 1, 2022 •

edited

Loading

Dreamlone left a comment

Dreamlone left a comment

Dreamlone commented Apr 6, 2022

andreygetmanov commented Apr 6, 2022

nicl-nno commented Apr 6, 2022

Dreamlone commented Apr 7, 2022 •

edited

Loading

Dreamlone left a comment

nicl-nno commented Apr 7, 2022

andreygetmanov commented Apr 7, 2022

nicl-nno commented Apr 7, 2022

Dreamlone commented Apr 7, 2022

Mulltimodal pipeline improvement #581

Mulltimodal pipeline improvement #581

Conversation

andreygetmanov commented Mar 1, 2022

pep8speaks commented Mar 1, 2022 • edited Loading

Comment last updated at 2022-04-07 14:19:52 UTC

Dreamlone Mar 3, 2022

Choose a reason for hiding this comment

andreygetmanov Mar 4, 2022

Choose a reason for hiding this comment

nicl-nno Mar 4, 2022

Choose a reason for hiding this comment

andreygetmanov Mar 4, 2022

Choose a reason for hiding this comment

Dreamlone Mar 5, 2022

Choose a reason for hiding this comment

codecov bot commented Apr 1, 2022 • edited Loading

Codecov Report

Dreamlone left a comment

Choose a reason for hiding this comment

Dreamlone left a comment

Choose a reason for hiding this comment

Dreamlone commented Apr 6, 2022

andreygetmanov commented Apr 6, 2022

nicl-nno commented Apr 6, 2022

Dreamlone commented Apr 7, 2022 • edited Loading

Dreamlone left a comment

Choose a reason for hiding this comment

nicl-nno commented Apr 7, 2022

andreygetmanov commented Apr 7, 2022

nicl-nno commented Apr 7, 2022

Dreamlone commented Apr 7, 2022

pep8speaks commented Mar 1, 2022 •

edited

Loading

codecov bot commented Apr 1, 2022 •

edited

Loading

Dreamlone commented Apr 7, 2022 •

edited

Loading