Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MultiModalData class improvement #789

Merged
merged 20 commits into from
Sep 5, 2022
Merged

MultiModalData class improvement #789

merged 20 commits into from
Sep 5, 2022

Conversation

andreygetmanov
Copy link
Collaborator

Now csv files with text and table columns can be read and separated to various data sources just in one motion

  • MultiModalData.from_csv method added
  • text fields are defined automatically, if are not predefined by user
  • tests are added

@andreygetmanov andreygetmanov requested a review from nicl-nno July 21, 2022 14:47
@pep8speaks
Copy link

pep8speaks commented Jul 21, 2022

Hello @andreygetmanov! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 34:5: F841 local variable 'prediction' is assigned to but never used

Line 261:17: F402 import 'field' from line 4 shadowed by loop variable

Line 7:1: F403 'from fedot.core.data.array_utilities import *' used; unable to detect undefined names
Line 29:21: F405 'np' may be undefined, or defined from star imports: fedot.core.data.array_utilities
Line 30:31: F405 'find_common_elements' may be undefined, or defined from star imports: fedot.core.data.array_utilities
Line 32:30: F541 f-string is missing placeholders
Line 59:72: F405 'Optional' may be undefined, or defined from star imports: fedot.core.data.array_utilities
Line 80:32: F405 'np' may be undefined, or defined from star imports: fedot.core.data.array_utilities
Line 88:44: F405 'np' may be undefined, or defined from star imports: fedot.core.data.array_utilities
Line 107:50: F405 'np' may be undefined, or defined from star imports: fedot.core.data.array_utilities
Line 107:69: F405 'np' may be undefined, or defined from star imports: fedot.core.data.array_utilities
Line 109:25: F405 'atleast_2d' may be undefined, or defined from star imports: fedot.core.data.array_utilities
Line 111:45: F405 'np' may be undefined, or defined from star imports: fedot.core.data.array_utilities
Line 111:59: F405 'np' may be undefined, or defined from star imports: fedot.core.data.array_utilities
Line 113:16: F405 'np' may be undefined, or defined from star imports: fedot.core.data.array_utilities
Line 115:53: F405 'np' may be undefined, or defined from star imports: fedot.core.data.array_utilities
Line 115:66: F405 'np' may be undefined, or defined from star imports: fedot.core.data.array_utilities
Line 119:46: F405 'np' may be undefined, or defined from star imports: fedot.core.data.array_utilities
Line 119:75: F405 'np' may be undefined, or defined from star imports: fedot.core.data.array_utilities
Line 122:22: F405 'np' may be undefined, or defined from star imports: fedot.core.data.array_utilities
Line 124:18: F405 'np' may be undefined, or defined from star imports: fedot.core.data.array_utilities
Line 139:36: F405 'np' may be undefined, or defined from star imports: fedot.core.data.array_utilities
Line 146:50: F405 'np' may be undefined, or defined from star imports: fedot.core.data.array_utilities
Line 146:69: F405 'np' may be undefined, or defined from star imports: fedot.core.data.array_utilities
Line 148:38: F405 'atleast_4d' may be undefined, or defined from star imports: fedot.core.data.array_utilities
Line 161:53: F405 'np' may be undefined, or defined from star imports: fedot.core.data.array_utilities
Line 161:66: F405 'np' may be undefined, or defined from star imports: fedot.core.data.array_utilities
Line 163:16: F405 'flatten_extra_dim' may be undefined, or defined from star imports: fedot.core.data.array_utilities
Line 168:45: F405 'np' may be undefined, or defined from star imports: fedot.core.data.array_utilities
Line 168:59: F405 'np' may be undefined, or defined from star imports: fedot.core.data.array_utilities
Line 173:53: F405 'np' may be undefined, or defined from star imports: fedot.core.data.array_utilities
Line 173:66: F405 'np' may be undefined, or defined from star imports: fedot.core.data.array_utilities

Line 29:1: W391 blank line at end of file

Comment last updated at 2022-09-05 14:00:25 UTC

@codecov
Copy link

codecov bot commented Jul 21, 2022

Codecov Report

Merging #789 (15a363b) into master (9ae9151) will increase coverage by 0.04%.
The diff coverage is 97.57%.

❗ Current head 15a363b differs from pull request most recent head 70e5af6. Consider uploading reports for the commit 70e5af6 to get more accurate results

@@            Coverage Diff             @@
##           master     #789      +/-   ##
==========================================
+ Coverage   87.73%   87.78%   +0.04%     
==========================================
  Files         193      194       +1     
  Lines       13230    13343     +113     
==========================================
+ Hits        11608    11713     +105     
- Misses       1622     1630       +8     
Impacted Files Coverage Δ
fedot/core/data/data_detection.py 97.01% <97.01%> (ø)
fedot/core/data/multi_modal.py 88.88% <97.29%> (+2.37%) ⬆️
fedot/preprocessing/preprocessing.py 98.76% <98.07%> (-0.55%) ⬇️
fedot/core/data/data.py 86.82% <100.00%> (+0.54%) ⬆️
fedot/core/data/data_preprocessing.py 93.24% <100.00%> (+0.09%) ⬆️
fedot/core/data/merge/data_merger.py 98.93% <100.00%> (ø)
...lementations/data_operations/ts_transformations.py 86.40% <0.00%> (-1.70%) ⬇️
...entations/models/ts_implementations/statsmodels.py 94.38% <0.00%> (-0.52%) ⬇️
...edot/core/optimisers/gp_comp/operators/mutation.py 93.27% <0.00%> (-0.45%) ⬇️
... and 4 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@aPovidlo aPovidlo self-requested a review July 21, 2022 14:59
@Dreamlone

This comment was marked as resolved.

ALLOWED_NAN_PERCENT = 0.9


class DataDetector:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Не совсем понял предназначение данного класса

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Не совсем понял предназначение данного класса

Если ты про DataDetector, то это абстракция двух последующих классов. В них есть схожие по механике методы, поэтому решил на это указать, создав абстрактный класс

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Схожие по механике методы вижу. Так может один раз реализовать в DataDetector и использовать от туда, чем для каждого по-отдельности? Или это как-то по другому работает?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Схожие по механике методы вижу. Так может один раз реализовать в DataDetector и использовать от туда, чем для каждого по-отдельности? Или это как-то по другому работает?

Это шаблон абстракции для этих классов. В этом классе методы просто унифицируются. Думаю в будущем расширять по мере необходимости

@andreygetmanov andreygetmanov force-pushed the multimodal_csv branch 2 times, most recently from 87e78dd to 38c8796 Compare August 30, 2022 15:53
Copy link
Collaborator

@MorrisNein MorrisNein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Поддерживаю идею разделить код для разных типов данных
Возможно, можно было обойтись разными модулями вместо разных классов, но на данном этапе не принципиально, не стоит всё пытаться сделать сразу идеально. По мере усложнения станет понятнее, как удобно.

UPD: Я тут писал про ошибку в property MultiModalData, но сам ошибся. Там всё ок

Now csv files with text and table data can be read just in one motion

- from_csv method added
- text fields are defined automatically
- tests are added
- added docstring for _column_contains_text
- multimodal_wine dataset is moved to more appropriate place
- protected funcs of multi_modal.py are now protected
- refactoring of text preprocessing
- refactoring of multimodal data test
- now if text column contains a lot of nans, it's dropped
…ing DataDetector parent in data_detection.py
- added description for multimodal_text_num_example.py
- DataDetection classes now look better
- refactoring of prepare_multimodal_ts logic
- preprocessing is now excludes some data types using decorators
@andreygetmanov andreygetmanov merged commit e708a83 into master Sep 5, 2022
@andreygetmanov andreygetmanov deleted the multimodal_csv branch September 5, 2022 14:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Impement automatic definition of text fields in dataset
6 participants