Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MultiModalData class improvement #789

Merged
merged 20 commits into from
Sep 5, 2022
Merged
Show file tree
Hide file tree
Changes from 18 commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
99121be
MultiModalData class improvement
andreygetmanov Jul 21, 2022
47a53e9
minor changes:
andreygetmanov Jul 21, 2022
b64819b
- added 2 tests on MultiModalData.from_csv use
andreygetmanov Aug 8, 2022
c1e7ecb
- fixed bug with incorrect multimodal data preprocessing
andreygetmanov Aug 10, 2022
c240d69
- fixed bug with incorrect multimodal data preprocessing
andreygetmanov Aug 11, 2022
74731c2
- added data for tests
andreygetmanov Aug 11, 2022
2fbc879
- rewrote path in test_multimodal_data.py by Path
andreygetmanov Aug 12, 2022
fa8ab7b
- task now is defined by str, not by Task class
andreygetmanov Aug 12, 2022
3bf4cbe
- added substitution of nans to '' in text features
andreygetmanov Aug 19, 2022
9b9a0d0
- tests of multimodal data class are finished
andreygetmanov Aug 22, 2022
9d58be8
- text and ts preparation methods are now in distinct classes inherit…
andreygetmanov Aug 23, 2022
20bb336
- refactoring of prepare_multimodal_ts_data method
andreygetmanov Aug 30, 2022
ebd5f34
- refactoring structure of test_text_data_only
andreygetmanov Aug 30, 2022
11e953a
- refactoring structure of test_multimodal_data_from_csv
andreygetmanov Aug 30, 2022
cad1d58
- table and text preprocessing are now distinguished for easier reada…
andreygetmanov Aug 30, 2022
ef31bae
- removed duplicate of array_to_input_data method
andreygetmanov Sep 2, 2022
21f057a
- flake corrections
andreygetmanov Sep 2, 2022
15a363b
- minor changes
andreygetmanov Sep 2, 2022
1265ed4
- decorators moved to the distinguished file
andreygetmanov Sep 5, 2022
70e5af6
- minor changes
andreygetmanov Sep 5, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
54 changes: 19 additions & 35 deletions examples/advanced/multimodal_text_num_example.py
Original file line number Diff line number Diff line change
@@ -1,49 +1,33 @@
import os
from pathlib import Path

from fedot.api.main import Fedot

from fedot.core.data.data import InputData
from fedot.core.data.data_split import train_test_data_setup
from fedot.core.data.multi_modal import MultiModalData
from fedot.core.repository.dataset_types import DataTypesEnum
from fedot.core.repository.tasks import Task, TaskTypesEnum
from fedot.core.utils import fedot_project_root


def prepare_multi_modal_data(files_path: str, task: Task) -> MultiModalData:
def run_multi_modal_example(file_path: str, is_visualise=True) -> float:
"""
Imports data from 2 different sources (table and text)

:param files_path: path to data
:param task: task to solve
:return: MultiModalData object which contains table and text data
This is an example of FEDOT use on multimodal data.
The data is taken and adapted from Wine Reviews dataset (winemag-data_first150k):
https://www.kaggle.com/datasets/zynicide/wine-reviews
and contains information about wine country, region, price, etc.
Column that contains text features is 'description'.
Other columns contain numerical and categorical features.
The aim is to predict wine variety, so it's a classification task.

:param file_path: path to the file with multimodal data
:param is_visualise: if True, then final pipeline will be visualised

:return: F1 metrics of the model
"""

path = os.path.join(str(fedot_project_root()), files_path)

# import of table data
path_table = os.path.join(path, 'multimodal_wine_table.csv')
data_num = InputData.from_csv(path_table, task=task, target_columns='variety')

# import of text data
path_text = os.path.join(path, 'multimodal_wine_text.csv')
data_text = InputData.from_csv(path_text, data_type=DataTypesEnum.text, task=task, target_columns='variety')

data = MultiModalData({
'data_source_table': data_num,
'data_source_text': data_text
})

return data


def run_multi_modal_example(files_path: str, is_visualise=True) -> float:
task = Task(TaskTypesEnum.classification)

data = prepare_multi_modal_data(files_path, task)
task = 'classification'
path = Path(fedot_project_root(), file_path)
data = MultiModalData.from_csv(file_path=path, task=task, target_columns='variety', index_col=None)
fit_data, predict_data = train_test_data_setup(data, shuffle_flag=True, split_ratio=0.7)

automl_model = Fedot(problem='classification', timeout=10)
automl_model = Fedot(problem=task, timeout=10)
automl_model.fit(features=fit_data,
target=fit_data.target)

Expand All @@ -59,4 +43,4 @@ def run_multi_modal_example(files_path: str, is_visualise=True) -> float:


if __name__ == '__main__':
run_multi_modal_example(files_path='examples/data/multimodal_wine', is_visualise=True)
run_multi_modal_example(file_path='examples/data/multimodal_wine.csv', is_visualise=True)
780 changes: 780 additions & 0 deletions examples/data/multimodal_wine.csv

Large diffs are not rendered by default.

Loading