Fast topological features #1252

kasyanovse · 2024-01-19T15:21:36Z

This is a 🙋 feature or enhancement.

Summary

Ускоренная версия топологических фич (в 30 раз). От обычных топологических фич отличаются достаточно сильно:

Скинул весь код в один класс.
Отказался от использования giotto-tda в пользу giotto-ph.
Изменил расчет фич из топологических фич для максимального ускорения.

Context

Inspired by #1241.

pep8speaks · 2024-01-19T15:21:43Z

Hello @kasyanovse! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2024-01-27 18:04:17 UTC

github-actions · 2024-01-19T15:22:31Z

Code in this pull request still contains PEP8 errors, please write the /fix-pep8 command in the comments below to create commit with automatic fixes.

Comment last updated at

codecov · 2024-01-19T15:27:37Z

Codecov Report

Attention: 30 lines in your changes are missing coverage. Please review.

Comparison is base (5e726e9) 80.05% compared to head (8f895c3) 79.84%.

Files	Patch %	Lines
...erations/topological/fast_topological_extractor.py	30.23%	30 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #1252      +/-   ##
==========================================
- Coverage   80.05%   79.84%   -0.21%     
==========================================
  Files         149      150       +1     
  Lines       10278    10322      +44     
==========================================
+ Hits         8228     8242      +14     
- Misses       2050     2080      +30

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

kasyanovse · 2024-01-19T15:45:47Z

/fix-pep8

fedot/api/api_utils/assumptions/task_assumptions.py

fedot/core/repository/data/data_operation_repository.json

...aluation/operation_implementations/data_operations/topological/fast_topological_extractor.py

valer1435 · 2024-01-24T08:39:36Z

Хотелось бы тест на то, что фичи получаются +- те же, что и в обычном

v1docq · 2024-01-24T08:46:30Z

...aluation/operation_implementations/data_operations/topological/fast_topological_extractor.py

+                             maxdim=self.max_homology_dimension,
+                             coeff=2,
+                             metric='euclidean',
+                             n_threads=1,
+                             collapse_edges=False)["dgms"]
+        result = list()


Неплохо бы вынести хотя бы метрику, n_threads, и размерность гомологий в гиперпараметры. Очень сильно будут влиять на итоговые диаграммы

a749678
b2c5f3e

Вынес. Распараллеливание сделал на уровне transform метода.

v1docq · 2024-01-24T08:48:51Z

...aluation/operation_implementations/data_operations/topological/fast_topological_extractor.py

+class FastTopologicalFeaturesImplementation(DataOperationImplementation):
+    def __init__(self, params: Optional[OperationParameters] = None):
+        super().__init__(params)
+        self.points_count = params.get('points_count')
+        self.max_homology_dimension = 1
+        self.feature_funs = (lambda x: np.quantile(x, (0.1, 0.25, 0.5, 0.75, 0.9)), )
+        self.shape = None


.Поинт клауд реализовался через абстракцию траекторной матрицы что связывало с собой эти классы и позволяло реализовывать над ними алгебру (складывать, вычитать, делить) для федота это офк лишнее, но просто имей ввиду. Человеку который никогда не видел фильтрацию виетори рипса и персистентных диаграмм
сходу придется учить giotta. Поэтому отказ от собственных абстрактных классов в пользу решений из коробки не всегда самое очевидное решение

Согласен, код обычных топо фич читается лучше, но, надеюсь, в этот код мало кому придется лазить)

kasyanovse · 2024-01-24T18:26:42Z

Хотелось бы тест на то, что фичи получаются +- те же, что и в обычном

Здесь из топологии генерируются другие фичи, поэтому смысла в этом нет. Сравнение предсказаний для lagged-topo-ridge на картинке. Я бы не сказал, что есть принципиальные отличия, однако можно сказать что fast_topo не уловило низкочастотные составляющие. Это жертва ради скорости, но если нужно, то качество можно улучшить, докинув к квантилям еще и стат фичи.

Код для генерации картинки

import logging
from time import perf_counter
import pickle

import numpy as np
from matplotlib import pyplot as plt

from fedot.core.pipelines.node import PipelineNode
from fedot.core.pipelines.pipeline import Pipeline
from fedot.core.repository.tasks import Task, TaskTypesEnum, TsForecastingParams
from fedot.api.main import Fedot
from fedot.core.data.data import InputData
from fedot.core.repository.dataset_types import DataTypesEnum
from fedot.core.data.data_split import train_test_data_setup


RANDOM_SEED = 100


def get_data(data_length=500, test_length=100):
    garmonics = [(0.1, 0.9), (0.1, 1), (0.1, 1.1), (0.05, 2), (0.05, 5), (1, 0.02)]
    time = np.linspace(0, 100, data_length)
    data = time * 0
    for g in garmonics:
        data += g[0] * np.sin(g[1] * 2 * np.pi / time[-1] * 25 * time)

    data = InputData(idx=np.arange(0, data.shape[0]),
                     features=data,
                     target=data,
                     task=Task(TaskTypesEnum.ts_forecasting,
                               TsForecastingParams(forecast_length=test_length)),
                     data_type=DataTypesEnum.ts)
    return train_test_data_setup(data,
                                 validation_blocks=1,
                                 split_ratio=(data_length - test_length) / ((data_length - test_length) + test_length))


def plot_ppl(ppls, train, test, labels):
    _, ax = plt.subplots()
    limits = len(test.target)
    ax.plot(train.idx[-limits:], train.target[-limits:], label='train')
    ax.plot(test.idx, test.target, label='test')
    for label, ppl in zip(labels, ppls):
        predict = ppl.predict(test).predict
        ax.plot(test.idx[-len(predict):], predict, label=label)
    ax.legend()


if __name__ == '__main__':
    train, test = get_data()
    node = PipelineNode('lagged')
    node = PipelineNode('fast_topological_features', nodes_from=[node])
    node = PipelineNode('ridge', nodes_from=[node])
    ppl1 = Pipeline(node)
    t0 = perf_counter()
    ppl1.fit(train)
    ppl1.predict(test)
    print(perf_counter() - t0)

    train, test = get_data()
    node = PipelineNode('lagged')
    node = PipelineNode('topological_features', nodes_from=[node])
    node = PipelineNode('ridge', nodes_from=[node])
    ppl2 = Pipeline(node)
    t0 = perf_counter()
    ppl2.fit(train)
    ppl2.predict(test)
    print(perf_counter() - t0)

    plot_ppl([ppl1, ppl2], train, test, ('fast_topo', 'topo'))

Lopa10ko

lgtm :)
я все nit-pick'и исчерпал

kasyanovse added 2 commits January 17, 2024 19:49

add fast topo

c3f9f9c

fix fast topo

2388af2

kasyanovse added enhancement New feature or request time series related to time series processing labels Jan 19, 2024

kasyanovse self-assigned this Jan 19, 2024

github-actions bot and others added 2 commits January 19, 2024 15:46

Automated autopep8 fixes

5dd70ac

pep8

e39eecc

kasyanovse requested a review from valer1435 January 19, 2024 15:56

add to initial assumption

619849e

nicl-nno requested a review from Lopa10ko January 22, 2024 10:27

Lopa10ko requested changes Jan 22, 2024

View reviewed changes

fedot/api/api_utils/assumptions/task_assumptions.py Outdated Show resolved Hide resolved

fedot/core/repository/data/data_operation_repository.json Show resolved Hide resolved

Lopa10ko reviewed Jan 22, 2024

View reviewed changes

...aluation/operation_implementations/data_operations/topological/fast_topological_extractor.py Outdated Show resolved Hide resolved

kasyanovse added 2 commits January 22, 2024 19:26

make code more clear in fit method

18815b8

add fast_topoligical_features to docs

328b4b3

valer1435 requested a review from v1docq January 24, 2024 08:27

v1docq reviewed Jan 24, 2024

View reviewed changes

kasyanovse added 5 commits January 24, 2024 20:50

fix aligment for pipeline builders in TSForecastingAssumptions

3972d10

add topo to FedotBuilder docs

7f742a4

fix table in docs

e2b0249

add params to fast_topo

a749678

change params and add it to tuner search space

b2c5f3e

kasyanovse requested review from Lopa10ko and v1docq January 24, 2024 18:31

Lopa10ko approved these changes Jan 24, 2024

View reviewed changes

v1docq approved these changes Jan 26, 2024

View reviewed changes

valer1435 approved these changes Jan 26, 2024

View reviewed changes

kasyanovse added 8 commits January 27, 2024 10:05

fix integration tests

29f1013

add new param stride

002551c

fix param

0c9ddfc

dirty speedup

7b755ea

delete ica from initial assumption due to instability

1b4db86

fix documentation

ee2ecd0

fix test

355a134

delete fast_topo from assumption and delete fast_train tag

8f895c3

kasyanovse merged commit 5efd0fb into master Jan 27, 2024
9 of 10 checks passed

kasyanovse deleted the add_topo2_speedup2 branch January 27, 2024 19:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fast topological features #1252

Fast topological features #1252

kasyanovse commented Jan 19, 2024

pep8speaks commented Jan 19, 2024 •

edited

Loading

github-actions bot commented Jan 19, 2024 •

edited

Loading

codecov bot commented Jan 19, 2024 •

edited

Loading

kasyanovse commented Jan 19, 2024

valer1435 commented Jan 24, 2024

v1docq Jan 24, 2024

kasyanovse Jan 24, 2024

v1docq Jan 24, 2024

kasyanovse Jan 24, 2024

kasyanovse commented Jan 24, 2024 •

edited by Lopa10ko

Loading

Lopa10ko left a comment

Fast topological features #1252

Fast topological features #1252

Conversation

kasyanovse commented Jan 19, 2024

Summary

Context

pep8speaks commented Jan 19, 2024 • edited Loading

Comment last updated at 2024-01-27 18:04:17 UTC

github-actions bot commented Jan 19, 2024 • edited Loading

Comment last updated at

codecov bot commented Jan 19, 2024 • edited Loading

Codecov Report

kasyanovse commented Jan 19, 2024

valer1435 commented Jan 24, 2024

v1docq Jan 24, 2024

Choose a reason for hiding this comment

kasyanovse Jan 24, 2024

Choose a reason for hiding this comment

v1docq Jan 24, 2024

Choose a reason for hiding this comment

kasyanovse Jan 24, 2024

Choose a reason for hiding this comment

kasyanovse commented Jan 24, 2024 • edited by Lopa10ko Loading

Lopa10ko left a comment

Choose a reason for hiding this comment

pep8speaks commented Jan 19, 2024 •

edited

Loading

github-actions bot commented Jan 19, 2024 •

edited

Loading

codecov bot commented Jan 19, 2024 •

edited

Loading

kasyanovse commented Jan 24, 2024 •

edited by Lopa10ko

Loading