-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fast topological features #1252
Conversation
Hello @kasyanovse! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2024-01-27 18:04:17 UTC |
Code in this pull request still contains PEP8 errors, please write the Comment last updated at |
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #1252 +/- ##
==========================================
- Coverage 80.05% 79.84% -0.21%
==========================================
Files 149 150 +1
Lines 10278 10322 +44
==========================================
+ Hits 8228 8242 +14
- Misses 2050 2080 +30 ☔ View full report in Codecov by Sentry. |
/fix-pep8 |
...aluation/operation_implementations/data_operations/topological/fast_topological_extractor.py
Outdated
Show resolved
Hide resolved
Хотелось бы тест на то, что фичи получаются +- те же, что и в обычном |
maxdim=self.max_homology_dimension, | ||
coeff=2, | ||
metric='euclidean', | ||
n_threads=1, | ||
collapse_edges=False)["dgms"] | ||
result = list() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Неплохо бы вынести хотя бы метрику, n_threads, и размерность гомологий в гиперпараметры. Очень сильно будут влиять на итоговые диаграммы
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
class FastTopologicalFeaturesImplementation(DataOperationImplementation): | ||
def __init__(self, params: Optional[OperationParameters] = None): | ||
super().__init__(params) | ||
self.points_count = params.get('points_count') | ||
self.max_homology_dimension = 1 | ||
self.feature_funs = (lambda x: np.quantile(x, (0.1, 0.25, 0.5, 0.75, 0.9)), ) | ||
self.shape = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.Поинт клауд реализовался через абстракцию траекторной матрицы что связывало с собой эти классы и позволяло реализовывать над ними алгебру (складывать, вычитать, делить) для федота это офк лишнее, но просто имей ввиду. Человеку который никогда не видел фильтрацию виетори рипса и персистентных диаграмм
сходу придется учить giotta. Поэтому отказ от собственных абстрактных классов в пользу решений из коробки не всегда самое очевидное решение
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Согласен, код обычных топо фич читается лучше, но, надеюсь, в этот код мало кому придется лазить)
Здесь из топологии генерируются другие фичи, поэтому смысла в этом нет. Сравнение предсказаний для Код для генерации картинкиimport logging
from time import perf_counter
import pickle
import numpy as np
from matplotlib import pyplot as plt
from fedot.core.pipelines.node import PipelineNode
from fedot.core.pipelines.pipeline import Pipeline
from fedot.core.repository.tasks import Task, TaskTypesEnum, TsForecastingParams
from fedot.api.main import Fedot
from fedot.core.data.data import InputData
from fedot.core.repository.dataset_types import DataTypesEnum
from fedot.core.data.data_split import train_test_data_setup
RANDOM_SEED = 100
def get_data(data_length=500, test_length=100):
garmonics = [(0.1, 0.9), (0.1, 1), (0.1, 1.1), (0.05, 2), (0.05, 5), (1, 0.02)]
time = np.linspace(0, 100, data_length)
data = time * 0
for g in garmonics:
data += g[0] * np.sin(g[1] * 2 * np.pi / time[-1] * 25 * time)
data = InputData(idx=np.arange(0, data.shape[0]),
features=data,
target=data,
task=Task(TaskTypesEnum.ts_forecasting,
TsForecastingParams(forecast_length=test_length)),
data_type=DataTypesEnum.ts)
return train_test_data_setup(data,
validation_blocks=1,
split_ratio=(data_length - test_length) / ((data_length - test_length) + test_length))
def plot_ppl(ppls, train, test, labels):
_, ax = plt.subplots()
limits = len(test.target)
ax.plot(train.idx[-limits:], train.target[-limits:], label='train')
ax.plot(test.idx, test.target, label='test')
for label, ppl in zip(labels, ppls):
predict = ppl.predict(test).predict
ax.plot(test.idx[-len(predict):], predict, label=label)
ax.legend()
if __name__ == '__main__':
train, test = get_data()
node = PipelineNode('lagged')
node = PipelineNode('fast_topological_features', nodes_from=[node])
node = PipelineNode('ridge', nodes_from=[node])
ppl1 = Pipeline(node)
t0 = perf_counter()
ppl1.fit(train)
ppl1.predict(test)
print(perf_counter() - t0)
train, test = get_data()
node = PipelineNode('lagged')
node = PipelineNode('topological_features', nodes_from=[node])
node = PipelineNode('ridge', nodes_from=[node])
ppl2 = Pipeline(node)
t0 = perf_counter()
ppl2.fit(train)
ppl2.predict(test)
print(perf_counter() - t0)
plot_ppl([ppl1, ppl2], train, test, ('fast_topo', 'topo')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm :)
я все nit-pick'и исчерпал
This is a 🙋 feature or enhancement.
Summary
Ускоренная версия топологических фич (в 30 раз). От обычных топологических фич отличаются достаточно сильно:
Context
Inspired by #1241.