Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Kaggle loop update (Feature & Model) #241

Merged
merged 70 commits into from
Sep 11, 2024
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
Show all changes
70 commits
Select commit Hold shift + click to select a range
49541c6
Init todo
you-n-g Jul 17, 2024
aa4c7e5
Evaluation & dataset
taozhiwang Jul 23, 2024
c51a6f0
Generate new data
taozhiwang Jul 23, 2024
90bd7e3
dataset generation
taozhiwang Jul 24, 2024
864f5a0
add the result
taozhiwang Jul 24, 2024
f9b57b9
Analysis
taozhiwang Jul 24, 2024
db82b67
Factor update
taozhiwang Jul 24, 2024
52dc938
Updates
taozhiwang Jul 25, 2024
702c830
Reformat analysis.py
taozhiwang Jul 25, 2024
ac80c93
CI fix
taozhiwang Jul 25, 2024
9357628
Merge branch 'main' into benchmark
you-n-g Jul 25, 2024
3088525
Merge pull request #112 from microsoft/benchmark
taozhiwang Jul 25, 2024
ab552ff
Merge branch 'main' of https://github.com/microsoft/RD-Agent
xisen-w Jul 26, 2024
fc01635
Merge branch 'main' of https://github.com/microsoft/RD-Agent
xisen-w Jul 29, 2024
681af16
Merge branch 'main' of https://github.com/microsoft/RD-Agent
xisen-w Jul 30, 2024
a4643de
Merge branch 'main' of https://github.com/microsoft/RD-Agent
xisen-w Jul 31, 2024
cc48faa
Merge branch 'main' of https://github.com/microsoft/RD-Agent
xisen-w Aug 6, 2024
3ff2406
Merge branch 'main' of https://github.com/microsoft/RD-Agent
xisen-w Aug 26, 2024
96a5e9e
Merge branch 'main' of https://github.com/microsoft/RD-Agent
xisen-w Aug 28, 2024
fa25aaf
Revised Preprocessing & Supported Random Forest
xisen-w Aug 28, 2024
5537ff0
Revised to support three models with feature
xisen-w Aug 30, 2024
818bf0b
Further revised prompts
xisen-w Aug 30, 2024
d3f91cb
Slight Revision
xisen-w Aug 30, 2024
79bdf4c
docs: update contributors (#230)
Hytn Aug 28, 2024
94a22cb
Revised to support three models with feature
xisen-w Aug 30, 2024
e8294a6
Further revised prompts
xisen-w Aug 30, 2024
a8e8dd9
Slight Revision
xisen-w Aug 30, 2024
f218b93
Merge branch 'model-loop-update' of https://github.com/microsoft/RD-A…
xisen-w Aug 30, 2024
ce8eeed
feat: kaggle model and feature (#238)
peteryang1 Sep 2, 2024
a8b2df9
feat: continue kaggle feature and model coder (#239)
peteryang1 Sep 2, 2024
c718143
finish the first round of runner (#240)
peteryang1 Sep 3, 2024
b2b7572
Optimized the factor scenario and added the front-end.
WinstonLiyt Sep 3, 2024
fed9a69
fix a small bug
WinstonLiyt Sep 4, 2024
05db6f1
fix a typo
WinstonLiyt Sep 4, 2024
88047f1
update the kaggle scenario
WinstonLiyt Sep 4, 2024
33b7b69
delete model_template folder
peteryang1 Sep 4, 2024
913ce10
use experiment to run data preprocess script
peteryang1 Sep 4, 2024
0135d21
add source data to scenarios
peteryang1 Sep 4, 2024
0de82d2
minor fix
peteryang1 Sep 4, 2024
ecbef88
minor bug fix
peteryang1 Sep 4, 2024
e8981ef
train.py debug
taozhiwang Sep 8, 2024
4da3957
fixed a bug in train.py and added some TODOs
WinstonLiyt Sep 8, 2024
b902a9e
For Debugging
xisen-w Sep 9, 2024
e6f95f5
fix two small bugs in based_exp
WinstonLiyt Sep 9, 2024
0ac70c0
fix some bugs
WinstonLiyt Sep 9, 2024
a6b603a
update preprocess
WinstonLiyt Sep 9, 2024
b4dd339
fix a bug in preprocess
WinstonLiyt Sep 9, 2024
fcd0f20
fix a bug in train.py
WinstonLiyt Sep 9, 2024
1e90441
reformat
WinstonLiyt Sep 9, 2024
121f5e0
Follow-up
xisen-w Sep 9, 2024
f915bc0
Merge branch 'model-loop-model-debug' into model-loop-update
xisen-w Sep 9, 2024
efb18a5
fix a bug in train.py
WinstonLiyt Sep 9, 2024
91ca2e9
fix a bug in workspace
WinstonLiyt Sep 9, 2024
966da07
fix a bug in feature duplication
WinstonLiyt Sep 9, 2024
d084e75
fix a bug in feedback
WinstonLiyt Sep 9, 2024
993b39e
fix a bug in preprocessed data
WinstonLiyt Sep 10, 2024
84e2447
Merge branch 'model-loop-update' of https://github.com/microsoft/RD-A…
xisen-w Sep 10, 2024
7effdf0
fix a bug om feature engineering
WinstonLiyt Sep 10, 2024
25a64c6
Merge branch 'main' into model-loop-update
WinstonLiyt Sep 10, 2024
275e526
fix a ci error
WinstonLiyt Sep 10, 2024
d1fa409
Debugged & Connected
xisen-w Sep 10, 2024
2aab52d
Merge branch 'model-loop-update' of https://github.com/microsoft/RD-A…
xisen-w Sep 10, 2024
c2dfbb8
Fixed error on feedback & added other fixes
xisen-w Sep 11, 2024
ffb2ff5
fix CI errors
WinstonLiyt Sep 11, 2024
3695a7b
fix a CI bug
WinstonLiyt Sep 11, 2024
1de9557
fix: fix_dotenv_error (#257)
SunsetWolf Sep 10, 2024
09812dc
chore(main): release 0.2.1 (#249)
you-n-g Sep 10, 2024
e0bc856
init a scenario for kaggle feature engineering
WinstonLiyt Aug 26, 2024
097d9f3
delete error codes
WinstonLiyt Sep 11, 2024
4ab8b96
Delete rdagent/app/kaggle_feature/conf.py
WinstonLiyt Sep 11, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion rdagent/log/ui/app.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,6 @@
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
from rdagent.scenarios.kaggle.experiment.scenario import KGScenario
import streamlit as st
from plotly.subplots import make_subplots
from streamlit import session_state as state
Expand All @@ -28,6 +27,7 @@
from rdagent.log.ui.qlib_report_figure import report_figure
from rdagent.scenarios.data_mining.experiment.model_experiment import DMModelScenario
from rdagent.scenarios.general_model.scenario import GeneralModelScenario
from rdagent.scenarios.kaggle.experiment.scenario import KGScenario
from rdagent.scenarios.qlib.experiment.factor_experiment import QlibFactorScenario
from rdagent.scenarios.qlib.experiment.factor_from_report_experiment import (
QlibFactorFromReportScenario,
Expand Down
3 changes: 0 additions & 3 deletions rdagent/scenarios/kaggle/developer/runner.py
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,6 @@ class KGFactorRunner(KGCachedRunner[KGFactorExperiment]):
def init_develop(self, exp: KGFactorExperiment) -> KGFactorExperiment:
"""
For the initial development, the experiment serves as a benchmark for feature engineering.
#TODO 不是特别确定写的对不对
"""
self.build_from_SOTA(exp)
if RUNNER_SETTINGS.cache_result:
Expand All @@ -88,8 +87,6 @@ def init_develop(self, exp: KGFactorExperiment) -> KGFactorExperiment:
return exp

def develop(self, exp: KGFactorExperiment) -> KGFactorExperiment:
# TODO 这里是用来跑读一次的sota的,就是不做特征工程的。后面轮次exp.based_experiments[-1]应该都有至
# TODO 但是不知道为啥 这里exp.based_experiments 是空。但是在proposal.py 是有定义的
if exp.based_experiments and exp.based_experiments[-1].result is None:
exp.based_experiments[-1] = self.init_develop(exp.based_experiments[-1])
self.build_from_SOTA(exp)
Expand Down
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
import pandas as pd
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.compose import ColumnTransformer
from sklearn.impute import SimpleImputer
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import OneHotEncoder


Expand Down
8 changes: 4 additions & 4 deletions rdagent/scenarios/kaggle/experiment/meta_tpl/train.py
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
from fea_share_preprocess import preprocess_script
import importlib.util
import random
from pathlib import Path
import numpy as np
import pandas as pd
from sklearn.preprocessing import LabelEncoder
from fea_share_preprocess import preprocess_script
from pathlib import Path
import random
from sklearn.metrics import accuracy_score, matthews_corrcoef
from sklearn.preprocessing import LabelEncoder


# Set random seed for reproducibility
Expand Down
6 changes: 5 additions & 1 deletion rdagent/scenarios/kaggle/experiment/prompts.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,11 @@ kg_feature_interface: |-
return X.fillna(0) # Example feature processing
```

Ensure that your code meets these requirements and produces a feature-engineered DataFrame that contains only the newly engineered columns, aligning with the user's data and objectives.
To Note:
1. Ensure that your code meets these requirements and produces a feature-engineered DataFrame that contains only the newly engineered columns, aligning with the user's data and objectives.
2. Ensure that the index of the output DataFrame matches the index of the original DataFrame. For example:
Incorrect: `normalized_df = pd.DataFrame(normalized_features, columns=X.columns)`
Correct: `normalized_df = pd.DataFrame(normalized_features, columns=X.columns, index=X.index)`

kg_model_interface: |-
Your code should contain several parts:
Expand Down
8 changes: 6 additions & 2 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,10 @@ st-theme
selenium
kaggle

#model related
# tool
seaborn
setuptools-scm

# This is a temporary package installed to pass the test_import test
xgboost
lightgbm
lightgbm
Loading