The original feature preprocess on datasets #33

khan-yin · 2023-05-21T14:47:06Z

hello author， I am a student now focused on GNN. I am curious about the original feature preprocess details on datasets for node classification. I want to know whether the origin feature has Heterogeneity or not. for example, the origin feature is generated by metapath2vec/transE ..etc, or maybe randomwalk? because I noticed that on some dataset，the original features (feats-type 0) can not even work better than only target with others zero features (feats-type 1). Thanks a lot. looking forward to your reply.😆

1049451037 · 2023-05-21T15:21:21Z

Hi. Thank you for your attention. The original features depends on the datasets. For example, the paper node in ACM and DBLP features are paper keyword n-gram. The author nodes are aggregated features from papers as suggested in HAN and MAGNN. Maybe the early aggregation causes the worse performance. For other information, you can refer to the dataset preprocessing scripts:

ACM：2c6535d
IMDB：4e05b43
DBLP：https://github.com/cynricfu/MAGNN/blob/master/preprocess_DBLP.ipynb

1049451037 mentioned this issue Jun 26, 2023

能否提供baseline的IMDB,DBLP等数据集的数据预处理代码？ #27

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The original feature preprocess on datasets #33

The original feature preprocess on datasets #33

khan-yin commented May 21, 2023

1049451037 commented May 21, 2023

The original feature preprocess on datasets #33

The original feature preprocess on datasets #33

Comments

khan-yin commented May 21, 2023

1049451037 commented May 21, 2023