Skip to content

Commit

Permalink
Merge pull request #171 from rllm-team/develop
Browse files Browse the repository at this point in the history
Update docs.
  • Loading branch information
JianwuZheng413 authored Jan 16, 2025
2 parents c1b6361 + c3761ac commit 005c4f5
Show file tree
Hide file tree
Showing 18 changed files with 66 additions and 167 deletions.
34 changes: 4 additions & 30 deletions docs/source/tutorial/gnns.rst
Original file line number Diff line number Diff line change
Expand Up @@ -74,18 +74,15 @@ Finally, we need to implement a :obj:`train()` function and a :obj:`test()` func

.. code-block:: python
def train():
for epoch in range(200):
model.train()
optimizer.zero_grad()
out = model(data.x, data.adj)
loss = loss_fn(out[data.train_mask], data.y[data.train_mask])
loss.backward()
optimizer.step()
return loss.item()
@torch.no_grad()
def test():
with torch.no_grad():
model.eval()
out = model(data.x, data.adj)
pred = out.argmax(dim=1)
Expand All @@ -94,29 +91,6 @@ Finally, we need to implement a :obj:`train()` function and a :obj:`test()` func
for mask in [data.train_mask, data.val_mask, data.test_mask]:
correct = float(pred[mask].eq(data.y[mask]).sum().item())
accs.append(correct / int(mask.sum()))
return accs
metric = "Acc"
best_val_acc = best_test_acc = 0
times = []
for epoch in range(1, args.epochs + 1):
start = time.time()
train_loss = train()
train_acc, val_acc, test_acc = test()
if val_acc > best_val_acc:
best_val_acc = val_acc
best_test_acc = test_acc
times.append(time.time() - start)
print(
f"Epoch: [{epoch}/{args.epochs}] "
f"Train Loss: {train_loss:.4f} Train {metric}: {train_acc:.4f} "
f"Val {metric}: {val_acc:.4f}, Test {metric}: {test_acc:.4f} "
)
print(f"Mean time per epoch: {torch.tensor(times).mean():.4f}s")
print(f"Total time: {sum(times):.4f}s")
print(f"Best test acc: {best_test_acc:.4f}")
print(f"Accuracy: {acc:.4f}")
>>> 0.8150
69 changes: 19 additions & 50 deletions docs/source/tutorial/rtls.rst
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
Design of RTLs
==============

What is a RTL?
What is RTL?
----------------
In machine learning, **Relational Table Learnings (RTLs)** typically refers to the learning of relational table data, which consists of multiple interconnected tables with significant heterogeneity. In an RTL, the input comprises multiple table signals that are interrelated. A typical RTL architecture consists of one or more Transforms followed by multiple Convolution layers, as detailed in *Understanding Transform* and *Understanding Convolution*.
In machine learning, **Relational Table Learnings (RTLs)** typically refers to the learning of relational table data, which consists of multiple interconnected tables with significant heterogeneity. In an RTL, the input comprises multiple table signals that are interrelated. A typical RTL architecture consists of one or more Transforms followed by multiple Convolution layers, as detailed in **Understanding Transforms** and **Understanding Convolutions**.


Construct a BRIDGE
Expand Down Expand Up @@ -41,7 +41,7 @@ For convenience, we will construct a basic homogeneous graph here, even though m

.. code-block:: python
from utils import build_homo_graph, reorder_ids
from examples.bridge.utils import build_homo_graph, reorder_ids
# Original movie id in datasets is unordered, so we reorder them.
ordered_rating = reorder_ids(
Expand Down Expand Up @@ -106,62 +106,31 @@ After initializing the data, we instantiate the model. Since the task of the TML
table_encoder=t_encoder,
graph_encoder=g_encoder,
).to(device)
optimizer = torch.optim.Adam(
model.parameters(),
lr=args.lr,
weight_decay=args.wd,
)
optimizer = torch.optim.Adam(model.parameters())
Finally, we need to implement a :obj:`train()` function and a :obj:`test()` function, the latter of which does not require gradient tracking. The model can then be trained on the training and validation sets, and the classification results can be obtained from the test set.
Finally, we jointly train the model and evaluate the results on the test set.

.. code-block:: python
def train() -> float:
model.train()
optimizer.zero_grad()
logits = model(
table=user_table,
non_table=movie_embeddings,
adj=adj,
)
loss = F.cross_entropy(logits[train_mask].squeeze(), y[train_mask])
loss.backward()
optimizer.step()
return loss.item()
for epoch in range(50):
optimizer.zero_grad()
logits = model(
table=user_table,
non_table=movie_embeddings,
adj=adj,
)
loss = F.cross_entropy(logits[train_mask].squeeze(), y[train_mask])
loss.backward()
optimizer.step()
@torch.no_grad()
def test():
with torch.no_grad():
model.eval()
logits = model(
table=user_table,
non_table=movie_embeddings,
adj=adj,
)
preds = logits.argmax(dim=1)
accs = []
for mask in [train_mask, val_mask, test_mask]:
correct = float(preds[mask].eq(y[mask]).sum().item())
accs.append(correct / int(mask.sum()))
return accs
start_time = time.time()
best_val_acc = best_test_acc = 0
for epoch in range(1, args.epochs + 1):
train_loss = train()
train_acc, val_acc, test_acc = test()
print(
f"Epoch: [{epoch}/{args.epochs}]"
f"Loss: {train_loss:.4f} train_acc: {train_acc:.4f} "
f"val_acc: {val_acc:.4f} test_acc: {test_acc:.4f} "
)
if val_acc > best_val_acc:
best_val_acc = val_acc
best_test_acc = test_acc
print(f"Total Time: {time.time() - start_time:.4f}s")
print(
"BRIDGE result: "
f"Best Val acc: {best_val_acc:.4f}, "
f"Best Test acc: {best_test_acc:.4f}"
)
acc = (preds[test_mask] == y[test_mask]).sum(dim=0) / test_mask.sum()
print(f'Accuracy: {acc:.4f}')
>>> 0.3860
101 changes: 27 additions & 74 deletions docs/source/tutorial/tnns.rst
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
Design of TNNs
===============
What is a TNN?
What is TNN?
----------------
In machine learning, **Table/Tabular Neural Networks (TNNs)** are recently emerging neural networks specifically designed to process tabular data. In a TNN, the input is structured tabular data, usually organized in rows and columns. A typical TNN architecture consists of an initial Transform followed by multiple Convolution layers, as detailed in *Understanding Transform* and *Understanding Convolution*.
In machine learning, **Table/Tabular Neural Networks (TNNs)** are recently emerging neural networks specifically designed to process tabular data. In a TNN, the input is structured tabular data, usually organized in rows and columns. A typical TNN architecture consists of an initial Transform followed by multiple Convolution layers, as detailed in *Understanding Transforms* and *Understanding Convolutions*.


Construct a TabTransformer
Expand All @@ -11,34 +11,28 @@ In this tutorial, we will learn the basic workflow of using `[TabTransformer] <h

First, we use the :obj:`Titanic` dataset as an example, which can be loaded using the built-in dataloaders. Also, we instantiate a :obj:`TabTransformerTransform`, corresponding to the :obj:`TabTransformer` method. After applying the transformation and shuffling the data, we proceed to split the dataset into training, testing, and validation sets, following standard practices in deep learning.
.. code-block:: python
import argparse
import os.path as osp
import torch
import torch.nn.functional as F
from torch.utils.data import DataLoader
from rllm.types import ColType
from rllm.datasets import Titanic
from rllm.transforms.table_transforms import TabTransformerTransform
# Set random seed and device
torch.manual_seed(args.seed)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# Load dataset
path = osp.join(osp.dirname(osp.realpath(__file__)), "..", "data")
data = Titanic(cached_dir=path)[0]
path = osp.join(osp.dirname(osp.realpath(__file__)), "data")
data = Titanic(cached_dir=path, forced_reload=True)[0]
# Transform data
transform = TabTransformerTransform(out_dim=args.emb_dim)
emb_dim = 32
transform = TabTransformerTransform(out_dim=emb_dim)
data = transform(data).to(device)
data.shuffle()
# Split dataset, here the ratio of train-val-test is 80%-10%-10%
train_loader, val_loader, test_loader = data.get_dataloader(
train_split=0.8, val_split=0.1, test_split=0.1, batch_size=args.batch_size
train_split=0.8, val_split=0.1, test_split=0.1, batch_size=128
)
Next, we construct a simple :obj:`TabTransformer` model using the :obj:`TabTransformerConv` layer. Note that the first layer needs to pass in the metadata for initialization of the pre-encoder.
Expand Down Expand Up @@ -83,77 +77,36 @@ Next, we construct a simple :obj:`TabTransformer` model using the :obj:`TabTrans
# Set up model and optimizer
model = TabTransformer(
hidden_dim=args.emb_dim,
hidden_dim=emb_dim,
out_dim=data.num_classes,
num_layers=args.num_layers,
num_heads=args.num_heads,
num_layers=2,
num_heads=8,
metadata=data.metadata,
).to(device)
optimizer = torch.optim.Adam(
model.parameters(),
lr=args.lr,
weight_decay=args.wd,
)
optimizer = torch.optim.Adam(model.parameters(),)
Finally, we need to implement a :obj:`train()` function and a :obj:`test()` function, the latter of which does not require gradient tracking. The model can then be trained on the training and validation sets, and the classification results can be obtained from the test set.
Finally, we train our model and get the classification results on the test set.

.. code-block:: python
import time
def train(epoch: int) -> float:
model.train()
loss_accum = total_count = 0.0
for batch in tqdm(train_loader, desc=f"Epoch: {epoch}"):
for epoch in range(50):
for batch in train_loader:
x, y = batch
pred = model.forward(x)
loss = F.cross_entropy(pred, y.long())
pred = model(x)
loss = F.cross_entropy(pred, y)
optimizer.zero_grad()
loss.backward()
loss_accum += float(loss) * y.size(0)
total_count += y.size(0)
optimizer.step()
return loss_accum / total_count
@torch.no_grad()
def test(loader: DataLoader) -> float:
with torch.no_grad():
model.eval()
correct = total = 0
for batch in loader:
feat_dict, y = batch
pred = model.forward(feat_dict)
_, predicted = torch.max(pred, 1)
total += y.size(0)
correct += (predicted == y).sum().item()
accuracy = correct / total
return accuracy
metric = "Acc"
best_val_metric = best_test_metric = 0
times = []
for epoch in range(1, args.epochs + 1):
start = time.time()
train_loss = train(epoch)
train_metric = test(train_loader)
val_metric = test(val_loader)
test_metric = test(test_loader)
if val_metric > best_val_metric:
best_val_metric = val_metric
best_test_metric = test_metric
times.append(time.time() - start)
print(
f"Train Loss: {train_loss:.4f}, Train {metric}: {train_metric:.4f}, "
f"Val {metric}: {val_metric:.4f}, Test {metric}: {test_metric:.4f}"
)
print(f"Mean time per epoch: {torch.tensor(times).mean():.4f}s")
print(f"Total time: {sum(times):.4f}s")
print(
f"Best Val {metric}: {best_val_metric:.4f}, "
f"Best Test {metric}: {best_test_metric:.4f}"
)
correct = 0
for tf in test_loader:
x, y = batch
pred = model(x)
pred_class = pred.argmax(dim=-1)
correct += (y == pred_class).sum()
acc = int(correct) / len(test_dataset)
print(f'Accuracy: {acc:.4f}')
>>> 0.8082
3 changes: 3 additions & 0 deletions examples/bridge/bridge_tml1m.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@

sys.path.append("./")
sys.path.append("../")
sys.path.append("../../")
from rllm.datasets import TML1MDataset
from rllm.transforms.graph_transforms import GCNTransform
from rllm.transforms.table_transforms import TabTransformerTransform
Expand Down Expand Up @@ -128,6 +129,8 @@ def test():
for mask in [train_mask, val_mask, test_mask]:
correct = float(preds[mask].eq(y[mask]).sum().item())
accs.append(correct / int(mask.sum()))
print(mask.sum())
exit(0)
return accs


Expand Down
2 changes: 1 addition & 1 deletion rllm/data/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
from .dataset import Dataset # noqa
from ..datasets.dataset import Dataset # noqa
from .graph_data import BaseGraph, GraphData, HeteroGraphData # noqa
from .table_data import BaseTable, TableData, TableDataset # noqa
from .storage import BaseStorage, NodeStorage, EdgeStorage, recursive_apply # noqa
Expand Down
2 changes: 1 addition & 1 deletion rllm/datasets/adult.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@

from rllm.types import ColType
from rllm.data.table_data import TableData
from rllm.data.dataset import Dataset
from rllm.datasets.dataset import Dataset
from rllm.utils.download import download_url


Expand Down
2 changes: 1 addition & 1 deletion rllm/datasets/bank_marketing.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@

from rllm.types import ColType
from rllm.data.table_data import TableData
from rllm.data.dataset import Dataset
from rllm.datasets.dataset import Dataset
from rllm.utils.download import download_url


Expand Down
2 changes: 1 addition & 1 deletion rllm/datasets/churn_modelling.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@

from rllm.types import ColType
from rllm.data.table_data import TableData
from rllm.data.dataset import Dataset
from rllm.datasets.dataset import Dataset
from rllm.utils.download import download_url


Expand Down
File renamed without changes.
2 changes: 1 addition & 1 deletion rllm/datasets/dblp.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
import scipy.sparse as sp
import torch

from rllm.data.dataset import Dataset
from rllm.datasets.dataset import Dataset
from rllm.data.graph_data import HeteroGraphData
from rllm.utils.graph_utils import sparse_mx_to_torch_sparse_tensor
from rllm.utils.download import download_url
Expand Down
2 changes: 1 addition & 1 deletion rllm/datasets/imdb.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@

# import sys
# sys.path.append('../')
from rllm.data.dataset import Dataset
from rllm.datasets.dataset import Dataset
from rllm.data.graph_data import HeteroGraphData
from rllm.utils.sparse import sparse_mx_to_torch_sparse_tensor
from rllm.utils.extract import extract_zip
Expand Down
2 changes: 1 addition & 1 deletion rllm/datasets/planetoid.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@

# import sys
# sys.path.append('../')
from rllm.data.dataset import Dataset
from rllm.datasets.dataset import Dataset
from rllm.data.graph_data import GraphData
from rllm.utils.sparse import sparse_mx_to_torch_sparse_tensor
from rllm.datasets.utils import index2mask
Expand Down
2 changes: 1 addition & 1 deletion rllm/datasets/sjtutables/tacm12k.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@

from rllm.types import ColType
from rllm.data.table_data import TableData
from rllm.data.dataset import Dataset
from rllm.datasets.dataset import Dataset
from rllm.utils.download import download_url
from rllm.utils.extract import extract_zip

Expand Down
Loading

0 comments on commit 005c4f5

Please sign in to comment.