Add MNISTのコードと説明

chizuchizu · Mar 24, 2021 · b4efe9a · b4efe9a
1 parent a5b5a72
commit b4efe9a
Show file tree

Hide file tree

Showing 2 changed files with 41 additions and 0 deletions.
diff --git a/README.md b/README.md
@@ -28,3 +28,16 @@ Your token file exists.
 # Bad (トークンファイルが見つからない)
 Your token file doesn't exist.
 ```
+
+### データセットについて
+#### MNIST
+
+[Digit Recognizer | Kaggle](https://www.kaggle.com/c/digit-recognizer) のデータセットを使っています。  
+これはMNISTのデータを扱いやすくcsvに変換する処理を施したものです。
+
+- `train.csv`: そのままのデータ  
+- `train_400.csv`: 28x28(pixels)から20x20(pixels)に変換した後のデータ
+
+`train_400.csv`の作成は`src/resize.py`を用いました。
+
+これらのデータは既に作成し、GitHubにあるのでcloneした時点で利用可能となっています。
diff --git a/src/resize.py b/src/resize.py
@@ -0,0 +1,28 @@
+from tqdm import tqdm
+import pandas as pd
+import numpy as np
+import cv2
+
+train = pd.read_csv("data/train.csv")
+img_size = 20
+pixels = img_size ** 2  # ピクセル数
+
+save = np.zeros(
+    (
+        train.shape[0],
+        1 + pixels
+    ),
+    dtype=int
+)
+
+for i in tqdm(range(train.shape[0])):
+    image = train.iloc[i, 1:].values.reshape(28, 28).astype(np.uint8)
+
+    image = cv2.resize(image, dsize=(img_size, img_size)).flatten()
+
+    save[i, 0] = train.iloc[i, 0]
+    save[i, 1:] = image
+
+save = pd.DataFrame(save)
+
+save.to_csv(f"data/train_{pixels}.csv", index=False)