- データファイル (train.csv, eval.csv) のダウンロード
「tf.data を使って CSV をロードする」を練習で繰り返す場合,ダウンロードは初回だけで,次回以降はこのプロセスをスキップする。
ダウンロード:
>>> TRAIN_DATA_URL = "https://storage.googleapis.com/tf-datasets/titanic/train.csv"
>>> TEST_DATA_URL = "https://storage.googleapis.com/tf-datasets/titanic/eval.csv"
>>> tf.keras.utils.get_file("train.csv", TRAIN_DATA_URL)
Downloading data from https://storage.googleapis.com/tf-datasets/titanic/train.csv
32768/30874 [===============================] - 0s 1us/step
>>> tf.keras.utils.get_file("eval.csv", TEST_DATA_URL)
Downloading data from https://storage.googleapis.com/tf-datasets/titanic/eval.csv
16384/13049 [=====================================] - 0s 1us/step
>>>
このとき,(カーレントディレクトリがどこであっても) ディレクトリ ~ に .keras が以下の内容でつくられる:
- train.csv, eval.csv のロード
ファイル rain.csv, eval.csv のパスを設定:
>>> train_file_path = "[絶対パス]/.keras/datasets/train.csv"
>>> test_file_path = "[絶対パス]/.keras/datasets/eval.csv"
「~/.keras/datasets/‥‥.csv」はダメ(註)
読み込み形式の設定:
>>> LABELS = [0, 1]
>>> LABEL_COLUMN = 'survived'
>>> def get_dataset(file_path, **kwargs):
... dataset = tf.data.experimental.make_csv_dataset(
... file_path,
... batch_size=5,
... label_name=LABEL_COLUMN,
... na_value="?",
... num_epochs=1,
... ignore_errors=True,
... **kwargs)
... return dataset
...
>>>
ここで「batch_size=5」の意味は:
データを<5件を一括り>で表示
(Artificially small to make examples easier to show.)
ロード
>>> raw_train_data = get_dataset(train_file_path)
WARNING:tensorflow:From /home/pi/venv/lib/python3.7/site-packages/
tensorflow_core/python/data/experimental/ops/readers.py:540:
parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops)
is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.interleave(map_func, cycle_length, block_length,
num_parallel_calls=tf.data.experimental.AUTOTUNE)` instead.
If sloppy execution is desired, use `tf.data.Options.experimental_deterministic`.
>>> raw_test_data = get_dataset(test_file_path)
註: |
rain.csv, eval.csv のパスを設定するところで「~/.keras/datasets/‥‥.csv」を使うと,
train.csv, eval.csv をロードする get_dataset で,つぎのエラーが返される:
|
>>> raw_train_data = get_dataset(train_file_path)
Traceback (most recent call last):
File "", line 1, in
File "", line 9, in get_dataset
File "/home/pi/venv/lib/python3.7/site-packages/
tensorflow_core/python/data/experimental/ops/readers.py",
line 588, in make_csv_dataset_v1
compression_type, ignore_errors))
File "/home/pi/venv/lib/python3.7/site-packages/
tensorflow_core/python/data/experimental/ops/readers.py",
line 437, in make_csv_dataset_v2
filenames = _get_file_names(file_pattern, False)
File "/home/pi/venv/lib/python3.7/site-packages/
tensorflow_core/python/data/experimental/ops/readers.py",
line 970, in _get_file_names
file_names = list(gfile.Glob(file_pattern))
File "/home/pi/venv/lib/python3.7/site-packages/
tensorflow_core/python/lib/io/file_io.py",
line 363, in get_matching_files
return get_matching_files_v2(filename)
File "/home/pi/venv/lib/python3.7/site-packages/
tensorflow_core/python/lib/io/file_io.py",
line 384, in get_matching_files_v2
compat.as_bytes(pattern))
tensorflow.python.framework.errors_impl.
NotFoundError: ~/.keras/datasets; No such file or directory
|