ここまでの経過:
$ source \[venv のパス\]/venv/bin/activate
(venv) $ python
>>> import tensorflow as tf
>>> tf.enable_eager_execution()
>>> train_file_path = "\[絶対パス\]/.keras/datasets/train.csv"
>>> test_file_path = "\[絶対パス\]/.keras/datasets/eval.csv"
>>> LABEL_COLUMN = 'survived'
>>> def get_dataset(file_path, **kwargs):
... dataset = tf.data.experimental.make_csv_dataset(
... file_path,
... batch_size=5,
... label_name=LABEL_COLUMN,
... na_value="?",
... num_epochs=1,
... ignore_errors=True,
... **kwargs)
... return dataset
...
>>>
>>> raw_train_data = get_dataset(train_file_path)
>>> raw_test_data = get_dataset(test_file_path)
データ表示における数値を読みやすくするための設定:
>>> import numpy as np
>>> np.set_printoptions(precision=3, suppress=True)
ここで,
precision=3 : 小数点3桁まで表示
suppress=True : 指数表示 (「‥ e- ‥」) をしない
raw_train_data の先頭のデータ── batch (5件一括り) ──を表示する。
表示する関数:
>>> def show_batch(dataset):
... for batch, label in dataset.take(1):
... for key, value in batch.items():
... print("{:20s}: {}".format(key,value.numpy()))
...
>>>
(「ひとつだけ」を意味する「.take(1)」を除けば,データすべての表示になる。 )
show_batch の実行(註):
>>> show_batch(raw_train_data)
sex : [b'male' b'female' b'female' b'male' b'female']
age : [28. 38. 28. 25. 26.]
n_siblings_spouses : [0 1 8 1 1]
parch : [0 5 2 0 1]
fare : [ 7.229 31.388 69.55 17.8 26. ]
class : [b'Third' b'Third' b'Third' b'Third' b'Second']
deck : [b'unknown' b'unknown' b'unknown' b'unknown' b'unknown']
embark_town : [b'Cherbourg' b'Southampton' b'Southampton' b'Southampton' b'Southampton']
alone : [b'y' b'n' b'n' b'n' b'n']
整形前と比較:
.>>> for batch in raw_train_data.take(1):
... print(batch)
...
(OrderedDict([
('sex', ),
('age', ),
('n_siblings_spouses', ),
('parch', ),
('fare', ),
('class', ),
('deck', ),
('embark_town', ),
('alone', )
]), )
train.csv の内容と比較:
ターミナルで別のシェルを開き,head コマンドを使って train.csv のあたまを見る:
$ head ~/.keras/datasets/train.csv
survived,sex,age,n_siblings_spouses,parch,fare,class,deck,embark_town,alone
0,male,22.0,1,0,7.25,Third,unknown,Southampton,n
1,female,38.0,1,0,71.2833,First,C,Cherbourg,n
1,female,26.0,0,0,7.925,Third,unknown,Southampton,y
1,female,35.0,1,0,53.1,First,C,Southampton,n
0,male,28.0,0,0,8.4583,Third,unknown,Queenstown,y
0,male,2.0,3,1,21.075,Third,unknown,Southampton,n
1,female,27.0,0,2,11.1333,Third,unknown,Southampton,n
1,female,14.0,1,0,30.0708,Second,unknown,Cherbourg,n
1,female,4.0,1,1,16.7,Third,G,Southampton,n
raw_train_data のデータの並びとは違っている。
Traceback (most recent call last):
File "", line 1, in
File "", line 2, in show_batch
File "/home/pi/venv/lib/python3.7/site-packages/
tensorflow_core/python/data/ops/dataset_ops.py",
line 2115, in __iter__
return iter(self._dataset)
File "/home/pi/venv/lib/python3.7/site-packages/
tensorflow_core/python/data/ops/dataset_ops.py",
line 347, in __iter__
raise RuntimeError("__iter__() is only supported inside of tf.function "
RuntimeError: __iter__() is only supported inside of tf.function
or when eager execution is enabled.
>>>
備考
データセットから特定の列──例えば,'age', 'n_siblings_spouses', 'class', 'deck', 'alone ──を利用したい場合
>>> SELECT_COLUMNS = ['survived', 'age', 'n_siblings_spouses', 'class', 'deck', 'alone']
>>>
>>> temp_dataset = get_dataset(train_file_path, select_columns=SELECT_COLUMNS)
>>>
>>> show_batch(temp_dataset)
age : [40. 28. 28. 53. 20.]
n_siblings_spouses : [1 0 0 2 0]
class : [b'Third' b'Third' b'Third' b'First' b'Third']
deck : [b'unknown' b'unknown' b'unknown' b'C' b'unknown']
alone : [b'n' b'y' b'y' b'n' b'y']
>>>
|