Data defines a set of observations. There are three ways to read data in Edward. They follow the three ways to read data in TensorFlow.
Preloaded data. A constant or variable in the TensorFlow graph holds all the data. This setting is the fastest to work with and is recommended if the data fits in memory.
Represent the data as NumPy arrays or TensorFlow tensors.
x_data = np.array([0, 1, 0, 0, 0, 0, 0, 0, 0, 1])
x_data = tf.constant([0, 1, 0, 0, 0, 0, 0, 0, 0, 1])
During inference, we store them in TensorFlow variables internally to prevent copying data more than once in memory. As an example, see the getting started notebook.
Feeding. Manual code provides the data when running each step of inference. This setting provides the most fine control which is useful for experimentation.
Represent the data as TensorFlow placeholders, which are nodes in the graph that are fed at runtime.
x_data = tf.placeholder(tf.float32, [100, 25]) # placeholder of shape (100, 25)
During inference, the user must manually feed the placeholders. At each step, call inference.update()
while passing in a feed_dict
dictionary which binds placeholders to realized values as an argument. As an example, see the variational auto-encoder script. If the values do not change over inference updates, one can also bind the placeholder to values within the data
argument when first constructing inference.
Reading from files. An input pipeline reads the data from files at the beginning of a TensorFlow graph. This setting is recommended if the data does not fit in memory.
filename_queue = tf.train.string_input_producer(...)
reader = tf.SomeReader()
...
Represent the data as TensorFlow tensors, where the tensors are the output of data readers. During inference, each update will be automatically evaluated over new batch tensors represented through the data readers. As an example, see the data unit test.