18
Date
Date columns are used to represent dates and times. They are stored
as a long representing the number of milliseconds since an origin of
January 1, 1970. This is the same origin used by Java. A string
representation of the time is used for display. Options under
Tools:Options specify the default date formats for reading and
displaying date values.
Output Caches The pipeline is designed to pass blocks of data between components,
rather than passing all of the data at once. It is this capability that
allows the product to scale to handle a very large number of rows of
data.
Global and component level settings are available to determine
whether a copy of the data is stored for each node output. By default,
each computed output has a corresponding copy of the data in an
output cache file.
If all of the computations can be performed in a blockwise fashion
with a single pass through the data, all of the data can be passed from
node to node without storing the values. This would be the case in a
network containing a series of Read Text File, Create Columns,
and Write Text File nodes. The advantage of not caching output
values is a savings in file space usage.
The advantage of caching values is that it provides greater
interactivity. Additional components can be hooked to an output and
executed without having to recompute the previous component
outputs. The data at the output can also be viewed in the viewer.
This interactivity is the reason the product caches node outputs by
default.
Some computational components cannot operate in a blockwise
fashion with a single pass through the data. For example, logistic
regression needs to make multiple passes through the data as it
performs numerical optimization. For this type of node, the
preceding output caches corresponding to its inputs will be created
regardless of the cache settings.
Node State Each node is always in one of three states: created, configured, or
computed.
Commentaires sur ces manuels