KNIME Python Script (Labs) API
This document lists the API of the module knime_io
that functions as the main contact point between KNIME
and Python in the KNIME Python Script (Labs) node.
Please refer to the KNIME Python Integration Guide for more details on how to set up
and use the node.
Contents
Inputs and outputs
These properties can be used to retrieve data from or pass data back to KNIME Analytics Platform. The length of the input and output lists depends on the number of input and output ports of the node.
Example:
If you have a Python Script (Labs) node configured with two input tables and one input object, you can
access the two tables via knime_io.input_tables[0]
and knime_io.input_tables[1]
, and the input object
via knime_io.input_objects[0]
.
- knime_io.flow_variables: Dict[str, Any] = {}
A dictionary of flow variables provided by the KNIME workflow. New flow variables can be added to the output of the node by adding them to the dictionary. Supported flow variable types are numbers, strings, booleans and lists thereof.
- knime_io.input_objects: List = <knime_table._FixedSizeListView object>
A list of input objects of this script node using zero-based indices. Input objects are Python objects that are passed in from another Python script node’s
output_object
port. This can, for instance, be used to pass trained models between Python nodes.
- knime_io.input_tables: List[knime_table.ReadTable] = <knime_table._FixedSizeListView object>
The input tables of this script node. Tables are available in the same order as the port connectors are displayed alongside the node, using zero-based indexing.
- knime_io.output_images: List = <knime_table._FixedSizeListView object>
The output images of this script node. The value passed to the output port should be an array of bytes encoding an SVG or PNG image.
Example:
data = knime_io.input_tables[0].to_pandas() buffer = io.BytesIO() pyplot.figure() pyplot.plot('x', 'y', data=data) pyplot.savefig(buffer, format='svg') knime_io.output_images[0] = buffer.getvalue()
- knime_io.output_objects: List = <knime_table._FixedSizeListView object>
The output objects of this script node. Each output object can be an arbitrary Python object as long as it can be pickled. Use this to, for example, pass a trained model to another Python script node.
Example:
model = torchvision.models.resnet18() ... # train/finetune model ... knime_io.output_objects[0] = model
- knime_io.output_tables: List[knime_table.WriteTable] = <knime_table._FixedSizeListView object>
The output tables of this script node. You should assign a WriteTable or BatchWriteTable to each output port of this node. See the factory methods
knime_io.write_table()
andknime_io.batch_write_table()
below.Example:
knime_io.output_tables[0] = knime_io.write_table(my_pandas_df)
Factory methods
Use these methods to fill the knime_io.output_tables
.
- knime_io.batch_write_table() knime_table.BatchWriteTable
Factory method to create an empty BatchWriteTable that can be filled batch by batch.
Example:
table = knime_io.batch_write_table() table.append(df_1) table.append(df_2) knime_io.output_tables[0] = table
- knime_io.write_table(data: Union[knime_table.ReadTable, pandas.DataFrame, pyarrow.Table], sentinel: Optional[Union[str, int]] = None) knime_table.WriteTable
Factory method to create a WriteTable given a pandas.DataFrame or a pyarrow.Table. If the input is a pyarrow.Table, its first column must contain unique row identifiers of type ‘string’.
Example:
knime_io.output_tables[0] = knime_io.write_table(my_pandas_df, sentinel="min")
- Parameters
data – A ReadTable, pandas.DataFrame or a pyarrow.Table
sentinel –
Interpret the following values in integral columns as missing value:
"min"
min int32 or min int64 depending on the type of the column"max"
max int32 or max int64 depending on the type of the columna special integer value that should be interpreted as missing value
Classes
- class knime_table.Batch
A batch is a part of a table containing data. A batch should always fit into system memory, thus all methods accessing the data will be processed immediately and synchronously.
It can be sliced before the data is accessed as pandas.DataFrame or pyarrow.RecordBatch.
- __getitem__(slicing: Union[slice, Tuple[slice, Union[slice, List[int], List[str]]]]) knime_table.SlicedDataView
Creates a view of this batch by slicing specific rows and columns. The slicing syntax is similar to that of numpy arrays, but columns can also be addressed as index lists or via a list of column names.
- Parameters
row_slice – A slice object describing which rows to use.
column_slice – Optional. A slice object, a list of column indices, or a list of column names.
- Returns
A SlicedDataView that can be converted to pandas or pyarrow.
Examples:
Get the full batch:
full_batch = batch[:]
Get the first 100 rows of columns 1,2,3,4:
sliced_batch = batch[:100, 1:5]
Get all rows of the columns “name” and “age”:
sliced_batch = batch[:, ["name", "age"]]
The returned sliced_batches cannot be sliced further. But they can be converted to pandas or pyarrow.
- abstract property column_names: Tuple[str, ...]
Returns the list of column names.
- abstract property num_columns: int
Returns the number of columns in the table.
- abstract property num_rows: int
Returns the number of rows in the table.
If the table is not completely available yet because batches are still appended to it, querying the number of rows blocks until all data is available.
- property shape: Tuple[int, int]
Returns a tuple in the form (numRows, numColumns) representing the shape of this table.
If the table is not completely available yet because batches are still appended to it, querying the shape blocks until all data is available.
- abstract to_pandas(sentinel: Optional[Union[str, int]] = None) pandas.DataFrame
Access the batch or table as a pandas.DataFrame.
- Parameters
sentinel –
Replace missing values in integral columns by the given value, one of:
"min"
min int32 or min int64 depending on the type of the column"max"
max int32 or max int64 depending on the type of the columnAn integer value that should be inserted for each missing value
- Raises
IndexError – If rows or columns were requested outside of the available shape
- abstract to_pyarrow(sentinel: Optional[Union[str, int]] = None) Union[pyarrow.RecordBatch, pyarrow.Table]
Access this batch or table as a pyarrow.RecordBatch or pyarrow.table. The returned type depends on the type of the underlying object. When called on a ReadTable, returns a pyarrow.Table.
- Parameters
sentinel –
Replace missing values in integral columns by the given value, one of:
"min"
min int32 or min int64 depending on the type of the column"max"
max int32 or max int64 depending on the type of the columnAn integer value that should be inserted for each missing value
- Raises
IndexError – If rows or columns were requested outside of the available shape
- class knime_table.ReadTable
A KNIME ReadTable provides access to the data provided from KNIME, either in full (must fit into memory) or split into row-wise batches.
- __getitem__(slicing: Union[slice, Tuple[slice, Union[slice, List[int], List[str]]]]) knime_table.SlicedDataView
Creates a view of this ReadTable by slicing rows and columns. The slicing syntax is similar to that of numpy arrays, but columns can also be addressed as index lists or via a list of column names.
- Parameters
row_slice – A slice object describing which rows to use.
column_slice – Optional. A slice object, a list of column indices, or a list of column names.
- Returns
a SlicedDataView that can be converted to pandas or pyarrow.
Examples:
Get the first 100 rows of columns 1,2,3,4:
sliced_table = table[:100, 1:5]
Get all rows of the columns “name” and “age”:
sliced_table = table[:, ["name", "age"]]
The returned sliced_tables cannot be sliced further. But they can be converted to pandas or pyarrow.
- __len__() int
Returns the number of batches of this table
- abstract batches() Iterator[knime_table.Batch]
Returns an generator for the batches in this table. If the generator is advanced to a batch that is not available yet, it will block until the data is present. len(my_read_table) gives the static amount of batches within the table, which is not updated.
Example:
processed_table = knime_io.batch_write_table() for batch in knime_io.input_tables[0].batches(): input_batch = batch.to_pandas() # process the batch processed_table.append(input_batch)
- abstract property column_names: Tuple[str, ...]
Returns the list of column names.
- abstract property num_batches: int
Returns the number of batches in this table.
If the table is not completely available yet because batches are still appended to it, querying the number of batches blocks until all data is available.
- abstract property num_columns: int
Returns the number of columns in the table.
- abstract property num_rows: int
Returns the number of rows in the table.
If the table is not completely available yet because batches are still appended to it, querying the number of rows blocks until all data is available.
- property shape: Tuple[int, int]
Returns a tuple in the form (numRows, numColumns) representing the shape of this table.
If the table is not completely available yet because batches are still appended to it, querying the shape blocks until all data is available.
- abstract to_pandas(sentinel: Optional[Union[str, int]] = None) pandas.DataFrame
Access the batch or table as a pandas.DataFrame.
- Parameters
sentinel –
Replace missing values in integral columns by the given value, one of:
"min"
min int32 or min int64 depending on the type of the column"max"
max int32 or max int64 depending on the type of the columnAn integer value that should be inserted for each missing value
- Raises
IndexError – If rows or columns were requested outside of the available shape
- abstract to_pyarrow(sentinel: Optional[Union[str, int]] = None) Union[pyarrow.RecordBatch, pyarrow.Table]
Access this batch or table as a pyarrow.RecordBatch or pyarrow.table. The returned type depends on the type of the underlying object. When called on a ReadTable, returns a pyarrow.Table.
- Parameters
sentinel –
Replace missing values in integral columns by the given value, one of:
"min"
min int32 or min int64 depending on the type of the column"max"
max int32 or max int64 depending on the type of the columnAn integer value that should be inserted for each missing value
- Raises
IndexError – If rows or columns were requested outside of the available shape
- class knime_table.WriteTable
A table that can be filled as a whole.
- abstract property column_names: Tuple[str, ...]
Returns the list of column names.
- abstract property num_batches: int
Returns the number of batches in this table.
If the table is not completely available yet because batches are still appended to it, querying the number of batches blocks until all data is available.
- abstract property num_columns: int
Returns the number of columns in the table.
- abstract property num_rows: int
Returns the number of rows in the table.
If the table is not completely available yet because batches are still appended to it, querying the number of rows blocks until all data is available.
- property shape: Tuple[int, int]
Returns a tuple in the form (numRows, numColumns) representing the shape of this table.
If the table is not completely available yet because batches are still appended to it, querying the shape blocks until all data is available.
- class knime_table.BatchWriteTable
A table that can be filled batch by batch.
- abstract append(data: Union[knime_table.Batch, pandas.DataFrame, pyarrow.RecordBatch], sentinel: Optional[Union[str, int]] = None)
Appends a batch with the given data to the end of this table. The number of columns, as well as their data types, must match that of the previous batches in this table. Note that this cannot take a pyarrow.Table as input. With pyarrow, it can only process batches, which can be created as follows from some input table.
Example:
processed_table = knime_io.batch_write_table() for batch in knime_io.input_tables[0].batches(): input_batch = batch.to_pandas() # process the batch processed_table.append(input_batch)
- Parameters
data – A batch, a pandas.DataFrame or a pyarrow.RecordBatch
sentinel –
Only if data is a pandas.DataFrame or pyarrow.RecordBatch. Interpret the following values in integral columns as missing value:
"min"
min int32 or min int64 depending on the type of the column"max"
max int32 or max int64 depending on the type of the columna special integer value that should be interpreted as missing value
- Raises
ValueError – If the new batch does not have the same columns as previous batches in this Writetable.
- abstract property column_names: Tuple[str, ...]
Returns the list of column names.
- static create() knime_table.BatchWriteTable
Create an empty BatchWriteTable
- abstract property num_batches: int
Returns the number of batches in this table.
If the table is not completely available yet because batches are still appended to it, querying the number of batches blocks until all data is available.
- abstract property num_columns: int
Returns the number of columns in the table.
- abstract property num_rows: int
Returns the number of rows in the table.
If the table is not completely available yet because batches are still appended to it, querying the number of rows blocks until all data is available.
- property shape: Tuple[int, int]
Returns a tuple in the form (numRows, numColumns) representing the shape of this table.
If the table is not completely available yet because batches are still appended to it, querying the shape blocks until all data is available.
Node Extension Development
These classes can be used by developers to implement their own python nodes for KNIME.
- class knime_node.PythonNode
Extend this class to provide a pure Python based node extension to KNIME Analytics Platform.
Users can either use the decorators @kn.input_port, @kn.output_port, and @kn.view, or populate the input_ports, output_ports, and view attributes.
Use the Python logging facilities and its .warn and .error methods to write warnings and errors to the KNIME console.
Example:
@kn.node("My Predictor", node_type="Predictor", icon_path="icon.png", category="/") @kn.input_port("Trained Model", "Trained fancy machine learning model", id="org.example.my.model") @kn.input_table("Data", "The data on which to predict") class MyPredictor(): def configure(self, in_specs: List[Union[kn.Schema, kn.BinaryPortObjectSpec]]): return in_specs[1] def execute(self, input_data: List[Union[kn.Table, bytes]]): model = self._load_model_from_bytes(input_data[0]) table = input_data[1] new_col = model.predict(table.to_pandas()) # return [table.append_column(new_col)] def _load_model_from_bytes(self, data): return pickle.loads(data)
- abstract configure(config_context: knime_node.ConfigurationContext, *inputs)
Configure this Python node.
- Parameters
config_context – The ConfigurationContext providing KNIME utilities during execution
*inputs – Each input table spec or binary port spec will be added as parameter, in the same order that the ports were defined.
- Returns
Either a single spec, or a tuple or list of specs. The number of specs must match the number of defined output ports, and they must be returned in this order.
- Raises
InvalidConfigurationError – If the input configuration does not satisfy this node’s requirements.
- abstract execute(exec_context: knime_node.ExecutionContext, *inputs)
Execute this Python node.
- Parameters
exec_context – The ExecutionContext providing KNIME utilities during execution
*inputs – Each input table or binary port object will be added as parameter, in the same order that the ports were defined. Tables will be provided as a kn.Table, while binary data will be a plain Python bytes object.
- Returns
Either a single output object (table or binary), or a tuple or list of objects. The number of output objects must match the number of defined output ports, and they must be returned in this order. Tables must be provided as a kn.Table or kn.BatchOutputTable, while binary data should be returned as plain Python bytes object.
- class knime_node_table.Table
public API
- static from_pandas()
Factory method to create a Table given a pandas.DataFrame. The index of the data frame will be used as RowKey by KNIME.
Example:
Table.from_pandas(my_pandas_df, sentinel="min")
- Parameters
data – A pandas.DataFrame
sentinel –
Interpret the following values in integral columns as missing value:
"min"
min int32 or min int64 depending on the type of the column"max"
max int32 or max int64 depending on the type of the columna special integer value that should be interpreted as missing value
- static from_pyarrow()
Factory method to create a Table given a pyarrow.Table. The first column of the pyarrow.Table must contain unique row identifiers of type ‘string’.
Example:
Table.from_pyarrow(my_pyarrow_table, sentinel="min")
- Parameters
data – A pyarrow.Table
sentinel –
Interpret the following values in integral columns as missing value:
"min"
min int32 or min int64 depending on the type of the column"max"
max int32 or max int64 depending on the type of the columna special integer value that should be interpreted as missing value
- abstract property schema: knime_schema.Schema
The schema of this table, containing column names, types, and potentially metadata
- to_batches() Iterator[knime_node_table.Table]
Returns a generator over the batches in this table. A batch is part of the table with all columns, but only a subset of the rows. A batch should always fit into memory (max size currently 64mb). The table being passed to execute() is already present in batches, so accessing the data this way is very efficient.
Example:
output_table = BatchOutputTable.create() for batch in my_table.to_batches(): input_batch = batch.to_pandas() # process the batch output_table.append(Table.from_pandas(input_batch))
- to_pandas(sentinel: Optional[Union[str, int]] = None) pandas.DataFrame
Access this table as a pandas.DataFrame.
- Parameters
sentinel –
Replace missing values in integral columns by the given value, one of:
"min"
min int32 or min int64 depending on the type of the column"max"
max int32 or max int64 depending on the type of the columnAn integer value that should be inserted for each missing value
- to_pyarrow(sentinel: Optional[Union[str, int]] = None) Union[pyarrow.Table, pyarrow.RecordBatch]
Access this table as a pyarrow.Table.
- Parameters
sentinel –
Replace missing values in integral columns by the given value, one of:
"min"
min int32 or min int64 depending on the type of the column"max"
max int32 or max int64 depending on the type of the columnAn integer value that should be inserted for each missing value
- class knime_schema.Schema(types: List[knime_schema.KnimeType], names: List[str], metadata: Optional[List] = None)
A schema defines the data types and names of the columns inside a table. Additionally it can hold metadata for the individual columns.
- property column_names: List[str]
Return the list of column names
- classmethod from_columns(columns: List[knime_schema.Column])
Create a schema from a list of columns
- classmethod from_knime_dict(table_schema: dict) knime_schema.Schema
Construct a Schema from a dict that was retrieved from KNIME in JSON encoded form as the input to a node’s configure() method.
KNIME provides table information with a RowKey column at the beginning, which we drop before returning the created schema.
- classmethod from_types(types: List[knime_schema.KnimeType], names: List[str], metadata: Optional[List] = None)
Create a schema from a list of column data types, names and metadata
- property num_columns
The number of columns in this schema
- to_knime_dict() Dict
Convert this Schema into dict which can then be JSON encoded and sent to KNIME as result of a node’s configure() method.
Because KNIME expects a row key column as first column of the schema but we don’t include this in the KNIME Python table schema, we insert a row key column here.