.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/plot_convert_pipeline_vectorizer.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note Click :ref:`here ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_plot_convert_pipeline_vectorizer.py: Train, convert and predict with ONNX Runtime ============================================ This example demonstrates an end to end scenario starting with the training of a scikit-learn pipeline which takes as inputs not a regular vector but a dictionary ``{ int: float }`` as its first step is a `DictVectorizer `_. .. contents:: :local: Train a pipeline ++++++++++++++++ The first step consists in retrieving the boston datset. .. GENERATED FROM PYTHON SOURCE LINES 22-32 .. code-block:: default import pandas from sklearn.datasets import load_boston boston = load_boston() X, y = boston.data, boston.target from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(X, y) X_train_dict = pandas.DataFrame(X_train[:,1:]).T.to_dict().values() X_test_dict = pandas.DataFrame(X_test[:,1:]).T.to_dict().values() .. GENERATED FROM PYTHON SOURCE LINES 33-34 We create a pipeline. .. GENERATED FROM PYTHON SOURCE LINES 34-44 .. code-block:: default from sklearn.pipeline import make_pipeline from sklearn.ensemble import GradientBoostingRegressor from sklearn.feature_extraction import DictVectorizer pipe = make_pipeline( DictVectorizer(sparse=False), GradientBoostingRegressor()) pipe.fit(X_train_dict, y_train) .. rst-class:: sphx-glr-script-out Out: .. code-block:: none Pipeline(steps=[('dictvectorizer', DictVectorizer(sparse=False)), ('gradientboostingregressor', GradientBoostingRegressor())]) .. GENERATED FROM PYTHON SOURCE LINES 45-47 We compute the prediction on the test set and we show the confusion matrix. .. GENERATED FROM PYTHON SOURCE LINES 47-52 .. code-block:: default from sklearn.metrics import r2_score pred = pipe.predict(X_test_dict) print(r2_score(y_test, pred)) .. rst-class:: sphx-glr-script-out Out: .. code-block:: none 0.848444978558249 .. GENERATED FROM PYTHON SOURCE LINES 53-59 Conversion to ONNX format +++++++++++++++++++++++++ We use module `sklearn-onnx `_ to convert the model into ONNX format. .. GENERATED FROM PYTHON SOURCE LINES 59-69 .. code-block:: default from skl2onnx import convert_sklearn from skl2onnx.common.data_types import FloatTensorType, Int64TensorType, DictionaryType, SequenceType # initial_type = [('float_input', DictionaryType(Int64TensorType([1]), FloatTensorType([])))] initial_type = [('float_input', DictionaryType(Int64TensorType([1]), FloatTensorType([])))] onx = convert_sklearn(pipe, initial_types=initial_type) with open("pipeline_vectorize.onnx", "wb") as f: f.write(onx.SerializeToString()) .. GENERATED FROM PYTHON SOURCE LINES 70-72 We load the model with ONNX Runtime and look at its input and output. .. GENERATED FROM PYTHON SOURCE LINES 72-82 .. code-block:: default import onnxruntime as rt from onnxruntime.capi.onnxruntime_pybind11_state import InvalidArgument sess = rt.InferenceSession("pipeline_vectorize.onnx") import numpy inp, out = sess.get_inputs()[0], sess.get_outputs()[0] print("input name='{}' and shape={} and type={}".format(inp.name, inp.shape, inp.type)) print("output name='{}' and shape={} and type={}".format(out.name, out.shape, out.type)) .. rst-class:: sphx-glr-script-out Out: .. code-block:: none input name='float_input' and shape=[] and type=map(int64,tensor(float)) output name='variable' and shape=[None, 1] and type=tensor(float) .. GENERATED FROM PYTHON SOURCE LINES 83-85 We compute the predictions. We could do that in one call: .. GENERATED FROM PYTHON SOURCE LINES 85-91 .. code-block:: default try: pred_onx = sess.run([out.name], {inp.name: X_test_dict})[0] except (RuntimeError, InvalidArgument) as e: print(e) .. rst-class:: sphx-glr-script-out Out: .. code-block:: none [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Unexpected input data type. Actual: ((seq(map(int64,tensor(float))))) , expected: ((map(int64,tensor(float)))) .. GENERATED FROM PYTHON SOURCE LINES 92-94 But it fails because, in case of a DictVectorizer, ONNX Runtime expects one observation at a time. .. GENERATED FROM PYTHON SOURCE LINES 94-96 .. code-block:: default pred_onx = [sess.run([out.name], {inp.name: row})[0][0, 0] for row in X_test_dict] .. GENERATED FROM PYTHON SOURCE LINES 97-98 We compare them to the model's ones. .. GENERATED FROM PYTHON SOURCE LINES 98-100 .. code-block:: default print(r2_score(pred, pred_onx)) .. rst-class:: sphx-glr-script-out Out: .. code-block:: none 0.9999999999999528 .. GENERATED FROM PYTHON SOURCE LINES 101-103 Very similar. *ONNX Runtime* uses floats instead of doubles, that explains the small discrepencies. .. rst-class:: sphx-glr-timing **Total running time of the script:** ( 0 minutes 1.592 seconds) .. _sphx_glr_download_auto_examples_plot_convert_pipeline_vectorizer.py: .. only :: html .. container:: sphx-glr-footer :class: sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_convert_pipeline_vectorizer.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_convert_pipeline_vectorizer.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_