This is a work-in-progress that shows how the pre-alpha version of Algoneer works. Please let us know if you should encounter any problems.
Algoneer provides blackbox tests that work on
objects. A Model is an
Algorithm that has been
trained with a specific
To get started we need to install Algoneer.
You need Python version >=3.6 to run Algoneer.
pip3 install algoneer
Algoneer aims to be technology-agnostic and provides wrappers for the most popular data processing and machine learning libraries. In this tutorial, we are going to use Algoneer in conjunction with pandas and scikit-learn. In case you have not installed them already, you can do so by running:
pip3 install pandas scikit-learn
Algoneer also provides a separate package with several example datasets that make it easy to get started. We can also install them using pip:
pip3 install algoneer_datasets
That’s it, we’re good to go! Let’s start using Algoneer by loading an example dataset and running a test on it.
You can find the whole example code on GitHub.
from algoneer.dataschema import DataSchema, AttributeSchema as AS # we define the data schema for the bike dataset, which helps Algoneer to automatically # run tests on the dataset and any models derived from it class BikeSchema(DataSchema): # these are the regressands, which have the "x" role instant = AS(type=AS.Type.Integer, roles=["x"]) season = AS(type=AS.Type.Categorical, roles=["x"]) yr = AS(type=AS.Type.Integer, roles=["x"]) mnth = AS(type=AS.Type.Integer, roles=["x"]) holiday = AS(type=AS.Type.Boolean, roles=["x"]) weekday = AS(type=AS.Type.Integer, roles=["x"]) workingday = AS(type=AS.Type.Boolean, roles=["x"]) weathersit = AS(type=AS.Type.Categorical, roles=["x"]) temp = AS(type=AS.Type.Numerical, roles=["x"]) atemp = AS(type=AS.Type.Numerical, roles=["x"]) hum = AS(type=AS.Type.Numerical, roles=["x"]) windspeed = AS(type=AS.Type.Numerical, roles=["x"]) # this is the regressor, which has the "y" role cnt = AS(type=AS.Type.Integer, roles=["y"])
from algoneer_datasets.bike_sharing import path from algoneer.dataset.pandas import PandasDataset # we read the CSV data into a pandas dataframe import pandas as pd df = pd.read_csv(path+'/data.csv.gz') # we wrap the dataframe with an Algoneer dataset using the bike schema ds = PandasDataset(BikeSchema(), df)
This creates a
PandasDataSet that contains
the bike sharing data. This dataset is just a thin wrapper around a pandas
dataframe and adds functionality that is helpful when using the dataset for
testing. Notably, it includes a
contains information about all attributes in the dataset.
Now, to test a machine learning model with Algoneer we first need to train one. To do this, we can again import a model from the example datasets library:
from sklearn.ensemble import RandomForestRegressor from algoneer.algorithm.sklearn import SklearnAlgorithm # we wrap the random forest classifier using the SklearnAlgorithm class algo = SklearnAlgorithm(RandomForestRegressor, n_estimators=100) # we produce a model by training the algorithm with a dataset model = algo.fit(ds)
Algorithm class is just a thin wrapper
around existing algorithms, in this case a scikit-learn random forest regressor.
Now that we have trained our model, we can run a simple black box test on it:
from algoneer.methods.blackbox.shap import SHAP shap = SHAP()
This so-called partial dependence plot is a simple test that quantifies the average effect that a given attribute has on the prediction of a machine learning model. You can read more about the test here.
Let’s run it on our model:
result = shap.run(model, ds, max_datapoints=100)
Here, max_datapoints specifies the number of datapoints that we use to average the effect of the attribute.