Data Science & AI

Introduction Skippa

Written by
DSL
Published on
December 30, 2021

Summary

Any Data Scientist is probably familiar with pandas and scikit-learn.
The usual workflow starts with data cleaning in pandas, further preprocessing using pandas or scikit-learn transformers like StandardScaler, OneHotEncoder etc., then you start working with a machine learning algorithm (scikit-learn).
Now there are some problems with this workflow:
1. The development phase in your workflow is quite complex and requires a lot of code ? 2. It is difficult to reproduce workflow for forecasting in the implementation phase ? 3. Existing solutions to reduce these problems are not good enough (yet) ? Skippa is a package designed to:

  • drastically simplify development
  • ?
    pack all data cleaning and pre-processing along with the algorithm into a single pipeline file
  • ?
    reuse the interface from pandas and scikit-learn, which you are already familiar with”

Skippa helps you easily define data cleaning and pre-processing transformations.
It works roughly as follows:

from skippa import Skippa, columns
from sklearn.linear_model import LogisticRegression

X, y = get_training_data(...)

pipeline = (
    Skippa()
    .impute(columns(dtype_include='object'), strategy='most_frequent')
    .impute(columns(dtype_include='number'), strategy='median')
    .scale(columns(dtype_include='number'), type='standard')
    .onehot(columns(['category1', 'category2']))
    .model(LogisticRegression())
)
pipeline.fit(X, y)

predictions = pipeline.predict_proba(X)

☝️Skippa does not presume to solve all problems, does not cover all the functionality you might ever need, and is not a highly scalable solution, but it should be able to provide a huge simplification for > 80% of regular pandas/sklearn-based machine learning projects.

Links

You can read the rest of the blog here > Introduction Skippa [ENG]

Questions? Please contact us

Blog

This is also interesting

Lorem ipsum dolor sit amet, consectetur adipiscing elit.

AI kerstkaart persoonlijk

You know the drill: Christmas is coming and again you’re too late to send Christmas cards. Meanwhile, your parents, Aunt Jannie and…

Churn reduceren

Recognize it? As an organization with subscription services, reducing churn is probably on the agenda. Not the most popular topic, because we…

What are the possibilities of GenAI, Large Language Models (LLMs) for the internal organization? How to implement an LLM effectively for organizations….

Sign up for our newsletter

Do you want to be the first to know about a new blog?