Data Science & AI

Introduction Skippa

Written by
DSL
Published on
December 30, 2021

Summary

Any Data Scientist is probably familiar with pandas and scikit-learn.
The usual workflow starts with data cleaning in pandas, further preprocessing using pandas or scikit-learn transformers like StandardScaler, OneHotEncoder etc., then you start working with a machine learning algorithm (scikit-learn).
Now there are some problems with this workflow:
1. The development phase in your workflow is quite complex and requires a lot of code ? 2. It is difficult to reproduce workflow for forecasting in the implementation phase ? 3. Existing solutions to reduce these problems are not good enough (yet) ? Skippa is a package designed to:

  • drastically simplify development
  • ?
    pack all data cleaning and pre-processing along with the algorithm into a single pipeline file
  • ?
    reuse the interface from pandas and scikit-learn, which you are already familiar with”

Skippa helps you easily define data cleaning and pre-processing transformations.
It works roughly as follows:

from skippa import Skippa, columns
from sklearn.linear_model import LogisticRegression

X, y = get_training_data(...)

pipeline = (
    Skippa()
    .impute(columns(dtype_include='object'), strategy='most_frequent')
    .impute(columns(dtype_include='number'), strategy='median')
    .scale(columns(dtype_include='number'), type='standard')
    .onehot(columns(['category1', 'category2']))
    .model(LogisticRegression())
)
pipeline.fit(X, y)

predictions = pipeline.predict_proba(X)

☝️Skippa does not presume to solve all problems, does not cover all the functionality you might ever need, and is not a highly scalable solution, but it should be able to provide a huge simplification for > 80% of regular pandas/sklearn-based machine learning projects.

Links

You can read the rest of the blog here > Introduction Skippa [ENG]

Questions? Please contact us

Blog

This is also interesting

Lorem ipsum dolor sit amet, consectetur adipiscing elit.

DeepSeek has spectacularly positioned itself as a formidable competitor in the world of Large Language Models (LLMs). With stock prices surging and…

Image recognition has a problem Traditional computer vision models work well for specific tasks, but generalize poorly. A model that recognizes cats…

AI Summit

The AI-Action Summit in Paris (Feb 2025) once again demonstrated how crucial Artificial Intelligence (AI) is for economics, regulation and innovation. World…

Sign up for our newsletter

Do you want to be the first to know about a new blog?