Audio version of the article
PyTorch Tabular is designed to make the standard modelling pipeline easy enough for practitioners and standard enough for production deployment.
Earlier this month, PyTorch Tabular v0.7.0 was released on PyPI. This latest version of PyTorch Tabular aims to make deep learning with tabular data easy and accessible for real world research and use cases and research. The core principle behind the library’s design includes low resistance usability, easy customization and easy implementation and scalability.
Why PyTorch Tabular?
Despite being inappropriately effective on modalities such as image and text, deep learning has always lagged behind the increasing gradient in tabular data, both in popularity and performance, but in recent years newer tabular data models have been explicitly created that show the performance of Deep -Learning models increase, when it comes to popularity there are still challenges as there are no standard libraries like SciKit Learn for deep learning.
PyTorch Tabular was developed by Manu Joseph and is a new deep learning library that makes working with deep learning and tabular data quick and easy. The library was built on the PyTorch and PyTorch Lightning frameworks and works directly with Pandas data frames. In addition, many cutting-edge models such as NODE and TabNet have already been integrated and implemented in the library with a uniform API.
Besides PyTorch Tabular, other availability models include:
- FeedForward Network is a simple FF network with embedding layers for the categorical columns.
- Neural oblivious decision ensembles (NODE) for DL on tabular data was presented in ICLR 2020. On many datasets, it has beaten the well-tuned gradient boosting models.
- TabNet: Developed by Google Research, attentive interpretable tabular learning is another model which uses ‘sparse attention’ in various steps of decision making to model the outcome.
- Mixture density networks use ‘Gaussian components’ to approximate the target function and provide a probabilistic prediction.
- AutoInt: Automatic feature interaction learning using self-attentive neural networks is a model that attempts to learn interactions between the features and creates a better representation. Later, this representation is used in the downstream scenario.
- An adaptation of the transformer model for tabular data, TabTransformer creates contextual representations for categorical features.
PyTorch Tabular Design
According to the author, PyTorch Tabular was designed to make the standard modeling pipeline for professionals and standard enough for production use, along with a focus on customization to allow widespread use in research. To satisfy these objectives, PyTorch Tabular has adopted a ‘config-driven’ approach. It contains five configuration files that control the whole process: DataConfig, ModelConfig, TrainerConfig, OptimizerConfig and ExperimentConfig. These configuration files are programmed programmatically and configured through YAML files, making the work of data scientists and ML engineers easier. In addition to this, PyTorch Tabular uses BaseModel. This abstract class implements the standard part of any model definition like loss and metric calculation etc. along with Data Module and Tabular Model. PyTorch Tabular uses Data Module to unify and standardize the processing data and Tabular Model to merge the settings, initialize the correct model, the data module, and handle the train and prediction function with methods like “fit” and “predict”.
Deep learning for tabular data is becoming increasingly popular in the research community and industry. With increasing popularity, it becomes essential to have a user-friendly, unified API for tabular data, similar to what scikit learning has done for classic machine learning algorithms. PyTorch Tabular plans to lower the barriers to entry when using new SOTA deep learning model architectures and to reduce the “engineering” effort for researchers and developers.