Understanding various types of data transformation libraries

In this episode I speak about data transformation frameworks available for the data scientist who writes Python code.
The usual suspect is clearly Pandas, as the most widely used library and de-facto standard. However when data volumes increase and distributed algorithms are in place (according to a map-reduce paradigm of computation), Pandas no longer performs as expected. Other frameworks play a role in such context.

In this episode I explain the frameworks that are the best equivalent to Pandas in bigdata contexts.

References

This article has been published from a wire agency feed without modifications to th etext. Only the headline has been changed.

Source link