I’m interested in a fast and effective Dataframes library for a while. Pandas is defacto data frame library of choice in Python and Data Science, having in mind the easy of its use and available features.
However, most of the time it is used for data previewing, analysis, and the interface between data files and ML algorithms.
There are a lot of other Python data frames libraries, and some of them have interesting performance offerings, memory usage, and processing speed for example.
The best thing I was able to find with those performance offerings is Vaex:
It uses memory-mapped files instead of loading the whole data into RAM. Therefore, it is possible to work with 60 GB files, for example.
And, in advance to this, according to all the comparations I saw till now, it is the fastest tool for dataframes.
Few links about it:
I think that this could be enough to make you interested for Vaex 🙂
If you find something bad in it, or even better framework, please write a comment about it.