In this post, we look at some of the considerations you may want to keep in mind when building an ETL pipeline similar to the data collection and exploration as seen...
Over the last few months, Bodo and SaturnCloud have worked together to provide a joint solution with Bodo software running within Saturn Cloud resources. Data scientists can now access...
Python is the language of choice for AI and machine learning (ML), but SQL has been used...
In this post, we will run the code using Bodo. It means that the data will be distributed in chunks across processes. Bodo's documentation provides more information about the parallel execution model. If you want to run the example using Pandas only (without Bodo), simply...
Bodo allows machine learning practitioners to rapidly explore data and build complex pipelines. Using Bodo, developers can seamlessly scale their codes from using their own laptop to using Bodo's platform. In this series, we will...
Bodo’s mission is to enable easy access to high-performance computing; to build a platform that makes working with petabyte-scale datasets as fast and straightforward as running pandas on small datasets using a laptop. We believe...
The Snowflake Data Cloud simplifies data management for data engineers at a near-unlimited scale, while the Bodo Platform brings extreme performance and scalability to large-scale Python data processing. Snowflake and Bodo have combined forces to give data teams...
Overheads of master-executor systems like Spark have been justified as a “necessary evil” for achieving resilience. However, we have shown that Bodo can achieve much higher resilience without extra overheads.
Amazon S3 is one of the most popular technologies that data engineers use to store data as a data lake. One of the typical applications is to read compressed parquet files as part of the extract process in an ETL (extract-transform-load) pipeline...
Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.