A guide for utilizing the cloud with every step of the data science workflow
Data science is one of the fasting growing industries in the world, utilizing modern, cutting-edge technology to improve the way we use data. However, if you’ve worked in data science you probably know that one day you will inevitably find yourself staring at an Excel sheet. And there’s nothing wrong with Excel, it’s just not the kind of tool you would expect to use when working in one of the most modern industries.
Many organizations have begun utilizing modern cloud infrastructure but not to the full extent. So many data scientists will find themselves pulling data from a cloud data warehouse just to train a model on their local system. There’s nothing wrong with that too, but what if we could bring the entire data science workflow to the cloud? Well, we can!
From data cleaning to model deployment, there’s a cloud-based tool that you can use to modernize your workflow. In this article, I’m going to go through each step of the data science workflow and show how you can transition it to the cloud and provide some examples along the way. Feel free to skip around if you’ve already modernized part of your workflow but if you want the 100% cloud data science experience; stay tuned!
Data Collection and Storage on the Cloud
Chances are you’re already familiar with the benefits of storing data on the cloud, but in case you haven’t heard: it’s pretty great! Storing your data on the cloud lets you access your data from anywhere with an internet connection, integrate it easily with other cloud services, scale your storage capacity to as much as you need, create backups for recovery, and many other very helpful things.
Whether or not you need a data warehouse, data lake, or object storage, your data will have to live somewhere if you want to deploy it to other applications. There are tons of services that offer cloud data storage; some of the more popular ones include:
AWS S3Azure Blob StorageGoogle Cloud StorageHadoop