5 ½ Ways to get your Data to the Cloud

Jenny Brown
6 min readSep 10, 2019

--

*Check out the updated version of this post, here*

Once you’ve got a bucket, you can put data (objects) in it — but where to start? There are plenty of options, and we’ll go over 5.5 primary ways to get your data up to the cloud. Without giving too much away, the .5 method covers movement from one cloud to another.

We’ll organize them in order of “easiest” to “less easy” so you know what you’re getting into, but keep in mind that these should all be relatively straightforward, especially if they align with your goals.

1. The Google Cloud Console

The Cloud Console is a very straight-to-the-point option, as it really only requires you logging in and following a few steps.

This is your option if you have a simple serving situation or static data.

You open the Cloud storage browser, select the bucket you want to upload to, and then select the objects to upload — from individual files to folders.

This route provides you with an in-browser experience where you can easily click to create buckets and folders, and then choose, or even drag and drop the files from your local machine to upload.

2. The Command Line Tool

Also known as the GSUTIL tool, the command line tool will help you where the console has a tough time. While the console is fine for easy uploads and simplistic serving situations, for production environments, you often need an automated, command line solution.

For this, we provide the GSUTIL tool. gsutil is a Python application that lets you access Cloud Storage from the command line. It provides you with the ability to do all sorts of things like creating buckets, moving objects, or even editing metadata.

To use it, simply run the gsutil program with a myriad of command line options. For example, this command uploads a directory of files from your local machine to your GCS bucket using parallel upload.

gsutil -m cp -r dir gs://my-bucket

And this command lists out specific objects that have a version-specific URL using a wildcard.

gsutil ls -a gs://bucket/object1 gs://bucket/images/*.jpg

You can also use this command to run parallel composite uploads — they’ll help you manage performance by running your uploads in parallel and reducing overhead. We’ll devote an entire blog to the ins and outs of that later on.

gsutil -o GSUtil:parallel_composite_upload_threshold=150M cp bigfile gs://your-bucket

3. Client Libraries

At some point, you might need to interface with GCS directly from your code, rather than going out to a command line option. You can include the client libraries into your code and call a simple api to get data into a bucket or folder. And regardless of your language, we’ve got you covered; C++, C#, Golan, Java, Nodejs, PHP, Python and Ruby. You can read more about Client Libraries in this general explanation, and specifically about cloud storage client libraries — from installation to authentication, in this article for all languages.

There are even code samples to get you up and running, like this Python Command for uploading to a bucket.

def upload_blob(bucket_name, source_file_name, destination_blob_name):

“””Uploads a file to the bucket.”””

storage_client = storage.Client()

bucket = storage_client.get_bucket(bucket_name)

blob = bucket.blob(destination_blob_name)

blob.upload_from_filename(source_file_name)

print(‘File {} uploaded to {}.’.format(

source_file_name,

destination_blob_name))

4. REST API’s

If none of that does the trick, or if you’re in a locked down firewall situation, there’s always the good-old REST APIs, JSON and XML, which can let you kick off an HTTP POST request to upload data directly to a bucket or folder.

It’s a bit more complex, but it’s there, in case it fits your use case.

JSON API

The JSON API is intended for software developers who are familiar with web programming. It’s fully compatible with the Client Libraries, and is designed for accessing and manipulating your Cloud Storage projects in a more programmatic way. The API is activated by default for new projects, and easy to activate on current projects by activating it on the Google Cloud Storage JSON API page in the Console API Library. With the API you can do simple uploads (for smaller tasks that won’t be too disrupted if the connection fails, not including metadata uploadType=media

Multipart uploads, for similarly sized files that do include metadata

uploadType=multipart

And resumable uploads, for files of any size, without needing to worry about a connection interruption

uploadType=resumable

XML API

Also for those with web programming familiarity, the Cloud Storage XML API is intended for those familiar with RESTful services, and comfortable with creating applications that operate through HTTP requests.

When you’re using tools and libraries that have to work across different storage providers, or when you’re migrating from another provider to Google Cloud Storage, the XML API will be helpful, as it provides a web interface for making HTTP requests and handling HTTP responses.

The Cloud Storage XML API provides two standard HTTP methods for uploading data: POST Object and PUT Object — Unless you need to use HTML forms (usually through a web browser) to upload objects, we strongly recommend using PUT object instead of POST.

5. Google Cloud Storage Transfer Appliance

Depending on your amount of data, on offline data transfer might make the most sense. For a tremendous amount of data, you can use our secure, rackable high capacity storage server to upload data directly from your datacenter or on-prem systems.

All you need to do is request the Transfer Appliance, which you can do from the console, and after a quick consultation we ship you a device in the best size for your needs.

From there you just connect it for a secure high speed transfer, and send it back to us. We’ll alert you when your data is ready, and you’re good to go from there. More information on the Transfer Appliance can be found here.

5.5 Storage Transfer Service

If your data is already online in another cloud solution, you can quickly import it into Cloud Storage. This service will get you up and running with your multicloud environment, and can even help you transfer data within Cloud Storage, from one bucket to another. More information on that, and a helpful step-by-step guide, here.

The Transfer Service transfers data from an online data source (HTTP/HTTPS location) to an online data sink, which is essentially the data’s destination. The service performs a data transfer with a transfer operation — they can be scheduled and configured through a transfer job. This makes transfer and synchronization between data sources and sinks much easier, even when it comes to repeating schedules, deleting source data or reconfiguring buckets.

You can work with the Storage transfer service in the Google Cloud Platform Console UI, which is easiest, and also with the Google API Client Library of your choice, or even the REST APIs.

You may recall that GSUTIL can also help you transfer data between Cloud Storage and other locations. While it could work with a transfer from another cloud provider, we’d suggest using Storage Transfer Service instead. If you’re transferring data from an on-premise location, GSUTIL could be a better option, but it will depend on your use case.

And Now That we’re in the Cloud…

All of the methods we describe are laid out in further detail in this article about uploading objects, so feel free to check it out if you have more questions. Of course, there’s more than a few different things to do with your data once it’s uploaded into Google Cloud Storage, but that’s for the next blog.

--

--

Jenny Brown
Jenny Brown

Written by Jenny Brown

Google Cloud Developer Advocate, Thinker, Feeler, Adventurer, Surfer, Burner. Opinions are my own, but I’m happy to share.

Responses (1)