Creating a Bucket — A Deep Dive

*Check out the updated version of this post here*

Buckets are the cornerstone, foundation, and core of everything you do with your Data in Google Cloud Storage.

Anything you want to store in Cloud Storage needs to be in a bucket in order for you to do anything with it, so it’s an obvious starting point.

Before you jump in, there are some decisions to be made, so that when you create your bucket, you’re prepared to specify 4 things: A globally unique, static name, a storage class and location, and an access policy.

Your Globally Unique Static Name

Like any data storage system you work with (your desktop / laptop operating system, for example) your bucket needs a name so that you can reference it later. But unlike your local operating system, naming a cloud bucket has a few restrictions.

Firstly — understand that all bucket names, for all buckets (FOR ALL USERS!) share a single namespace. In simpler-terms, this means that all bucket names are always publicly visible (even though access to those buckets might not be) and as such, needs to be unique. So picking your bucket name is a lot like choosing a username for a new website or social service — You need to make it descriptive enough for you to know what it is, but unique enough that no one else has it (I like JBRO-ROXXX-546, personally..).

Secondly — There’s a set of restrictions on creating a name. Mainly that there’s a character limit, restrictions on what types of characters you can use, etc. (you can have fun discovering this yourself, or click here for an exhaustive list.

Finally — Understand that the name of your bucket is FOREVER. Buckets can’t be renamed after they are created. So if you want to change the name, you have to do a weird dance where you create a new bucket with the name you want, and transfer over your contents later (check this article for help with moving and renaming buckets).

This name will also need to fit a few other parameters, neatly laid out at the top of this article (we’re talking number of characters, type of characters, etc.).

Someone already took this name! drats!

Storage Classes

Once your name is chosen, the next thing you need to do is decide what TYPE of bucket you’re creating. Since GCS is a cloud based service, there’s lots of nuances in how the data is stored, where it is stored, and the type of performance and resiliency you want in case of an emergency. As such, GCS defines 4 bucket types that you can choose from — Regional, Multi-Regional, Nearline, and Coldline.

Real-time content serving

If you’re going to be serving the content of this bucket to your users, around the world, then “regional” and “multiregional” bucket types are what you should be looking for. Both of these bucket types offer low latency, and high durability — which is perfect for serving content for your real time application needs (either to users, or to your backends). And all objects in the bucket will be served based upon the bucket type. The only nuance between the two is location; Or rather — the location of the data centers your content is served from.

A quick aside — Regions and zones

I might be getting ahead of myself here, but GCP services are available at worldwide locations, and are divided into regions and zones. Where a zone is a geographic area covered by the physical datacenters, and a region contains a number of zones. Alright, back to it!

Regional || Multiregional

A Regional bucket allows you to define a single datacenter, or geographic area, in which to store and serve your data from. This is ideal in situations where you might have a single on-prem system, or a compute backend which is housed in the same geographic area. Basically, you know that no one outside of the geographic area is going to access the content, so there’s no need to copy / store in any other geo-locations.

A Multiregional bucket is for when you expect content to be accessed by clients in multiple geographic locations. Effectively, GCS will copy the data in this bucket between multiple datacenters in a region, and allow it to be served from the one closest to the user requesting the data. Multiregional buckets are ideal for getting the lowest distance-based-latency to your users, and also really important to protect your data from geographic outages.

In general — If you’re serving content to users across a spectrum of geolocations, multiregional buckets are the way to go.

Backups & Recovery Serving

Besides being able to serve your data with low-latency to a large number of locations around the world, GCS is also really good at storing data you don’t need right now. This is ideal for situations where you want to backup your data, or keep copies in case of regulatory situations, or disaster recovery.

A Nearline bucket type allows your content to be semi-accessible; If you only access that content about every month or so, this is the right place for you to store it, given the cost to access it.

A Coldline bucket, on the other hand, is where you store your data that you only access every year or so.

The tradeoff is straightforward — these types of buckets allow you to pay less to store the data (compared to their real time counterparts) but the operational costs (the price to access) is going to be higher when you do access it.

Change your mind?

There’s many situations where you might need to change the TYPE of bucket you’ve chosen here. For example, you upload a lot of data, serve it, but then need to upload a new version, while keeping the old-version for historical records. This is the type of pattern that’s needed, and is why you can change the storage type of a bucket after it’s created by following the steps in this article.

Location

As I mentioned above, GCP services are divided into regions and zones. And the next step of creating your bucket is figuring out where, geographically, it should call home base.

While the decision on what bucket type you need is based upon redundancy and access frequency, your location is all about the access locations of your clients, and what your expected first-time-to-byte is when caching is turned off.

That’s a lot of regions!

You can choose between a regional location and a multi-regional location depending on what type of bucket option you’ve chosen.

A Regional Location is a specific geographic place — like a city, and listed based upon common areas you may be familiar with:

A Multi-regional Location is a large geographic area, like a country, containing at least two geographic places.

Just as with storage classes, choosing a multi-regional location will increase your geo-redundancy, and help if you have distributed users, whereas regional locations will help you optimize latency and network bandwidth for users that are grouped in the same region.

To get the best of both worlds, you could even look into a dual-regional location — this will give you regional-like performance with the added geo-redundancy of having two specific regions instead of one!

Controlling Access

After establishing the name, location, and storage class of your bucket, you’ll want to decide on the access control model you’d like for the bucket — meaning who and what has access to the contents.

Truth be told — there’s a lot of nuances in this decision that have to do with how your bucket is going to be used in the future. Unless you’ve done this a few times and kicked the tires, you might not understand these nuances at first glance.

In simplistic terms — Cloud Storage uses GCP IAM policies to determine who has access to a specific resource (for example, you might want your finance team to have access, but not your PR team). For GCS, you need to figure out how fine-grained you need this.

Setting permissions uniformly at the bucket means that you won’t be able to set per-object access later on. This is perfect for situations where you have a public bucket and want to serve assets to your website or clients — You wouldn’t want to have to set the access on each object individually, because that’s insane.

If you’re serving and providing content based upon some access lists, however, then setting object-level and bucket-level permissions will give you the type of control you need to handle these situations as they come up.

Get Served!

Creating a GCS bucket is the first step in your adventure to serving content to the masses. Each of these options I’ve covered above has some nuance associated with them, so stay tuned, because I’m going to be digging deeper into each one with future videos & blog posts.

Google Cloud Developer Advocate, Thinker, Feeler, Adventurer, Surfer, Burner. Opinions are my own, but I’m happy to share.