amazon web services

Fixing Google Crawl Errors with AWS S3 hosting with CloudFront CDN

Ugh. I truly hate Amazon Web Services.

Sure, if you’ve got years of Network Admin experience, Cloud Computing Degrees or just general good sense and an endless amount of patience, AWS will be a breeze. If you’re anything like me (impatient, easily-confused, and quick to anger) then AWS can be the stuff of horrible, horrible nightmares…

So with that in mind, I’m going to write a quick post to try and remember how I’ve done stuff, and how you might avoid some of the problems I’ve been facing of late.

TL;DR

If you’re thinking of creating redirects from http to https, or if you’re redirecting from one domain to another, you’ll more than likely need to create multiple S3 buckets; one to actually host your site, and others to catch the URL and use Route 53 redirect to your main bucket. Also, if you’re using React Router, you’ll need to configure AWS CloudFront to Create Custom Error Responses, directing the browser to your main index.html file (where React Router will then direct to the correct place). This is required mainly for Google’s webcrawlers who will just return a 404 and remove subpages of your app from Google’s indexes. Ugh.

MY SET-UP

So, with my FretMap app, I have the following:

I basically just want to host a static website (no-backend, no server-side rendering) on an AWS bucket, and have CloudFront serve it up from different spots across the world. Simple right? Well not for me.

My first attempt at using AWS was buying to the Elastic Beanstalk promise of:

…Quickly deploy and manage applications in the AWS Cloud without worrying about the infrastructure that runs those applications.

Which obviously turned out to be just a sales-pitch as I spent at least a couple of weeks wrestling with config, https certification and redirect hell, caching issues and oodles of impenetrable AWS documentation and ended up with nothing to show for it. I even forked out to Amazon for support (they charge you a minimum of 1 month support fees and only respond occasionally within 9-5 working hours – often you’ll wait a couple of days for a response).

So eventually I gave up and decided to try hosting my website on an AWS S3 Bucket (usually these are using for hosting images, files, etc, but can used to host static websites too).

After signing up for an S3 bucket, I used Travis CI to grab my site from Github, run tests on it, and if it passes, deploy the site to my S3 bucket (read up on that here). So far so good, but a problem I faced was redirecting http version of my site to https (and also www to non-www). To do this, it seems you’ll need to create multiple S3 buckets, one to host your site, and others to associate a Route 53 domain name to it, and have S3 redirect it to your main bucket.

Here’s a couple of good resources on how to do that:

WON’T SOMEBODY PLEASE THINK OF THE ROUTING

For the React Router side of things, it’s import to remember that the subpages of your static single-page React app don’t actually exist on the server, they are generated by JS on the fly. This is important because:

Server: mysite.com (ALL GOOD! return 200) but mysite.com/about (WHAT? THAT DOESN’T EXIST – return 404)

The above problem will mean that your users (and search engines) won’t be able to reach any of your subpages if they were to try to hit them directly (though they’ll likely work if you land on the homepage first), so to get around the problem with AWS and CloudFront, you’ll need to do the following:

CONFIGURE STATIC WEB HOSTING IN S3

Go to:

  1. AWS
  2. S3
  3. Click on the name of your bucket where your site is being hosted
  4. Properties
  5. Static Website Hosting
  6. Use this bucket to host a Website
  7. Make sure both Index document and Error document are both a reference to your main root file (i.e. index.html)

CONFIGURE CLOUDFRONT TO USE CREATE CUSTOM ERROR RESPONSE

  1. AWS
  2. CloudFront
  3. Click on the ID name of each of your CloudFront Distributions
  4. Error Pages (tab)
  5. Click Create Custom Error Response
  6. Update the fields as shown in the image below:

Note: Make sure your Response Page Path relates to your root document, i.e. index.html.

CONCLUSION

As it stands, my website (including subpages) are navigable by users and index-able by search engines. You can test this using Google Webmasters > Fetch as Google which should now hopefully return nice happy 200s, rather than angry red 404s.

As a side note with this, a cause of Fetch as Google returning a blank page, or a soft 404 / temp unavailable can be down to it using an older JavaScript engine which won’t compile your JavaScript if you’re using ES6+. Make sure you have all your nice new JS features transpiled correctly through Babel or with something like Polyfill.io.

PS. Sorry this post is rambling, I honestly can’t remember a lot of what I had to do get my site working, which is almost completely down to the complexity of using AWS for idiots like myself – but hopefully there’s something here than can be of use.