A couple of weeks ago I took my first, tentative steps with hosting on AWS. As part of a mini project for a Python/Django tutorial on Udemy, I was supposed to deploy my finished Django blog on Elastic Beanstalk (EB). This post is an overview of the litany of things that went wrong, and how I overcame each of them. At the end I discuss my overall impression of the AWS experience as a complete beginner. Let's dive in.
Uploading a zip file? Not so fast...
The method shown by the teacher on Udemy was fairly simple: create a new application on AWS with the required environment (Python 3.7), upload a zip of the necessary project files, edit a couple of environment variables (inexplicably called "environment properties" on EB), click 'deploy' and presto! Your site is up and running.
When I tried to replicate these steps, a couple of things went wrong. Despite checking multiple times that I had exactly the right file structure in my zipped application folder, when I uploaded it to EB I got the spectacularly unhelpful message of "Validation error" (with no further detail).
What's more, clicking through to the environment properties panel, I was confronted with an empty box with no possible way to add any variables (sorry, Amazon, I'm going to keep calling them variables, just like 99.999% of other developers on the planet). Presumably this was somehow related to the cryptic validation error.
For reference, when creating a new application with the default demo app provided by AWS, this is what the working environment properties panel looks like:
Since I was using a Python 3.8 environment (the most recent available) whereas in the tutorial Python 3.7 was used, I wondered if maybe this environment was just borked. So I tried changing to Python 3.7 and re-uploading, but no dice.
After numerous failed attempts at repackaging the project, deleting and recreating the application, re-uploading the files and so on, and with nothing more to go on than "validation failed", I realised I would need to dive into the CLI to make any progress. From the complete absence of results online concerning my error, it seems either nobody uses the file upload method anyway, or I'm the first person in history to encounter a problem with it.
Bad gateway! Bad! BAD!
Fortunately there's plenty of documentation online about how to create an environment and deploy your application from the CLI. After installing the awsebcli
package, following the steps in the linked page was pretty straightforward. This gave me an environment where I was actually able to edit the environment variables, so it was a step in the right direction. The environment was however showing a health degraded status, and on opening the page in the browser I was confronted with the following:
At least with a working environment I actually had some log files to look at to help me diagnose the problem. It seemed like a pretty high-level error: No module named 'application'
This sent me down a rabbit hole where I followed a few dead ends before working out the real cause of the problem. Various answers on StackOverflow suggested the following possibilities, none of which seemed to apply to me (although I tried various versions of each proposed fix just in case):
- A misconfigured
django.config
file - Incompatibility between
Amazon Linux 2
andAmazon Linux 1
environments - Needing
gunicorn
to be installed
In despair, I finally tunnelled into the EC2 instance hosting my application via SSH (reader beware: you must set up SSH access when creating your environment: it cannot be allowed retroactively except by following some extremely convoluted process. I learned this the hard way and eventually just deleted and recreated my environment!). When I got there, what I saw made no sense. Crucial application files were missing! Among them was the .ebextensions/django.config
file which points EB to the WSGI file.
It was only then that, looking further into the mystery, I discovered a neat little feature of eb deploy
which is not mentioned explicitly in the AWS Django deployment guide but which I would have found if I'd been smart enough to click the link in the following note:
It turns out, to quote the documentation,
By default, the EB CLI deploys the latest commit in the current branch, using the commit ID and message as the application version label and description, respectively. If you want to deploy to your environment without committing, you can use the --staged option to deploy changes that have been added to the staging area.
My dumb ass hadn't committed the changes after creating the config file! A git commit
and an eb deploy
later, my environment was showing as healthy and I was finally able to load the page!
Database blues
...haha, of course I'm joking, the environment was still red and I had a new error in my log files. FML.
The error this time was related to the version of SQLite, deterministic=True requires SQLite 3.8.3 or higher
. Sure enough, the Python sqlite3 documentation states that "this flag is supported by SQLite 3.8.3 or higher, NotSupportedError
will be raised if used with older versions". Casting around on StackOverflow, it seems there are two possible solutions to this problem:
Option 1 wasn't acceptable because I already had the data I wanted to deploy in a SQLite database, so I went for option 2. As a side note, I know that nobody actually uses SQLite databases in production, but I really wanted to stick to the tutorial as closely as possible (one of the later stages would be to migrate from SQLite to PostgreSQL, so I knew the fix I implemented here would end up being redundant, but once I face a technical problem I'm always determined to get to the bottom of it!).
Once I'd overcome this error, I ran into yet another problem with the database which seemed to make no sense: sqlite3.OperationalError: no such table: blog_post
.
What the hell!? I checked the database on my local machine using DB Browser for SQLite, and the blog_post
table was definitely there. I then connected to the EC2 instance via SSH to confirm the database file was there, which it was. It was only on using sqlite3
in the shell to examine the file on EC2 that I realised...the database was completely empty! Not only was the blog_post
table missing, but there were no tables at all!
I made sure my deployment was up-to-date with git status
and ran eb deploy
again to no avail. This was a complete mystery. I then tried modifying the file, adding some data locally before committing and deploying again, to see if I could force it to update. Except...after adding data, git status
was still clean. I...wha...GOD DAMN IT, MY SQLITE DATABASE IS IN .GITIGNORE!!! It seems when you deploy to EB, if there's no db.sqlite3
file in your project and you're using sqlite3, EB 'helpfully' creates an empty database for you!
With that SNAFU cleared up, I finally had a working deployment and could, for the first time, load my page in the browser. There was just one small snag...
O static file, where art thou?
My page loaded, but with none of the static files. No images, no CSS, nothing. Once again StackOverflow came to the rescue, although I inadvertently ended up implementing two different solutions, either one of which would have worked.
All the suggestions were along the lines of updating the django.config
file to point to the correct static files directories. In the tutorial no mention was made of this: the only config in that file concerned the WSGIPath.
Playing around with the static file paths in my settings.py
file, I stumbled upon a partial solution. In development, the path to the static files is given as follows (and similarly for MEDIA_ROOT
and MEDIA_URL
):
# Static files (CSS, JavaScript, Images)
# https://docs.djangoproject.com/en/3.2/howto/static-files/
STATIC_ROOT = BASE_DIR / 'static_files'
STATIC_URL = '/static/'
where BASE_DIR = Path(__file__).resolve().parent.parent
. It turns out that for whatever reason this doesn't work when the application is deployed on EB. It was only once I removed BASE_DIR
from the STATIC_ROOT
path that I ended up seeing the files. Obviously this was kind of annoying because I would need to change it each time I wanted to run in debug mode!
In the end, I went ahead and added the static file paths to the config as suggested in numerous answers on StackOverflow, ending up with the following (which allows the proxy server to serve the files):
option_settings:
aws:elasticbeanstalk:container:python:
WSGIPath: my_blog.wsgi
aws:elasticbeanstalk:environment:proxy:staticfiles:
"/static": "static_files"
"/files": "uploads"
This finally allowed me to see my demo blog deployed on Elastic Beanstalk for the first time!
The rest of the Udemy tutorial was thankfully much easier to implement; I only came across a few minor bugs when following the steps to migrate to a PostgreSQL database on RDS and serve static files from a dedicated S3 bucket. Each of these was quickly solved with a Google search. A couple of fun examples are:
- RDS only supports the free tier for PostgreSQL up to version 12. It's a bit disconcerting when you try to create a database instance and see that it's going to cost something like $250/month!
- For absolutely no discernible reason, the interface to edit a policy in S3 shows up in Spanish!
Final thoughts
So, what is my overall impression of using AWS to deploy a basic static website? I have to say that the interface and the documentation both leave a lot to be desired. I'm not the only one who feels this way: my brother, who has far more experience than I at deploying websites with numerous hosting providers, calls AWS "the environment of maximum frustration". Apparently,
every AWS service is dreadful with non-existent documentation and terrible UX, but each is awful in its own special way.
I don't know if I'd agree that the documentation is non-existent: there's plenty of documentation, it's just hard to navigate and really key 'gotchas', like the eb deploy
dependence on Git commits or the fact that you can't retroactively allow SSH to EC2, that should be placed front-and-centre are buried several layers deep. I will agree that from the beginning (useless, cryptic "validation failed" error) to the end (S3 bucket policy editor in Spanish for some reason), the UX is just horrible.
Another aspect of AWS that's slightly intimidating is their opaque pricing. Various services are free up to a certain amount of usage in the first 12 months, which is great for something like a tutorial where you really have no desire to pay for something you're subsequently going to delete, but there are so many services and so many dependencies between services that there's always a worry you'll inadvertently end up using a paid service and getting a surprise bill. Compare this to other hosting providers with a fixed monthly tariff, where you know exactly what services you're getting and how much it'll cost you.
So, on balance I feel AWS is terrible for beginners. Why do I think this is a good thing? Well, markets are better for everyone (except Jeff Bezos) when there's real competition and consumers have a choice. In this case, it seems people working on small projects are better catered to by smaller providers like DigitalOcean, Linode and Heroku. When you don't need the massive array of different services that AWS provides, you'll probably have an easier time using one of these services. This gives these smaller companies a niche to expand into and a market to capture. Imagine if hosting on AWS were as easy as shopping on Amazon: nobody would ever go anywhere else, and Amazon (already by far the largest cloud computing provider) would end up swallowing all the competition and annihilating the market. Bezos would cackle with glee as he stuffed another billion dollars into his mattress. That's not a world I want to live in, and neither should you.