Google Code offered in: English - Español - 日本語 - 한국어 - Português - Pусский - 中文(简体) - 中文(繁體)
This article was written and submitted by an external contributor. The Google App Engine team thanks Aral Balkan for his time and expertise. NOTE (Jan 2009): this project is no longer being actively maintained
Aral BalkanWhen I started developing the web site for the <head> web conference on Google App Engine towards the end of last year, it dawned on me pretty quickly that one of my main development challenges would be to find a way to export all of the data we would be collecting in case something went wrong with the deployment datastore. The difference between having a backup and not was the difference between braving a few hours of down-time and canceling the conference because we had lost all of our registration data. Since Google App Engine does not come with a data backup solution (Google is currently working on its own datastore import and export utility for large datasets which is currently on the roadmap for the first quarter of 2009), it was up to me to build my own.
This is what prompted me to write Google App Engine Backup and Restore (or Gaebar, for short).
While designing Gaebar, I knew that having backups alone would not be enough if there wasn't an easy way to restore those backups. The real value of a backup, after all, is in the restore.
Backing up an entire datastore is not a trivial task on Google App Engine since App Engine currently has just one modus operandi. App Engine supports a massively-scalable model based on requests and responses where each response must return within ten seconds. It doesn't currently have a mode of operation that isn't required to scale but which supports long-running processes. The current workaround is to fake long-running processes by using a client-side engine and breaking up batch operations into bite-sized portions that work within the massively-scalable request-response model. Building administrative features in this way is somewhat akin to using a Ferrari to go from your bedroom to the bathroom: it's overkill and quite uncomfortable but can be made to work if you want it badly enough. This is the method that Gaebar uses to enable easy backups and restores on the Google App Engine platform.
Gaebar backs up your datastore, a few rows at a time, into Python functions and stores these functions in python modules called code shards that are less than 1MB in size each. To restore your datastore, Gaebar runs these functions, in successive requests, to recreate the models and entities in your datastore.
In other words, Gaebar backs up your datastore to Python code and then runs that code to restore it. And, it does all this while staying within the 10 second execution limit and the 1MB size limit on data structures that are currently in effect on the deployment environment.
Gaebar automates the backup and restore process as much as possible, going so far as to automatically download your backups from the deployment environment onto your local development server.
And you control the whole process via an easy-to-use, if pink, online interface.
First off, understand that Gaebar only works with Django projects. So, if you're using webapp, I'm sorry to say that you're currently out of luck. (If you happen to port Gaebar to webapp, please do let me know.)
Before you can run Gaebar, you must patch your dev_appserver.py
as per the instructions here: http://aralbalkan.com/1440 (and please star issue 616 if you'd like Google to fix this so we can remove this step).
This is required in order to override some of the local development server restrictions to allow the automatic download of backups. Gaebar will not work unless you implement this patch.
The easiest way to get started with Gaebar is to download one of the two test suite applications, Gaebar Gead or Gaebar App-Engine-Patch.
The test suite applications are built on the Google App Engine Helper and app-engine-patch skeletons, respectively, and come pre-configured with Gaebar. You can simply download one of the test apps and follow the instructions to populate a test datastore, run a backup/restore, and run the test suite to make sure that everything is working.
For your own projects, the easiest way to get started with Gaebar is if you're building a new Django application. You can simply download the Gaebar Gaed Skeleton which contains a base Google App Engine Django install, including a zipped up Django 1.x and Gaebar.
If you already have a Django project, you need to add the Gaebar app to it. (Gaebar is just a regular Django app). An overview of how to do this is provided below but you should really read the readme.txt
file that comes with Gaebar for full step-by-step instructions.
Start by downloading the Gaebar. You can download Gaebar Beta 3 as a zip from GitHub or, if you're a Gitball Wizard, you can use Git to add the Gaebar master branch to your project as a submodule.
Once you have downloaded Gaebar, install it into your Django project by following this general overview:
urls.py
to add the URL mappings for Gaebar app.yaml
index.yaml
settings.py
to add the Gaebar settings that specify, among others, the location of your local development server and the models that you want to back up.(Again, for step-by-step instructions, see the readme.txt
file that comes with Gaebar.)
Once you've installed Gaebar, you can test that everything is working by firing up your local development server and accessing Gaebar at the /gaebar
URL. (e.g., http://localhost:8000/gaebar). You should see the Gaebar interface.
Since backing up your local datastore isn't terribly exciting, you will most likely want to deploy your application to the deployment environment (e.g., http://your_app_id.appspot.com
) and start your first backup.
Important: Make sure that your local development server is running before you start a backup as Gaebar will automatically hit your local server to download your remote backup to your local machine.
Once your backup is complete, you can find it in the /gaebar/backups/
folder. Go ahead and take a look at what it contains. It's just lots of pickled Python code.
Once you have a backup, you have several options for where you can restore it, based on the use cases below.
Restore to: your local development server.
Once you have restored the data to your local development server, I would highly suggest that you take a backup of your datastore.
You can find your datastore files in a temporary folder on your local machine (e.g., on my Mac, it's /var/folders/bz/bzDU030xHXK-jKYLMXnTzk+++TI/-Tmp-/
). To find the folder on your own machine, flush the datastore (./manage.py flush
) and you will see the path to your datastore folder printed in the resulting output.
Once you know the path of the temporary folder that holds your datastore, copy the django_my_project_name_.datastore
and django_my_project_name_.datastore.history
files somewhere safe. Then, if you ever need to restore the datastore again, you can simply copy those files back into the temporary folder instead of having to go through the whole restore process again. This will save you a lot of time especially when working with large datastores.
Restore to: the same application instance that you backed up from on the deployment environment.
In the unlikely event that disaster strikes and the datastore for your live application becomes either corrupted or lost, you can restore your datastore from your backup.
I would only recommend doing this if something has really gone terribly wrong on your live application and you absolutely need to revert to a backed up version of the datastore. This would be a good time to mention that Gaebar does not come with any warranties (go ahead, read the license, I'll wait around till you're done) and that restoring is a destructive process. If, for any reason, your restore fails, and data is corrupted or lost due to a bug in your code, you will not be able to revert your datastore to its previous state or recover your data. Developer beware!
Restore to: a different application instance on the deployment environment.
Now here's where it gets interesting! You're not limited to restoring your backup to the same instance of your application that you backed up from. Simply change the app name in your app.yaml
, deploy, and run the restore to restore your backup to a different application instance.
What this means is that you can now have a staging application, running on the same deployment environment as your live application, that you can test with using live data. (You use the staging application in much the same way as you would use a staging server in traditional web development.)
Regardless of whether you decide to use a staging application in your workflow, I would highly recommend that you do a restore to a different application instance after you've backed up your live datastore for the first time to make sure that the backup restores successfully. A bad time to realize that it doesn't would be when you have to perform disaster recovery on your live application instance.
Google App Engine is an exciting new cloud platform with the potential to revolutionize how we build and deploy web applications. You must remember, however, that it is currently a pre-release product and is under continuous development. As such, it is understandable that it may currently lack certain features. One such feature that has, until recently, been missing is a simple means to back up the data in your application's datastore. Gaebar is a recently-released, open source, third-party Django application that fills this gap and brings and easy datastore back up and restore solution to Google App Engine.
So, what are you waiting for? Back up your datastore already and start sleeping better at night!
Downloads: The current version of Gaebar, as of this writing is Beta 3. You can find the latest version on the master branch of the Gaebar GitHub repository.
Aral is a developer, professional speaker, consultant, author, and entrepreneur. Last year, his company Naklab staged a fully-virtual web conference called <head>, using Flash Platform technologies alongside App Engine. Aral blogs at aralbalkan.com and you can tweet to him @aral.