English

Google App Engine

Updating Your Model's Schema

Mark Ivey, Google Engineer
June 2008

This is one of a series of in-depth articles discussing App Engine's datastore. To see the other articles in the series, see Related links.

If you are maintaining a successful app, you will eventually find a reason to change your schema. This article walks through an example showing the two basic steps needed to update an existing schema:

  1. Updating the Model class
  2. Updating existing Entities in the datastore (this step isn't always necessary, we'll talk more about when to do it below).

Before We Start

While updating your schema, you may need to disable the ability for your users to edit data in your application. Whether or not this is necessary depends on your application, but there are a few situations (like trying to add a sequential index value to each entity) where it is much easier to correctly update existing entities if no other edits are happening.

Updating Your Models

Here's an example of a simple picture model:

class Picture(db.Model):
    author = db.UserProperty()
    png_data = db.BlobProperty()
    name = db.StringProperty(default='')  # Unique name.

Let's update this so each picture can have a rating. To store the ratings, we'll store the number of votes and the average value of the votes. Updating the model is fairly easy, we just add two new properties:

class Picture(db.Model):
    author = db.UserProperty()
    png_data = db.BlobProperty()
    name = db.StringProperty(default='')  # Unique name.
    num_votes = db.IntegerProperty(default=0)
    avg_rating = db.FloatProperty(default=0)

Now all new entities going into the datastore will get a default rating of 0. Note that existing entities in the datastore don't automatically get modified, so they won't have these properties.

Updating Existing Entities

The App Engine datastore doesn't require all entities to have the same set of properties. After updating your models to add new properties, existing entities will continue to exist without these properties. In some situations, this is fine, and you don't need to do any more work. When would you want to go back and update existing entities so they also have the new properties? One situation would be when you want to do a query based on the new properties. In our example with Pictures, queries like "Most popular" or "Least popular" wouldn't return existing pictures, because they don't (yet) have the ratings properties. To fix this, we'll need to update the existing entities in the datastore.

Conceptually, updating existing entities is easy. You just need to write a request handler that loads each entity, sets the value of the new property, and saves the entity. There are two gotchas that we'll need to work around:

  • Queries are limited to 1000 results. If there are more than 1000 entities in the datastore, it will take more than one query to access them all.
  • Requests have a fairly short deadline before they get killed. If there are a significant number of entities, the request handler won't be able to update them all in a single request.
The solution to these two problems is to make the handler only update a small batch of entities for each request. By making multiple requests, we can work all of the existing entities without hitting query limits or request deadlines. To keep things simple, we'll update only a single entity with each request, so the handler will do this:
  1. Load one entity
  2. Set the value of the property (if the property has a default value, this will happen automatically)
  3. Save the entity
  4. Use a meta refresh tag to send the browser to the URL which will update the next entity

A word of caution: when writing a query that retrieves entities in batches, avoid OFFSET (which doesn't work for large sets of data) and instead limit the amount of data returned by using a WHERE condition. If your data already has a unique property of some sort, this is fairly easy. In this example, the Picture model's name property is unique across items, so we'll use a WHERE statement based on the name.

Here's the code:

# Request handler for the URL /update_datastore
def get(self):
    name = self.request.get('name', None)
    if name is None:
        # First request, just get the first name out of the datastore.
        pic = models.Picture.gql('ORDER BY name DESC').get()
        name = pic.name

    q = models.Picture.gql('WHERE name <= :1 ORDER BY name DESC', name)
    pics = q.fetch(limit=2)
    current_pic = pics[0]
    if len(pics) == 2:
        next_name = pics[1].name
        next_url = '/update_datastore?name=%s' % urllib.quote(next_name)
    else:
        next_name = 'FINISHED'
        next_url = '/'  # Finished processing, go back to main page.
    # In this example, the default values of 0 for num_votes and avg_rating are
    # acceptable, so we don't need to do anything other than call put().
    current_pic.put()

    context = {
        'current_name': name,
        'next_name': next_name,
        'next_url': next_url,
    }
    self.response.out.write(template.render('update_datastore.html', context))
    

The template that goes along with this shows which entity we are currently at and uses a meta refresh tag to send the browser to the next entity:

<html>
<head>
  <meta http-equiv="refresh" content="0;url={{ next_url }}"/>
</head>
<body>
  <h3>Update Datastore</h3>
  <ul>
    <li>Updated: {{ current_name }}</li>
    <li>About to update: {{ next_name }}</li>
  </ul>
</body>
</html>
    

If you don't have a property with unique values across your entities, the above example won't work directly (it will get stuck when it hits entities with identical values). You'll need to expand it to be able to work across sections of identical entities. The general concept is the same, however: Use WHERE to limit the size of the query, and work over the data in batches calling put() for each entity.

Removing Deleted Properties from the Datastore

If you remove a property from your model, you will find that existing entities still have the property. It will still be shown in the admin console and will still be present in the datastore. To really clean out the old data, you need to cycle through your entities and remove the data from each one.

  1. Make sure you have removed the properties from the model definition.
  2. If your model class inherits from db.Model, temporarily switch it to inherit from db.Expando. (db.Model instances can't be modified dynamically, which is what we need to do in the next step.)
  3. Cycle through existing entities (like described above). For each entity, use delattr to delete the obsolete property and then save the entity.
  4. If your model originally inherited from db.Model, don't forget to change it back after updating all the data.

The Future

This iterative-request-handler method for updating existing entities will work today. However, we are working on some solutions for offline processing. When those become available, they may provide a more compelling way to modify your data, without imposing the server load that this method does. You might consider subscribing to the App Engine blog to stay up to date on new developments.