English

Google App Engine

Using the High Replication Datastore

The High Replication Datastore provides higher availability for your reads and writes because it stores data synchronously in multiple data centers. The back end changes, but the Datastore API does not change at all. You'll use the same programming interfaces no matter which Datastore you're using.

However, in the High Replication Datastore, queries across entity groups (in other words, non-ancestor queries) may return stale results. In order to return strongly consistent query results in the High Replication environment, you need to query over a single entity group. This type of query is called an ancestor query.

Ancestor queries work because entity groups are a unit of consistency: all operations are applied to the entire group. Ancestor queries won't return data until the entire entity group is up to date. Thus, the data returned from ancestor queries on entity groups is strongly consistent.

If your application relies on strongly consistent results for certain queries, you may have to change the way your application stores entities. This page discusses best practices for working with data stored in the High Replication Datastore. Let's look at how this works using the sample guestbook applications for Master/Slave and High Replication Datastores, respectively.

In the Master/Slave Datastore

In the Master/Slave Datastore, the sample guestbook application creates a new root entity for each greeting:

class Guestbook(webapp.RequestHandler):
  def post(self):
    greeting = Greeting()
    ...

We then query on the Greeting class for the ten most recent greetings:

class MainPage(webapp.RequestHandler):
  def get(self):
    self.response.out.write('<html><body>')
    greetings = db.GqlQuery("SELECT * FROM Greeting ORDER BY date DESC LIMIT 10")

This scheme works well because the Master/Slave Datastore defaults to strongly consistent results for all queries. The Master/Slave Datastore provides strongly consistent results because the Datastore reads and writes only from the master replica by default.

If you attempt this query in the High Replication Datastore, the data center used to execute the query may not have seen the new Greeting when the query was executed.

In the High Replication Datastore

In the High Replication Datastore, the sample guestbook application uses a parent key for the kind Guestbook with guestbook_name() as the key name and saves subsequent greetings in the entity group defined by the parent key:

class Guestbook(webapp.RequestHandler):
  def post(self):
    guestbook_name=self.request.get('guestbook_name')
    greeting = Greeting(parent=guestbook_key(guestbook_name))
    ...

Queries for greetings use the parent Guestbook key to perform an ancestor query which will find only Greeting entities added to that specific guestbook:

class MainPage(webapp.RequestHandler):
  def get(self):
    self.response.out.write('<html><body>')
    guestbook_name=self.request.get('guestbook_name')

    greetings = db.GqlQuery("SELECT * "
                            "FROM Greeting "
                            "WHERE ANCESTOR IS :1 "
                            "ORDER BY date DESC LIMIT 10",
                            guestbook_key(guestbook_name))

Usage Notes

The High Replication code sample above writes to a single entity group per guestbook. This allows queries on a single guestbook to be strongly consistent, but also limits changes to the guestbook to 1 write per second (the supported limit for entity groups). Therefore, writing to a single entity group per guestbook is not ideal when high usage is expected. If your app is likely to encounter heavy write usage, consider using another means. For example, you can put recent posts in memcache with an expiration, and then display a mix of recent posts from memcache and posts retrieved from the Datastore.

With eventual consistency, more than 99.9% of your writes are available for queries within a few seconds. The goal is to find a caching solution for your application that provides the data for the current user for the period of time in which they're posting to your app. The caching solution might involve memcache, a cache in a cookie, some state you put in the URL, or something else entirely. The point is that, if the solution provides the data for the current user in context of their posts, it will likely be sufficient to make the eventual consistency of High Replication completely acceptable. Remember, if you do a get(), put(), or a transaction, you will always see the most recently written data.