Google Code offered in: English - Español - 日本語 - 한국어 - Português - Pусский - 中文(简体) - 中文(繁體)
This is part two of a five-part series on effectively scaling your App Engine-based apps. To see the other articles in the series, see Related links.
This article was written for SDK version 1.1.7. As of release 1.3.1, query cursors (Java | Python) have superseded the techniques described below and are now the recommended method for paging through large datasets.
One of the typical web functions you may need to do is pagination, that is, displaying large datasets one page at a time. If you have too many items to display on a single web page you will display a subset of them, say 20, with a link to the next page of 20 items. For example, the comments on a blog would be paged if there are a lot of comments.
Now you may be tempted to use
fetch(limit, offset=0)
to do your
paging, but this has limitations. From the documentation:
Note: fetch() returns a maximum of 1000 results. If more than 1000 entities match the query, and either no limit is specified or a limit larger than 1000 is used, only the first 1000 results are returned by fetch().
So you can use fetch() only if there are going to be less than 1000 entities. Even if you have less than 1000 entities you may still want to avoid it as the time it takes grows as the value of offset increases. From the documentation:
The query has performance characteristics that correspond linearly with the offset amount plus the limit.
If your application needs more than that then you will need to do something different. We will use a suggestion box as our working example. We will start with a naive implementation and slowly evolve it into a good working example of paging.
PAGESIZE = 10 class Suggestion(db.Model): suggestion = db.StringProperty() when = db.DateTimeProperty(auto_now_add=True)
The first requirement for paging is having an indexed property that you can run an inequality filter against. In this case let's construct a way to page through the suggestions in reverse chronological order. To get the first set of 10 suggestions:
def get(self): suggestions = Suggestion.all().order("-when").fetch(PAGESIZE) # ..render template..
But we need to know if we should provide a "next" link to the next page if there happens to be enough data for another page, so let's request PAGESIZE+1 entities, and if we actually get back PAGESIZE+1 entities then we know there are enough entities for another page:
def get(self): next = None suggestions = Suggestion.all().order("-when").fetch(PAGESIZE + 1) if len(suggestions) == PAGESIZE + 1: next = suggestions[-1].when suggestions = suggestions[:PAGESIZE] # ..render template..
Now when we generate our HTML for the page of suggestions we can
use the value of next
to determine whether or not to present a
next
link. We also need a way to handle a request for the next
page, which we do by adding a query parameter bookmark
that
contains the value of when
that we should start the next page
at:
def get(self): next = None bookmark = self.request.get("bookmark") if bookmark: suggestions = Suggestion.all().order("-when") .filter('when <=', bookmark).fetch(PAGESIZE+1) else: suggestions = Suggestion.all().order("-when").fetch(PAGESIZE+1) if len(suggestions) == PAGESIZE+1: next = suggestions[-1].when suggestions = suggestions[:PAGESIZE] # ..render template..
If you are paging and using a property that is unique across all entities then you're done and the above code should work for you. Unfortunately the code presented above has a problem for this particular example because the value of the 'when' field isn't guaranteed to be unique across all the entities, that is, it's possible that more than one suggestion could come in at the exact same time, as in this example:
offset | suggestion | when |
---|---|---|
... | ... | ... |
9 | Stock Jolt | 2008-10-26 04:38:00 |
10 | Allow dogs in the office | 2008-10-26 03:35:58 |
11 | Allow cats in the office | 2008-10-26 03:35:58 |
12 | Buy some multicolored exercise balls | 2008-10-26 01:10:03 |
13 | ... | ... |
If we present a page with the 10 most recent suggestions you would end up with '2008-10-26 04:38:02' as the place to pick up the next 10 in the list. Unfortunately #10 and #11 have the same value for 'when' and you would get overlap, the item at spot #10 appears as the last element on the first page and also as the first element on the second page. You can see how this problem would get worse the more duplicates there were.
While we want to page in reverse chronological order, the 'when' field may not be unique. What we need is a way to guarantee that it is unique. One way to guarantee uniqueness is to have a single counter that we increment each time we add a Suggestion entity to the datastore. The value of the counter is appended to each 'when' value and guarantees they are unique:
offset | suggestion | when |
---|---|---|
... | ... | ... |
9 | Stock Jolt | 2008-10-26 04:38:00|09 |
10 |
Allow dogs in the office |
2008-10-26 03:35:58|10 |
11 |
Allow cats in the office | 2008-10-26 03:35:58|11 |
12 | Buy some multicolored exercise balls | 2008-10-26 01:10:03|12 |
13 | ... | ... |
Let's update our model to reflect this change. We will have a
'created' property which is the date and time that the Suggestion
was created and change when
to be a StringProperty since
it will now contain the date and time concatenated with another
string that makes it unique. The when
property is the one we will
sort by when paginating.
class Suggestion(db.Model): suggestion = db.StringProperty() created = db.DateTimeProperty(auto_now_add=True) when = db.StringProperty()
If the rate at which you add suggestions is low then this is an
acceptable solution, but if they are coming in quickly then the
counter ends up becoming a bottleneck as it can only be updated so
quickly. We need a way to generate uniqueness but also allow quick
updates, and if you've read the article on counter sharding then this is a
familiar problem with a familiar solution. In this case we are
going to shard our counters over the users that are adding
suggestions and use the user id as part of the unique value we add
to each value of when
.
So we need a Contributor:
class Contributor (db.Model): count = db.IntegerProperty(default=0) def unique_user(user): """ Creates a unique string by using an increasing counter sharded per user. """ email = user.email() def txn(): contributor = Contributor.get_by_key_name(email) if contributor == None: contributor = Contributor(key_name=email) contributor.count += 1 contributor.put() return contributor.count count = db.run_in_transaction(txn) return email + "|" + str(count)
Now this looks good:
offset | suggestion | when |
---|---|---|
... | ... | ... |
9 | Stock Jolt | 2008-10-26 04:38:00|joe@bitworking.org|1 |
10 | Allow dogs in the office | 2008-10-26 03:35:58|fred@example.org|1 |
11 | Allow cats in the office | 2008-10-26 03:35:58|joe@bitworking.org|2 |
12 | Buy some multicolored exercise balls | 2008-10-26 01:10:03|joe@bitworking.org|3 |
13 | ... | ... |
But we do have one more problem, and that is that we now expose the
email address of the person making the suggestion in our when
value, which will get exposed in our 'next' link URI. To get around
that final problem we can obfuscate the data we append to each
when
value by putting it through an MD5 hash:
def _unique_user(user): """ Creates a unique string by using an increasing counter sharded per user. The resulting string is hashed to keep the users email address private. """ email = user.email() def txn(): contributor = Contributor.get_by_key_name(email) if contributor == None: contributor = Contributor(key_name=email) contributor.count += 1 contributor.put() return contributor.count count = db.run_in_transaction(txn) return hashlib.md5( email + "|" + str(count)).hexdigest()
offset | suggestion | when |
---|---|---|
... | ... | ... |
9 | Stock Jolt | 2008-10-26 04:38:00|aee15ab24b7b3718596e3acce04fba85 |
10 | Allow dogs in the office | 2008-10-26 03:35:58|404a3235076f6651914358680acf3cb5 |
11 | Allow cats in the office | 2008-10-26 03:35:58|7574b989df099d4e2b95619c9cf0c2a0 |
12 | Buy some multicolored exercise balls | 2008-10-26 01:10:03|c675e87cc990a718979afecc93a77bc1 |
13 | ... | ... |
All of the above is about paginating entities by a specific
property. If you just need to see all of the entities of a
particular kind and you either don't care about the order in which
they appear or the __key__
order is sufficient, then you can do
your paging using the __key__
property. The value of this property
is the entity key so you are guaranteed that it is unique, and you
don't need to use any of the techniques above to add 'uniqueness'
to it.
def get(self): next = None bookmark = self.request.get("__key__") if bookmark: suggestions = Suggestion.all().order("__key__").filter('__key__ >=', bookmark).fetch(PAGESIZE+1) else: suggestions = Suggestion.all().order("-when").fetch(PAGESIZE+1) if len(suggestions) == PAGESIZE+1: next = suggestions[-1].when suggestions = suggestions[:PAGESIZE] # ..render template..
You can read more about the __key__ property and how to use in the App Engine documentation.
__key__
and a non-unique propertyThe final example of paging is using the __key__
property in
conjunction with a non-unique property. This technique is nice
because it doesn't require that addition of a new property to the
model, but it does require the addition of a new index, and it also
uses more queries so it is less efficient. If you would like to
read more about this technique Ryan Barrett posted
a longer write-up to the Google App Engine discussion
group.
Code for this article can be found in the
Google App Engine Samples project on code.google.com
and it contains a full example of creating a unique index and also
a full example of paging with __key__
and a non-unique
property.
In summary, to do paging you need a property that has a unique value for each entity and is indexed in the direction you want to page. Always fetch N+1 entities to determine in you have enough data for a 'next' page. And finally, if you don't have uniqueness in the property then you can use a single counter, or counters sharded over some attribute such as the user, to make that property unique.