English

Google App Engine

NDB Queries

Experimental!

NDB is an experimental, innovative, and rapidly changing new feature for App Engine. Unfortunately, being on the bleeding edge means that we may make backwards-incompatible changes to NDB. We will inform the community when this feature is no longer experimental.

An application can use queries to search the Datastore for entities that match specific search criteria called filters.

Overview

An application can use queries to search the Datastore for entities that match specific search criteria called filters. For example, an application that keeps track of several guestbooks could use a query to retrieve messages from one guestbook, ordered by date:

from google.appengine.ext import ndb

class Greeting(ndb.Model):
  """Models an individual Guestbook entry with content and date."""
  content = ndb.StringProperty()
  date = ndb.DateTimeProperty(auto_now_add=True)

  @classmethod
  def query_book(cls, ancestor_key):
    return cls.query(ancestor=ancestor_key).order(-cls.date)


class MainPage(webapp.RequestHandler):
  def get(self):
    guestbook_name = self.request.get('guestbook_name')
    ancestor_key = ndb.Key("Book", guestbook_name or "*notitle*")
    greetings = Greeting.query_book(ancestor_key).fetch(20)

    self.response.out.write('<html><body>')
    for greeting in greetings:
      self.response.out.write('<blockquote>%s</blockquote>' %
                              cgi.escape(greeting.content))

The App Engine Datastore natively supports filters for exact matches (the == operator) and comparisons (the <, <=, > and >= operators). It supports combining multiple filters using a Boolean AND operator, with some limitations (see below).

In addition to the native operators, the API supports the != operator, combining groups of filters using the Boolean OR operator, and the IN operator, which test for equality to one of a list of possible values (like Python's 'in' operator). These operations don't map 1:1 to the Datastore's native operators; thus they are a little quirky and slow, relatively. They are implemented using in-memory merging of result streams. Note that p != v is implemented as "p < v OR p > v". (This matters for repeated properties.)

Limitations: The Datastore enforces some restrictions on queries. Violating these will cause it to raise exceptions. For example, combining too many filters, using inequalities for multiple properties, or combining an inequality with a sort order on a different property are all currently disallowed. Also filters referencing multiple properties sometimes require secondary indexes to be configured.

Unsupported: The Datastore does not directly support substring matches, case-insensitive matches, or so-called full-text search. There are ways to implement case-insensitive matches and even full-text search using computed properties.

Filtering by Property Values

Usually you don't want to retrieve all entities of a given kind; you want only those with a specific value or range of values for some property.

Property objects overload some operators to return filter expressions that can be used to control a query: for example, to find all Account entities whose userid property has the exact value 42, you can use the expression

qry = Account.query(Account.userid == 42)

NDB supports these operators:

property == value
property < value
property <= value
property > value
property >= value
property != value
property.IN([value1, value2])

To filter for an inequality, you can use syntax like the following:

qry = Account.query(Account.userid >= 40)

This finds all Account entities whose userid property is greater than or equal to 40.

Two of these operators, != and IN, are implemented as combinations of the others, and are a little quirky as described in != and IN.

You can specify multiple filters:

qry = Account.query(Account.userid >= 40, Account.userid < 50)

This combines the specified filter arguments, returning all Account entities whose userid value is greater than or equal to 40 and less than 50.

Note: As mentioned earlier, the Datastore rejects queries using inequality filtering on more than one property.

Instead of specifying an entire query filter in a single expression, you may find it more convenient to build it up in steps: for example:

qry1 = Account.query() # Retrieve all Account entitites
qry2 = qry1.filter(Account.userid >= 40) # Filter on userid >= 40
qry3 = qry2.filter(Account.userid < 50) # Filter on userid < 50 as well

qry3 is equivalent to the qry variable from the previous example. Note that query objects are immutable, so the construction of qry2 does not affect qry1 and the construction of qry3 does not affect qry1 or qry2.

The != and IN Operators

The != (not-equal) and IN (membership) operators are implemented by combining other filters using the OR operator. The first of these,

property != value

is implemented as

(property < value) OR (property > value)

For example,

qry = Article.query(Article.tags != 'perl')

is equivalent to

qry = Article.query(query.OR(Article.tags < 'perl',
                             Article.tags > 'perl'))

Note: Perhaps surprisingly, this query does not search for Article entities that don't include 'perl' as a tag! Rather, it finds all entities with at least one tag unequal to 'perl'. For example, the following entity would be included in the results, even though it has 'perl' as one of its tags:

Article(title='Perl + Python = Parrot',
stars=5,
tags=['python', 'perl'])

However, this one would not be included:

Article(title='Introduction to Perl',
stars=3,
tags=['perl'])

There is no way to query for entities that do not include a tag equal to 'perl'.

Similarly, the IN operator

property IN [value1, value2, ...]

which tests for membership in a list of possible values, is implemented as

(property == value1) OR (property == value2) OR ...

For example,

qry = Article.query(Article.tags.IN(['python', 'ruby', 'php']))

is equivalent to

qry = Article.query(query.OR(Article.tags == 'python',
                             Article.tags == 'ruby',
                             Article.tags == 'php'))

Note: Queries using OR de-duplicate their results: the result stream doesn't include entity more than once, even if an entity matches two or more subqueries.

Querying for Repeated Properties

The Article class defined in the preceding section also serves as an example of querying for repeated properties. Notably, a filter like

Article.tags == 'python'

uses a single value, even though Article.tags is a repeated property. You cannot compare repeated properties to list objects (the Datastore won't understand it), and a filter like

Article.tags.IN(['python', 'ruby', 'php'])

does something completely different from searching for Article entities whose tags value is the list ['python', 'ruby', 'php']: it searches for entities whose tags value (regarded as a list) contains at least one of those values.

Querying for a value of None on a repeated property has undefined behavior; don't do that.

Combining AND and OR Operators

You can nest AND and OR operators arbitrarily: for example,

qry = Article.query(query.AND(Article.tags == 'python',
                              query.OR(Article.tags.IN(['ruby', 'jruby']),
                                       query.AND(Article.tags == 'php',
                                                 Article.tags != 'perl'))))

Due to OR's implementation, a query of this form that is too complex might fail with an exception. These filters are normalized so that there is (at most) a single OR operator at the top of the expression tree and a single level of AND operators right below that. This expansion uses the standard rules for obtaining a disjunctive normal form for a Boolean expression, along with the expansions already given for != and IN. Altogether, the normalized form of the example above is (using an informal notation):

OR(AND(tags == 'python', tags == 'ruby'),
   AND(tags == 'python', tags == 'jruby'),
   AND(tags == 'python', tags == 'php', tags < 'perl'),
   AND(tags == 'python', tags == 'php', tags > 'perl'))

Caution: For some filters, this normalization can cause a combinatorial explosion. Consider the AND of 3 OR clauses with 2 basic clauses each. When normalized, this becomes an OR of 8 AND clauses with 3 basic clauses each: that is, 6 terms become 24.

Specifying Sort Orders

You can use the order() method to specify the order in which a query returns its results. This method takes a list of arguments, each of which is either a property object (to be sorted in ascending order) or its negation (denoting descending order). For example:

qry = Greeting.query().order(Greeting.message, -Greeting.userid)

This retrieves all Greeting entities, sorted by ascending value of their message property. Runs of consecutive entities with the same message property will be sorted by descending value of their userid property. You can use multiple order() calls to the same effect:

qry = Greeting.query().order(Greeting.message).order(-Greeting.userid)

Note: When combining filters with order(), the Datastore rejects certain combinations. In particular, when you use an inequality filter, the first sort order (if any) must specify the same property as the filter. Also, you sometimes need to configure a secondary index.

Ancestor Queries

Ancestor queries constrain query results to entities descended from a specific ancestor, identified by its key:

key = ndb.Key(BlogPost, 12345)
qry = Comment.query(ancestor=key)

You can combine this with filters and sort orders:

qry = Comment.query(Comment.tags == 'python', ancestor=key).order(Comment.date)

This is particularly useful inside transactions, which are limited to operating on entities descended from a common ancestor (or, for cross-group transactions, a few common ancestors).

Query Attributes

Query objects have the following read-only data attributes:

Attribute Type Default Description
kind str None Kind name (usually the class name)
ancestor Key None Ancestor specified to query
filters FilterNode None Filter expression
orders Order None Sort orders

Printing a query object (or calling str() or repr() on it) produces a nicely-formatted string representation:

print Employee.query()
Query(kind='Employee')
print Employee.query(ancestor=Key(Manager, 1))
Query(kind='Employee', ancestor=Key('Manager', 1))

Filtering for Structured Property Values

A query can filter directly for the field values of structured properties. For example, a query for all contacts with an address whose city is 'Amsterdam' would look like

Contact.query(Contact.address.city == 'Amsterdam')

If you combine multiple such filters, the filters may match different Address sub-entities within the same Contact entity. For example:

Contact.query(Contact.address.city == 'Amsterdam',  # Beware!
              Contact.address.street == 'Spear St')

may find contacts with an address whose city is 'Amsterdam' and another (different) address whose street is 'Spear St'. However, at least for equality filters, you can create a query that returns only results with multiple values in a single sub-entity:

Contact.query(Contact.address == Address(city='San Francisco',
                                         street='Spear St'))

If a sub-entity has any property values equal to None, they are ignored. Thus, it doesn't make sense to filter for a sub-entity property value of None.

Query Iterators

While a query is in progress, its state is held in an iterator object. (Most applications won't use them directly; it's normally more straightforward to call fetch(20) than to manipulate the iterator object.) There are two basic ways to get such an object:

  • using Python's built-in iter() function on a Query object
  • calling the Query object's iter() method

The first supports the use of a Python for loop (which implicitly calls the iter() function) to loop over a query.

for greeting in greetings:
      self.response.out.write('<blockquote>%s</blockquote>' %
                              cgi.escape(greeting.content))

The second way, using the Query object's iter() method, allows you to pass options to the iterator to affect its behavior. For example, to use a keys-only query in a for loop, you can write this:

for key in qry.iter(keys_only=True):
  print key

Query iterators have other useful methods:

Method Description
__iter__() Part of Python's iterator protocol.
next() Returns the next result or raises the exception StopIteration if there is none.

has_next() Returns True if a subsequent next() call will return a result, False if it will raise StopIteration.

Blocks until the answer to this question is known and buffers the result (if any) until you retrieve it with next().
probably_has_next() Like has_next(), but uses a faster (and sometimes inaccurate) shortcut.

May return a false positive (True when next() would actually raise StopIteration), but never a false negative (False when next() would actually return a result).
cursor_before() Returns a query cursor representing a point just before the last result returned.

Raises an exception if no cursor is available (in particular, if the produce_cursors query option was not passed).
cursor_after() Returns a query cursor representing a point just after the last result returned.

Raises an exception if no cursor is available (in particular, if the produce_cursors query option was not passed).

Query Cursors

A query cursor is a small opaque data structure representing a resumption point in a query. Using cursors takes several steps:

  1. Run a query using a query iterator with the query option produce_cursors set.
  2. Extract and save a cursor from the query iterator after iterating to some point.
  3. Later pass the saved cursor to another query (actually it must be the same one, though not necessarily the same query object) as the start_cursor or end_cursor option, to resume the query from or restrict it to this point.

Here is an example:

from google.appengine.datastore.datastore_query import Cursor

q = Person.query()
it = q.iter(produce_cursors=True)

p1 = it.next()
print p1.key

p2 = it.next()
print p2.key

p3 = it.next()
print p3.key

c = it.cursor_after() # The position right after p3
cstr = c.to_websafe_string()
print cstr

# Start a new query that continues where the first one stopped
c2 = Cursor.from_websafe_string(cstr)
it2 = q.iter(start_cursor=c2)
p4 = it2.next()
print p4.key
p5 = it2.next()
print p5.key

Note the use of to_websafe_string() and from_websafe_string() to serialize and deserialize the cursor. This allows you to pass a cursor to a client on the web in the response to one request, and receive it back from the client in a later request.

Occasionally, calling cursor_after() or cursor_before() may make a blocking Datastore call, rerunning part of the query in order to extract a cursor that points to the middle of a batch.

The Query method fetch_page() simplifies the process by doing most of the work for you. It works somewhat like fetch(), but it returns a triple (results, cursor, more). The returned more flag indicates whether there are more results; a UI can use this, for example, to suppress a "Next Page" button or link. To request subsequent pages, pass the cursor returned by one fetch_page() call into the next. The example above simplifies to:

from google.appengine.datastore.datastore_query import Cursor

q = Person.query()
res, c, more = q.fetch_page(3)
print res