English

Google App Engine

Datastore Overview

The App Engine Datastore provides robust, scalable storage for your web application, with an emphasis on read and query performance. An application creates entities, with data values stored as properties of an entity. The application can perform queries over entities. All queries are pre-indexed for fast results over very large data sets.

Introducing the Datastore

The App Engine Datastore holds data objects known as entities. An entity has one or more properties, named values of one of several supported data types: for instance, a property can be a string, an integer, or even a reference to another entity.

Note: Only the HRD is supported by Python 2.7.

Unlike traditional relational databases, the App Engine Datastore uses a distributed architecture to automatically manage scaling to very large data sets. While the Datastore interface has many of the same features as traditional databases, it differs from them in the way it describes relationships between data objects. Entities of the same kind can have different properties, and different entities can have properties with the same name but different value types. These unique characteristics imply a different way of designing and managing data to take advantage of the ability to scale automatically. This documentation explains how to design your application to take the greatest advantage of the Datastore's distributed architecture.

The Datastore can execute multiple operations in a single transaction. By definition, a transaction cannot succeed unless every one of its operations succeeds; if any of the operations fails, the transaction is automatically rolled back. This is especially useful for distributed web applications, where multiple users may be accessing or manipulating the same data at the same time.

Introducing the Python Datastore API

In Python, Datastore entities are created from Python objects; the object's attributes become properties for the entity. To create a new entity, you call the parent entity's base class (if desired), set the object's attributes, and then save the object (by calling a method such as put()). The base class becomes the name of the entity's kind. Updating an existing entity requires you to retrieve the entity's object (for example, by using a query), modify its properties, and then save it with the new properties.

In the Python API, a model describes a kind of entity, including the types and configuration for its properties. An application defines a model using Python classes, with class attributes describing the properties. Entities of a given kind are represented by instances of the corresponding model class, with instance attributes representing the property values. An entity can be created with the class constructor method, then stored by calling the put() method:

import datetime
from google.appengine.ext import db
from google.appengine.api import users


class Employee(db.Model):
  name = db.StringProperty(required=True)
  role = db.StringProperty(required=True, choices=set(["executive", "manager", "producer"]))
  hire_date = db.DateProperty()
  new_hire_training_completed = db.BooleanProperty(indexed=False)
  account = db.UserProperty()


e = Employee(name="John",
             role="manager",
             account=users.get_current_user())
e.hire_date = datetime.datetime.now().date()
e.put()

The Datastore API provides two interfaces for queries: a query object interface and an SQL-like query language called GQL. A query returns entities in the form of instances of the model classes that can be modified and put back into the Datastore:

training_registration_list = [users.User("Alfred.Smith@example.com"),
                              users.User("jharrison@example.com"),
                              users.User("budnelson@example.com")]
employees_trained = db.GqlQuery("SELECT * FROM Employee WHERE account IN :1",
                                training_registration_list)
for e in employees_trained:
  e.new_hire_training_completed = True
  db.put(e)

Entities, Properties, and Keys

Data objects in the App Engine Datastore are known as entities. An entity has one or more named properties, each of which can have one or more values. Property values can belong to including integers, floating-point numbers, strings, dates, binary data, and others. A query on a property with multiple values tests whether any of the values meets the query criteria. This makes such properties useful for membership testing.

Datastore entities are schemaless: unlike traditional relational databases, the App Engine Datastore does not require that all entities of a given kind have the same properties or that all of an entity's values for a given property be of the same data type. If a formal schema is needed, the application itself is responsible for ensuring that entities conform to it; the Python SDK includes a rich library of data modeling features for this purpose.

Each entity has a key that uniquely identifies it. The key consists of the following components:

  • A kind, which categorizes it for the purpose of Datastore queries
  • An identifier, which can be either
    • a key name string
    • an integer numeric ID
  • An optional ancestor path locating the entity within the Datastore hierarchy

Kinds and Identifiers

Each Datastore entity is of a particular kind, which categorizes the entity for the purpose of queries: for instance, a human resources application might represent each employee at a company with an entity of kind Employee. In addition, each entity has its own identifier, assigned when the entity is created. Because the identifier is part of the entity's key, it is associated permanently with the entity and cannot be changed. It can be assigned in either of two ways:

  • Your application can specify its own identifier string for the entity (called the key name).
  • You can have the Datastore automatically assign the entity an integer numeric ID.

Ancestor Paths

Entities in the Datastore form a hierarchically structured space similar to the directory structure of a file system. When you create an entity, you can optionally designate another entity as its parent; the new entity is a child of the parent entity. This association between an entity and its parent is permanent, and cannot be changed once the entity is created. An entity without a parent is a root entity. The Datastore will never assign the same numeric ID to two entities with the same parent, or to two root entities (those without a parent).

An entity's parent, parent's parent, and so on recursively, are its ancestors; its children, children's children, and so on, are its descendants. The sequence of entities beginning with a root entity and proceeding from parent to child, leading to a given entity, constitute that entity's ancestor path. The complete key identifying the entity consists of a sequence of kind-identifier pairs specifying its ancestor path and terminating with those of the entity itself:

Person:GreatGrandpa / Person:Grandpa / Person:Dad / Person:Me

For a root entity, the ancestor path is empty and the key consists solely of the entity's own kind and identifier:

Person:GreatGrandpa

Queries and Indexes

In addition to retrieving entities from the Datastore directly by their keys, an application can perform a query to retrieve them by the values of their properties. A query operates on entities of a given kind; it can specify filters on the entities' property values and keys, and can return zero or more entities as results. (To conserve memory and improve performance, the query should, whenever possible, specify a limit on the number of results returned.) A query can also specify sort orders to order the results by their property values. The results include all entities that have at least one value for every property named in the filters and sort orders, and whose property values meet all the specified filter criteria. The query can return either the entire entities or just their keys.

Every query uses an index, a table containing the query's results in the desired order. Indexes for some types of query are provided automatically; an App Engine application can define additional indexes for itself in a configuration file named index.yaml. The Datastore updates the indexes incrementally to reflect any changes the application makes to its entities. Thus the correct results of all queries are immediately available directly from the indexes, with no further computation needed.

The development web server automatically adds suggestions to the configuration file when it encounters queries that do not yet have indexes configured. You can tune indexes manually by editing the file before uploading the application; see the article Index Selection and Advanced Search for more information.

Note: This mechanism supports a wide range of queries and is suitable for most applications. However, it does not support some kinds of query common in other database technologies: in particular, joins and aggregate queries aren't supported within the query engine.

Transactions and Entity Groups

Every attempt to create, update, or delete an entity takes place in the context of a transaction. A single transaction can include any number of such operations. To maintain the consistency of the data, the transaction ensures that all of the operations it contains are applied to the Datastore as a unit or, if any of the operations fails, that none of them are applied.

You can perform multiple actions on an entity within a single transaction. For example, suppose you want to increment a counter field in an object. To do so, you need to read the value of the counter, calculate the new value, and then store it back. Without a transaction, it is possible for another process to increment the counter between the time you read the value and the time you update it, causing your application to overwrite the updated value. Doing the read, calculation, and write in a single transaction ensures that no other process can interfere with the increment.

A single transaction can apply to multiple entities, so long as the entities are descended from a common ancestor. Such entities are said to belong to the same entity group. For applications using the Master/Slave Datastore, all entities retrieved, created, updated, or deleted in a transaction must be in the same entity group; in the High Replication Datastore (HRD), the entities in a transaction can belong either to a single entity group or to different entity groups (see Cross-Group Transactions). In designing your data model, you should determine which entities you need to be able to process in the same transaction. Then, when you create those entities, place them in the same entity group by declaring them with a common ancestor. This tells App Engine that the entities will be updated together, so it can store them in a way that supports transactions.

The Datastore uses optimistic concurrency to manage transactions. When two or more application instances try to change the same entity group at the same time (either by updating existing entities or by creating new ones), the first application to commit its changes will succeed and all others will fail on commit. These other applications can then try their transactions again to apply them to the updated data. Note that because the Datastore works this way, using entity groups limits the number of concurrent writes you can do to any entity in a given group.

Cross-Group Transactions

Applications using the HRD can perform transactions on entities belonging to different entity groups. Transactions of this type, called cross-group (XG) transactions, extend the behavior users experience with single-group transactions: an XG transaction will succeed as long as no concurrent transaction touches any of the entity groups to which it applies. This gives you more flexibility in organizing your data, because you aren't forced to put disparate pieces of data under the same ancestor just to perform atomic writes on them.

Note: The first read of an entity group in an XG transaction may throw a TransactionFailedError exception if there is a conflict with other transactions accessing that same entity group. This means that an XG transaction that performs only reads can fail with a concurrency exception.

XG transactions can be used across a maximum of five entity groups. A transaction that touches only a single entity group behaves like a single-group transaction. Operations within such an XG transaction have the same performance and cost as the equivalent single-group transactions with regard to billing and resource usage, but will experience higher latency.

Similarly to single-group transactions, you cannot perform a non-ancestor query in an XG transaction. You can, however, perform ancestor queries on separate entity groups. Nontransactional (non-ancestor) queries may see all, some, or none of the results of a previously committed transaction. (For background on this issue, see Understanding Datastore Writes: Commit, Apply, and Data Visibility.) However, such nontransactional queries are more likely to see the results of a partially committed XG transaction than those of a partially commited single-group transaction.

Differences from SQL

The App Engine Datastore differs from a traditional relational database in several important ways:

  • The App Engine Datastore is designed to scale, allowing applications to maintain high performance as they receive more traffic:
    • Datastore writes scale by automatically distributing data as necessary.
    • Datastore reads scale because the only supported queries are those whose performance scales with the size of the result set (as opposed to the data set). This means that a query whose result set contains 100 entities performs the same whether it searches over a hundred entities or a million. This property is the key reason some types of query are not supported.
  • Because all queries on App Engine are served by pre-built indexes, the types of query that can be executed are more restrictive than those allowed on a relational database with SQL. The following are not supported:
    • Join operations
    • Inequality filtering on multiple properties
    • Filtering of data based on results of a subquery
  • Unlike traditional relational databases, the App Engine Datastore doesn't require entities of the same kind to have a consistent property set (although you can choose to enforce such a requirement in your own application code). It is not currently possible for a query to return only a subset of the result entities' properties.

For more in-depth information about the design of the Datastore, read our series of articles on Mastering the Datastore.

Understanding Datastore Writes: Commit, Apply, and Data Visibility

For a full discussion of this topic, see the articles Life of a Datastore Write and Transaction Isolation in App Engine.

For App Engine applications, data is written to the Datastore in two phases:

  1. In the Commit phase, the entity data is recorded in a log.
  2. The Apply phase consists of two actions performed in parallel:
    • The entity data is written.
    • The index rows for the entity are written. (Note that this can take longer than writing the data itself.)

In the High Replication Datastore (HRD), the write operation returns immediately after the Commit phase and the Apply phase then takes place asynchronously; the Master/Slave Datastore usually doesn't return until the Apply phase is complete (both the data and the indexes have been written).

If a failure occurs during the Commit phase, there are automatic retries; but if failures continue, the Datastore returns an error message that your application receives as an exception. If the Commit phase succeeds but the Apply fails, the Apply is rolled forward to completion when one of the following occurs:

  • Periodic Datastore sweeps check for uncompleted Commit jobs and apply them.
  • Certain application operations (gets, puts, deletes, and ancestor queries) that use the affected entity group cause any changes that have been committed but not yet applied to be completed before proceeding with the new operation.

    Note: In the Master/Slave Datastore, ancestor queries trigger an Apply only when they are included within a transaction.

This Datastore write behavior can have several implications for how and when data is visible to your application at different parts of the Commit and Apply phases:

  • Because Datastore gets and ancestor queries apply any outstanding modifications before executing, these operations always see a consistent view of all previous successful transactions. This means that a get operation (looking up an updated entity by its key) is guaranteed to see the latest version of that entity.
  • In the HRD, as much as a few hundred milliseconds may elapse from the time a write operation returns until the transaction is completely applied. In this case, queries spanning more than one entity group cannot determine whether there are any outstanding modifications before executing and may return stale results. (This is usually not an issue with the Master/Slave Datastore, because the entire transaction is normally completed before the operation returns.)
  • The timing of concurrent query requests may affect their results. If an entity initially satisfies a query but is later changed so that it no longer does, the entity may still be included in the query's result set; it will be omitted only if the query executes after the Apply phase of the update has been completed (that is, after the indexes have been written).

Datastore Statistics

The Datastore maintains statistics about the data stored for an application, such as how many entities there are of a given kind, or how much space is used by property values of a given type. You can view these statistics in the Administration Console under Datastore > Statistics.

Data sent to the datastore by the app counts toward the Data Sent to (Datastore) API limit. Data received by the app from the datastore counts toward the Data Received from (Datastore) API limit.

The total amount of data currently stored in the datastore for the app cannot exceed the Stored Data (billable) limit. This includes all entity properties and keys and the indexes necessary to support querying these entities. See How Entities and Indexes are Stored for a complete breakdown of the metadata required to store entities and indexes at the Bigtable level.

For more information on system-wide safety limits, see Limits, and the "Quota Details" section of the Admin Console.

In addition to system-wide safety limits, the following limits apply specifically to the use of the datastore:

Limit Amount
maximum entity size 1 megabyte
maximum number of values in all indexes for an entity (1) 5,000 values
  1. An entity uses one value in an index for every column × every row that refers to the entity, in all indexes. The number of index values for an entity can grow large if an indexed property has multiple values, requiring multiple rows with repeated values in the table.