English | Site Directory

Datastore Java API Overview

The Google App Engine datastore provides robust scalable data storage for your web application. The datastore is designed with web applications in mind, with an emphasis on read and query performance. It stores data entities with properties, organized by application-defined kinds. It can perform queries over entities of the same kind, with filters and sort orders on property values and keys. All queries are pre-indexed for fast results over very large data sets. The datastore supports transactional updates, using entity groupings defined by the application as the unit of transactionality in the distributed data network.

Introducing the Datastore

The App Engine datastore stores and performs queries over data objects, known as entities. An entity has one or more properties, named values of one of several supported data types. A property can be a reference to another entity.

The datastore can execute multiple operations in a single transaction, and roll back the entire transaction if any of the operations fail. This is especially useful for distributed web applications, where multiple users may be accessing or manipulating the same data object at the same time.

Unlike traditional databases, the datastore uses a distributed architecture to manage scaling to very large data sets. An App Engine application can optimize how data is distributed by describing relationships between data objects, and by defining indexes for queries.

The App Engine datastore is strongly consistent, but it's not a relational database. While the datastore interface has many of the same features of traditional databases, the datastore's unique characteristics imply a different way of designing and managing data to take advantage of the ability to scale automatically.

Java Data Interfaces

Datastore entities are schemaless: Two entities of the same kind are not obligated to have the same properties, or use the same value types for the same properties. The application is responsible for ensuring that entities conform to a schema when needed. The Java SDK includes implementations of the Java Data Objects (JDO) and Java Persistence API (JPA) interfaces for modeling and persisting data. These standard interfaces include mechanisms for defining classes for data objects, and for performing queries. The datastore also provides a low-level API, which you can use to implement other interface adapters, or just use directly in your applications.

This documentation describes JDO in detail. Most other sources of information about JDO also apply to App Engine. Please refer to the documentation in this guide for information about implementation differences and unimplemented features.

JPA is also supported. This documentation describes JPA briefly; see Using JPA.

JDO uses annotations on Java classes ("plain old Java objects," or "POJOs") to describe how instances of the class are stored in the datastore as entities, and how entities are recreated as instances when retrieved from the datastore. The following is an example of a simple data class:

Employee.java

import java.util.Date;
import javax.jdo.annotations.IdGeneratorStrategy;
import javax.jdo.annotations.IdentityType;
import javax.jdo.annotations.PersistenceCapable;
import javax.jdo.annotations.Persistent;
import javax.jdo.annotations.PrimaryKey;

@PersistenceCapable(identityType = IdentityType.APPLICATION)
public class Employee {
    @PrimaryKey
    @Persistent(valueStrategy = IdGeneratorStrategy.IDENTITY)
    private Long id;

    @Persistent
    private String firstName;

    @Persistent
    private String lastName;

    @Persistent
    private Date hireDate;

    public Employee(String firstName, String lastName, Date hireDate) {
        this.firstName = firstName;
        this.lastName = lastName;
        this.hireDate = hireDate;
    }

    // Accessors for the fields.  JDO doesn't use these, but your application does.

    public Long getId() {
        return id;
    }

    public String getFirstName() {
        return firstName;
    } 

    // ... other accessors...
}

You interact with the datastore using a PersistenceManager object, which you obtain from a PersistenceManagerFactory object. Because setting up a PersistenceManagerFactory is expensive, you must make sure to get the instance only once in the lifetime of your application (which may span many requests). An easy way to do this is to wrap it in a singleton class, like this:

PMF.java

import javax.jdo.JDOHelper;
import javax.jdo.PersistenceManagerFactory;

public final class PMF {
    private static final PersistenceManagerFactory pmfInstance =
        JDOHelper.getPersistenceManagerFactory("transactions-optional");

    private PMF() {}

    public static PersistenceManagerFactory get() {
        return pmfInstance;
    }
}

You can instantiate instances of data classes and store them in the datastore:

import java.util.Date;
import javax.jdo.JDOHelper;
import javax.jdo.PersistenceManager;
import javax.jdo.PersistenceManagerFactory;

import Employee;
import PMF;

// ...
        Employee employee = new Employee("Alfred", "Smith", new Date());

        PersistenceManager pm = PMF.get().getPersistenceManager();

        try {
            pm.makePersistent(employee);
        } finally {
            pm.close();
        }

JDO includes a query interface called JDOQL. You can use JDOQL to retrieve entities as instances of this class, as follows:

import java.util.List;
import Employee;

// ...
        String query = "select from " + Employee.class.getName() + " where lastName == 'Smith'";
        List<Employee> employees = (List<Employee>) pm.newQuery(query).execute();

Entities and Properties

A data object in the App Engine datastore is known as an entity. An entity has one or more properties, named values of one of several data types, including integers, floating point values, strings, dates, binary data, and more.

Each entity also has a key that uniquely identifies the entity. The simplest key has a kind and a unique numeric ID provided by the datastore. The ID can also be a string provided by the application.

An application can fetch an entity from the datastore by using its key, or by performing a query that matches the entity's properties. A query can return zero or more entities, and can return the results sorted by property values. A query can also limit the number of results returned by the datastore to conserve memory and run time.

Unlike relational databases, the App Engine datastore does not require that all entities of a given kind have the same properties. The application can specify and enforce its data model using libraries included with the SDK, or its own code.

A property can have one or more values. A property with multiple values can have values of mixed types. A query on a property with multiple values tests whether any of the values meets the query criteria. This makes such properties useful for testing for membership.

Queries and Indexes

An App Engine datastore query operates on every entity of a given kind (a data class). It specifies zero or more filters on entity property values and keys, and zero or more sort orders. An entity is returned as a result for a query if the entity has at least one value (possibly null) for every property mentioned in the query's filters and sort orders, and all of the filter criteria are met by the property values.

Every datastore query uses an index, a table that contains the results for the query in the desired order. An App Engine application defines its indexes in a configuration file. The development web server automatically adds suggestions to this file as it encounters queries that do not yet have indexes configured. You can tune indexes manually by editing the file before uploading the application. As the application makes changes to datastore entities, the datastore updates the indexes with the correct results. When the application executes a query, the datastore fetches the results directly from the corresponding index.

This mechanism supports a wide range of queries and is suitable for most applications. However, it does not support some kinds of queries you may be used to from other database technologies.

Transactions and Entity Groups

With the App Engine datastore, every attempt to create, update or delete an entity happens in a transaction. A transaction ensures that every change made to the entity is saved to the datastore, or, in the case of failure, none of the changes are made. This ensures consistency of data within an entity.

You can perform multiple actions on an entity within a single transaction using the transaction API. For example, say you want to increment a counter field in an object. To do so, you need to read the value of the counter, calculate the new value, then store it. Without a transaction, it is possible for another process to increment the counter between the time you read the value and the time you update the value, causing your app to overwrite the updated value. Doing the read, calculation and write in a single transaction ensures that no other process interferes with the increment.

You can make changes to multiple entities within a single transaction. To support this, App Engine needs to know in advance which entities will be updated together, so it knows to store them in a way that supports transactions. You must declare that an entity belongs to the same entity group as another entity when you create the entity. All entities fetched, created, updated or deleted in a transaction must be in the same entity group.

Entity groups are defined by a hierarchy of relationships between entities. To create an entity in a group, you declare that the entity is a child of another entity already in the group. The other entity is the parent. An entity created without a parent is a root entity. A root entity without any children exists in an entity group by itself. Each entity has a path of parent-child relationships from a root entity to itself (the shortest path being no parent). This path is an essential part of the entity's complete key. A complete key can be represented by the kind and ID or key name of each entity in the path.

The datastore uses optimistic concurrency to manage transactions. While one app instance is applying changes to entities in an entity group, all other attempts to update any entity in the group fail instantly. The app can try the transaction again to apply it to the updated data.

Quotas and Limits

Each call to the datastore API counts toward the Datastore API Calls quota. Note that some library calls result in multiple calls to the underlying datastore API.

Data sent to the datastore by the app counts toward the Data Sent to (Datastore) API quota. Data received by the app from the datastore counts toward the Data Received from (Datastore) API quota.

The total amount of data currently stored in the datastore for the app cannot exceed the Stored Data (adjustable) quota. This includes entity properties and keys, but does not include indexes.

The amount of CPU time consumed by datastore operations applies to the following quotas:

  • CPU Time (adjustable)
  • Datastore CPU Time

For more information on quotas, see Quotas, and the "Quota Details" section of the Admin Console.

In addition to quotas, the following limits apply to the use of the datastore:

Limit Amount
maximum entity size 1 megabyte
maximum number of values in an index for an entity (1) 1,000 values
maximum number of entities in a batch put or batch delete 500 entities
maximum number of entities in a batch get 1,000 entities
maximum results offset for a query 1,000
  1. An entity uses one value in an index for every column × every row that refers to the entity, in all indexes. The number of indexes values for an entity can grow large if an indexed property has multiple values, requiring multiple rows with repeated values in the table.