English

Google App Engine

Memcache Python API Overview

High performance scalable web applications often use a distributed in-memory data cache in front of or in place of robust persistent storage for some tasks. App Engine includes a memory cache service for this purpose.

Caching Data in Python

The following example demonstrates several ways to set values in Memcache using the Python API.

from google.appengine.api import memcache

# Add a value if it doesn't exist in the cache, with a cache expiration of 1 hour.
memcache.add(key="weather_USA_98105", value="raining", time=3600)

# Set several values, overwriting any existing values for these keys.
memcache.set_multi({ "USA_98105": "raining",
                     "USA_94105": "foggy",
                     "USA_94043": "sunny" },
                     key_prefix="weather_", time=3600)

# Atomically increment an integer value.
memcache.set(key="counter", value=0)
memcache.incr("counter")
memcache.incr("counter")
memcache.incr("counter")

When to Use a Memory Cache

One use of a memory cache is to speed up common datastore queries. If many requests make the same query with the same parameters, and changes to the results do not need to appear on the web site right away, the app can cache the results in the memcache. Subsequent requests can check the memcache, and only perform the datastore query if the results are absent or expired. Session data, user preferences, and any other queries performed on most pages of a site are good candidates for caching.

Memcache may be useful for other temporary values. However, when considering whether to store a value solely in the memcache and not backed by other persistent storage, be sure that your application behaves acceptably when the value is suddenly not available. Values can expire from the memcache at any time, and may be expired prior to the expiration deadline set for the value. For example, if the sudden absence of a user's session data would cause the session to malfunction, that data should probably be stored in the datastore in addition to the memcache.

Using Compare-and-Set in Python

This section describes compare and set behavior in Python. There is also a Java API providing similar functionality on the Java side.

What is Compare and Set?

The compare and set feature provides a way to safely make key-value updates to memcache in scenarios where multiple requests are being handled concurrently that need to update the same memcache key in an atomic fashion. Without using the compare and set feature, it is possible to get race conditions in those scenarios.

For a complete discussion of the compare and set feature for Python, see Guido van Rossum's blog post Compare-And-Set in Memcache.

Key Logical Components of Compare and Set

The Client object is required for compare and set because certain state information is stored away in it by the methods that support compare and set. (You cannot use the memcache functions, which are stateless.)

When you retrieve keys, you must use the memcache Client methods that support compare and set: gets() or get_multi() with the for_cas param set to True. The gets() operation internally receives two values from the memcache service: the value stored for the key and a timestamp (also known as the cas_id). The timestamp is an opaque number; only the memcache service knows what it means. The important thing is that each time the value associated with a memcache key is updated, the associated timestamp is changed. The gets() operation stores this timestamp in a Python dict on the Client object, using the key passed to gets() as the dict key.

When you update a key, you must use the memcache Client methods that support compare and set: cas() or cas_multi(). The cas() operation internally adds the timestamp to the request it sends to the memcache service. The service then compares the timestamp received with a cas() operation to the timestamp currently associated with the key. If they match, it updates the value and the timestamp, and returns success. If they don't match, it leaves the value and timestamp alone, and returns failure. By the way, it does not send the new timestamp back with a successful response. The only way to retrieve the timestamp is to call gets().

The other key logical component is the App Engine memcache service and its behavior with regard to compare and set. The App Engine memcache service itself behaves atomically. That is, when two concurrent requests (for the same app id) use memcache, they will go to the same memcache service instance (for historic reasons called a shard), and the memcache service has enough internal locking so that concurrent requests for the same key are properly serialized. In particular this means that two cas() requests for the same key do not actually run in parallel -- the service handles the first request that came in until completion (i.e., updating the value and timestamp) before it starts handling the second request.

Using Compare and Set

To use the compare and set feature,

  1. Instantiate a memcache Client object.
  2. Use a Retry loop
    1. Within the Retry loop, get the key using gets() (or get_multi() with the for_cas param set to True).
    2. Within the Retry loop, Update the key value using cas() or cas_multi().

The following snippet shows one way to use this feature:

def bump_counter(key):
   client = memcache.Client()
   while True: # Retry loop
     counter = client.gets(key)
     assert counter is not None, 'Uninitialized counter'
     if client.cas(key, counter+1):
        break

The retry loop is necessary because without the loop this code doesn't actually avoid race conditions, it just detects them! The memcache service guarantees that when used in the pattern shown here (i.e. using gets() and cas(), if two (or more) different client instances happen to be involved a race condition, only the first one to execute the cas() operation will succeed (return True), while the second one (and later ones) will fail (return False).

Another refinement you should add to this sample code is to set a limit on the number of retries, to avoid an infinite loop in worst-case scenarios where there is a lot of contention for the same counter (meaning more requests are trying to update the counter than the memcache service can process in real time).

How Cached Data Expires

By default, values stored in memcache are retained as long as possible. Values may be evicted from the cache when a new value is added to the cache if the cache is low on memory. When values are evicted due to memory pressure, the least recently used values are evicted first.

The app can provide an expiration time when a value is stored, as either a number of seconds relative to when the value is added, or as an absolute Unix epoch time in the future (a number of seconds from midnight January 1, 1970). The value will be evicted no later than this time, though it may be evicted for other reasons.

Under rare circumstances, values may also disappear from the cache prior to expiration for reasons other than memory pressure. While memcache is resilient to server failures, memcache values are not saved to disk, so a service failure may cause values to become unavailable.

In general, an application should not expect a cached value to always be available.

Quotas and Limits

Each Memcache call counts toward the Memcache API Calls quota.

Data sent by the application to the memcache counts toward the Data Sent to (Memcache) API quota. Data received from the memcache counts toward the Data Received from (Memcache) API quota.

For more information on quotas, see Quotas, and the "Quota Details" section of the Admin Console.

In addition to quotas, the following limits apply to the use of the Memcache service:

Limit Amount
maximum size of a cached value 1 megabyte
  • A key can be any size. If the key is larger than 250 bytes, it is hashed to a 250-byte value before storing or retrieving.
  • The "multi" batch operations can have any number of elements. The total size of the call and the total size of the data fetched must not exceed 32 megabytes.