Google Code offered in: English - Español - 日本語 - 한국어 - Português - Pусский - 中文(简体) - 中文(繁體)
Prospective Search is an experimental, innovative, and rapidly changing new feature for App Engine. Unfortunately, being on the bleeding edge means that we may make backwards-incompatible changes to Prospective Search. We will inform the community when this feature is no longer experimental.
Prospective search is a querying service that allows your application to match search queries against real-time data streams. For every document presented, prospective search returns the ID of every registered query that matches the document.
Prospective search allows you to register a large set of queries and simultaneously match the queries against a single document. It is particularly useful for applications that process streaming data, for example:
To understand prospective search, it's helpful to compare it to the conventional retrospective search model. In a retrospective search application, such as Google search, the application must build, or have access to, an index of the data to be searched. Needing to pre-index the data makes it difficult and expensive to create real-time applications, because each query must be executed separately against a potentially large index.
In a prospective search application, such as Google Alerts, you register search queries and match them against new documents in real time, as the documents are inserted into your application. This allows you to create applications that efficiently monitor incoming live data. You are not limited to using existing, indexed data.
Applications often use both retrospective and prospective search capabilities to get the best of both worlds. For example, an application can use retrospective search to find matching documents indexed in the past while using prospective search to find matching documents as soon as they arrive.
The life of a typical prospective search application looks something like this:
db.Model
.Here's a summary of the essential function calls:
Function | Description |
---|---|
get_subscription() | Returns information about a single subscription such as the state, the query, the expiration time, and the subscription ID. |
list_subscriptions() | Returns information about a specified number of subscriptions, such as the state, the query, the expiration time, and the subscription ID. |
list_topics() | Lists all topics currently in existence. |
match() | Matches all subscriptions within a topic. Returns results in the Task Queue rather than returning them directly, to ensure that the application can scale. |
subscribe() | Registers subscriptions made up of a subscription ID and a query for a given topic. Expect a delay of a few seconds between when |
unsubscribe() | Removes a subscription. |
Note: Subscriptions may not serve immediately after the subscribe()
call. Once the subscription has an OK status, it is guaranteed to serve. Typically, the delay is within a couple of seconds. Check the status with get_subscription()
or list_subscriptions()
.
Prospective search applications may match queries against one or more streams of documents. Developers separate streams of documents by assigning a unique topic to documents they want grouped together and matched against a given set of queries. Generally, developers assign the same topic to documents of the same schema or format, but this convention is not enforced.
Topics are not defined as a separate step; instead, topics are created as a side effect of the subscribe()
call. As soon as a new topic is passed to subscribe()
, the topic exists. As soon as the last subscription using a given topic is deleted, the topic ceases to exist.
Documents are assigned to a particular topic when calling match()
. The topic name can either be explicitly specified to match()
or is taken from the class name of the document. See Creating Documents.
Use list_topics()
to list all topics that currently exist.
You may also specify a result_key
argument in the match()
call that is returned with the matching results. A result_key
can be useful if you know, for example, that returned documents are too large for the task queue. In this case, you can choose to store the documents in a database and use the identifying result_key
to retrieve them later.
The document is a class derived from db.Model
. It contains a set of properties which correspond to fields, and queries can match against these fields. For example, the following code sample creates a definition using db.Model
.
class Comment(db.Model): author = db.StringProperty() body = db.TextProperty() length = db.IntegerProperty()
The example document will have the topic "Comment" derived from the class name, unless it is explicitly overwritten in the match()
call. The document defines two string fields named author
and body
, and one integer field named length
.
Note: Even though the document is assigned to a topic at this point, the topic does not exist in prospective search until the subscribe()
function registers a query associated with this topic.
Here's how to populate the db.Model
object with data from a data source and create an instance of the document:
comment = Comment() comment.author = "Rose Jones" comment.body = "A rose by any other name would smell as sweet." comment.length = len(comment.body)
This example stores a string, text, and an integer in the appropriate fields.
Prospective search matches the following properties:
db.StringProperty
db.IntegerProperty
db.BooleanProperty
db.FloatProperty
db.TextProperty
Prospective search also supports list properties. Conditions on list properties check all values in the list and match any matching value in the list. The following list properties are supported:
db.StringListProperty()
db.ListProperty()
For db.ListProperty
, prospective search supports the following types:
str
unicode
bool
int
(32-bit int range only)float
db.Text
Prospective search uses a simple query language allowing you to query the contents of a document's fields. This query language supports numeric and text expressions and uses a field:value
syntax. The field identifies the name of a property defined as part of the Entity
or derived document class. The value defines the query on the specified field—a string or numerical value. Text fields and queries can be unicode strings.
Prospective search supports all space-delimited languages. Prospective search supports some languages not segmented by spaces (specifically, Chinese, Japanese, Korean, and Thai). For these languages, prospective search segments the text automatically.
The simplest type of query consists only of a string or text type value. The value can be a word or phrase to be matched against any supported string or text fields in the document. Queries are not case sensitive.
For example, to find all documents with the word "rose" (regardless of case) in any string or text field in the document, use a query like the following:
rose
This simple query matches against any supported string or text field in the document. If your documents are "Comments" as defined in the Creating Documents section, the query matches if the word "rose" appears in the author or body fields. If the schema defines additional string or text fields, such as a subject or email, rose
also matches the contents of those fields.
To match a phrase, surround the query in quotes as follows:
"any other name"
Note: Queries built with the value by itself only match against string or text type fields.
To create more complex queries that reference specific fields, use both the field and value in your query. Use a colon to delineate the two as follows:
field:value
This syntax allows you to reference any supported field defined in a schema by name. For example, to search for "Rose" only within the author field of a comment document as defined in Creating Documents, use the following query:
author:rose
To search the body field for the phrase "any other name", use the following query:
body:"any other name"
To match against multiple fields at the same time, list a series of field:value
pairs together with a space between them as follows:
author:"Rose Jones" body:rose
The query language supports a number of Boolean operators as well as parentheses for grouping parts of the query together. The supported Boolean operators are AND
, OR
, and NOT
. Always use uppercase for Boolean operators. Lowercase words are treated as part of the field or value portions of the query.
By default, when you create queries that match multiple fields at the same time, each value
is combined with a Boolean AND
. For the query as a whole to match, all the specified values must match.
You can also explicitly specify this by using the AND
Boolean operator. The following two queries are equivalent:
author:rose body:"any other name" author:rose AND body:"any other name"
Use the OR
operator if you only want to know if any of the two values matches. You can use more than one OR in a query. For example:
author:("bob" OR ("rose" OR "tom") AND "jones")
This example matches any document whose author
field contains either "Rose Jones", "Tom Jones", or "Bob".
For an example of Boolean NOT
, see the following:
author:rose NOT body:filligree
This example matches any document whose author
field contains "rose" but whose body
field does not contain "filligree".
Use parentheses to create more complex queries combining supported Boolean operators.
For example:
(author:Thomas OR author:Jones) AND (NOT body:rose)
This example matches documents with author "Thomas" or "Jones" only if the body
field of the comment does not contain "rose".
Numeric operators only match against numeric fields. Supported numeric operators are as follows:
< > <= >= =
For "not equal to", use the Boolean NOT with a numeric field name such as length
. For example:
NOT length = 15
This example returns documents whose length
is not 15.
You can combine numeric operators with text and Boolean operators. For example:
author:"Rose Jones" length > 15
This query matches comments whose body
field is longer than 15 characters in length and whose author
field is "Rose Jones".
The Prospective Search API returns match results by creating events on the TaskQueue. This section describes how to process the match events.
The Match method defines which TaskQueue to use, how many subscription ids per TaskQueue task, and what additional information to send (such as the document itself, or a key to identify the document).
To receive the resulting matching subscription ids, first, you must map the request handler to your match response handler:
def main(argv): application = webapp.WSGIApplication([('/', MainHandler), ('/_ah/prospective_search', MatchResponseHandler)], debug=True) util.run_wsgi_app(application)
In MatchResponseHandler you can access parameters of the POST request which includes the matching subscription IDs and the document sent for matching:
class MatchResponseHandler(webapp.RequestHandler): """MatchResponseHandler receives match results from TaskQueue.""" def post(self): # List of subscription ids that matched for match. sub_ids = self.request.get_all('id') # document from match request, either a python dict or db.Model # if result_return_document = true in Match call doc = prospective_search.get_document(self.request) # topic from match request topic = self.request.get('topic') # Key specified in match call. key = self.request.get_all('key') # Number of total matching subscriptions from match request # which generated this result event. results_count = self.request.get_all('results_count') # Index of 1st subscription in this match result batch. # 0 <= result_offset < results_count. results_offset = self.request.get_all('results_offset')