Google Code offered in: English - Español - 日本語 - 한국어 - Português - Pусский - 中文(简体) - 中文(繁體)
Conversion is an experimental, innovative, and rapidly changing new feature for App Engine. Unfortunately, being on the bleeding edge means that we may make backwards-incompatible changes to Conversion. We will inform the community when this feature is no longer experimental.
The App Engine Conversion API converts documents between common filetypes using Google's infrastructure for efficiency and scale. The API enables conversions between HTML, PDF, text, and image formats, synchronously or asynchronously, with an option to perform optical character recognition (OCR). You can use it for functions such as:
The API supports the following types of files:
html
- CSS is supported as long as it is included within the HTML page; external CSS files are not supported. Paged Media CSS is supported.txt
gif
, jpg
, bmp
- When converting from text-based formats into *.png
, the resulting document has one asset for each page of the text file. For conversions from *.pdf
files, the contents of a page are determined by the pagination in the file. For *.html
files, content is placed into *.pdf
and then converted from *.pdf
to *.png
. For *.txt
files, page size is determined by system defaults.png
files may be used as the output for a conversion, but not the input.pdf
See Conversion Paths for more information about conversions between these filetypes.
This document has the following sections:
The Conversion API employs a few core concepts and data structures:
We expect all input asset strings(e.g. HTML and TXT) in UTF-8. We support the most common fonts used across the web, and at least one font for sans-serif, serif and monospace.
Synchronous conversions apply to situations in which the user expects the converted file immediately. Users must wait for the time it takes the API to perform the conversion, so when using the synchronous API, avoid time-intensive operations (such as OCR). For example, you would use synchronous conversions to dynamically generate a PDF invoice when a customer clicks on a download link.
from google.appengine.api import conversion # Create a conversion request from HTML to PNG. asset = conversion.Asset("text/html", "<b>some data</b>", "test.html") conversion_obj = conversion.Conversion(asset, "image/png") result = conversion.convert(conversion_obj) if result.assets: # Note: in most cases, we will return data all in one asset. # Except that we return multiple assets for multiple pages image. for asset in result.assets: doSomethingWithAsset(asset.data) else: handleError(result.error_code, result.error_text)
The asynchronous API allows you to perform conversions in the background for jobs in which the user does not expect an immediate result. Asynchronous conversions are useful for time-intensive operations (such as OCR). For example, use synchronous conversions for background jobs that generate invoices and send email them to a number of customers.
from google.appengine.api import conversion # Create a conversion request from HTML to PNG. asset = conversion.Asset("text/html", "<b>some data</b>", "test.html") conversion_obj = conversion.Conversion(asset, "image/png") rpc = conversion.create_rpc() conversion.make_convert_call(rpc, conversion_obj) # ... Perform another task result = rpc.get_result() if result.assets: # Note: in most cases, we will return data all in one asset. # Except that we return multiple assets for multiple pages image. for asset in result.assets: doSomethingWithAsset(asset.data) else: handleError(result.error_code, result.error_text)
You can add more assets to a document (such as images in an HTML page) simply by defining the asset and calling the conversion function as follows:
subasset = conversion.Asset("image/gif", image_data, "static/icon.gif") conversion.add_asset(subasset)
Note: The name of the additional asset (in this case, static/icon.gif
) is used by the primary HTML file to reference the image, so it must be the same as the corresponding HTML tag src
attribute. The API does not currently support conversion using external CSS files, but you can get around this by adding the CSS directly to the HTML page.
To perform multiple conversions in one request, simply pass a list of conversions into the Convert method, and get back the results in the same order:
conversions = [ conversion.Conversion( conversion.Asset("text/html", "<b>data1</b>"), "text/plain"), conversion.Conversion( conversion.Asset("text/plain", "data2"), "text/html")] results = conversion.convert(conversions)
The Conversion API also allows you to provide more specific conversion options. You can specify:
*.png
only)*.png
only)*.txt
, *.html
, and *.pdf
only)*.png
files that are 1,000 pixels wide.
asset = conversion.Asset("text/html", "<b>some data</b>", "test.html") conversion = conversion.Conversion( asset, "image/png", image_width=1000, first_page=2, last_page=10)
Conversions can be performed in any direction between PDF, HTML, TXT, and image formats, and OCR will be employed if necessary. Note that while PNG, GIF, JPEG, and BMP image formats are supported as input formats, only PNG is available for output.
Please refer to the following table for the complete list of conversions:
Input MIME Type | Output MIME Type | Category |
---|---|---|
image/bmp, image/gif, image/jpeg, image/png | text/html | OCR |
image/bmp, image/gif, image/jpeg, image/png | application/pdf | OCR |
image/bmp, image/gif, image/jpeg, image/png | text/plain | OCR |
application/pdf | text/html | OCR |
application/pdf | text/plain | OCR |
application/pdf | image/png | Printable generation |
text/html | application/pdf | Printable generation |
text/html | image/png | Printable generation |
text/html | text/plain | HTML to TXT conversion |
text/plain | text/html | TXT to HTML conversion |
text/plain | application/pdf | Printable generation |
text/plain | image/png | Printable generation |
Each Conversion request counts toward the Conversion API Calls quota. At this stage, we offer a small free quota of 100 conversions per day.
In addition to quotas, the following limits also apply the following limits:
Limit | Amount |
---|---|
file size | 2 megabytes |
maximum conversions per request | 10 conversions |
maximum deadline | 60 seconds* |
*Beware the 60s time limit, particularly when invoking OCR on large PDF documents.