A database engine is a subclass of sqlalchemy.sql.Engine
, and is the starting point for where SQLAlchemy provides a layer of abstraction on top of the various DBAPI2 database modules. For all databases supported by SA, there is a specific "implementation" module, found in the sqlalchemy.databases
package, that provides all the objects an Engine
needs in order to perform its job. A typical user of SQLAlchemy never needs to deal with these modules directly. For many purposes, the only knowledge that's needed is how to create an Engine for a particular connection URL. When dealing with direct execution of SQL statements, one would also be aware of Result, Connection, and Transaction objects. The primary public facing objects are:
create_engine()
function.
sqlalchemy.engine.base.ComposedSQLEngine
.
begin()
, commit()
and rollback()
methods that support basic "nestable" behavior, meaning an outermost transaction is maintained against multiple nested calls to begin/commit.
Underneath the public-facing API of ComposedSQLEngine
, several components are provided by database implementations to provide the full behavior, including:
CREATE
and DROP
statements.
Engines exist for SQLite, Postgres, MySQL, and Oracle, using the Pysqlite, Psycopg2 (Psycopg1 will work to some degree but its typing model is not supported...install Psycopg2!), MySQLDB, and cx_Oracle modules. There is also preliminary support for MS-SQL using adodbapi or pymssql, as well as Firebird. For each engine, a distinct Python module exists in the sqlalchemy.databases
package, which provides implementations of some of the objects mentioned in the previous section.
Downloads for each DBAPI at the time of this writing are as follows:
The SQLAlchemy Wiki contains a page of database notes, describing whatever quirks and behaviors have been observed. Its a good place to check for issues with specific databases. Database Notes
SQLAlchemy 0.2 indicates the source of an Engine strictly via RFC-1738 style URLs, combined with optional keyword arguments to specify options for the Engine. The form of the URL is:
$ driver://username:password@host:port/database
Available drivernames are sqlite
, mysql
, postgres
, oracle
, mssql
, and firebird
. For sqlite, the database name is the filename to connect to, or the special name ":memory:" which indicates an in-memory database. The URL is typically sent as a string to the create_engine()
function:
# postgres pg_db = create_engine('postgres://scott:tiger@localhost:5432/mydatabase') # sqlite (note the four slashes for an absolute path) sqlite_db = create_engine('sqlite:////absolute/path/to/database.txt') sqlite_db = create_engine('sqlite:///relative/path/to/database.txt') sqlite_db = create_engine('sqlite://') # in-memory database # mysql mysql_db = create_engine('mysql://localhost/foo') # oracle via TNS name oracle_db = create_engine('oracle://scott:tiger@dsn') # oracle will feed host/port/SID into cx_oracle.makedsn oracle_db = create_engine('oracle://scott:tiger@127.0.0.1:1521/sidname')
The Engine
will create its first connection to the database when a SQL statement is executed. As concurrent statements are executed, the underlying connection pool will grow to a default size of five connections, and will allow a default "overflow" of ten. Since the Engine
is essentially "home base" for the connection pool, it follows that you should keep a single Engine
per database established within an application, rather than creating a new one for each connection.
Keyword options can also be specified to create_engine()
, following the string URL as follows:
db = create_engine('postgres://...', encoding='latin1', echo=True, module=psycopg1)
Options that can be specified include the following:
plain
, which is the default, and threadlocal
, which applies a "thread-local context" to implicit executions performed by the Engine. This context is further described in Implicit Connection Contexts.
sqlalchemy.pool.Pool
to be used as the underlying source for connections, overriding the engine's connect arguments (pooling is described in Connection Pooling). If None, a default Pool
(usually QueuePool
, or SingletonThreadPool
in the case of SQLite) will be created using the engine's connect arguments.
Example:
from sqlalchemy import * import sqlalchemy.pool as pool import MySQLdb def getconn(): return MySQLdb.connect(user='ed', dbname='mydb') engine = create_engine('mysql', pool=pool.QueuePool(getconn, pool_size=20, max_overflow=40))
QueuePool
as well as SingletonThreadPool
as of 0.2.7.
QueuePool
.
QueuePool
.
echo
attribute of ComposedSQLEngine
can be modified at any time to turn logging on and off. If set to the string "debug"
, result rows will be printed to the standard output as well.
sys.stdout
.
<column1>(+)=<column2>
must be used in order to achieve a LEFT OUTER JOIN. threaded
parameter of the connection indicating thread-safe usage. cx_Oracle docs indicate setting this flag to False
will speed performance by 10-15%. While this defaults to False
in cx_Oracle, SQLAlchemy defaults it to True
, preferring stability over early optimization.
Unicode
column type instead.
Unicode
type object.
In this section we describe the SQL execution interface available from an Engine
instance. Note that when using the Object Relational Mapper (ORM) as well as when dealing with with "bound" metadata objects (described later), SQLAlchemy deals with the Engine for you and you generally don't need to know much about it; in those cases, you can skip this section and go to Database Meta Data.
The Engine provides a connect()
method which returns a Connection
object. This object provides methods by which literal SQL text as well as SQL clause constructs can be compiled and executed.
engine = create_engine('sqlite:///:memory:') connection = engine.connect() result = connection.execute("select * from mytable where col1=:col1", col1=5) for row in result: print row['col1'], row['col2'] connection.close()
The close
method on Connection
does not actually remove the underlying connection to the database, but rather indicates that the underlying resources can be returned to the connection pool. When using the connect()
method, the DBAPI connection referenced by the Connection
object is not referenced anywhere else.
In both execution styles above, the Connection
object will also automatically return its resources to the connection pool when the object is garbage collected, i.e. its __del__()
method is called. When using the standard C implementation of Python, this method is usually called immediately as soon as the object is dereferenced. With other Python implementations such as Jython, this is not so guaranteed.
The execute method on Engine
and Connection
can also receive SQL clause constructs as well, which are described in Constructing SQL Queries via Python Expressions:
connection = engine.connect() result = connection.execute(select([table1], table1.c.col1==5)) for row in result: print row['col1'], row['col2'] connection.close()
Both Connection
and Engine
fulfill an interface known as Connectable
which specifies common functionality between the two objects, such as getting a Connection
and executing queries. Therefore, most SQLAlchemy functions which take an Engine
as a parameter with which to execute SQL will also accept a Connection
:
engine = create_engine('sqlite:///:memory:') # specify some Table metadata metadata = MetaData() table = Table('sometable', metadata, Column('col1', Integer)) # create the table with the Engine table.create(engine=engine) # drop the table with a Connection off the Engine connection = engine.connect() table.drop(engine=connection)
An implicit connection refers to connections that are allocated by the Engine
internally. There are two general cases when this occurs: when using the various execute()
methods that are available off the Engine
object itself, and when calling the execute()
method on constructed SQL objects, which are described in None.
engine = create_engine('sqlite:///:memory:') result = engine.execute("select * from mytable where col1=:col1", col1=5) for row in result: print row['col1'], row['col2'] result.close()
When using implicit connections, the returned ResultProxy
has a close()
method which will return the resources used by the underlying Connection
.
The strategy
keyword argument to create_engine()
affects the algorithm used to retreive the underlying DBAPI connection used by implicit executions. When set to plain
, each implicit execution requests a unique connection from the connection pool, which is returned to the pool when the resulting ResultProxy
falls out of scope (i.e. __del__()
is called) or its close()
method is called. If a second implicit execution occurs while the ResultProxy
from the previous execution is still open, then a second connection is pulled from the pool.
When strategy
is set to threadlocal
, the Engine
still checks out a connection which is closeable in the same manner via the ResultProxy
, except the connection it checks out will be the same connection as one which is already checked out, assuming the operation is in the same thread. When all ResultProxy
objects are closed, the connection is returned to the pool normally.
It is crucial to note that the plain
and threadlocal
contexts do not impact the connect() method on the Engine. connect()
always returns a unique connection. Implicit connections use a different method off of Engine
for their operations called contextual_connect()
.
The plain
strategy is better suited to an application that insures the explicit releasing of the resources used by each execution. This is because each execution uses its own distinct connection resource, and as those resources remain open, multiple connections can be checked out from the pool quickly. Since the connection pool will block further requests when too many connections have been checked out, not keeping track of this can impact an application's stability.
db = create_engine('mysql://localhost/test', strategy='plain') # execute one statement and receive results. r1 now references a DBAPI connection resource. r1 = db.execute("select * from table1") # execute a second statement and receive results. r2 now references a *second* DBAPI connection resource. r2 = db.execute("select * from table2") for row in r1: ... for row in r2: ... # release connection 1 r1.close() # release connection 2 r2.close()
Advantages to plain
include that connection resources are immediately returned to the connection pool, without any reliance upon the __del__()
method; there is no chance of resources being left around by a Python implementation that doesn't necessarily call __del__()
immediately.
The threadlocal
strategy is better suited to a programming style which relies upon the __del__()
method of Connection objects in order to return them to the connection pool, rather than explicitly issuing a close()
statement upon the ResultProxy
object. This is because all of the executions within a single thread will share the same connection, if one has already been checked out in the current thread. Using this style, an application will use only one connection per thread at most within the scope of all implicit executions.
db = create_engine('mysql://localhost/test', strategy='threadlocal') # execute one statement and receive results. r1 now references a DBAPI connection resource. r1 = db.execute("select * from table1") # execute a second statement and receive results. r2 now references the *same* resource as r1 r2 = db.execute("select * from table2") for row in r1: ... for row in r2: ... # dereference r1. the connection is still held by r2. r1 = None # dereference r2. with no more references to the underlying connection resources, they # are returned to the pool. r2 = None
Advantages to threadlocal
include that resources can be left to clean up after themselves, application code can be more minimal, its guaranteed that only one connection is used per thread, and there is no chance of a "connection pool block", which is when an execution hangs because the current thread has already checked out all remaining resources.
To get at the actual Connection
object which is used by implicit executions, call the contextual_connection()
method on Engine
:
# threadlocal strategy db = create_engine('mysql://localhost/test', strategy='threadlocal') conn1 = db.contextual_connection() conn2 = db.contextual_connection() >>> assert conn1.connection is conn2.connection True
When the plain
strategy is used, the contextual_connection()
method is synonymous with the connect()
method; both return a distinct connection from the pool.
The Connection
object provides a begin()
method which returns a Transaction
object. This object is usually used within a try/except clause so that it is guaranteed to rollback()
or commit()
:
trans = connection.begin() try: r1 = connection.execute(table1.select()) connection.execute(table1.insert(), col1=7, col2='this is some data') trans.commit() except: trans.rollback() raise
The Transaction
object also handles "nested" behavior by keeping track of the outermost begin/commit pair. In this example, two functions both issue a transaction on a Connection, but only the outermost Transaction object actually takes effect when it is committed.
# method_a starts a transaction and calls method_b def method_a(connection): trans = connection.begin() # open a transaction try: method_b(connection) trans.commit() # transaction is committed here except: trans.rollback() # this rolls back the transaction unconditionally raise # method_b also starts a transaction def method_b(connection): trans = connection.begin() # open a transaction - this runs in the context of method_a's transaction try: connection.execute("insert into mytable values ('bat', 'lala')") connection.execute(mytable.insert(), col1='bat', col2='lala') trans.commit() # transaction is not committed yet except: trans.rollback() # this rolls back the transaction unconditionally raise # open a Connection and call method_a conn = engine.connect() method_a(conn) conn.close()
Above, method_a
is called first, which calls connection.begin()
. Then it calls method_b
. When method_b
calls connection.begin()
, it just increments a counter that is decremented when it calls commit()
. If either method_a
or method_b
calls rollback()
, the whole transaction is rolled back. The transaction is not committed until method_a
calls the commit()
method.
Note that SQLAlchemy's Object Relational Mapper also provides a way to control transaction scope at a higher level; this is described in SessionTransaction.