This tutorial provides a relatively simple walking tour through the basic concepts of SQLAlchemy. You may wish to skip it and dive into the main manual which is more reference-oriented. The examples in this tutorial comprise a fully working interactive Python session, and are guaranteed to be functioning courtesy of doctest.
Installing SQLAlchemy from scratch is most easily achieved with setuptools. (setuptools installation). Just run this from the command-line:
# easy_install SQLAlchemy
This command will download the latest version of SQLAlchemy from the Python Cheese Shop and install it to your system.
Otherwise, you can install from the distribution using the setup.py
script:
# python setup.py install
SQLAlchemy is designed to operate with a DBAPI implementation built for a particular database, and includes support for the most popular databases. If you have one of the supported DBAPI implementations, you can proceed to the following section. Otherwise SQLite is an easy-to-use database to get started with, which works with plain files or in-memory databases.
To work with SQLite, you'll need:
Note that the SQLite library download is not required with Windows, as the Windows Pysqlite library already includes it linked in. Pysqlite and SQLite can also be installed on Linux or FreeBSD via pre-made packages or from sources.
SQLAlchemy provides the entire namespace of everything you'll need under the module name sqlalchemy
. For the purposes of this tutorial, we will import its full list of symbols into our own local namespace.
>>> from sqlalchemy import *
After our imports, the next thing we need is a handle to the desired database, represented by an Engine
object. This object handles the business of managing connections and dealing with the specifics of a particular database. Below, we will make a SQLite connection to a file-based database called "tutorial.db".
>>> db = create_engine('sqlite:///tutorial.db')
For full information on creating database engines, including those for SQLite and others, see Database Engines.
A core philosophy of SQLAlchemy is that tables and domain classes are different beasts. For this reason, SQLAlchemy provides constructs that represent tables by themselves (known as table metadata). So we will begin by constructing table metadata objects and performing SQL operations with them directly. Later, we will look into SQLAlchemy's Object Relational Mapper (ORM), which provides an additional layer of abstraction onto table metadata, allowing us to load and save objects of any arbitrary Python class.
Firstly, your Tables have to belong to a collection called MetaData
. We will create a handy form of MetaData
that automatically connects to our Engine
(connecting a schema object to an Engine is called binding):
>>> metadata = BoundMetaData(db)
An equivalent operation is to create the BoundMetaData
object directly with an Engine URL, which calls the create_engine
call for us:
>>> metadata = BoundMetaData('sqlite:///tutorial.db')
Now, when we tell "metadata" about the tables in our database, we can issue CREATE statements for those tables, as well as create and execute SQL statements derived from them, without needing to open or close any connections; that will be all done automatically. Note that this feature is entirely optional. SQLAlchemy includes full support for explicit Connections used with schema and SQL constructs that are entirely unbound to any Engine.
For the purposes of this tutorial, we will stick with "bound" objects, as it makes the code simpler and easier to read.
With metadata
as our established home for tables, lets make a Table for it:
>>> users_table = Table('users', metadata, ... Column('user_id', Integer, primary_key=True), ... Column('user_name', String(40)), ... Column('password', String(10)) ... )
As you might have guessed, we have just defined a table named users
which has three columns: user_id
(which is a primary key column), user_name
and password
. Currently it is just an object that doesn't necessarily correspond to an existing table in our database. To actually create the table, we use the create()
method. To make it interesting, we will have SQLAlchemy echo the SQL statements it sends to the database, by setting the echo
flag on the Engine
associated with our BoundMetaData
:
>>> metadata.engine.echo = True >>> users_table.create() CREATE TABLE users( user_id INTEGER NOT NULL PRIMARY KEY, user_name VARCHAR(40), password VARCHAR(10) ) ...
Alternatively, the users
table might already exist (such as, if you're running examples from this tutorial for the second time), in which case you can just skip the create()
method call. You can even skip defining the individual columns in the users
table and ask SQLAlchemy to load its definition from the database:
>>> users_table = Table('users', metadata, autoload=True) >>> list(users_table.columns)[0].name 'user_id'
Documentation on table metadata is available in Database Meta Data.
Inserting is achieved via the insert()
method, which defines a clause object (known as a ClauseElement
) representing an INSERT statement:
>>> i = users_table.insert() >>> i <sqlalchemy.sql.Insert object at 0x...> >>> print i INSERT INTO users (user_id, user_name, password) VALUES (?, ?, ?)
Since we created this insert statement object from the users
table which is bound to our Engine
, the statement itself is also bound to the Engine
, and supports executing itself. The execute()
method of the clause object will compile the object into a string according to the underlying dialect of the Engine to which the statement is bound, and will then execute the resulting statement.
>>> i.execute(user_name='Mary', password='secure') INSERT INTO users (user_name, password) VALUES (?, ?) ['Mary', 'secure'] COMMIT <sqlalchemy.engine.base.ResultProxy instance at 0x...> >>> i.execute({'user_name':'Tom'}, {'user_name':'Fred'}, {'user_name':'Harry'}) INSERT INTO users (user_name) VALUES (?) [['Tom'], ['Fred'], ['Harry']] COMMIT <sqlalchemy.engine.base.ResultProxy instance at 0x...>
Note that the VALUES
clause of each INSERT
statement was automatically adjusted to correspond to the parameters sent to the execute()
method. This is because the compilation step of a ClauseElement
takes into account not just the constructed SQL object and the specifics of the type of database being used, but the execution parameters sent along as well.
When constructing clause objects, SQLAlchemy will bind all literal values into bind parameters. On the construction side, bind parameters are always treated as named parameters. At compilation time, SQLAlchemy will convert them into their proper format, based on the paramstyle of the underlying DBAPI. This works equally well for all named and positional bind parameter formats described in the DBAPI specification.
Documentation on inserting: Inserts.
Let's check that the data we have put into users
table is actually there. The procedure is analogous to the insert example above, except you now call the select()
method off the users
table:
>>> s = users_table.select() >>> print s SELECT users.user_id, users.user_name, users.password FROM users >>> r = s.execute() SELECT users.user_id, users.user_name, users.password FROM users []
This time, we won't ignore the return value of execute()
. Its an instance of ResultProxy
, which is a result-holding object that behaves very similarly to the cursor
object one deals with directly with a database API:
>>> r <sqlalchemy.engine.base.ResultProxy instance at 0x...> >>> r.fetchone() (1, u'Mary', u'secure') >>> r.fetchall() [(2, u'Tom', None), (3, u'Fred', None), (4, u'Harry', None)]
Query criterion for the select is specified using Python expressions, using the Column
objects in the Table
as a base. All expressions constructed from Column
objects are themselves instances of ClauseElements
, just like the Select
, Insert
, and Table
objects themselves.
>>> r = users_table.select(users_table.c.user_name=='Harry').execute() SELECT users.user_id, users.user_name, users.password FROM users WHERE users.user_name = ? ['Harry'] >>> row = r.fetchone() >>> print row (4, u'Harry', None)
Pretty much the full range of standard SQL operations are supported as constructed Python expressions, including joins, ordering, grouping, functions, correlated subqueries, unions, etc. Documentation on selecting: Simple Select.
You can see that when we print out the rows returned by an execution result, it prints the rows as tuples. These rows in fact support both the list and dictionary interfaces. The dictionary interface allows the addressing of columns by string column name, or even the original Column
object:
>>> row.keys() ['user_id', 'user_name', 'password'] >>> row['user_id'], row[1], row[users_table.c.password] (4, u'Harry', None)
Addressing the columns in a row based on the original Column
object is especially handy, as it eliminates the need to work with literal column names altogether.
Result sets also support iteration. We'll show this with a slightly different form of select
that allows you to specify the specific columns to be selected:
>>> for row in select([users_table.c.user_id, users_table.c.user_name]).execute(): ... print row SELECT users.user_id, users.user_name FROM users [] (1, u'Mary') (2, u'Tom') (3, u'Fred') (4, u'Harry')
Lets create a second table, email_addresses
, which references the users
table. To define the relationship between the two tables, we will use the ForeignKey
construct. We will also issue the CREATE
statement for the table in one step:
>>> email_addresses_table = Table('email_addresses', metadata, ... Column('address_id', Integer, primary_key=True), ... Column('email_address', String(100), nullable=False), ... Column('user_id', Integer, ForeignKey('users.user_id'))).create() CREATE TABLE email_addresses( address_id INTEGER NOT NULL PRIMARY KEY, email_address VARCHAR(100) NOT NULL, user_id INTEGER REFERENCES users(user_id) ) ...
Above, the email_addresses
table is related to the users
table via the ForeignKey('users.user_id')
. The ForeignKey
constructor can take a Column
object or a string representing the table and column name. When using the string argument, the referenced table must exist within the same MetaData
object; thats where it looks for the other table!
Next, lets put a few rows in:
>>> email_addresses_table.insert().execute( ... {'email_address':'tom@tom.com', 'user_id':2}, ... {'email_address':'mary@mary.com', 'user_id':1}) INSERT INTO email_addresses (email_address, user_id) VALUES (?, ?) [['tom@tom.com', 2], ['mary@mary.com', 1]] COMMIT <sqlalchemy.engine.base.ResultProxy instance at 0x...>
With two related tables, we can now construct a join amongst them using the join
method:
>>> r = users_table.join(email_addresses_table).select().execute() SELECT users.user_id, users.user_name, users.password, email_addresses.address_id, email_addresses.email_address, email_addresses.user_id FROM users JOIN email_addresses ON users.user_id = email_addresses.user_id [] >>> print [row for row in r] [(1, u'Mary', u'secure', 2, u'mary@mary.com', 1), (2, u'Tom', None, 1, u'tom@tom.com', 2)]
The join
method is also a standalone function in the sqlalchemy
namespace. The join condition is figured out from the foreign keys of the Table objects given. The condition (also called the "ON clause") can be specified explicitly, such as in this example where we locate all users that used their email address as their password:
>>> print join(users_table, email_addresses_table, ... and_(users_table.c.user_id==email_addresses_table.c.user_id, ... users_table.c.password==email_addresses_table.c.email_address) ... ) users JOIN email_addresses ON users.user_id = email_addresses.user_id AND users.password = email_addresses.email_address
Now that we have a little bit of Table and SQL operations covered, lets look into SQLAlchemy's ORM (object relational mapper). With the ORM, you associate Tables (and other Selectable units, like queries and table aliases) with Python classes, into units called Mappers. Then you can execute queries that return lists of object instances, instead of result sets. The object instances themselves are associated with an object called a Session, which automatically tracks changes on each object and supports a "save all at once" operation called a flush.
A Mapper is usually created once per Python class, and at its core primarily means to say, "objects of this class are to be stored as rows in this table". Lets create a class called User
, which will represent a user object that is stored in our users
table:
>>> class User(object): ... def __repr__(self): ... return "(User %s,password:%s)" % (self.user_name, self.password)
The class is a new style class (i.e. it extends object
) and does not require a constructor (although one may be provided if desired). We just have one __repr__
method on it which will display basic information about the User. Note that the __repr__
method references the instance variables user_name
and password
which otherwise aren't defined. While we are free to explicitly define these attributes and treat them normally, this is optional; as SQLAlchemy's Mapper
construct will manage them for us, since their names correspond to the names of columns in the users
table. Lets create a mapper, and observe that these attributes are now defined:
>>> usermapper = mapper(User, users_table) >>> u1 = User() >>> print u1.user_name None >>> print u1.password None
The mapper
function returns a new instance of Mapper
. As it is the first Mapper we have created for the User
class, it is known as the classes' primary mapper. We generally don't need to hold onto the usermapper
instance variable; SA's ORM can automatically locate this Mapper when it deals with the class, or instances of that class.
After you create a Mapper, all operations with that Mapper require the usage of an important object called a Session
. All objects loaded or saved by the Mapper must be attached to a Session
object, which represents a kind of "workspace" of objects that are loaded into memory. A particular object instance can only be attached to one Session
at a time.
By default, you have to create a Session
object explicitly before you can load or save objects. Theres several ways to manage sessions, but the most straightforward is to just create one, which we will do by saying, create_session()
:
>>> session = create_session() >>> session <sqlalchemy.orm.session.Session object at 0x...>
The Session has all kinds of methods on it to retrieve and store objects, and also to view their current status. The Session also provides an easy interface which can be used to query the database, by giving you an instance to a Query
object corresponding to a particular Python class:
>>> query = session.query(User) >>> print query.select_by(user_name='Harry') SELECT users.user_name AS users_user_name, users.password AS users_password, users.user_id AS users_user_id FROM users WHERE users.user_name = ? ORDER BY users.oid ['Harry'] [(User Harry,password:None)]
All querying for objects is performed via an instance of Query
. The various select
methods on an instance of Mapper
also use an underlying Query
object to perform the operation. A Query
is always bound to a specific Session
.
Lets turn off the database echoing for a moment, and try out a few methods on Query
. Methods that end with the suffix _by
primarily take keyword arguments which correspond to properties on the object. Other methods take ClauseElement
objects, which are constructed by using Column
objects inside of Python expressions, in the same way as we did with our SQL select example in the previous section of this tutorial. Using ClauseElement
structures to query objects is more verbose but more flexible:
>>> metadata.engine.echo = False >>> print query.select(User.c.user_id==3) [(User Fred,password:None)] >>> print query.get(2) (User Tom,password:None) >>> print query.get_by(user_name='Mary') (User Mary,password:secure) >>> print query.selectfirst(User.c.password==None) (User Tom,password:None) >>> print query.count() 4
Notice that our User
class has a special attribute c
attached to it. This 'c' represents the columns on the User's mapper's Table object. Saying User.c.user_name
is synonymous with saying users_table.c.user_name
, recalling that User
is the Python class and users
is our Table
object.
With a little experience in loading objects, lets see what its like to make changes. First, lets create a new user "Ed". We do this by just constructing the new object. Then, we just add it to the session:
>>> ed = User() >>> ed.user_name = 'Ed' >>> ed.password = 'edspassword' >>> session.save(ed) >>> ed in session True
Lets also make a few changes on some of the objects in the database. We will load them with our Query
object, and then change some things.
>>> mary = query.get_by(user_name='Mary') >>> harry = query.get_by(user_name='Harry') >>> mary.password = 'marysnewpassword' >>> harry.password = 'harrysnewpassword'
At the moment, nothing has been saved to the database; all of our changes are in memory only. What happens if some other part of the application also tries to load 'Mary' from the database and make some changes before we had a chance to save it ? Assuming that the same Session
is used, loading 'Mary' from the database a second time will issue a second query in order locate the primary key of 'Mary', but will return the same object instance as the one already loaded. This behavior is due to an important property of the Session
known as the identity map:
>>> mary2 = query.get_by(user_name='Mary') >>> mary is mary2 True
With the identity map, a single Session
can be relied upon to keep all loaded instances straight.
As far as the issue of the same object being modified in two different Sessions, that's an issue of concurrency detection; SQLAlchemy does some basic concurrency checks when saving objects, with the option for a stronger check using version ids. See Mapper Arguments for more details.
With a new user "ed" and some changes made on "Mary" and "Harry", lets also mark "Fred" as deleted:
>>> fred = query.get_by(user_name='Fred') >>> session.delete(fred)
Then to send all of our changes to the database, we flush()
the Session. Lets turn echo back on to see this happen!:
>>> metadata.engine.echo = True >>> session.flush() BEGIN UPDATE users SET password=? WHERE users.user_id = ? ['marysnewpassword', 1] UPDATE users SET password=? WHERE users.user_id = ? ['harrysnewpassword', 4] INSERT INTO users (user_name, password) VALUES (?, ?) ['Ed', 'edspassword'] DELETE FROM users WHERE users.user_id = ? [3] COMMIT
When our User object contains relationships to other kinds of information, such as a list of email addresses, we can indicate this by using a function when creating the Mapper
called relation()
. While there is a lot you can do with relations, we'll cover a simple one here. First, recall that our users
table has a foreign key relationship to another table called email_addresses
. A single row in email_addresses
has a column user_id
that references a row in the users
table; since many rows in the email_addresses
table can reference a single row in users
, this is called a one to many relationship.
First, deal with the email_addresses
table by itself. We will create a new class Address
which represents a single row in the email_addresses
table, and a corresponding Mapper
which will associate the Address
class with the email_addresses
table:
>>> class Address(object): ... def __init__(self, email_address): ... self.email_address = email_address ... def __repr__(self): ... return "(Address %s)" % (self.email_address) >>> mapper(Address, email_addresses_table) <sqlalchemy.orm.mapper.Mapper object at 0x...>
Next, we associate the User
and Address
classes together by creating a relation using relation()
, and then adding that relation to the User
mapper, using the add_property
function:
>>> usermapper.add_property('addresses', relation(Address))
The relation()
function takes either a class or a Mapper as its first argument, and has many options to further control its behavior. The 'User' mapper has now placed additional property on each User
instance called addresses
. SQLAlchemy will automatically determine that this relationship is a one-to-many relationship, and will subsequently create addresses
as a list. When a new User
is created, this list will begin as empty.
Lets see what we get for the email addresses already in the database. Since we have made a change to the mapper's configuration, its best that we clear out our Session
, which is currently holding onto every User
object we have already loaded:
>>> session.clear()
We can then treat the addresses
attribute on each User
object like a regular list:
>>> mary = query.get_by(user_name='Mary') SELECT users.user_name AS users_user_name, users.password AS users_password, users.user_id AS users_user_id FROM users WHERE users.user_name = ? ORDER BY users.oid LIMIT 1 OFFSET 0 ['Mary'] >>> print [a for a in mary.addresses] SELECT email_addresses.user_id AS email_addresses_user_id, email_addresses.address_id AS email_addresses_address_id, email_addresses.email_address AS email_addresses_email_address FROM email_addresses WHERE ? = email_addresses.user_id ORDER BY email_addresses.oid [1] [(Address mary@mary.com)]
Adding to the list is just as easy. New Address
objects will be detected and saved when we flush
the Session:
>>> mary.addresses.append(Address('mary2@gmail.com')) >>> session.flush() BEGIN INSERT INTO email_addresses (email_address, user_id) VALUES (?, ?) ['mary2@gmail.com', 1] COMMIT
Main documentation for using mappers: Data Mapping
You may have noticed from the example above that when we say session.flush()
, SQLAlchemy indicates the names BEGIN
and COMMIT
to indicate a transaction with the database. The flush()
method, since it may execute many statements in a row, will automatically use a transaction in order to execute these instructions. But what if we want to use flush()
inside of a larger transaction? This is performed via the SessionTransaction
object, which we can establish using session.create_transaction()
. Below, we will perform a more complicated SELECT
statement, make several changes to our collection of users and email addresess, and then create a new user with two email addresses, within the context of a transaction. We will perform a flush()
in the middle of it to write the changes we have so far, and then allow the remaining changes to be written when we finally commit()
the transaction. We enclose our operations within a try/except
block to insure that resources are properly freed:
>>> transaction = session.create_transaction() >>> try: ... (ed, harry, mary) = session.query(User).select( ... User.c.user_name.in_('Ed', 'Harry', 'Mary'), order_by=User.c.user_name ... ) ... del mary.addresses[1] ... harry.addresses.append(Address('harry2@gmail.com')) ... session.flush() ... print "***flushed the session***" ... fred = User() ... fred.user_name = 'fred_again' ... fred.addresses.append(Address('fred@fred.com')) ... fred.addresses.append(Address('fredsnewemail@fred.com')) ... session.save(fred) ... transaction.commit() ... except: ... transaction.rollback() ... raise BEGIN SELECT users.user_name AS users_user_name, users.password AS users_password, users.user_id AS users_user_id FROM users WHERE users.user_name IN (?, ?, ?) ORDER BY users.user_name ['Ed', 'Harry', 'Mary'] SELECT email_addresses.user_id AS email_addresses_user_id, email_addresses.address_id AS email_addresses_address_id, email_addresses.email_address AS email_addresses_email_address FROM email_addresses WHERE ? = email_addresses.user_id ORDER BY email_addresses.oid [4] UPDATE email_addresses SET user_id=? WHERE email_addresses.address_id = ? [None, 3] INSERT INTO email_addresses (email_address, user_id) VALUES (?, ?) ['harry2@gmail.com', 4] ***flushed the session*** INSERT INTO users (user_name, password) VALUES (?, ?) ['fred_again', None] INSERT INTO email_addresses (email_address, user_id) VALUES (?, ?) ['fred@fred.com', 6] INSERT INTO email_addresses (email_address, user_id) VALUES (?, ?) ['fredsnewemail@fred.com', 6] COMMIT
Main documentation: Session / Unit of Work
That covers a quick tour through the basic idea of SQLAlchemy, in its simplest form. Beyond that, one should familiarize oneself with the basics of Sessions, the various patterns that can be used to define different kinds of Mappers and relations among them, the rudimentary SQL types that are available when constructing Tables, and the basics of Engines, SQL statements, and database Connections.
A database engine is a subclass of sqlalchemy.sql.Engine
, and is the starting point for where SQLAlchemy provides a layer of abstraction on top of the various DBAPI2 database modules. For all databases supported by SA, there is a specific "implementation" module, found in the sqlalchemy.databases
package, that provides all the objects an Engine
needs in order to perform its job. A typical user of SQLAlchemy never needs to deal with these modules directly. For many purposes, the only knowledge that's needed is how to create an Engine for a particular connection URL. When dealing with direct execution of SQL statements, one would also be aware of Result, Connection, and Transaction objects. The primary public facing objects are:
create_engine()
function.
sqlalchemy.engine.base.ComposedSQLEngine
.
begin()
, commit()
and rollback()
methods that support basic "nestable" behavior, meaning an outermost transaction is maintained against multiple nested calls to begin/commit.
Underneath the public-facing API of ComposedSQLEngine
, several components are provided by database implementations to provide the full behavior, including:
CREATE
and DROP
statements.
Engines exist for SQLite, Postgres, MySQL, and Oracle, using the Pysqlite, Psycopg2 (Psycopg1 will work to some degree but its typing model is not supported...install Psycopg2!), MySQLDB, and cx_Oracle modules. There is also preliminary support for MS-SQL using adodbapi or pymssql, as well as Firebird. For each engine, a distinct Python module exists in the sqlalchemy.databases
package, which provides implementations of some of the objects mentioned in the previous section.
Downloads for each DBAPI at the time of this writing are as follows:
The SQLAlchemy Wiki contains a page of database notes, describing whatever quirks and behaviors have been observed. Its a good place to check for issues with specific databases. Database Notes
SQLAlchemy 0.2 indicates the source of an Engine strictly via RFC-1738 style URLs, combined with optional keyword arguments to specify options for the Engine. The form of the URL is:
$ driver://username:password@host:port/database
Available drivernames are sqlite
, mysql
, postgres
, oracle
, mssql
, and firebird
. For sqlite, the database name is the filename to connect to, or the special name ":memory:" which indicates an in-memory database. The URL is typically sent as a string to the create_engine()
function:
# postgres pg_db = create_engine('postgres://scott:tiger@localhost:5432/mydatabase') # sqlite (note the four slashes for an absolute path) sqlite_db = create_engine('sqlite:////absolute/path/to/database.txt') sqlite_db = create_engine('sqlite:///relative/path/to/database.txt') sqlite_db = create_engine('sqlite://') # in-memory database # mysql mysql_db = create_engine('mysql://localhost/foo') # oracle via TNS name oracle_db = create_engine('oracle://scott:tiger@dsn') # oracle will feed host/port/SID into cx_oracle.makedsn oracle_db = create_engine('oracle://scott:tiger@127.0.0.1:1521/sidname')
The Engine
will create its first connection to the database when a SQL statement is executed. As concurrent statements are executed, the underlying connection pool will grow to a default size of five connections, and will allow a default "overflow" of ten. Since the Engine
is essentially "home base" for the connection pool, it follows that you should keep a single Engine
per database established within an application, rather than creating a new one for each connection.
Keyword options can also be specified to create_engine()
, following the string URL as follows:
db = create_engine('postgres://...', encoding='latin1', echo=True, module=psycopg1)
Options that can be specified include the following:
plain
, which is the default, and threadlocal
, which applies a "thread-local context" to implicit executions performed by the Engine. This context is further described in Implicit Connection Contexts.
sqlalchemy.pool.Pool
to be used as the underlying source for connections, overriding the engine's connect arguments (pooling is described in Connection Pooling). If None, a default Pool
(usually QueuePool
, or SingletonThreadPool
in the case of SQLite) will be created using the engine's connect arguments.
Example:
from sqlalchemy import * import sqlalchemy.pool as pool import MySQLdb def getconn(): return MySQLdb.connect(user='ed', dbname='mydb') engine = create_engine('mysql', pool=pool.QueuePool(getconn, pool_size=20, max_overflow=40))
QueuePool
as well as SingletonThreadPool
as of 0.2.7.
QueuePool
.
QueuePool
.
echo
attribute of ComposedSQLEngine
can be modified at any time to turn logging on and off. If set to the string "debug"
, result rows will be printed to the standard output as well.
sys.stdout
.
<column1>(+)=<column2>
must be used in order to achieve a LEFT OUTER JOIN. threaded
parameter of the connection indicating thread-safe usage. cx_Oracle docs indicate setting this flag to False
will speed performance by 10-15%. While this defaults to False
in cx_Oracle, SQLAlchemy defaults it to True
, preferring stability over early optimization.
Unicode
column type instead.
Unicode
type object.
In this section we describe the SQL execution interface available from an Engine
instance. Note that when using the Object Relational Mapper (ORM) as well as when dealing with with "bound" metadata objects (described later), SQLAlchemy deals with the Engine for you and you generally don't need to know much about it; in those cases, you can skip this section and go to Database Meta Data.
The Engine provides a connect()
method which returns a Connection
object. This object provides methods by which literal SQL text as well as SQL clause constructs can be compiled and executed.
engine = create_engine('sqlite:///:memory:') connection = engine.connect() result = connection.execute("select * from mytable where col1=:col1", col1=5) for row in result: print row['col1'], row['col2'] connection.close()
The close
method on Connection
does not actually remove the underlying connection to the database, but rather indicates that the underlying resources can be returned to the connection pool. When using the connect()
method, the DBAPI connection referenced by the Connection
object is not referenced anywhere else.
In both execution styles above, the Connection
object will also automatically return its resources to the connection pool when the object is garbage collected, i.e. its __del__()
method is called. When using the standard C implementation of Python, this method is usually called immediately as soon as the object is dereferenced. With other Python implementations such as Jython, this is not so guaranteed.
The execute method on Engine
and Connection
can also receive SQL clause constructs as well, which are described in Constructing SQL Queries via Python Expressions:
connection = engine.connect() result = connection.execute(select([table1], table1.c.col1==5)) for row in result: print row['col1'], row['col2'] connection.close()
Both Connection
and Engine
fulfill an interface known as Connectable
which specifies common functionality between the two objects, such as getting a Connection
and executing queries. Therefore, most SQLAlchemy functions which take an Engine
as a parameter with which to execute SQL will also accept a Connection
:
engine = create_engine('sqlite:///:memory:') # specify some Table metadata metadata = MetaData() table = Table('sometable', metadata, Column('col1', Integer)) # create the table with the Engine table.create(engine=engine) # drop the table with a Connection off the Engine connection = engine.connect() table.drop(engine=connection)
An implicit connection refers to connections that are allocated by the Engine
internally. There are two general cases when this occurs: when using the various execute()
methods that are available off the Engine
object itself, and when calling the execute()
method on constructed SQL objects, which are described in None.
engine = create_engine('sqlite:///:memory:') result = engine.execute("select * from mytable where col1=:col1", col1=5) for row in result: print row['col1'], row['col2'] result.close()
When using implicit connections, the returned ResultProxy
has a close()
method which will return the resources used by the underlying Connection
.
The strategy
keyword argument to create_engine()
affects the algorithm used to retreive the underlying DBAPI connection used by implicit executions. When set to plain
, each implicit execution requests a unique connection from the connection pool, which is returned to the pool when the resulting ResultProxy
falls out of scope (i.e. __del__()
is called) or its close()
method is called. If a second implicit execution occurs while the ResultProxy
from the previous execution is still open, then a second connection is pulled from the pool.
When strategy
is set to threadlocal
, the Engine
still checks out a connection which is closeable in the same manner via the ResultProxy
, except the connection it checks out will be the same connection as one which is already checked out, assuming the operation is in the same thread. When all ResultProxy
objects are closed, the connection is returned to the pool normally.
It is crucial to note that the plain
and threadlocal
contexts do not impact the connect() method on the Engine. connect()
always returns a unique connection. Implicit connections use a different method off of Engine
for their operations called contextual_connect()
.
The plain
strategy is better suited to an application that insures the explicit releasing of the resources used by each execution. This is because each execution uses its own distinct connection resource, and as those resources remain open, multiple connections can be checked out from the pool quickly. Since the connection pool will block further requests when too many connections have been checked out, not keeping track of this can impact an application's stability.
db = create_engine('mysql://localhost/test', strategy='plain') # execute one statement and receive results. r1 now references a DBAPI connection resource. r1 = db.execute("select * from table1") # execute a second statement and receive results. r2 now references a *second* DBAPI connection resource. r2 = db.execute("select * from table2") for row in r1: ... for row in r2: ... # release connection 1 r1.close() # release connection 2 r2.close()
Advantages to plain
include that connection resources are immediately returned to the connection pool, without any reliance upon the __del__()
method; there is no chance of resources being left around by a Python implementation that doesn't necessarily call __del__()
immediately.
The threadlocal
strategy is better suited to a programming style which relies upon the __del__()
method of Connection objects in order to return them to the connection pool, rather than explicitly issuing a close()
statement upon the ResultProxy
object. This is because all of the executions within a single thread will share the same connection, if one has already been checked out in the current thread. Using this style, an application will use only one connection per thread at most within the scope of all implicit executions.
db = create_engine('mysql://localhost/test', strategy='threadlocal') # execute one statement and receive results. r1 now references a DBAPI connection resource. r1 = db.execute("select * from table1") # execute a second statement and receive results. r2 now references the *same* resource as r1 r2 = db.execute("select * from table2") for row in r1: ... for row in r2: ... # dereference r1. the connection is still held by r2. r1 = None # dereference r2. with no more references to the underlying connection resources, they # are returned to the pool. r2 = None
Advantages to threadlocal
include that resources can be left to clean up after themselves, application code can be more minimal, its guaranteed that only one connection is used per thread, and there is no chance of a "connection pool block", which is when an execution hangs because the current thread has already checked out all remaining resources.
To get at the actual Connection
object which is used by implicit executions, call the contextual_connection()
method on Engine
:
# threadlocal strategy db = create_engine('mysql://localhost/test', strategy='threadlocal') conn1 = db.contextual_connection() conn2 = db.contextual_connection() >>> assert conn1.connection is conn2.connection True
When the plain
strategy is used, the contextual_connection()
method is synonymous with the connect()
method; both return a distinct connection from the pool.
The Connection
object provides a begin()
method which returns a Transaction
object. This object is usually used within a try/except clause so that it is guaranteed to rollback()
or commit()
:
trans = connection.begin() try: r1 = connection.execute(table1.select()) connection.execute(table1.insert(), col1=7, col2='this is some data') trans.commit() except: trans.rollback() raise
The Transaction
object also handles "nested" behavior by keeping track of the outermost begin/commit pair. In this example, two functions both issue a transaction on a Connection, but only the outermost Transaction object actually takes effect when it is committed.
# method_a starts a transaction and calls method_b def method_a(connection): trans = connection.begin() # open a transaction try: method_b(connection) trans.commit() # transaction is committed here except: trans.rollback() # this rolls back the transaction unconditionally raise # method_b also starts a transaction def method_b(connection): trans = connection.begin() # open a transaction - this runs in the context of method_a's transaction try: connection.execute("insert into mytable values ('bat', 'lala')") connection.execute(mytable.insert(), col1='bat', col2='lala') trans.commit() # transaction is not committed yet except: trans.rollback() # this rolls back the transaction unconditionally raise # open a Connection and call method_a conn = engine.connect() method_a(conn) conn.close()
Above, method_a
is called first, which calls connection.begin()
. Then it calls method_b
. When method_b
calls connection.begin()
, it just increments a counter that is decremented when it calls commit()
. If either method_a
or method_b
calls rollback()
, the whole transaction is rolled back. The transaction is not committed until method_a
calls the commit()
method.
Note that SQLAlchemy's Object Relational Mapper also provides a way to control transaction scope at a higher level; this is described in SessionTransaction.
The core of SQLAlchemy's query and object mapping operations is database metadata, which are Python objects that describe tables and other schema-level objects. Metadata objects can be created by explicitly naming the various components and their properties, using the Table, Column, ForeignKey, Index, and Sequence objects imported from sqlalchemy.schema
. There is also support for reflection, which means you only specify the name of the entities and they are recreated from the database automatically.
A collection of metadata entities is stored in an object aptly named MetaData
. This object takes an optional name
parameter:
from sqlalchemy import * metadata = MetaData(name='my metadata')
Then to construct a Table, use the Table
class:
users = Table('users', metadata, Column('user_id', Integer, primary_key = True), Column('user_name', String(16), nullable = False), Column('email_address', String(60), key='email'), Column('password', String(20), nullable = False) ) user_prefs = Table('user_prefs', metadata, Column('pref_id', Integer, primary_key=True), Column('user_id', Integer, ForeignKey("users.user_id"), nullable=False), Column('pref_name', String(40), nullable=False), Column('pref_value', String(100)) )
The specific datatypes for each Column, such as Integer, String, etc. are described in The Types System, and exist within the module sqlalchemy.types
as well as the global sqlalchemy
namespace.
Foreign keys are most easily specified by the ForeignKey
object within a Column
object. For a composite foreign key, i.e. a foreign key that contains multiple columns referencing multiple columns to a composite primary key, an explicit syntax is provided which allows the correct table CREATE statements to be generated:
# a table with a composite primary key invoices = Table('invoices', metadata, Column('invoice_id', Integer, primary_key=True), Column('ref_num', Integer, primary_key=True), Column('description', String(60), nullable=False) ) # a table with a composite foreign key referencing the parent table invoice_items = Table('invoice_items', metadata, Column('item_id', Integer, primary_key=True), Column('item_name', String(60), nullable=False), Column('invoice_id', Integer, nullable=False), Column('ref_num', Integer, nullable=False), ForeignKeyConstraint(['invoice_id', 'ref_num'], ['invoices.invoice_id', 'invoices.ref_num']) )
Above, the invoice_items
table will have ForeignKey
objects automatically added to the invoice_id
and ref_num
Column
objects as a result of the additional ForeignKeyConstraint
object.
The MetaData
object supports some handy methods, such as getting a list of Tables in the order (or reverse) of their dependency:
>>> for t in metadata.table_iterator(reverse=False): ... print t.name users user_prefs
And Table
provides an interface to the table's properties as well as that of its columns:
employees = Table('employees', metadata, Column('employee_id', Integer, primary_key=True), Column('employee_name', String(60), nullable=False, key='name'), Column('employee_dept', Integer, ForeignKey("departments.department_id")) ) # access the column "EMPLOYEE_ID": employees.columns.employee_id # or just employees.c.employee_id # via string employees.c['employee_id'] # iterate through all columns for c in employees.c: # ... # get the table's primary key columns for primary_key in employees.primary_key: # ... # get the table's foreign key objects: for fkey in employees.foreign_keys: # ... # access the table's MetaData: employees.metadata # access the table's Engine, if its MetaData is bound: employees.engine # access a column's name, type, nullable, primary key, foreign key employees.c.employee_id.name employees.c.employee_id.type employees.c.employee_id.nullable employees.c.employee_id.primary_key employees.c.employee_dept.foreign_key # get the "key" of a column, which defaults to its name, but can # be any user-defined string: employees.c.name.key # access a column's table: employees.c.employee_id.table is employees >>> True # get the table related by a foreign key fcolumn = employees.c.employee_dept.foreign_key.column.table
A MetaData object can be associated with one or more Engine instances. This allows the MetaData and the elements within it to perform operations automatically, using the connection resources of that Engine. This includes being able to "reflect" the columns of tables, as well as to perform create and drop operations without needing to pass an Engine
or Connection
around. It also allows SQL constructs to be created which know how to execute themselves (called "implicit execution").
To bind MetaData
to a single Engine
, use BoundMetaData
:
engine = create_engine('sqlite://', **kwargs) # create BoundMetaData from an Engine meta = BoundMetaData(engine) # create the Engine and MetaData in one step meta = BoundMetaData('postgres://db/', **kwargs)
Another form of MetaData
exists which allows connecting to any number of engines, within the context of the current thread. This is DynamicMetaData
:
meta = DynamicMetaData() meta.connect(engine) # connect to an existing Engine meta.connect('mysql://user@host/dsn') # create a new Engine and connect
DynamicMetaData
is ideal for applications that need to use the same set of Tables
for many different database connections in the same process, such as a CherryPy web application which handles multiple application instances in one process.
Some users prefer to create Table
objects without specifying a MetaData
object, having Tables scoped on an application-wide basis. For them the default_metadata
object and the global_connect()
function is supplied. default_metadata
is simply an instance of DynamicMetaData
that exists within the sqlalchemy
namespace, and global_connect()
is a synonym for default_metadata.connect()
. Defining a Table
that has no MetaData
argument will automatically use this default metadata as follows:
from sqlalchemy import * # a Table with just a name and its Columns mytable = Table('mytable', Column('col1', Integer, primary_key=True), Column('col2', String(40)) ) # connect all the "anonymous" tables to a postgres uri in the current thread global_connect('postgres://foo:bar@lala/test') # create all tables in the default metadata default_metadata.create_all() # the table is bound mytable.insert().execute(col1=5, col2='some value')
Once you have a BoundMetaData
or a connected DynamicMetaData
, you can create Table
objects without specifying their columns, just their names, using autoload=True
:
>>> messages = Table('messages', meta, autoload = True) >>> [c.name for c in messages.columns] ['message_id', 'message_name', 'date']
At the moment the Table is constructed, it will query the database for the columns and constraints of the messages
table.
Note that if a reflected table has a foreign key referencing another table, then the metadata for the related table will be loaded as well, even if it has not been defined by the application:
>>> shopping_cart_items = Table('shopping_cart_items', meta, autoload = True) >>> print shopping_cart_items.c.cart_id.table.name shopping_carts
To get direct access to 'shopping_carts', simply instantiate it via the Table constructor. You'll get the same instance of the shopping cart Table as the one that is attached to shoppingcartitems:
>>> shopping_carts = Table('shopping_carts', meta) >>> shopping_carts is shopping_cart_items.c.cart_id.table.name True
This works because when the Table constructor is called for a particular name and MetaData
object, if the table has already been created then the instance returned will be the same as the original. This is a singleton constructor:
>>> news_articles = Table('news', meta, ... Column('article_id', Integer, primary_key = True), ... Column('url', String(250), nullable = False) ... ) >>> othertable = Table('news', meta) >>> othertable is news_articles True
Individual columns can be overridden with explicit values when reflecting tables; this is handy for specifying custom datatypes, constraints such as primary keys that may not be configured within the database, etc.
>>> mytable = Table('mytable', meta, ... Column('id', Integer, primary_key=True), # override reflected 'id' to have primary key ... Column('mydata', Unicode(50)), # override reflected 'mydata' to be Unicode ... autoload=True)
Some databases support the concept of multiple schemas. A Table
can reference this by specifying the schema
keyword argument:
financial_info = Table('financial_info', meta, Column('id', Integer, primary_key=True), Column('value', String(100), nullable=False), schema='remote_banks' )
Within the MetaData
collection, this table will be identified by the combination of financial_info
and remote_banks
. If another table called financial_info
is referenced without the remote_banks
schema, it will refer to a different Table
. ForeignKey
objects can reference columns in this table using the form remote_banks.financial_info.id
.
ON UPDATE
and ON DELETE
clauses to a table create are specified within the ForeignKeyConstraint
object, using the on_update
and on_delete
keyword arguments:
foobar = Table('foobar', meta, Column('id', Integer, primary_key=True), Column('lala', String(40)), ForeignKeyConstraint(['lala'],['hoho.lala'], on_update="CASCADE", on_delete="CASCADE"))
Many table, schema, or column names require quoting to be enabled. Reasons for this include names that are the same as a database reserved word, or for identifiers that use MixedCase, where the database would normally "fold" the case convention into lower or uppercase (such as Postgres). SQLAlchemy currently has no internal logic to automatically determine when quoting should be used; so its off unless explicitly enabled for a particular column or table. Turning on quoting for a column or table identifier is performed manually by the quote=True
flag on Column
or Table
, as well as the quote_schema=True
flag for Table
.
table2 = Table('WorstCase2', metadata, # desc is a reserved word so quote this column Column('desc', Integer, quote=True, primary_key=True), # MixedCase uses a mixed case convention, so quote this column Column('MixedCase', Integer, quote=True), # Union is both a reserved word and mixed case Column('Union', Integer, quote=True), # normal_column doesnt require quoting Column('normal_column', String(30)), # the table name uses mixed case, so turn on quoting for the table ident quote=True)
Tables
may support database-specific options, such as MySQL's engine
option that can specify "MyISAM", "InnoDB", and other backends for the table:
addresses = Table('engine_email_addresses', meta, Column('address_id', Integer, primary_key = True), Column('remote_user_id', Integer, ForeignKey(users.c.user_id)), Column('email_address', String(20)), mysql_engine='InnoDB' )
Creating and dropping individual tables can be done via the create()
and drop()
methods of Table
; these methods take an optional engine
parameter which references an Engine
or a Connection
. If not supplied, the Engine
bound to the MetaData
will be used, else an error is raised:
meta = BoundMetaData('sqlite:///:memory:') employees = Table('employees', meta, Column('employee_id', Integer, primary_key=True), Column('employee_name', String(60), nullable=False, key='name'), Column('employee_dept', Integer, ForeignKey("departments.department_id")) ) sqlemployees.create()
drop()
method:
sqlemployees.drop(connectable=e)
The create()
and drop()
methods also support an optional keyword argument checkfirst
which will issue the database's appropriate pragma statements to check if the table exists before creating or dropping:
employees.create(connectable=e, checkfirst=True) employees.drop(checkfirst=False)
Entire groups of Tables can be created and dropped directly from the MetaData
object with create_all()
and drop_all()
. These methods always check for the existence of each table before creating or dropping. Each method takes an optional engine
keyword argument which can reference an Engine
or a Connection
. If no engine is specified, the underlying bound Engine
, if any, is used:
engine = create_engine('sqlite:///:memory:') metadata = MetaData() users = Table('users', metadata, Column('user_id', Integer, primary_key = True), Column('user_name', String(16), nullable = False), Column('email_address', String(60), key='email'), Column('password', String(20), nullable = False) ) user_prefs = Table('user_prefs', metadata, Column('pref_id', Integer, primary_key=True), Column('user_id', Integer, ForeignKey("users.user_id"), nullable=False), Column('pref_name', String(40), nullable=False), Column('pref_value', String(100)) ) sqlmetadata.create_all(connectable=engine)
SQLAlchemy includes flexible constructs in which to create default values for columns upon the insertion of rows, as well as upon update. These defaults can take several forms: a constant, a Python callable to be pre-executed before the SQL is executed, a SQL expression or function to be pre-executed before the SQL is executed, a pre-executed Sequence (for databases that support sequences), or a "passive" default, which is a default function triggered by the database itself upon insert, the value of which can then be post-fetched by the engine, provided the row provides a primary key in which to call upon.
A basic default is most easily specified by the "default" keyword argument to Column. This defines a value, function, or SQL expression that will be pre-executed to produce the new value, before the row is inserted:
# a function to create primary key ids i = 0 def mydefault(): global i i += 1 return i t = Table("mytable", meta, # function-based default Column('id', Integer, primary_key=True, default=mydefault), # a scalar default Column('key', String(10), default="default") )
The "default" keyword can also take SQL expressions, including select statements or direct function calls:
t = Table("mytable", meta, Column('id', Integer, primary_key=True), # define 'create_date' to default to now() Column('create_date', DateTime, default=func.now()), # define 'key' to pull its default from the 'keyvalues' table Column('key', String(20), default=keyvalues.select(keyvalues.c.type='type1', limit=1)) )
The "default" keyword argument is shorthand for using a ColumnDefault object in a column definition. This syntax is optional, but is required for other types of defaults, futher described below:
Column('mycolumn', String(30), ColumnDefault(func.get_data()))
Similar to an on-insert default is an on-update default, which is most easily specified by the "onupdate" keyword to Column, which also can be a constant, plain Python function or SQL expression:
t = Table("mytable", meta, Column('id', Integer, primary_key=True), # define 'last_updated' to be populated with current_timestamp (the ANSI-SQL version of now()) Column('last_updated', DateTime, onupdate=func.current_timestamp()), )
To use an explicit ColumnDefault object to specify an on-update, use the "for_update" keyword argument:
Column('mycolumn', String(30), ColumnDefault(func.get_data(), for_update=True))
A PassiveDefault indicates an column default that is executed upon INSERT by the database. This construct is used to specify a SQL function that will be specified as "DEFAULT" when creating tables.
t = Table('test', meta, Column('mycolumn', DateTime, PassiveDefault("sysdate")) )
A create call for the above table will produce:
CREATE TABLE test ( mycolumn datetime default sysdate )
PassiveDefault also sends a message to the Engine
that data is available after an insert. The object-relational mapper system uses this information to post-fetch rows after the insert, so that instances can be refreshed with the new data. Below is a simplified version:
# table with passive defaults mytable = Table('mytable', engine, Column('my_id', Integer, primary_key=True), # an on-insert database-side default Column('data1', Integer, PassiveDefault("d1_func")), ) # insert a row r = mytable.insert().execute(name='fred') # check the result: were there defaults fired off on that row ? if r.lastrow_has_defaults(): # postfetch the row based on primary key. # this only works for a table with primary key columns defined primary_key = r.last_inserted_ids() row = table.select(table.c.id == primary_key[0])
When Tables are reflected from the database using autoload=True
, any DEFAULT values set on the columns will be reflected in the Table object as PassiveDefault instances.
Current Postgres support does not rely upon OID's to determine the identity of a row. This is because the usage of OIDs has been deprecated with Postgres and they are disabled by default for table creates as of PG version 8. Pyscopg2's "cursor.lastrowid" function only returns OIDs. Therefore, when inserting a new row which has passive defaults set on the primary key columns, the default function is still pre-executed since SQLAlchemy would otherwise have no way of retrieving the row just inserted.
A table with a sequence looks like:
table = Table("cartitems", meta, Column("cart_id", Integer, Sequence('cart_id_seq'), primary_key=True), Column("description", String(40)), Column("createdate", DateTime()) )
The Sequence is used with Postgres or Oracle to indicate the name of a database sequence that will be used to create default values for a column. When a table with a Sequence on a column is created in the database by SQLAlchemy, the database sequence object is also created. Similarly, the database sequence is dropped when the table is dropped. Sequences are typically used with primary key columns. When using Postgres, if an integer primary key column defines no explicit Sequence or other default method, SQLAlchemy will create the column with the SERIAL keyword, and will pre-execute a sequence named "tablenamecolumnnameseq" in order to retrieve new primary key values, if they were not otherwise explicitly stated. Oracle, which has no "auto-increment" keyword, requires that a Sequence be created for a table if automatic primary key generation is desired.
A Sequence object can be defined on a Table that is then used for a non-sequence-supporting database. In that case, the Sequence object is simply ignored. Note that a Sequence object is entirely optional for all databases except Oracle, as other databases offer options for auto-creating primary key values, such as AUTOINCREMENT, SERIAL, etc. SQLAlchemy will use these default methods for creating primary key values if no Sequence is present on the table metadata.
A sequence can also be specified with optional=True
which indicates the Sequence should only be used on a database that requires an explicit sequence, and not those that supply some other method of providing integer values. At the moment, it essentially means "use this sequence only with Oracle and not Postgres".
Indexes can be defined on table columns, including named indexes, non-unique or unique, multiple column. Indexes are included along with table create and drop statements. They are not used for any kind of run-time constraint checking; SQLAlchemy leaves that job to the expert on constraint checking, the database itself.
boundmeta = BoundMetaData('postgres:///scott:tiger@localhost/test') mytable = Table('mytable', boundmeta, # define a unique index Column('col1', Integer, unique=True), # define a unique index with a specific name Column('col2', Integer, unique='mytab_idx_1'), # define a non-unique index Column('col3', Integer, index=True), # define a non-unique index with a specific name Column('col4', Integer, index='mytab_idx_2'), # pass the same name to multiple columns to add them to the same index Column('col5', Integer, index='mytab_idx_2'), Column('col6', Integer), Column('col7', Integer) ) # create the table. all the indexes will be created along with it. mytable.create() # indexes can also be specified standalone i = Index('mytab_idx_3', mytable.c.col6, mytable.c.col7, unique=False) # which can then be created separately (will also get created with table creates) i.create()
A Table
object created against a specific MetaData
object can be re-created against a new MetaData using the tometadata
method:
# create two metadata meta1 = BoundMetaData('sqlite:///querytest.db') meta2 = MetaData() # load 'users' from the sqlite engine users_table = Table('users', meta1, autoload=True) # create the same Table object for the plain metadata users_table_2 = users_table.tometadata(meta2)
Note: This section describes how to use SQLAlchemy to construct SQL queries and receive result sets. It does not cover the object relational mapping capabilities of SQLAlchemy; that is covered later on in Data Mapping. However, both areas of functionality work similarly in how selection criterion is constructed, so if you are interested just in ORM, you should probably skim through basic WHERE Clause construction before moving on.
Once you have used the sqlalchemy.schema
module to construct your tables and/or reflect them from the database, performing SQL queries using those table meta data objects is done via the sqlalchemy.sql
package. This package defines a large set of classes, each of which represents a particular kind of lexical construct within a SQL query; all are descendants of the common base class sqlalchemy.sql.ClauseElement
. A full query is represented via a structure of ClauseElement
s. A set of reasonably intuitive creation functions is provided by the sqlalchemy.sql
package to create these structures; these functions are described in the rest of this section.
Executing a ClauseElement
structure can be performed in two general ways. You can use an Engine
or a Connection
object's execute()
method to which you pass the query structure; this is known as explicit style. Or, if the ClauseElement
structure is built upon Table metadata which is bound to an Engine
directly, you can simply call execute()
on the structure itself, known as implicit style. In both cases, the execution returns a cursor-like object (more on that later). The same clause structure can be executed repeatedly. The ClauseElement
is compiled into a string representation by an underlying Compiler
object which is associated with the Engine
via its Dialect
.
The examples below all include a dump of the generated SQL corresponding to the query object, as well as a dump of the statement's bind parameters. In all cases, bind parameters are shown as named parameters using the colon format (i.e. ':name'). When the statement is compiled into a database-specific version, the named-parameter statement and its bind values are converted to the proper paramstyle for that database automatically.
For this section, we will mostly use the implcit style of execution, meaning the Table
objects are associated with an instance of BoundMetaData
, and constructed ClauseElement
objects support self-execution. Assume the following configuration:
from sqlalchemy import * metadata = BoundMetaData('sqlite:///mydb.db', strategy='threadlocal', echo=True) # a table to store users users = Table('users', metadata, Column('user_id', Integer, primary_key = True), Column('user_name', String(40)), Column('password', String(80)) ) # a table that stores mailing addresses associated with a specific user addresses = Table('addresses', metadata, Column('address_id', Integer, primary_key = True), Column('user_id', Integer, ForeignKey("users.user_id")), Column('street', String(100)), Column('city', String(80)), Column('state', String(2)), Column('zip', String(10)) ) # a table that stores keywords keywords = Table('keywords', metadata, Column('keyword_id', Integer, primary_key = True), Column('name', VARCHAR(50)) ) # a table that associates keywords with users userkeywords = Table('userkeywords', metadata, Column('user_id', INT, ForeignKey("users")), Column('keyword_id', INT, ForeignKey("keywords")) )
A select is done by constructing a Select
object with the proper arguments, adding any extra arguments if desired, then calling its execute()
method.
from sqlalchemy import * # use the select() function defined in the sql package s = select([users]) # or, call the select() method off of a Table object s = users.select() # then, call execute on the Select object: sqlresult = s.execute()
# the SQL text of any clause object can also be viewed via the str() call: >>> str(s) SELECT users.user_id, users.user_name, users.password FROM users
As mentioned above, ClauseElement
structures can also be executed with a Connection
object explicitly:
engine = create_engine('sqlite:///myfile.db') conn = engine.connect() s = users.select() sqlresult = conn.execute(s)
conn.close()
For queries that don't contain any tables, ClauseElement
s that represent a fully executeable statement support an engine
keyword parameter which can bind the object to an Engine
, thereby allowing implicit execution:
The object returned by execute()
is a sqlalchemy.engine.ResultProxy
object, which acts much like a DBAPI cursor
object in the context of a result set, except that the rows returned can address their columns by ordinal position, column name, or even column object:
# select rows, get resulting ResultProxy object sqlresult = users.select().execute()
# get one row row = result.fetchone() # get the 'user_id' column via integer index: user_id = row[0] # or column name user_name = row['user_name'] # or column object password = row[users.c.password] # or column accessor password = row.password # ResultProxy object also supports fetchall() rows = result.fetchall() # or get the underlying DBAPI cursor object cursor = result.cursor # close the result. If the statement was implicitly executed # (i.e. without an explicit Connection), this will # return the underlying connection resources back to # the connection pool. de-referencing the result # will also have the same effect. if an explicit Connection was # used, then close() does nothing. result.close()
A common need when writing statements that reference multiple tables is to create labels for columns, thereby separating columns from different tables with the same name. The Select construct supports automatic generation of column labels via the use_labels=True
parameter:
sqlc = select([users, addresses], users.c.user_id==addresses.c.address_id, use_labels=True).execute()
The table name part of the label is affected if you use a construct such as a table alias:
person = users.alias('person') sqlc = select([person, addresses], person.c.user_id==addresses.c.address_id, use_labels=True).execute()
Labels are also generated in such a way as to never go beyond 30 characters. Most databases support a limit on the length of symbols, such as Postgres, and particularly Oracle which has a rather short limit of 30:
long_named_table = users.alias('this_is_the_person_table') sqlc = select([person], use_labels=True).execute()
You can also specify custom labels on a per-column basis using the label()
function:
sqlc = select([users.c.user_id.label('id'), users.c.user_name.label('name')]).execute()
Calling select
off a table automatically generates a column clause which includes all the table's columns, in the order they are specified in the source Table object.
But in addition to selecting all the columns off a single table, any set of columns can be specified, as well as full tables, and any combination of the two:
The WHERE condition is the named keyword argument whereclause
, or the second positional argument to the select()
constructor and the first positional argument to the select()
method of Table
.
WHERE conditions are constructed using column objects, literal values, and functions defined in the sqlalchemy.sql
module. Column objects override the standard Python operators to provide clause compositional objects, which compile down to SQL operations:
Notice that the literal value "7" was broken out of the query and placed into a bind parameter. Databases such as Oracle must parse incoming SQL and create a "plan" when new queries are received, which is an expensive process. By using bind parameters, the same query with various literal values can have its plan compiled only once, and used repeatedly with less overhead.
More where clauses:
# another comparison operator sqlc = select([users], users.c.user_id>7).execute()
# OR keyword sqlc = users.select(or_(users.c.user_name=='jack', users.c.user_name=='ed')).execute()
# AND keyword sqlc = users.select(and_(users.c.user_name=='jack', users.c.password=='dog')).execute()
# NOT keyword sqlc = users.select(not_( or_(users.c.user_name=='jack', users.c.password=='dog') )).execute()
# IN clause sqlc = users.select(users.c.user_name.in_('jack', 'ed', 'fred')).execute()
# join users and addresses together sqlc = select([users, addresses], users.c.user_id==addresses.c.address_id).execute()
# join users and addresses together, but dont specify "addresses" in the # selection criterion. The WHERE criterion adds it to the FROM list # automatically. sqlc = select([users], and_( users.c.user_id==addresses.c.user_id, users.c.user_name=='fred' )).execute()
Select statements can also generate a WHERE clause based on the parameters you give it. If a given parameter, which matches the name of a column or its "label" (the combined tablename + "_" + column name), and does not already correspond to a bind parameter in the select object, it will be added as a comparison against that column. This is a shortcut to creating a full WHERE clause:
Supported column operators so far are all the numerical comparison operators, i.e. '==', '>', '>=', etc., as well as like(), startswith(), endswith(), between(), and in(). Boolean operators include not_(), and() and or(), which also can be used inline via '~', '&', and '|'. Math operators are '+', '-', '*', '/'. Any custom operator can be specified via the op() function shown below.
# "like" operator users.select(users.c.user_name.like('%ter')) # equality operator users.select(users.c.user_name == 'jane') # in opertator users.select(users.c.user_id.in_(1,2,3)) # and_, endswith, equality operators users.select(and_(addresses.c.street.endswith('green street'), addresses.c.zip=='11234')) # & operator subsituting for 'and_' users.select(addresses.c.street.endswith('green street') & (addresses.c.zip=='11234')) # + concatenation operator select([users.c.user_name + '_name']) # NOT operator users.select(~(addresses.c.street == 'Green Street')) # any custom operator select([users.c.user_name.op('||')('_category')]) # "null" comparison via == (converts to IS) sqlusers.select(users.c.user_name==None).execute()
# or via explicit null() construct sqlusers.select(users.c.user_name==null()).execute()
Functions can be specified using the func
keyword:
sqlselect([func.count(users.c.user_id)]).execute()
sqlusers.select(func.substr(users.c.user_name, 1) == 'J').execute()
Functions also are callable as standalone values:
# call the "now()" function time = func.now(engine=myengine).scalar() # call myfunc(1,2,3) myvalue = func.myfunc(1, 2, 3, engine=db).execute() # or call them off the engine db.func.now().scalar()
You can drop in a literal value anywhere there isnt a column to attach to via the literal
keyword:
sqlselect([literal('foo') + literal('bar'), users.c.user_name]).execute()
# literals have all the same comparison functions as columns sqlselect([literal('foo') == literal('bar')], engine=myengine).scalar()
Literals also take an optional type
parameter to give literals a type. This can sometimes be significant, for example when using the "+" operator with SQLite, the String type is detected and the operator is converted to "||":
sqlselect([literal('foo', type=String) + 'bar'], engine=e).execute()
The ORDER BY clause of a select statement can be specified as individual columns to order by within an array specified via the order_by
parameter, and optional usage of the asc() and desc() functions:
These are specified as keyword arguments:
sqlc = select([users.c.user_name], distinct=True).execute()
sqlc = users.select(limit=10, offset=20).execute()
The Oracle driver does not support LIMIT and OFFSET directly, but instead wraps the generated query into a subquery and uses the "rownum" variable to control the rows selected (this is somewhat experimental).
As some of the examples indicated above, a regular inner join can be implicitly stated, just like in a SQL expression, by just specifying the tables to be joined as well as their join conditions:
sqladdresses.select(addresses.c.user_id==users.c.user_id).execute()
There is also an explicit join constructor, which can be embedded into a select query via the from_obj
parameter of the select statement:
sqladdresses.select(from_obj=[ addresses.join(users, addresses.c.user_id==users.c.user_id) ]).execute()
The join constructor can also be used by itself:
sqljoin(users, addresses, users.c.user_id==addresses.c.user_id).select().execute()
The join criterion in a join() call is optional. If not specified, the condition will be derived from the foreign key relationships of the two tables. If no criterion can be constructed, an exception will be raised.
sqljoin(users, addresses).select().execute()
Notice that this is the first example where the FROM criterion of the select statement is explicitly specified. In most cases, the FROM criterion is automatically determined from the columns requested as well as the WHERE clause. The from_obj
keyword argument indicates a list of explicit FROM clauses to be used in the statement.
A join can be created on its own using the join
or outerjoin
functions, or can be created off of an existing Table or other selectable unit via the join
or outerjoin
methods:
Aliases are used primarily when you want to use the same table more than once as a FROM expression in a statement:
address_b = addresses.alias('addressb') sql# select users who have an address on Green street as well as Orange street users.select(and_( users.c.user_id==addresses.c.user_id, addresses.c.street.like('%Green%'), users.c.user_id==address_b.c.user_id, address_b.c.street.like('%Orange%') )).execute()
SQLAlchemy allows the creation of select statements from not just Table objects, but from a whole class of objects that implement the Selectable
interface. This includes Tables, Aliases, Joins and Selects. Therefore, if you have a Select, you can select from the Select:
>>> s = users.select() >>> str(s) SELECT users.user_id, users.user_name, users.password FROM users >>> s = s.select() >>> str(s) SELECT user_id, user_name, password FROM (SELECT users.user_id, users.user_name, users.password FROM users)
Any Select, Join, or Alias object supports the same column accessors as a Table:
>>> s = users.select() >>> [c.key for c in s.columns] ['user_id', 'user_name', 'password']
When you use use_labels=True
in a Select object, the label version of the column names become the keys of the accessible columns. In effect you can create your own "view objects":
s = select([users, addresses], users.c.user_id==addresses.c.user_id, use_labels=True) sqlselect([ s.c.users_user_name, s.c.addresses_street, s.c.addresses_zip ], s.c.addresses_city=='San Francisco').execute()
To specify a SELECT statement as one of the selectable units in a FROM clause, it usually should be given an alias.
sqls = users.select().alias('u') select([addresses, s]).execute()
Select objects can be used in a WHERE condition, in operators such as IN:
# select user ids for all users whos name starts with a "p" s = select([users.c.user_id], users.c.user_name.like('p%')) # now select all addresses for those users sqladdresses.select(addresses.c.user_id.in_(s)).execute()
The sql package supports embedding select statements into other select statements as the criterion in a WHERE condition, or as one of the "selectable" objects in the FROM list of the query. It does not at the moment directly support embedding a SELECT statement as one of the column criterion for a statement, although this can be achieved via direct text insertion, described later.
Subqueries can be used in the column clause of a select statement by specifying the scalar=True
flag:
sqlselect([table2.c.col1, table2.c.col2, select([table1.c.col1], table1.c.col2==7, scalar=True)])
When a select object is embedded inside of another select object, and both objects reference the same table, SQLAlchemy makes the assumption that the table should be correlated from the child query to the parent query. To disable this behavior, specify the flag correlate=False
to the Select statement.
# make an alias of a regular select. s = select([addresses.c.street], addresses.c.user_id==users.c.user_id).alias('s') >>> str(s) SELECT addresses.street FROM addresses, users WHERE addresses.user_id = users.user_id # now embed that select into another one. the "users" table is removed from # the embedded query's FROM list and is instead correlated to the parent query s2 = select([users, s.c.street]) >>> str(s2) SELECT users.user_id, users.user_name, users.password, s.street FROM users, (SELECT addresses.street FROM addresses WHERE addresses.user_id = users.user_id) s
An EXISTS clause can function as a higher-scaling version of an IN clause, and is usually used in a correlated fashion:
# find all users who have an address on Green street: sqlusers.select( exists( [addresses.c.address_id], and_( addresses.c.user_id==users.c.user_id, addresses.c.street.like('%Green%') ) ) )
Unions come in two flavors, UNION and UNION ALL, which are available via module level functions or methods off a Selectable:
sqlunion( addresses.select(addresses.c.street=='123 Green Street'), addresses.select(addresses.c.street=='44 Park Ave.'), addresses.select(addresses.c.street=='3 Mill Road'), order_by=[addresses.c.street] ).execute()
sqlusers.select( users.c.user_id==7 ).union_all( users.select( users.c.user_id==9 ), order_by=[users.c.user_id] # order_by is an argument to union_all() ).execute()
Throughout all these examples, SQLAlchemy is busy creating bind parameters wherever literal expressions occur. You can also specify your own bind parameters with your own names, and use the same statement repeatedly. The bind parameters, shown here in the "named" format, will be converted to the appropriate named or positional style according to the database implementation being used.
s = users.select(users.c.user_name==bindparam('username')) # execute implicitly sqls.execute(username='fred')
# execute explicitly conn = engine.connect() sqlconn.execute(s, username='fred')
executemany()
is also available by supplying multiple dictionary arguments instead of keyword arguments to the execute()
method of ClauseElement
or Connection
. Examples can be found later in the sections on INSERT/UPDATE/DELETE.
By throwing the compile()
method onto the end of any query object, the query can be "compiled" by the SQLEngine into a sqlalchemy.sql.Compiled
object just once, and the resulting compiled object reused, which eliminates repeated internal compilation of the SQL string:
s = users.select(users.c.user_name==bindparam('username')).compile() s.execute(username='fred') s.execute(username='jane') s.execute(username='mary')
The sql package tries to allow free textual placement in as many ways as possible. In the examples below, note that the from_obj parameter is used only when no other information exists within the select object with which to determine table metadata. Also note that in a query where there isnt even table metadata used, the SQLEngine to be used for the query has to be explicitly specified:
# strings as column clauses sqlselect(["user_id", "user_name"], from_obj=[users]).execute()
# strings for full column lists sqlselect( ["user_id, user_name, password, addresses.*"], from_obj=[users.alias('u'), addresses]).execute()
# functions, etc. sqlselect([users.c.user_id, "process_string(user_name)"]).execute()
# where clauses sqlusers.select(and_(users.c.user_id==7, "process_string(user_name)=27")).execute()
# subqueries sqlusers.select( "exists (select 1 from addresses where addresses.user_id=users.user_id)").execute()
# custom FROM objects sqlselect( ["*"], from_obj=["(select user_id, user_name from users)"], engine=db).execute()
# a full query sqltext("select user_name from users", engine=db).execute()
Use the format ':paramname'
to define bind parameters inside of a text block. They will be converted to the appropriate format upon compilation:
t = engine.text("select foo from mytable where lala=:hoho") r = t.execute(hoho=7)
Bind parameters can also be explicit, which allows typing information to be added. Just specify them as a list with keys that match those inside the textual statement:
t = engine.text("select foo from mytable where lala=:hoho", bindparams=[bindparam('hoho', type=types.String)]) r = t.execute(hoho="im hoho")
Result-row type processing can be added via the typemap
argument, which is a dictionary of return columns mapped to types:
# specify DateTime type for the 'foo' column in the result set # sqlite, for example, uses result-row post-processing to construct dates t = engine.text("select foo from mytable where lala=:hoho", bindparams=[bindparam('hoho', type=types.String)], typemap={'foo':types.DateTime} ) r = t.execute(hoho="im hoho") # 'foo' is a datetime year = r.fetchone()['foo'].year
One of the primary motivations for a programmatic SQL library is to allow the piecemeal construction of a SQL statement based on program variables. All the above examples typically show Select objects being created all at once. The Select object also includes "builder" methods to allow building up an object. The below example is a "user search" function, where users can be selected based on primary key, user name, street address, keywords, or any combination:
def find_users(id=None, name=None, street=None, keywords=None): statement = users.select() if id is not None: statement.append_whereclause(users.c.user_id==id) if name is not None: statement.append_whereclause(users.c.user_name==name) if street is not None: # append_whereclause joins "WHERE" conditions together with AND statement.append_whereclause(users.c.user_id==addresses.c.user_id) statement.append_whereclause(addresses.c.street==street) if keywords is not None: statement.append_from( users.join(userkeywords, users.c.user_id==userkeywords.c.user_id).join( keywords, userkeywords.c.keyword_id==keywords.c.keyword_id)) statement.append_whereclause(keywords.c.name.in_(keywords)) # to avoid multiple repeats, set query to be DISTINCT: statement.distinct=True return statement.execute() sqlfind_users(id=7)
sqlfind_users(street='123 Green Street')
sqlfind_users(name='Jack', keywords=['jack','foo'])
An INSERT involves just one table. The Insert object is used via the insert() function, and the specified columns determine what columns show up in the generated SQL. If primary key columns are left out of the criterion, the SQL generator will try to populate them as specified by the particular database engine and sequences, i.e. relying upon an auto-incremented column or explicitly calling a sequence beforehand. Insert statements, as well as updates and deletes, can also execute multiple parameters in one pass via specifying an array of dictionaries as parameters.
The values to be populated for an INSERT or an UPDATE can be specified to the insert()/update() functions as the values
named argument, or the query will be compiled based on the values of the parameters sent to the execute() method.
# basic insert sqlusers.insert().execute(user_id=1, user_name='jack', password='asdfdaf')
# insert just user_name, NULL for others # will auto-populate primary key columns if they are configured # to do so sqlusers.insert().execute(user_name='ed')
# INSERT with a list: sqlusers.insert(values=(3, 'jane', 'sdfadfas')).execute()
# INSERT with user-defined bind parameters i = users.insert( values={'user_name':bindparam('name'), 'password':bindparam('pw')} ) sqli.execute(name='mary', pw='adas5fs')
# INSERT many - if no explicit 'values' parameter is sent, # the first parameter list in the list determines # the generated SQL of the insert (i.e. what columns are present) # executemany() is used at the DBAPI level sqlusers.insert().execute( {'user_id':7, 'user_name':'jack', 'password':'asdfasdf'}, {'user_id':8, 'user_name':'ed', 'password':'asdffcadf'}, {'user_id':9, 'user_name':'fred', 'password':'asttf'}, )
Updates work a lot like INSERTS, except there is an additional WHERE clause that can be specified.
# change 'jack' to 'ed' sqlusers.update(users.c.user_name=='jack').execute(user_name='ed')
# use bind parameters u = users.update(users.c.user_name==bindparam('name'), values={'user_name':bindparam('newname')}) sqlu.execute(name='jack', newname='ed')
# update a column to another column sqlusers.update(values={users.c.password:users.c.user_name}).execute()
# multi-update sqlusers.update(users.c.user_id==bindparam('id')).execute( {'id':7, 'user_name':'jack', 'password':'fh5jks'}, {'id':8, 'user_name':'ed', 'password':'fsr234ks'}, {'id':9, 'user_name':'mary', 'password':'7h5jse'}, )
A correlated update lets you update a table using selection from another table, or the same table:
s = select([addresses.c.city], addresses.c.user_id==users.c.user_id) sqlusers.update( and_(users.c.user_id>10, users.c.user_id<20), values={users.c.user_name:s} ).execute()
A delete is formulated like an update, except theres no values:
users.delete(users.c.user_id==7).execute() users.delete(users.c.user_name.like(bindparam('name'))).execute( {'name':'%Jack%'}, {'name':'%Ed%'}, {'name':'%Jane%'}, ) users.delete(exists())
Data mapping describes the process of defining Mapper objects, which associate table metadata with user-defined classes.
The Mapper
's role is to perform SQL operations upon the database, associating individual table rows with instances of those classes, and individual database columns with properties upon those instances, to transparently associate in-memory objects with a persistent database representation.
When a Mapper
is created to associate a Table
object with a class, all of the columns defined in the Table
object are associated with the class via property accessors, which add overriding functionality to the normal process of setting and getting object attributes. These property accessors keep track of changes to object attributes; these changes will be stored to the database when the application "flushes" the current state of objects (known as a Unit of Work).
Two objects provide the primary interface for interacting with Mappers and the "unit of work" in SA 0.2, which are the Query
object and the Session
object. Query
deals with selecting objects from the database, whereas Session
provides a context for loaded objects and the ability to communicate changes on those objects back to the database.
The primary method on Query
for loading objects is its select()
method, which has similar arguments to a sqlalchemy.sql.Select
object. But this select method executes automatically and returns results, instead of awaiting an execute() call. Instead of returning a cursor-like object, it returns an array of objects.
The three configurational elements to be defined, i.e. the Table
metadata, the user-defined class, and the Mapper
, are typically defined as module-level variables, and may be defined in any fashion suitable to the application, with the only requirement being that the class and table metadata are described before the mapper. For the sake of example, we will be defining these elements close together, but this should not be construed as a requirement; since SQLAlchemy is not a framework, those decisions are left to the developer or an external framework.
Also, keep in mind that the examples in this section deal with explicit Session
objects mapped directly to Engine
objects, which represents the most explicit style of using the ORM. Options exist for how this is configured, including binding Table
objects directly to Engines
(described in Binding MetaData to an Engine), as well as using the "Threadlocal" plugin which provides various code shortcuts by using an implicit Session associated to the current thread (described in threadlocal).
First, the metadata/mapper configuration code:
from sqlalchemy import * # metadata meta = MetaData() # table object users_table = Table('users', meta, Column('user_id', Integer, primary_key=True), Column('user_name', String(16)), Column('password', String(20)) ) # class definition class User(object): pass # create a mapper and associate it with the User class. # technically we dont really need the 'usermapper' variable. usermapper = mapper(User, users_table)
Note that no database definitions are required. Next we will define an Engine
and connect a Session
to it, and perform a simple select:
The method session.query(class_or_mapper)
returns a Query
object. Below is a synopsis of things you can do with Query
:
# get a query from a Session based on class: query = session.query(User) # get a query from a Session given a Mapper: query = session.query(usermapper) # select_by, which takes keyword arguments. the # keyword arguments represent property names and the values # represent values which will be compared via the = operator. # the comparisons are joined together via "AND". result = query.select_by(name='john', street='123 green street') # select_by can also combine ClauseElements with key/value properties. # all ClauseElements and keyword-based criterion are combined together # via "AND". result = query.select_by(users_table.c.user_name=='john', addresses_table.c.zip_code=='12345', street='123 green street') # get_by, which takes the same arguments as select_by # returns a single scalar result or None if no results user = query.get_by(id=12) # "dynamic" versions of select_by and get_by - everything past the # "select_by_" or "get_by_" is used as the key, and the function argument # as the value result = query.select_by_name('fred') u = query.get_by_name('fred') # get an object directly from its primary key. this will bypass the SQL # call if the object has already been loaded u = query.get(15) # get an object that has a composite primary key of three columns. # the order of the arguments matches that of the table meta data. myobj = query.get((27, 3, 'receipts')) # using a WHERE criterion result = query.select(or_(users_table.c.user_name == 'john', users_table.c.user_name=='fred')) # using a WHERE criterion to get a scalar u = query.selectfirst(users_table.c.user_name=='john') # selectone() is a stricter version of selectfirst() which # will raise an exception if there is not exactly one row u = query.selectone(users_table.c.user_name=='john') # using a full select object result = query.select(users_table.select(users_table.c.user_name=='john'))
Some of the above examples above illustrate the usage of the mapper's Table object to provide the columns for a WHERE Clause. These columns are also accessible off of the mapped class directly. When a mapper is assigned to a class, it also attaches a special property accessor c
to the class itself, which can be used just like the table metadata to access the columns of the table:
userlist = session.query(User).select(User.c.user_id==12)
When objects corresponding to mapped classes are created or manipulated, all changes are logged by the Session
object. The changes are then written to the database when an application calls flush()
. This pattern is known as a Unit of Work, and has many advantages over saving individual objects or attributes on those objects with individual method invocations. Domain models can be built with far greater complexity with no concern over the order of saves and deletes, excessive database round-trips and write operations, or deadlocking issues. The flush()
operation batches its SQL statements into a transaction, and can also perform optimistic concurrency checks (using a version id column) to insure the proper number of rows were in fact affected (not supported with the current MySQL drivers).
The Unit of Work is a powerful tool, and has some important concepts that should be understood in order to use it effectively. See the Session / Unit of Work section for a full description on all its operations.
When a mapper is created, the target class has its mapped properties decorated by specialized property accessors that track changes. New objects by default must be explicitly added to the Session
, however this can be made automatic by using threadlocal or SessionContext.
mapper(User, users_table) # create a new User myuser = User() myuser.user_name = 'jane' myuser.password = 'hello123' # create another new User myuser2 = User() myuser2.user_name = 'ed' myuser2.password = 'lalalala' # create a Session and save them sess = create_session() sess.save(myuser) sess.save(myuser2) # load a third User from the database sqlmyuser3 = sess.query(User).select(User.c.user_name=='fred')[0]
myuser3.user_name = 'fredjones' # save all changes sqlsession.flush()
The mapped class can also specify whatever methods and/or constructor it wants:
class User(object): def __init__(self, user_name, password): self.user_id = None self.user_name = user_name self.password = password def get_name(self): return self.user_name def __repr__(self): return "User id %s name %s password %s" % (repr(self.user_id), repr(self.user_name), repr(self.password)) mapper(User, users_table) sess = create_session() u = User('john', 'foo') sess.save(u) sqlsession.flush()
>>> u User id 1 name 'john' password 'foo'
SQLAlchemy will only put modified object attributes columns into the UPDATE statements generated upon flush. This is to conserve database traffic and also to successfully interact with a "deferred" attribute, which is a mapped object attribute against the mapper's primary table that isnt loaded until referenced by the application.
So that covers how to map the columns in a table to an object, how to load objects, create new ones, and save changes. The next step is how to define an object's relationships to other database-persisted objects. This is done via the relation
function provided by the orm
module.
So with our User class, lets also define the User has having one or more mailing addresses. First, the table metadata:
from sqlalchemy import * metadata = MetaData() # define user table users_table = Table('users', metadata, Column('user_id', Integer, primary_key=True), Column('user_name', String(16)), Column('password', String(20)) ) # define user address table addresses_table = Table('addresses', metadata, Column('address_id', Integer, primary_key=True), Column('user_id', Integer, ForeignKey("users.user_id")), Column('street', String(100)), Column('city', String(80)), Column('state', String(2)), Column('zip', String(10)) )
Of importance here is the addresses table's definition of a foreign key relationship to the users table, relating the user_id column into a parent-child relationship. When a Mapper
wants to indicate a relation of one object to another, the ForeignKey
relationships are the default method by which the relationship is determined (options also exist to describe the relationships explicitly).
So then lets define two classes, the familiar User
class, as well as an Address
class:
class User(object): def __init__(self, user_name, password): self.user_name = user_name self.password = password class Address(object): def __init__(self, street, city, state, zip): self.street = street self.city = city self.state = state self.zip = zip
And then a Mapper
that will define a relationship of the User
and the Address
classes to each other as well as their table metadata. We will add an additional mapper keyword argument properties
which is a dictionary relating the names of class attributes to database relationships, in this case a relation
object against a newly defined mapper for the Address class:
mapper(Address, addresses_table) mapper(User, users_table, properties = { 'addresses' : relation(Address) } )
Lets do some operations with these classes and see what happens:
engine = create_engine('sqlite:///mydb.db') metadata.create_all(engine) session = create_session(bind_to=engine) u = User('jane', 'hihilala') u.addresses.append(Address('123 anywhere street', 'big city', 'UT', '76543')) u.addresses.append(Address('1 Park Place', 'some other city', 'OK', '83923')) session.save(u) session.flush()
A lot just happened there! The Mapper
figured out how to relate rows in the addresses table to the users table, and also upon flush had to determine the proper order in which to insert rows. After the insert, all the User
and Address
objects have their new primary and foreign key attributes populated.
Also notice that when we created a Mapper
on the User
class which defined an addresses
relation, the newly created User
instance magically had an "addresses" attribute which behaved like a list. This list is in reality a property function which returns an instance of sqlalchemy.util.HistoryArraySet
. This object fulfills the full set of Python list accessors, but maintains a unique set of objects (based on their in-memory identity), and also tracks additions and deletions to the list:
del u.addresses[1] u.addresses.append(Address('27 New Place', 'Houston', 'TX', '34839')) session.flush()
Note that when creating a relation with the relation()
function, the target can either be a class, in which case the primary mapper for that class is used as the target, or a Mapper
instance itself, as returned by the mapper()
function.
In the previous example, a single address was removed from the addresses
attribute of a User
object, resulting in the corresponding database row being updated to have a user_id of None
. But now, theres a mailing address with no user_id floating around in the database of no use to anyone. How can we avoid this ? This is acheived by using the cascade
parameter of relation
:
clear_mappers() # clear mappers from the previous example mapper(Address, addresses_table) mapper(User, users_table, properties = { 'addresses' : relation(Address, cascade="all, delete-orphan") } ) del u.addresses[1] u.addresses.append(Address('27 New Place', 'Houston', 'TX', '34839')) session.flush()
In this case, with the delete-orphan
cascade rule set, the element that was removed from the addresses list was also removed from the database. Specifying cascade="all, delete-orphan"
means that every persistence operation performed on the parent object will be cascaded to the child object or objects handled by the relation, and additionally that each child object cannot exist without being attached to a parent. Such a relationship indicates that the lifecycle of the Address
objects are bounded by that of their parent User
object.
Cascading is described fully in Cascade rules.
By creating relations with the backref
keyword, a bi-directional relationship can be created which will keep both ends of the relationship updated automatically, independently of database operations. Below, the User
mapper is created with an addresses
property, and the corresponding Address
mapper receives a "backreference" to the User
object via the property name user
:
Address = mapper(Address, addresses_table) User = mapper(User, users_table, properties = { 'addresses' : relation(Address, backref='user') } ) u = User('fred', 'hi') a1 = Address('123 anywhere street', 'big city', 'UT', '76543') a2 = Address('1 Park Place', 'some other city', 'OK', '83923') # append a1 to u u.addresses.append(a1) # attach u to a2 a2.user = u # the bi-directional relation is maintained >>> u.addresses == [a1, a2] True >>> a1.user is user and a2.user is user True
The backreference feature also works with many-to-many relationships, which are described later. When creating a backreference, a corresponding property (i.e. a second relation()
) is placed on the child mapper. The default arguments to this property can be overridden using the backref()
function:
mapper(User, users_table) mapper(Address, addresses_table, properties={ 'user':relation(User, backref=backref('addresses', cascade="all, delete-orphan")) })
The backref()
function is often used to set up a bi-directional one-to-one relationship. This is because the relation()
function by default creates a "one-to-many" relationship when presented with a primary key/foreign key relationship, but the backref()
function can redefine the uselist
property to make it a scalar:
mapper(User, users_table) mapper(Address, addresses_table, properties={ 'user' : relation(User, backref=backref('address', uselist=False)) })
We've seen how the relation
specifier affects the saving of an object and its child items, how does it affect selecting them? By default, the relation keyword indicates that the related property should be attached a lazy loader when instances of the parent object are loaded from the database; this is just a callable function that when accessed will invoke a second SQL query to load the child objects of the parent.
# define a mapper mapper(User, users_table, properties = { 'addresses' : relation(mapper(Address, addresses_table)) }) # select users where username is 'jane', get the first element of the list # this will incur a load operation for the parent table sqluser = session.query(User).select(User.c.user_name=='jane')[0]
# iterate through the User object's addresses. this will incur an # immediate load of those child items sqlfor a in user.addresses:
print repr(a)
For mappers that have relationships, the select_by
method of the Query
object can create queries that include automatically created joins. Just specify a key in the argument list which is not present in the primary mapper's list of properties or columns, but is present in the property list of one of its relationships:
sqll = session.query(User).select_by(street='123 Green Street')
The above example is shorthand for:
l = session.query(User).select(and_( Address.c.user_id==User.c.user_id, Address.c.street=='123 Green Street') )
All keyword arguments sent to select_by
are used to create query criterion. This means that familiar select
keyword options like order_by
and limit
are not directly available. To enable these options with select_by
, you can try the SelectResults extension which offers methods off the result of a select
or select_by
such as order_by()
and array slicing functions that generate new queries.
Also, select_by
will not create joins derived from Column
-based expressions (i.e. ClauseElement
objects); the reason is that a Column
-based expression may include many columns, and select_by
has no way to know which columns in the expression correspond to properties and which don't (it also prefers not to dig into column expressions which may be very complex). The next section describes some ways to combine Column
expressions with select_by
's auto-joining capabilities.
Feature Status: Alpha API
The join_to
method of Query
is a component of the select_by
operation, and is given a keyname in order to return a "join path" from the Query's mapper to the mapper which is referenced by a relation()
of the given name:
>>> q = session.query(User) >>> j = q.join_to('addresses') >>> print j users.user_id=addresses.user_id
join_to
can also be given the name of a column-based property, in which case it will locate a path to the nearest mapper which has that property as a column:
>>> q = session.query(User) >>> j = q.join_to('street') >>> print j users.user_id=addresses.user_id
Also available is the join_via
function, which is similar to join_to
, except instead of traversing through all properties to find a path to the given key, its given an explicit path to the target property:
>>> q = session.query(User) >>> j = q.join_via(['orders', 'items']) >>> print j users.c.user_id==orders.c.user_id AND orders.c.item_id==items.c.item_id
Expressions produced by join_to
and join_via
can be used with select
to create more complicated query criterion across multiple relations:
>>> l = q.select( (addresses_table.c.street=='some address') & (items_table.c.item_name=='item #4') & q.join_to('addresses') & q.join_via(['orders', 'items']) )
With just a single parameter lazy=False
specified to the relation object, the parent and child SQL queries can be joined together.
mapper(Address, addresses_table) mapper(User, users_table, properties = { 'addresses' : relation(Address, lazy=False) } ) sqlusers = session.query(User).select(User.c.user_name=='Jane')
for u in users: print repr(u) for a in u.addresses: print repr(a)
Above, a pretty ambitious query is generated just by specifying that the User should be loaded with its child Addresses in one query. When the mapper processes the results, it uses an Identity Map to keep track of objects that were already loaded, based on their primary key identity. Through this method, the redundant rows produced by the join are organized into the distinct object instances they represent.
The generation of this query is also immune to the effects of additional joins being specified in the original query. To use our select_by example above, joining against the "addresses" table to locate users with a certain street results in this behavior:
sqlusers = session.query(User).select_by(street='123 Green Street')
The join implied by passing the "street" parameter is stated as an additional join between the addresses
and users
tables. Also, since the eager join is "aliasized", no name conflict occurs.
The options
method on the Query
object provides an easy way to get alternate forms of a mapper query from an original one. The most common use of this feature is to change the "eager/lazy" loading behavior of a particular mapper, via the functions eagerload()
, lazyload()
and noload()
:
# user mapper with lazy addresses mapper(User, users_table, properties = { 'addresses' : relation(mapper(Address, addresses_table)) } ) # query object query = session.query(User) # make an eager loading query eagerquery = query.options(eagerload('addresses')) u = eagerquery.select() # make another query that wont load the addresses at all plainquery = query.options(noload('addresses')) # multiple options can be specified myquery = oldquery.options(lazyload('tracker'), noload('streets'), eagerload('members')) # to specify a relation on a relation, separate the property names by a "." myquery = oldquery.options(eagerload('orders.items'))
The above examples focused on the "one-to-many" relationship. To do other forms of relationship is easy, as the relation
function can usually figure out what you want:
metadata = MetaData() # a table to store a user's preferences for a site prefs_table = Table('user_prefs', metadata, Column('pref_id', Integer, primary_key = True), Column('stylename', String(20)), Column('save_password', Boolean, nullable = False), Column('timezone', CHAR(3), nullable = False) ) # user table with a 'preference_id' column users_table = Table('users', metadata, Column('user_id', Integer, primary_key = True), Column('user_name', String(16), nullable = False), Column('password', String(20), nullable = False), Column('preference_id', Integer, ForeignKey("user_prefs.pref_id")) ) # engine and some test data engine = create_engine('sqlite:///', echo=True) metadata.create_all(engine) engine.execute(prefs_table.insert(), dict(pref_id=1, stylename='green', save_password=1, timezone='EST')) engine.execute(users_table.insert(), dict(user_name = 'fred', password='45nfss', preference_id=1)) # classes class User(object): def __init__(self, user_name, password): self.user_name = user_name self.password = password class UserPrefs(object): pass mapper(UserPrefs, prefs_table) mapper(User, users_table, properties = dict( preferences = relation(UserPrefs, lazy=False, cascade="all, delete-orphan"), )) # select session = create_session(bind_to=engine) sqluser = session.query(User).get_by(user_name='fred')
save_password = user.preferences.save_password # modify user.preferences.stylename = 'bluesteel' # flush sqlsession.flush()
The relation
function handles a basic many-to-many relationship when you specify the association table:
metadata = MetaData() articles_table = Table('articles', metadata, Column('article_id', Integer, primary_key = True), Column('headline', String(150), key='headline'), Column('body', TEXT, key='body'), ) keywords_table = Table('keywords', metadata, Column('keyword_id', Integer, primary_key = True), Column('keyword_name', String(50)) ) itemkeywords_table = Table('article_keywords', metadata, Column('article_id', Integer, ForeignKey("articles.article_id")), Column('keyword_id', Integer, ForeignKey("keywords.keyword_id")) ) engine = create_engine('sqlite:///') metadata.create_all(engine) # class definitions class Keyword(object): def __init__(self, name): self.keyword_name = name class Article(object): pass mapper(Keyword, keywords_table) # define a mapper that does many-to-many on the 'itemkeywords' association # table mapper(Article, articles_table, properties = dict( keywords = relation(Keyword, secondary=itemkeywords_table, lazy=False) ) ) session = create_session(bind_to=engine) article = Article() article.headline = 'a headline' article.body = 'this is the body' article.keywords.append(Keyword('politics')) article.keywords.append(Keyword('entertainment')) session.save(article) sqlsession.flush()
# select articles based on a keyword. select_by will handle the extra joins. sqlarticles = session.query(Article).select_by(keyword_name='politics')
a = articles[0] # clear out keywords with a new list a.keywords = [] a.keywords.append(Keyword('topstories')) a.keywords.append(Keyword('government')) # flush sqlsession.flush()
Many to Many can also be done with an association object, that adds additional information about how two items are related. This association object is set up in basically the same way as any other mapped object. However, since an association table typically has no primary key columns, you have to tell the mapper what columns will compose its "primary key", which are the two (or more) columns involved in the association. Also, the relation function needs an additional hint as to the fact that this mapped object is an association object, via the "association" argument which points to the class or mapper representing the other side of the association.
from sqlalchemy import * metadata = MetaData() users_table = Table('users', metadata, Column('user_id', Integer, primary_key = True), Column('user_name', String(16), nullable = False), ) articles_table = Table('articles', metadata, Column('article_id', Integer, primary_key = True), Column('headline', String(150), key='headline'), Column('body', TEXT, key='body'), ) keywords_table = Table('keywords', metadata, Column('keyword_id', Integer, primary_key = True), Column('keyword_name', String(50)) ) # add "attached_by" column which will reference the user who attached this keyword itemkeywords_table = Table('article_keywords', metadata, Column('article_id', Integer, ForeignKey("articles.article_id")), Column('keyword_id', Integer, ForeignKey("keywords.keyword_id")), Column('attached_by', Integer, ForeignKey("users.user_id")) ) engine = create_engine('sqlite:///', echo=True) metadata.create_all(engine) # class definitions class User(object): pass class Keyword(object): def __init__(self, name): self.keyword_name = name class Article(object): pass class KeywordAssociation(object): pass mapper(User, users_table) mapper(Keyword, keywords_table) # mapper for KeywordAssociation # specify "primary key" columns manually mapper(KeywordAssociation, itemkeywords_table, primary_key = [itemkeywords_table.c.article_id, itemkeywords_table.c.keyword_id], properties={ 'keyword' : relation(Keyword, lazy = False), 'user' : relation(User, lazy = False) } ) # Article mapper, relates to Keyword via KeywordAssociation mapper(Article, articles_table, properties={ 'keywords':relation(KeywordAssociation, lazy=False, association=Keyword) } ) session = create_session(bind_to=engine) # select by keyword sqlalist = session.query(Article).select_by(keyword_name='jacks_stories')
# user is available for a in alist: for k in a.keywords: if k.keyword.name == 'jacks_stories': print k.user.user_name
Keep in mind that the association object works a little differently from a plain many-to-many relationship. Members have to be added to the list via instances of the association object, which in turn point to the associated object:
user = User() user.user_name = 'some user' article = Article() assoc = KeywordAssociation() assoc.keyword = Keyword('blue') assoc.user = user assoc2 = KeywordAssociation() assoc2.keyword = Keyword('green') assoc2.user = user article.keywords.append(assoc) article.keywords.append(assoc2) session.flush()
The concept behind Unit of Work is to track modifications to a field of objects, and then be able to flush those changes to the database in a single operation. Theres a lot of advantages to this, including that your application doesn't need to worry about individual save operations on objects, nor about the required order for those operations, nor about excessive repeated calls to save operations that would be more efficiently aggregated into one step. It also simplifies database transactions, providing a neat package with which to insert into the traditional database begin/commit phase.
SQLAlchemy's unit of work includes these functions:
Session
object. Transactional capability, which rides on top of the transactions provided by Engine
objects, is provided by the SessionTransaction
object.
When dealing with mapped instances with regards to Sessions, an instance may be attached or unattached to a particular Session. An instance also may or may not correspond to an actual row in the database. The product of these two binary conditions yields us four general states a particular instance can have within the perspective of the Session:
Transient - a transient instance exists within memory only and is not associated with any Session. It also has no database identity and does not have a corresponding record in the database. When a new instance of a class is constructed, and no default session context exists with which to automatically attach the new instance, it is a transient instance. The instance can then be saved to a particular session in which case it becomes a pending instance. If a default session context exists, new instances are added to that Session by default and therefore become pending instances immediately.
Pending - a pending instance is a Session-attached object that has not yet been assigned a database identity. When the Session is flushed (i.e. changes are persisted to the database), a pending instance becomes persistent.
Persistent - a persistent instance has a database identity and a corresponding record in the database, and is also associated with a particular Session. By "database identity" we mean the object is associated with a table or relational concept in the database combined with a particular primary key in that table. Objects that are loaded by SQLAlchemy in the context of a particular session are automatically considered persistent, as are formerly pending instances which have been subject to a session flush()
.
Detached - a detached instance is an instance which has a database identity and corresponding row in the database, but is not attached to any Session. This occurs when an instance has been removed from a Session, either because the session itself was cleared or closed, or the instance was explicitly removed from the Session. The object can be re-attached with a session again in which case it becomes Persistent again. Detached instances are useful when an application needs to represent a long-running operation across multiple Sessions, needs to store an object in a serialized state and then restore it later (such as within an HTTP "session" object), or in some cases where code needs to load instances locally which will later be associated with some other Session.
A new Session object is constructed via the create_session()
function:
session = create_session()
A common option used with create_session()
is to specify a specific Engine
or Connection
to be used for all operations performed by this Session:
# create an engine e = create_engine('postgres://some/url') # create a Session that will use this engine for all operations. # it will open and close Connections as needed. session = create_session(bind_to=e) # open a Connection conn = e.connect() # create a Session that will use this specific Connection for all operations session = create_session(bind_to=conn)
The session to which an object is attached can be acquired via the object_session()
function, which returns the appropriate Session
if the object is pending or persistent, or None
if the object is transient or detached:
session = object_session(obj)
It is possible to install a default "threadlocal" session context by importing a mod called sqlalchemy.mods.threadlocal
. This mod creates a familiar SA 0.1 keyword objectstore
in the sqlalchemy
namespace. The objectstore
may be used directly like a session; all session actions performed on sqlalchemy.objectstore
will be proxied to the thread-local Session:
# install 'threadlocal' mod (only need to call this once per application) import sqlalchemy.mods.threadlocal # then 'objectstore' is available within the 'sqlalchemy' namespace from sqlalchemy import objectstore # flush the current thread-local session using the objectstore directly objectstore.flush() # which is the same as this (assuming we are still on the same thread): session = objectstore.get_session() session.flush()
We will now cover some of the key concepts used by Sessions and its underlying Unit of Work.
A primary concept of the Session's underlying Unit of Work is that it is keeping track of all persistent instances; recall that a persistent instance has a database identity and is attached to a Session. In particular, the Unit of Work must insure that only one copy of a particular persistent instance exists within the Session at any given time. The UOW accomplishes this task using a dictionary known as an Identity Map. When a Query
is used to issue select
or get
requests to the database, it will in nearly all cases result in an actual SQL execution to the database, and a corresponding traversal of rows received from that execution. However, when the underlying mapper instantiates objects corresponding to the result set rows it receives, it will check the session's identity map first before instantating a new object, and return the same instance already present in the identity map if it already exists, essentially ignoring the object state represented by that row. There are several ways to override this behavior and truly refresh an already-loaded instance which are described later, but the main idea is that once your instance is loaded into a particular Session, it will never change its state without your explicit approval, regardless of what the database says about it.
For example; below, two separate calls to load an instance with database identity "15" are issued, and the results assigned to two separate variables. However, since the same Session
was used, the two instances are the same instance:
mymapper = mapper(MyClass, mytable) session = create_session() obj1 = session.query(MyClass).selectfirst(mytable.c.id==15) obj2 = session.query(MyClass).selectfirst(mytable.c.id==15) >>> obj1 is obj2 True
The Identity Map is an instance of weakref.WeakValueDictionary
, so that when an in-memory object falls out of scope, it will be removed automatically. However, this may not be instant if there are circular references upon the object. To guarantee that an instance is removed from the identity map before removing references to it, use the expunge()
method, described later, to remove it.
The Session supports an iterator interface in order to see all objects in the identity map:
for obj in session: print obj
As well as __contains__()
:
if obj in session: print "Object is present"
The identity map itself is accessible via the identity_map
accessor:
>>> session.identity_map.values() [<__main__.User object at 0x712630>, <__main__.Address object at 0x712a70>]
The identity of each object instance is available via the _instance_key
property attached to each object instance, and is a tuple consisting of the object's class and an additional tuple of primary key values, in the order that they appear within the table definition:
>>> obj._instance_key (<class 'test.tables.User'>, (7,))
At the moment that an object is assigned this key within a flush()
operation, it is also added to the session's identity map.
The get()
method on Query
, which retrieves an object based on primary key identity, also checks in the Session's identity map first to save a database round-trip if possible. In the case of an object lazy-loading a single child object, the get()
method is used as well, so scalar-based lazy loads may in some cases not query the database; this is particularly important for backreference relationships as it can save a lot of queries.
The next concept is that in addition to the Session
storing a record of all objects loaded or saved, it also stores lists of all newly created (i.e. pending) objects, lists of all persistent objects whose attributes have been modified, and lists of all persistent objects that have been marked as deleted. These lists are used when a flush()
call is issued to save all changes. After the flush occurs, these lists are all cleared out.
These records are all tracked by a collection of Set
objects (which are a SQLAlchemy-specific instance called a HashSet
) that are also viewable off the Session
:
# pending objects recently added to the Session session.new # persistent objects with modifications session.dirty # persistent objects that have been marked as deleted via session.delete(obj) session.deleted
Unlike the identity map, the new
, dirty
, and deleted
lists are not weak referencing. This means if you abandon all references to new or modified objects within a session, they are still present and will be saved on the next flush operation, unless they are removed from the Session explicitly (more on that later). The new
list may change in a future release to be weak-referencing, however for the deleted
list, one can see that its quite natural for a an object marked as deleted to have no references in the application, yet a DELETE operation is still required.
The query()
function takes a class or Mapper
as an argument, along with an optional entity_name
parameter, and returns a new Query
object which will issue mapper queries within the context of this Session. If a Mapper is passed, then the Query uses that mapper. Otherwise, if a class is sent, it will locate the primary mapper for that class which is used to construct the Query.
# query from a class session.query(User).select_by(name='ed') # query from a mapper query = session.query(usermapper) x = query.get(1) # query from a class mapped with entity name 'alt_users' q = session.query(User, entity_name='alt_users') y = q.options(eagerload('orders')).select()
entity_name
is an optional keyword argument sent with a class object, in order to further qualify which primary mapper to be used; this only applies if there was a Mapper
created with that particular class/entity name combination, else an exception is raised. All of the methods on Session which take a class or mapper argument also take the entity_name
argument, so that a given class can be properly matched to the desired primary mapper.
All instances retrieved by the returned Query
object will be stored as persistent instances within the originating Session
.
Given a class or mapper, a scalar or tuple-based identity, and an optional entity_name
keyword argument, creates a Query
corresponding to the given mapper or class/entity_name combination, and calls the get()
method with the given identity value. If the object already exists within this Session, it is simply returned, else it is queried from the database. If the instance is not found, the method returns None
.
# get Employer primary key 5 employer = session.get(Employer, 5) # get Report composite primary key 7,12, using mapper 'report_mapper_b' report = session.get(Report, (7,12), entity_name='report_mapper_b')
load() is similar to get() except it will raise an exception if the instance does not exist in the database. It will also load the object's data from the database in all cases, and overwrite all changes on the object if it already exists in the session with the latest data from the database.
# load Employer primary key 5 employer = session.load(Employer, 5) # load Report composite primary key 7,12, using mapper 'report_mapper_b' report = session.load(Report, (7,12), entity_name='report_mapper_b')
save() is called with a single transient (unsaved, unattached) instance as an argument, which is then added to the Session and becomes pending. When the session is next flush
ed, the instance will be saved to the database uponwhich it becomes persistent (saved, attached). If the given instance is not transient, meaning it is either attached to an existing Session or it has a database identity, an exception is raised.
user1 = User(name='user1') user2 = User(name='user2') session.save(user1) session.save(user2) session.flush() # write changes to the database
save() is called automatically for new instances by the classes' associated mapper, if a default Session context is in effect (such as a thread-local session), which means that newly created instances automatically become pending. If there is no default session available, then the instance remains transient (unattached) until it is explicitly added to a Session via the save() method.
A transient instance also can be automatically save
ed if it is associated with a parent object which specifies save-update
within its cascade
rules, and that parent is already attached or becomes attached to a Session. For more information on cascade
, see the next section.
The save_or_update()
method, covered later, is a convenience method which will call the save()
or update()
methods appropriately dependening on whether or not the instance has a database identity (but the instance still must be unattached).
This is the main gateway to what the Unit of Work does best, which is save everything ! It should be clear by now what a flush looks like:
session.flush()
It also can be called with a list of objects; in this form, the flush operation will be limited only to the objects specified in the list, as well as any child objects within private
relationships for a delete operation:
# saves only user1 and address2. all other modified # objects remain present in the session. session.flush([user1, address2])
This second form of flush should be used carefully as it will not necessarily locate other dependent objects within the session, whose database representation may have foreign constraint relationships with the objects being operated upon.
A common misconception about the flush()
operation is that once performed, the newly persisted instances will automatically have related objects attached to them, based on the values of primary key identities that have been assigned to the instances before they were persisted. An example would be, you create a new Address
object, set address.user_id
to 5, and then flush()
the session. The erroneous assumption would be that there is now a User
object of identity "5" attached to the Address
object, but in fact this is not the case. If you were to refresh()
the Address
, invalidating its current state and re-loading, then it would have the appropriate User
object present.
This misunderstanding is related to the observed behavior of backreferences (Backreferences), which automatically associates an instance "A" with another instance "B", in response to the manual association of instance "B" to instance "A" by the user. The backreference operation occurs completely externally to the flush()
operation, and is pretty much the only example of a SQLAlchemy feature that manipulates the relationships of persistent objects.
The primary guideline for dealing with flush()
is, the developer is responsible for maintaining in-memory objects and their relationships to each other, the unit of work is responsible for maintaining the database representation of the in-memory objects. The typical pattern is that the manipulation of objects is the way that changes get communicated to the unit of work, so that when the flush occurs, the objects are already in their correct in-memory representation and problems dont arise. The manipulation of identifier attributes like integer key values as well as deletes in particular are a frequent source of confusion.
This method first calls clear()
, removing all objects from this Session
, and then insures that any transactional resources are closed.
The delete
method places an instance into the Unit of Work's list of objects to be marked as deleted:
# mark two objects to be deleted session.delete(obj1) session.delete(obj2) # flush session.flush()
The delete operation will have an effect on instances that are attached to the deleted instance according to the cascade
style of the relationship; cascade rules are described further in the following section. By default, associated instances may need to be updated in the database to reflect that they no longer are associated with the parent object, before the parent is deleted. If the relationship specifies cascade="delete"
, then the associated instance will also be deleted upon flush, assuming it is still attached to the parent. If the relationship additionally includes the delete-orphan
cascade style, the associated instance will be deleted if it is still attached to the parent, or is unattached to any other parent.
The delete()
operation has no relationship to the in-memory status of the instance, including usage of the del
Python statement. An instance marked as deleted and flushed will still exist within memory until references to it are freed; similarly, removing an instance from memory via the del
statement will have no effect, since the persistent instance will still be referenced by its Session. Obviously, if the instance is removed from the Session and then totally dereferenced, it will no longer exist in memory, but also won't exist in any Session and is therefore not deleted from the database.
This method detaches all instances from the Session, sending them to the detached or transient state as applicable, and replaces the underlying UnitOfWork with a new one.
session.clear()
The clear()
method is particularly useful with a "default context" session such as a thread-local session, which can stay attached to the current thread to handle a new field of objects without having to re-attach a new Session.
To assist with the Unit of Work's "sticky" behavior, individual objects can have all of their attributes immediately re-loaded from the database, or marked as "expired" which will cause a re-load to occur upon the next access of any of the object's mapped attributes. This includes all relationships, so lazy-loaders will be re-initialized, eager relationships will be repopulated. Any changes marked on the object are discarded:
# immediately re-load attributes on obj1, obj2 session.refresh(obj1) session.refresh(obj2) # expire objects obj1, obj2, attributes will be reloaded # on the next access: session.expire(obj1) session.expire(obj2)
Expunge removes an object from the Session, sending persistent instances to the detached state, and pending instances to the transient state:
session.expunge(obj1)
Use expunge
when youd like to remove an object altogether from memory, such as before calling del
on it, which will prevent any "ghost" operations occuring when the session is flushed.
Both of these methods receive two arguments; in the case of bind_mapper()
, it is a Mapper
and an Engine
or Connection
instance; in the case of bind_table()
, it is a Table
instance or other Selectable
(such as an Alias
, Select
, etc.), and an Engine
or Connection
instance.
engine1 = create_engine('sqlite:///file1.db') engine2 = create_engine('mysql://localhost') sqlite_conneciton = engine1.connect() sess = create_session() sess.bind_mapper(mymapper, sqlite_connection) # bind mymapper operations to a single SQLite connection sess.bind_table(email_addresses_table, engine2) # bind operations with the email_addresses_table to mysql
Normally, when a Session
is created via create_session()
with no arguments, the Session has no awareness of individual Engines
, and when mappers use the Session
to retrieve connections, the underlying MetaData
each Table
is associated with is expected to be "bound" to an Engine
, else no engine can be located and an exception is raised. A second form of create_session()
takes the argument bind_to=engine_or_connection
, where all SQL operations performed by this Session
use the single Engine
or Connection
(collectively known as a Connectable
) passed to the constructor. With bind_mapper()
and bind_table()
, the operations of individual mapper and/or tables are bound to distinct engines or connections, thereby overriding not only the engine which may be "bound" to the underlying MetaData
, but also the Engine
or Connection
which may have been passed to the create_session()
function. Configurations which interact with multiple explicit database connections at one time must use either or both of these methods in order to associate Session
operations with the appropriate connection resource.
Binding a Mapper
to a resource takes precedence over a Table
bind, meaning if mapper A is associated with table B, and the Session binds mapper A to connection X and table B to connection Y, an operation with mapper A will use connection X, not connection Y.
The update() method is used only with detached instances. A detached instance only exists if its Session
was cleared or closed, or the instance was expunge()
d from its session. update()
will re-attach the detached instance with this Session, bringing it back to the persistent state, and allowing any changes on the instance to be saved when the Session
is next flush
ed. If the instance is already attached to an existing Session
, an exception is raised.
A detached instance also can be automatically update
ed if it is associated with a parent object which specifies save-update
within its cascade
rules, and that parent is already attached or becomes attached to a Session. For more information on cascade
, see the next section.
The save_or_update()
method is a convenience method which will call the save()
or update()
methods appropriately dependening on whether or not the instance has a database identity (but the instance still must be unattached).
This method is a combination of the save()
and update()
methods, which will examine the given instance for a database identity (i.e. if it is transient or detached), and will call the implementation of save()
or update()
as appropriate. Use save_or_update()
to add unattached instances to a session when you're not sure if they were newly created or not. Like save()
and update()
, save_or_update()
cascades along the save-update
cascade indicator, described in the cascade
section below.
Feature Status: Alpha Implementation
merge()
is used to return the persistent version of an instance that is not attached to this Session. When passed an instance, if an instance with its database identity already exists within this Session, it is returned. If the instance does not exist in this Session, it is loaded from the database and then returned.
A future version of merge()
will also update the Session's instance with the state of the given instance (hence the name "merge").
This method is useful for bringing in objects which may have been restored from a serialization, such as those stored in an HTTP session:
# deserialize an object myobj = pickle.loads(mystring) # "merge" it. if the session already had this object in the # identity map, then you get back the one from the current session. myobj = session.merge(myobj)
Note that merge()
does not associate the given instance with the Session; it remains detached (or attached to whatever Session it was already attached to).
Feature Status: Alpha Implementation
Mappers support the concept of configurable cascade behavior on relation()
s. This behavior controls how the Session should treat the instances that have a parent-child relationship with another instance that is operated upon by the Session. Cascade is indicated as a comma-separated list of string keywords, with the possible values all
, delete
, save-update
, refresh-expire
, merge
, expunge
, and delete-orphan
.
Cascading is configured by setting the cascade
keyword argument on a relation()
:
mapper(Order, order_table, properties={ 'items' : relation(Item, items_table, cascade="all, delete-orphan"), 'customer' : relation(User, users_table, user_orders_table, cascade="save-update"), })
The above mapper specifies two relations, items
and customer
. The items
relationship specifies "all, delete-orphan" as its cascade
value, indicating that all save
, update
, merge
, expunge
, refresh
delete
and expire
operations performed on a parent Order
instance should also be performed on the child Item
instances attached to it (save
and update
are cascaded using the save_or_update()
method, so that the database identity of the instance doesn't matter). The delete-orphan
cascade value additionally indicates that if an Item
instance is no longer associated with an Order
, it should also be deleted. The "all, delete-orphan" cascade argument allows a so-called lifecycle relationship between an Order
and an Item
object.
The customer
relationship specifies only the "save-update" cascade value, indicating most operations will not be cascaded from a parent Order
instance to a child User
instance, except for if the Order
is attached with a particular session, either via the save()
, update()
, or save-update()
method.
Additionally, when a child item is attached to a parent item that specifies the "save-update" cascade value on the relationship, the child is automatically passed to save_or_update()
(and the operation is further cascaded to the child item).
Note that cascading doesn't do anything that isn't possible by manually calling Session methods on individual instances within a hierarchy, it merely automates common operations on a group of associated instances.
The default value for cascade
on relation()
s is save-update
, and the private=True
keyword argument is a synonym for cascade="all, delete-orphan"
.
SessionTransaction is a multi-engine transaction manager, which aggregates one or more Engine/Connection pairs and keeps track of a Transaction object for each one. As the Session receives requests to execute SQL statements, it uses the Connection that is referenced by the SessionTransaction. At commit time, the underyling Session is flushed, and each Transaction is the committed.
Example usage is as follows:
sess = create_session() trans = sess.create_transaction() try: item1 = sess.query(Item).get(1) item2 = sess.query(Item).get(2) item1.foo = 'bar' item2.bar = 'foo' trans.commit() except: trans.rollback() raise
The create_transaction()
method creates a new SessionTransaction object but does not declare any connection/transaction resources. At the point of the first get()
call, a connection resource is opened off the engine that corresponds to the Item classes' mapper and is stored within the SessionTransaction
with an open Transaction
. When trans.commit()
is called, the flush()
method is called on the Session
and the corresponding update statements are issued to the database within the scope of the transaction already opened; afterwards, the underying Transaction is committed, and connection resources are freed.
SessionTransaction
, like the Transaction
off of Connection
also supports "nested" behavior, and is safe to pass to other functions which then issue their own begin()
/commit()
pair; only the outermost begin()
/commit()
pair actually affects the transaction, and any call to rollback()
within a particular call stack will issue a rollback.
Note that while SessionTransaction is capable of tracking multiple transactions across multiple databases, it currently is in no way a fully functioning two-phase commit engine; generally, when dealing with multiple databases simultaneously, there is the distinct possibility that a transaction can succeed on the first database and fail on the second, which for some applications may be an invalid state. If this is an issue, its best to either refrain from spanning transactions across databases, or to look into some of the available technologies in this area, such as Zope which offers a two-phase commit engine; some users have already created their own SQLAlchemy/Zope hybrid implementations to deal with scenarios like these.
The SessionTransaction can interact with direct SQL queries in two general ways. Either specific Connection
objects can be associated with the SessionTransaction
, which are then useable both for direct SQL as well as within flush()
operations performed by the SessionTransaction
, or via accessing the Connection
object automatically referenced within the SessionTransaction
.
To associate a specific Connection
with the SessionTransaction
, use the add()
method:
connection = engine.connect() trans = session.create_transaction() try: trans.add(connection) connection.execute(mytable.update(), {'col1':4, 'col2':17}) session.flush() # flush() operation will use the same connection trans.commit() except: trans.rollback() raise
The add()
method will key the Connection
's underlying Engine
to this SessionTransaction
. When mapper operations are performed against this Engine
, the Connection
explicitly added will be used. This overrides any other Connection
objects that the underlying Session was associated with, corresponding to the underlying Engine
of that Connection
. However, if the SessionTransaction
itself is already associated with a Connection
, then an exception is thrown.
The other way is just to use the Connection
referenced by the SessionTransaction
. This is performed via the connection()
method, and requires passing in a class or Mapper
which indicates which underlying Connection
should be returned (recall that different Mappers
may use different underlying Engines
). If the class_or_mapper
argument is None
, then the Session
must be globally bound to a specific Engine
when it was constructed, else the method returns None
.
trans = session.create_transaction() try: connection = trans.connection(UserClass) # get the Connection used by the UserClass' Mapper connection.execute(mytable.update(), {'col1':4, 'col2':17}) trans.commit() except: trans.rollback() raise
The connection()
method also exists on the Session
object itself, and can be called regardless of whether or not a SessionTransaction
is in progress. If a SessionTransaction
is in progress, it will return the connection referenced by the transaction. If an Engine
is being used with threadlocal
strategy, the Connection
returned will correspond to the connection resources that are bound to the current thread, if any (i.e. it is obtained by calling contextual_connection()
).
The transactions issued by SessionTransaction
as well as internally by the Session
's flush()
operation use the same Transaction
object off of Connection
that is publically available. Recall that this object supports "nestable" behavior, meaning any number of actors can call begin()
off a particular Connection
object, and they will all be managed within the scope of a single transaction. Therefore, the flush()
operation can similarly take place within the scope of a regular Transaction
:
connection = engine.connect() # Connection session = create_session(bind_to=connection) # Session bound to the Connection trans = connection.begin() # start transaction try: stuff = session.query(MyClass).select() # Session operation uses connection stuff[2].foo = 'bar' connection.execute(mytable.insert(), dict(id=12, value="bar")) # use connection explicitly session.flush() # Session flushes with "connection", using transaction "trans" trans.commit() # commit except: trans.rollback() # or rollback raise
The session module can log an extensive display of its "flush plans", which is a graph of its internal representation of objects before they are written to the database. To turn this logging on:
# make an Session with echo_uow session = create_session(echo_uow=True)
The flush()
operation will then dump to the standard output displays like the following:
Task dump: UOWTask(6034768, 'User/users/None') | |- Save User(6016624) | |-Process User(6016624).addresses | |- UOWTask(6034832, 'Address/email_addresses/None') | |- Save Address(6034384) | |- Save Address(6034256) | |---- | |----
The above graph can be read straight downwards to determine the order of operations. It indicates "save User 6016624, process each element in the 'addresses' list on User 6016624, save Address 6034384, Address 6034256".
Of course, one can also get a good idea of the order of operations just by logging the actual SQL statements executed.
This section details all the options available to Mappers, as well as advanced patterns.
To start, heres the tables we will work with again:
from sqlalchemy import * metadata = MetaData() # a table to store users users_table = Table('users', metadata, Column('user_id', Integer, primary_key = True), Column('user_name', String(40)), Column('password', String(80)) ) # a table that stores mailing addresses associated with a specific user addresses_table = Table('addresses', metadata, Column('address_id', Integer, primary_key = True), Column('user_id', Integer, ForeignKey("users.user_id")), Column('street', String(100)), Column('city', String(80)), Column('state', String(2)), Column('zip', String(10)) ) # a table that stores keywords keywords_table = Table('keywords', metadata, Column('keyword_id', Integer, primary_key = True), Column('name', VARCHAR(50)) ) # a table that associates keywords with users userkeywords_table = Table('userkeywords', metadata, Column('user_id', INT, ForeignKey("users")), Column('keyword_id', INT, ForeignKey("keywords")) )
When mappers are constructed, by default the column names in the Table metadata are used as the names of attributes on the mapped class. This can be customzed within the properties by stating the key/column combinations explicitly:
user_mapper = mapper(User, users_table, properties={ 'id' : users_table.c.user_id, 'name' : users_table.c.user_name, })
In the situation when column names overlap in a mapper against multiple tables, columns may be referenced together with a list:
# join users and addresses usersaddresses = sql.join(users_table, addresses_table, users_table.c.user_id == addresses_table.c.user_id) m = mapper(User, usersaddresses, properties = { 'id' : [users_table.c.user_id, addresses_table.c.user_id], } )
A common request is the ability to create custom class properties that override the behavior of setting/getting an attribute. Currently, the easiest way to do this in SQLAlchemy is how it would be done in any Python program; define your attribute with a different name, such as "_attribute", and use a property to get/set its value. The mapper just needs to be told of the special name:
class MyClass(object): def _set_email(self, email): self._email = email def _get_email(self, email): return self._email email = property(_get_email, _set_email) mapper(MyClass, mytable, properties = { # map the '_email' attribute to the "email" column # on the table '_email': mytable.c.email })
It is also possible to use the select_by
and get_by
functions on Query
using the original property name, by establishing a synonym
:
mapper(MyClass, mytable, proeprties = { # map the '_email' attribute to the "email" column # on the table '_email': mytable.c.email # make a synonym 'email' 'email' : synonym('_email') }) # now you can select_by(email) result = session.query(MyClass).select_by(email='john@smith.com')
The synonym
keyword is currently an Alpha Feature.
Feature Status: Alpha API
A one-to-many or many-to-many relationship results in a list-holding element being attached to all instances of a class. Currently, this list is an instance of sqlalchemy.util.HistoryArraySet
, is a UserDict
instance that decorates an underlying list object. The implementation of this list can be controlled, and can in fact be any object that implements a list
-style append
and __iter__
method. A common need is for a list-based relationship to actually be a dictionary. This can be achieved by subclassing dict
to have list
-like behavior.
In this example, a class MyClass
is defined, which is associated with a parent object MyParent
. The collection of MyClass
objects on each MyParent
object will be a dictionary, storing each MyClass
instance keyed to its name
attribute.
# a class to be stored in the list class MyClass(object): def __init__(self, name): self.name = name # create a dictionary that will act like a list, and store # instances of MyClass class MyDict(dict): def append(self, item): self[item.name] = item def __iter__(self): return self.values() # parent class class MyParent(object): # this class-level attribute provides the class to be # used by the 'myclasses' attribute myclasses = MyDict # mappers, constructed normally mapper(MyClass, myclass_table) mapper(MyParent, myparent_table, properties={ 'myclasses' : relation(MyClass) }) # elements on 'myclasses' can be accessed via string keyname myparent = MyParent() myparent.myclasses.append(MyClass('this is myclass')) myclass = myparent.myclasses['this is myclass']
When creating relations on a mapper, most examples so far have illustrated the mapper and relationship joining up based on the foreign keys of the tables they represent. in fact, this "automatic" inspection can be completely circumvented using the primaryjoin
and secondaryjoin
arguments to relation
, as in this example which creates a User object which has a relationship to all of its Addresses which are in Boston:
class User(object): pass class Address(object): pass mapper(Address, addresses_table) mapper(User, users_table, properties={ 'boston_addreses' : relation(Address, primaryjoin= and_(users_table.c.user_id==Address.c.user_id, Addresses.c.city=='Boston')) })
Many to many relationships can be customized by one or both of primaryjoin
and secondaryjoin
, shown below with just the default many-to-many relationship explicitly set:
class User(object): pass class Keyword(object): pass mapper(Keyword, keywords_table) mapper(User, users_table, properties={ 'keywords':relation(Keyword, secondary=userkeywords_table primaryjoin=users_table.c.user_id==userkeywords_table.c.user_id, secondaryjoin=userkeywords_table.c.keyword_id==keywords_table.c.keyword_id ) })
The previous example leads in to the idea of joining against the same table multiple times. Below is a User object that has lists of its Boston and New York addresses:
mapper(User, users_table, properties={ 'boston_addreses' : relation(Address, primaryjoin= and_(users_table.c.user_id==Address.c.user_id, Addresses.c.city=='Boston')), 'newyork_addresses' : relation(Address, primaryjoin= and_(users_table.c.user_id==Address.c.user_id, Addresses.c.city=='New York')), })
Both lazy and eager loading support multiple joins equally well.
This feature allows particular columns of a table to not be loaded by default, instead being loaded later on when first referenced. It is essentailly "column-level lazy loading". This feature is useful when one wants to avoid loading a large text or binary field into memory when its not needed. Individual columns can be lazy loaded by themselves or placed into groups that lazy-load together.
book_excerpts = Table('books', db, Column('book_id', Integer, primary_key=True), Column('title', String(200), nullable=False), Column('summary', String(2000)), Column('excerpt', String), Column('photo', Binary) ) class Book(object): pass # define a mapper that will load each of 'excerpt' and 'photo' in # separate, individual-row SELECT statements when each attribute # is first referenced on the individual object instance mapper(Book, book_excerpts, properties = { 'excerpt' : deferred(book_excerpts.c.excerpt), 'photo' : deferred(book_excerpts.c.photo) })
Deferred columns can be placed into groups so that they load together:
book_excerpts = Table('books', db, Column('book_id', Integer, primary_key=True), Column('title', String(200), nullable=False), Column('summary', String(2000)), Column('excerpt', String), Column('photo1', Binary), Column('photo2', Binary), Column('photo3', Binary) ) class Book(object): pass # define a mapper with a 'photos' deferred group. when one photo is referenced, # all three photos will be loaded in one SELECT statement. The 'excerpt' will # be loaded separately when it is first referenced. mapper(Book, book_excerpts, properties = { 'excerpt' : deferred(book_excerpts.c.excerpt), 'photo1' : deferred(book_excerpts.c.photo1, group='photos'), 'photo2' : deferred(book_excerpts.c.photo2, group='photos'), 'photo3' : deferred(book_excerpts.c.photo3, group='photos') })
Keyword options to the relation
function include:
ForeignKey
pointing to the other column in an equality expression. Specifying it here can override the normal foreign key properties of the join condition, which is useful for self-referential table relationships, join conditions where a ForeignKey
is not present, or where the same column might appear on both sides of the join condition.
private=True
is the equivalent of setting cascade="all, delete-orphan"
, and indicates the lifecycle of child objects should be contained within that of the parent. See the example in datamapping_relations_cycle.
backref()
construct for more configurability. See Backreferences.
By default, mappers will attempt to ORDER BY the "oid" column of a table, or the primary key column, when selecting rows. This can be modified in several ways.
The "order_by" parameter can be sent to a mapper, overriding the per-engine ordering if any. A value of None means that the mapper should not use any ordering. A non-None value, which can be a column, an asc
or desc
clause, or an array of either one, indicates the ORDER BY clause that should be added to all select queries:
# disable all ordering mapper = mapper(User, users_table, order_by=None) # order by a column mapper = mapper(User, users_table, order_by=users_tableusers_table.c.user_id) # order by multiple items mapper = mapper(User, users_table, order_by=[users_table.c.user_id, desc(users_table.c.user_name)])
"order_by" can also be specified to an individual select
method, overriding all other per-engine/per-mapper orderings:
# order by a column l = mapper.select(users_table.c.user_name=='fred', order_by=users_table.c.user_id) # order by multiple criterion l = mapper.select(users_table.c.user_name=='fred', order_by=[users_table.c.user_id, desc(users_table.c.user_name)])
For relations, the "order_by" property can also be specified to all forms of relation:
# order address objects by address id mapper = mapper(User, users_table, properties = { 'addresses' : relation(mapper(Address, addresses_table), order_by=addresses_table.c.address_id) }) # eager load with ordering - the ORDER BY clauses of parent/child will be organized properly mapper = mapper(User, users_table, properties = { 'addresses' : relation(mapper(Address, addresses_table), order_by=desc(addresses_table.c.email_address), eager=True) }, order_by=users_table.c.user_id)
You can limit rows in a regular SQL query by specifying limit
and offset
. A Mapper can handle the same concepts:
class User(object): pass mapper(User, users_table) sqlr = session.query(User).select(limit=20, offset=10)
However, things get tricky when dealing with eager relationships, since a straight LIMIT of rows does not represent the count of items when joining against other tables to load related items as well. So here is what SQLAlchemy will do when you use limit or offset with an eager relationship:
class User(object): pass class Address(object): pass mapper(User, users_table, properties={ 'addresses' : relation(mapper(Address, addresses_table), lazy=False) }) r = session.query(User).select(User.c.user_name.like('F%'), limit=20, offset=10)
The main WHERE clause as well as the limiting clauses are coerced into a subquery; this subquery represents the desired result of objects. A containing query, which handles the eager relationships, is joined against the subquery to produce the result.
The options
method on the Query
object, first introduced in None, produces a new Query
object by creating a copy of the underlying Mapper
and placing modified properties on it. The options
method is also directly available off the Mapper
object itself, so that the newly copied Mapper
can be dealt with directly. The options
method takes a variable number of MapperOption
objects which know how to change specific things about the mapper. The five available options are eagerload
, lazyload
, noload
, deferred
and extension
.
An example of a mapper with a lazy load relationship, upgraded to an eager load relationship:
class User(object): pass class Address(object): pass # a 'lazy' relationship mapper(User, users_table, properties = { 'addreses':relation(mapper(Address, addresses_table), lazy=True) }) # copy the mapper and convert 'addresses' to be eager eagermapper = class_mapper(User).options(eagerload('addresses'))
The defer
and undefer
options can control the deferred loading of attributes:
# set the 'excerpt' deferred attribute to load normally m = book_mapper.options(undefer('excerpt')) # set the referenced mapper 'photos' to defer its loading of the column 'imagedata' m = book_mapper.options(defer('photos.imagedata'))
Feature Status: Alpha Implementation
Inheritance in databases comes in three forms: single table inheritance, where several types of classes are stored in one table, concrete table inheritance, where each type of class is stored in its own table, and multiple table inheritance, where the parent/child classes are stored in their own tables that are joined together in a select.
There is also a concept of polymorphic
loading, which indicates if multiple kinds of classes can be loaded in one pass.
SQLAlchemy supports all three kinds of inheritance. Additionally, true polymorphic
loading is supported in a straightfoward way for single table inheritance, and has some more manually-configured features that can make it happen for concrete and multiple table inheritance.
Working examples of polymorphic inheritance come with the distribution in the directory examples/polymorphic
.
Here are the classes we will use to represent an inheritance relationship:
class Employee(object): def __init__(self, name): self.name = name def __repr__(self): return self.__class__.__name__ + " " + self.name class Manager(Employee): def __init__(self, name, manager_data): self.name = name self.manager_data = manager_data def __repr__(self): return self.__class__.__name__ + " " + self.name + " " + self.manager_data class Engineer(Employee): def __init__(self, name, engineer_info): self.name = name self.engineer_info = engineer_info def __repr__(self): return self.__class__.__name__ + " " + self.name + " " + self.engineer_info
Each class supports a common name
attribute, while the Manager
class has its own attribute manager_data
and the Engineer
class has its own attribute engineer_info
.
This will support polymorphic loading via the Employee
mapper.
employees_table = Table('employees', metadata, Column('employee_id', Integer, primary_key=True), Column('name', String(50)), Column('manager_data', String(50)), Column('engineer_info', String(50)), Column('type', String(20)) ) employee_mapper = mapper(Employee, employees_table, polymorphic_on=employees_table.c.type) manager_mapper = mapper(Manager, inherits=employee_mapper, polymorphic_identity='manager') engineer_mapper = mapper(Engineer, inherits=employee_mapper, polymorphic_identity='engineer')
Without polymorphic loading, you just define a separate mapper for each class.
managers_table = Table('managers', metadata, Column('employee_id', Integer, primary_key=True), Column('name', String(50)), Column('manager_data', String(50)), ) engineers_table = Table('engineers', metadata, Column('employee_id', Integer, primary_key=True), Column('name', String(50)), Column('engineer_info', String(50)), ) manager_mapper = mapper(Manager, managers_table) engineer_mapper = mapper(Engineer, engineers_table)
With polymorphic loading, the SQL query to do the actual polymorphic load must be constructed, usually as a UNION. There is a helper function to create these UNIONS called polymorphic_union
.
pjoin = polymorphic_union({ 'manager':managers_table, 'engineer':engineers_table }, 'type', 'pjoin') employee_mapper = mapper(Employee, pjoin, polymorphic_on=pjoin.c.type) manager_mapper = mapper(Manager, managers_table, inherits=employee_mapper, concrete=True, polymorphic_identity='manager') engineer_mapper = mapper(Engineer, engineers_table, inherits=employee_mapper, concrete=True, polymorphic_identity='engineer')
A future release of SQLALchemy might better merge the generated UNION into the mapper construction phase.
Like concrete table inheritance, this can be done non-polymorphically, or with a little more complexity, polymorphically:
employees = Table('employees', metadata, Column('person_id', Integer, primary_key=True), Column('name', String(50)), Column('type', String(30))) engineers = Table('engineers', metadata, Column('person_id', Integer, ForeignKey('employees.person_id'), primary_key=True), Column('engineer_info', String(50)), ) managers = Table('managers', metadata, Column('person_id', Integer, ForeignKey('employees.person_id'), primary_key=True), Column('manager_data', String(50)), ) person_mapper = mapper(Employee, employees) mapper(Engineer, engineers, inherits=person_mapper) mapper(Manager, managers, inherits=person_mapper)
Polymorphic:
person_join = polymorphic_union( { 'engineer':employees.join(engineers), 'manager':employees.join(managers), 'person':employees.select(employees.c.type=='person'), }, None, 'pjoin') person_mapper = mapper(Employee, employees, select_table=person_join, polymorphic_on=person_join.c.type, polymorphic_identity='person') mapper(Engineer, engineers, inherits=person_mapper, polymorphic_identity='engineer') mapper(Manager, managers, inherits=person_mapper, polymorphic_identity='manager')
The join condition in a multiple table inheritance relationship can be specified explicitly, using inherit_condition
:
AddressUser.mapper = mapper( AddressUser, addresses_table, inherits=User.mapper, inherit_condition=users_table.c.user_id==addresses_table.c.user_id )
Mappers can be constructed against arbitrary relational units (called Selectables
) as well as plain Tables
. For example, The join
keyword from the SQL package creates a neat selectable unit comprised of multiple tables, complete with its own composite primary key, which can be passed in to a mapper as the table.
# a class class AddressUser(object): pass # define a Join j = join(users_table, addresses_table) # map to it - the identity of an AddressUser object will be # based on (user_id, address_id) since those are the primary keys involved m = mapper(AddressUser, j, properties={ 'user_id':[users_table.c.user_id, addresses_table.c.user_id] })
A second example:
# many-to-many join on an association table j = join(users_table, userkeywords, users_table.c.user_id==userkeywords.c.user_id).join(keywords, userkeywords.c.keyword_id==keywords.c.keyword_id) # a class class KeywordUser(object): pass # map to it - the identity of a KeywordUser object will be # (user_id, keyword_id) since those are the primary keys involved m = mapper(KeywordUser, j, properties={ 'user_id':[users_table.c.user_id, userkeywords.c.user_id], 'keyword_id':[userkeywords.c.keyword_id, keywords.c.keyword_id] })
In both examples above, "composite" columns were added as properties to the mappers; these are aggregations of multiple columns into one mapper property, which instructs the mapper to keep both of those columns set at the same value.
Similar to mapping against a join, a plain select() object can be used with a mapper as well. Below, an example select which contains two aggregate functions and a group_by is mapped to a class:
s = select([customers, func.count(orders).label('order_count'), func.max(orders.price).label('highest_order')], customers.c.customer_id==orders.c.customer_id, group_by=[c for c in customers.c] ) class Customer(object): pass m = mapper(Customer, s)
Above, the "customers" table is joined against the "orders" table to produce a full row for each customer row, the total count of related rows in the "orders" table, and the highest price in the "orders" table, grouped against the full set of columns in the "customers" table. That query is then mapped against the Customer class. New instances of Customer will contain attributes for each column in the "customers" table as well as an "ordercount" and "highestorder" attribute. Updates to the Customer object will only be reflected in the "customers" table and not the "orders" table. This is because the primary keys of the "orders" table are not represented in this mapper and therefore the table is not affected by save or delete operations.
The first mapper created for a certain class is known as that class's "primary mapper." Other mappers can be created as well, these come in two varieties.
non_primary=True
, and represents a load-only mapper. Objects that are loaded with a secondary mapper will have their save operation processed by the primary mapper. It is also invalid to add new relation()
s to a non-primary mapper. To use this mapper with the Session, specify it to the query
method:
example:
# primary mapper mapper(User, users_table) # make a secondary mapper to load User against a join othermapper = mapper(User, users_table.join(someothertable), non_primary=True) # select result = session.query(othermapper).select()
entity_name
parameter. Instances loaded with this mapper will be totally managed by this new mapper and have no connection to the original one. Most methods on Session
include an optional entity_name
parameter in order to specify this condition.
example:
# primary mapper mapper(User, users_table) # make an entity name mapper that stores User objects in another table mapper(User, alternate_users_table, entity_name='alt') # make two User objects user1 = User() user2 = User() # save one in in the "users" table session.save(user1) # save the other in the "alternate_users_table" session.save(user2, entity_name='alt') session.flush() # select from the alternate mapper session.query(User, entity_name='alt').select()
Oftentimes it is necessary for two mappers to be related to each other. With a datamodel that consists of Users that store Addresses, you might have an Address object and want to access the "user" attribute on it, or have a User object and want to get the list of Address objects. The easiest way to do this is via the backref
keyword described in Backreferences. Although even when backreferences are used, it is sometimes necessary to explicitly specify the relations on both mappers pointing to each other.
To achieve this involves creating the first mapper by itself, then creating the second mapper referencing the first, then adding references to the first mapper to reference the second:
usermapper = mapper(User, users) mapper(Address, addresses_table, properties={ 'user':relation(User) }) usermapper.add_property('addresses', relation(Address))
Note that with a circular relationship as above, you cannot declare both relationships as "eager" relationships, since that produces a circular query situation which will generate a recursion exception. So what if you want to load an Address and its User eagerly? Just use eager options:
eagerquery = session.query(Address).options(eagerload('user')) s = eagerquery.select(Address.c.address_id==12)
A self-referential mapper is a mapper that is designed to operate with an adjacency list table. This is a table that contains one or more foreign keys back to itself, and is usually used to create hierarchical tree structures. SQLAlchemy's default model of saving items based on table dependencies is not sufficient in this case, as an adjacency list table introduces dependencies between individual rows. Fortunately, SQLAlchemy will automatically detect a self-referential mapper and do the extra lifting to make it work.
# define a self-referential table trees = Table('treenodes', engine, Column('node_id', Integer, primary_key=True), Column('parent_node_id', Integer, ForeignKey('treenodes.node_id'), nullable=True), Column('node_name', String(50), nullable=False), ) # treenode class class TreeNode(object): pass # mapper defines "children" property, pointing back to TreeNode class, # with the mapper unspecified. it will point back to the primary # mapper on the TreeNode class. TreeNode.mapper = mapper(TreeNode, trees, properties={ 'children' : relation( TreeNode, cascade="all, delete-orphan" ), } ) # or, specify the circular relationship after establishing the original mapper: mymapper = mapper(TreeNode, trees) mymapper.add_property('children', relation( mymapper, cascade="all, delete-orphan" ))
This kind of mapper goes through a lot of extra effort when saving and deleting items, to determine the correct dependency graph of nodes within the tree.
A self-referential mapper where there is more than one relationship on the table requires that all join conditions be explicitly spelled out. Below is a self-referring table that contains a "parentnodeid" column to reference parent/child relationships, and a "rootnodeid" column which points child nodes back to the ultimate root node:
# define a self-referential table with several relations trees = Table('treenodes', engine, Column('node_id', Integer, primary_key=True), Column('parent_node_id', Integer, ForeignKey('treenodes.node_id'), nullable=True), Column('root_node_id', Integer, ForeignKey('treenodes.node_id'), nullable=True), Column('node_name', String(50), nullable=False), ) # treenode class class TreeNode(object): pass # define the "children" property as well as the "root" property TreeNode.mapper = mapper(TreeNode, trees, properties={ 'children' : relation( TreeNode, primaryjoin=trees.c.parent_node_id==trees.c.node_id cascade="all, delete-orphan" ), 'root' : relation( TreeNode, primaryjoin=trees.c.root_node_id=trees.c.node_id, foreignkey=trees.c.node_id, uselist=False ) } )
The "root" property on a TreeNode is a many-to-one relationship. By default, a self-referential mapper declares relationships as one-to-many, so the extra parameter foreignkey
, pointing to a column or list of columns on the remote side of a relationship, is needed to indicate a "many-to-one" self-referring relationship.
Both TreeNode examples above are available in functional form in the examples/adjacencytree
directory of the distribution.
Take any result set and feed it into a mapper to produce objects. Multiple mappers can be combined to retrieve unrelated objects from the same row in one step. The instances
method on mapper takes a ResultProxy object, which is the result type generated from SQLEngine, and delivers object instances.
class User(object): pass User.mapper = mapper(User, users_table) # select users c = users_table.select().execute() # get objects userlist = User.mapper.instances(c) # define a second class/mapper class Address(object): pass Address.mapper = mapper(Address, addresses_table) # select users and addresses in one query s = select([users_table, addresses_table], users_table.c.user_id==addresses_table.c.user_id) # execute it, and process the results with the User mapper, chained to the Address mapper r = User.mapper.instances(s.execute(), Address.mapper) # result rows are an array of objects, one for each mapper used for entry in r: user = r[0] address = r[1]
Other arguments not covered above include:
Selectable
which will take the place of the Mapper
's main table argument when performing queries.
Mappers can have functionality augmented or replaced at many points in its execution via the usage of the MapperExtension class. This class is just a series of "hooks" where various functionality takes place. An application can make its own MapperExtension objects, overriding only the methods it needs. Methods that are not overridden return the special value sqlalchemy.orm.mapper.EXT_PASS
, which indicates the operation should proceed as normally.
class MapperExtension(object): def select_by(self, query, *args, **kwargs): """overrides the select_by method of the Query object""" def select(self, query, *args, **kwargs): """overrides the select method of the Query object""" def create_instance(self, mapper, session, row, imap, class_): """called when a new object instance is about to be created from a row. the method can choose to create the instance itself, or it can return None to indicate normal object creation should take place. mapper - the mapper doing the operation row - the result row from the database imap - a dictionary that is storing the running set of objects collected from the current result set class_ - the class we are mapping. """ def append_result(self, mapper, session, row, imap, result, instance, isnew, populate_existing=False): """called when an object instance is being appended to a result list. If this method returns True, it is assumed that the mapper should do the appending, else if this method returns False, it is assumed that the append was handled by this method. mapper - the mapper doing the operation row - the result row from the database imap - a dictionary that is storing the running set of objects collected from the current result set result - an instance of util.HistoryArraySet(), which may be an attribute on an object if this is a related object load (lazy or eager). use result.append_nohistory(value) to append objects to this list. instance - the object instance to be appended to the result isnew - indicates if this is the first time we have seen this object instance in the current result set. if you are selecting from a join, such as an eager load, you might see the same object instance many times in the same result set. populate_existing - usually False, indicates if object instances that were already in the main identity map, i.e. were loaded by a previous select(), get their attributes overwritten """ def populate_instance(self, mapper, session, instance, row, identitykey, imap, isnew): """called right before the mapper, after creating an instance from a row, passes the row to its MapperProperty objects which are responsible for populating the object's attributes. If this method returns True, it is assumed that the mapper should do the appending, else if this method returns False, it is assumed that the append was handled by this method. Essentially, this method is used to have a different mapper populate the object: def populate_instance(self, mapper, session, instance, row, identitykey, imap, isnew): othermapper.populate_instance(session, instance, row, identitykey, imap, isnew, frommapper=mapper) return True """ def before_insert(self, mapper, connection, instance): """called before an object instance is INSERTed into its table. this is a good place to set up primary key values and such that arent handled otherwise.""" def before_update(self, mapper, connection, instance): """called before an object instnace is UPDATED""" def after_update(self, mapper, connection, instance): """called after an object instnace is UPDATED""" def after_insert(self, mapper, connection, instance): """called after an object instance has been INSERTed""" def before_delete(self, mapper, connection, instance): """called before an object instance is DELETEed""" def after_delete(self, mapper, connection, instance): """called after an object instance is DELETEed"""
To use MapperExtension, make your own subclass of it and just send it off to a mapper:
m = mapper(User, users_table, extension=MyExtension())
Multiple extensions will be chained together and processed in order; they are specified as a list:
m = mapper(User, users_table, extension=[ext1, ext2, ext3])
The package sqlalchemy.types
defines the datatype identifiers which may be used when defining metadata. This package includes a set of generic types, a set of SQL-specific subclasses of those types, and a small extension system used by specific database connectors to adapt these generic types into database-specific type objects.
SQLAlchemy comes with a set of standard generic datatypes, which are defined as classes.
The standard set of generic types are:
class String(TypeEngine): def __init__(self, length=None) class Integer(TypeEngine) class SmallInteger(Integer) class Numeric(TypeEngine): def __init__(self, precision=10, length=2) class Float(Numeric): def __init__(self, precision=10) # DateTime, Date and Time types deal with datetime objects from the Python datetime module class DateTime(TypeEngine) class Date(TypeEngine) class Time(TypeEngine) class Binary(TypeEngine): def __init__(self, length=None) class Boolean(TypeEngine) # converts unicode strings to raw bytes # as bind params, raw bytes to unicode as # rowset values, using the unicode encoding # setting on the engine (defaults to 'utf-8') class Unicode(TypeDecorator): impl = String # uses the pickle protocol to serialize data # in/out of Binary columns class PickleType(TypeDecorator): impl = Binary
More specific subclasses of these types are available, which various database engines may choose to implement specifically, allowing finer grained control over types:
class FLOAT(Numeric) class TEXT(String) class DECIMAL(Numeric) class INT(Integer) INTEGER = INT class TIMESTAMP(DateTime) class DATETIME(DateTime) class CLOB(String) class VARCHAR(String) class CHAR(String) class BLOB(Binary) class BOOLEAN(Boolean)
When using a specific database engine, these types are adapted even further via a set of database-specific subclasses defined by the database engine. There may eventually be more type objects that are defined for specific databases. An example of this would be Postgres' Array type.
Type objects are specified to table meta data using either the class itself, or an instance of the class. Creating an instance of the class allows you to specify parameters for the type, such as string length, numerical precision, etc.:
mytable = Table('mytable', engine, # define type using a class Column('my_id', Integer, primary_key=True), # define type using an object instance Column('value', Number(7,4)) )
User-defined types can be created, to support either database-specific types, or customized pre-processing of query parameters as well as post-processing of result set data. You can make your own classes to perform these operations. To augment the behavior of a TypeEngine
type, such as String
, the TypeDecorator
class is used:
import sqlalchemy.types as types class MyType(types.TypeDecorator): """basic type that decorates String, prefixes values with "PREFIX:" on the way in and strips it off on the way out.""" impl = types.String def convert_bind_param(self, value, engine): return "PREFIX:" + value def convert_result_value(self, value, engine): return value[7:]
The Unicode
and PickleType
classes are instances of TypeDecorator
already and can be subclassed directly.
To build a type object from scratch, which will not have a corresponding database-specific implementation, subclass TypeEngine
:
import sqlalchemy.types as types class MyType(types.TypeEngine): def __init__(self, precision = 8): self.precision = precision def get_col_spec(self): return "MYTYPE(%s)" % self.precision def convert_bind_param(self, value, engine): return value def convert_result_value(self, value, engine): return value
This section describes the connection pool module of SQLAlchemy. The Pool
object it provides is normally embedded within an Engine
instance. For most cases, explicit access to the pool module is not required. However, the Pool
object can be used on its own, without the rest of SA, to manage DBAPI connections; this section describes that usage. Also, this section will describe in more detail how to customize the pooling strategy used by an Engine
.
At the base of any database helper library is a system of efficiently acquiring connections to the database. Since the establishment of a database connection is typically a somewhat expensive operation, an application needs a way to get at database connections repeatedly without incurring the full overhead each time. Particularly for server-side web applications, a connection pool is the standard way to maintain a "pool" of database connections which are used over and over again among many requests. Connection pools typically are configured to maintain a certain "size", which represents how many connections can be used simultaneously without resorting to creating more newly-established connections.
Any DBAPI module can be "proxied" through the connection pool using the following technique (note that the usage of 'psycopg2' is just an example; substitute whatever DBAPI module you'd like):
import sqlalchemy.pool as pool import psycopg2 as psycopg psycopg = pool.manage(psycopg) # then connect normally connection = psycopg.connect(database='test', username='scott', password='tiger')
This produces a sqlalchemy.pool.DBProxy
object which supports the same connect()
function as the original DBAPI module. Upon connection, a connection proxy object is returned, which delegates its calls to a real DBAPI connection object. This connection object is stored persistently within a connection pool (an instance of sqlalchemy.pool.Pool
) that corresponds to the exact connection arguments sent to the connect()
function.
The connection proxy supports all of the methods on the original connection object, most of which are proxied via __getattr__()
. The close()
method will return the connection to the pool, and the cursor()
method will return a proxied cursor object. Both the connection proxy and the cursor proxy will also return the underlying connection to the pool after they have both been garbage collected, which is detected via the __del__()
method.
Additionally, when connections are returned to the pool, a rollback()
is issued on the connection unconditionally. This is to release any locks still held by the connection that may have resulted from normal activity.
By default, the connect()
method will return the same connection that is already checked out in the current thread. This allows a particular connection to be used in a given thread without needing to pass it around between functions. To disable this behavior, specify use_threadlocal=False
to the manage()
function.
When proxying a DBAPI module through the pool
module, options exist for how the connections should be pooled:
poolclass=SingletonThreadPool
, in which case the use_threadlocal parameter is not used.
Queue.Queue
class to maintain a list of available connections. A developer can supply his or her own Pool class to supply a different pooling algorithm. Also included is the SingletonThreadPool
, which provides a single distinct connection per thread and is required with SQLite.
QueuePool
as well as SingletonThreadPool
as of 0.2.7 - the size of the pool to be maintained. This is the largest number of connections that will be kept persistently in the pool. Note that the pool begins with no connections; once this number of connections is requested, that number of connections will remain.
QueuePool
- the maximum overflow size of the pool. When the number of checked-out connections reaches the size set in pool_size, additional connections will be returned up to this limit. When those additional connections are returned to the pool, they are disconnected and discarded. It follows then that the total number of simultaneous connections the pool will allow is pool_size
+ max_overflow
, and the total number of "sleeping" connections the pool will allow is pool_size
. max_overflow
can be set to -1 to indicate no overflow limit; no limit will be placed on the total number of concurrent connections.
QueuePool
- the timeout before giving up on returning a connection, if none are available and the max_overflow
has been reached.
One level below using a DBProxy to make transparent pools is creating the pool yourself. The pool module comes with two implementations of connection pools: QueuePool
and SingletonThreadPool
. While QueuePool
uses Queue.Queue
to provide connections, SingletonThreadPool
provides a single per-thread connection which SQLite requires.
Constructing your own pool involves passing a callable used to create a connection. Through this method, custom connection schemes can be made, such as a connection that automatically executes some initialization commands to start. The options from the previous section can be used as they apply to QueuePool
or SingletonThreadPool
.
import sqlalchemy.pool as pool import psycopg2 def getconn(): c = psycopg2.connect(username='ed', host='127.0.0.1', dbname='test') # execute an initialization function on the connection before returning c.cursor.execute("setup_encodings()") return c p = pool.QueuePool(getconn, max_overflow=10, pool_size=5, use_threadlocal=True)
Or with SingletonThreadPool:
import sqlalchemy.pool as pool import sqlite def getconn(): return sqlite.connect(filename='myfile.db') # SQLite connections require the SingletonThreadPool p = pool.SingletonThreadPool(getconn)
SQLAlchemy has a variety of extensions and "mods" available which provide extra functionality to SA, either via explicit usage or by augmenting the core behavior.
Author: Mike Bayer and Daniel Miller
Establishes threadlocal
as the default strategy for new ComposedSQLEngine
objects, installs a threadlocal SessionContext
that is attached to all Mappers via a global MapperExtension
, and establishes the global SessionContext
under the name sqlalchemy.objectstore
. Usually this is used in combination with Tables
that are associated with BoundMetaData
or DynamicMetaData
, so that the Session
does not need to be bound to any Engine
explicitly.
import sqlalchemy.mods.threadlocal from sqlalchemy import * metadata = BoundMetaData('sqlite:///') user_table = Table('users', metadata, Column('user_id', Integer, primary_key=True), Column('user_name', String(50), nullable=False) ) class User(object): pass mapper(User, user_table) # thread local session session = objectstore.get_session() # "user" object is added to the session automatically user = User() session.flush()
All Mapper
objects constructed after the threadlocal
import will receive a default MapperExtension
which implements the get_session()
method, returning the Session
that is associated with the current thread by the global SessionContext
. All newly constructed objects will automatically be attached to the Session
corresponding to the current thread, i.e. they will skip the "transient" state and go right to "pending".
This occurs because when a Mapper
is first constructed for a class, it decorates the classes' __init__()
method in a manner like the following:
oldinit = class_.__init__ # the previous init method def __init__(self): session = ext.get_session() # get Session from this Mapper's MapperExtension if session is EXT_PASS: session = None if session is not None: session.save(self) # attach to the current session oldinit(self) # call previous init method
An instance can be redirected at construction time to a different Session
by specifying the keyword parameter _sa_session
:
session = create_session() # create a new session distinct from the thread-local session myuser = User(_sa_session=session) # make a new User that is saved to this session
Similarly, the entity_name parameter, which specifies an alternate Mapper
to be used when attaching this instance to the Session
, can be specified via _sa_entity_name
:
myuser = User(_sa_session=session, _sa_entity_name='altentity')
The MapperExtension
object's get_session()
method is also used by the Query
object to locate a Session
with which to store newly loaded instances, if the Query
is not already associated with a specific Session
. As a result, the Query
can be constructed standalone from a mapper or class:
# create a Query from a class query = Query(User) # specify entity name query = Query(User, entity_name='foo') # create a Query from a mapper query = Query(mapper)
The objectstore
is an instance of SessionContext
, available in the sqlalchemy
namespace which provides a proxy to the underlying Session
bound to the current thread. objectstore
can be treated just like the Session
itself:
objectstore.save(instance) objectstore.flush() objectstore.clear()
With get_session()
handling the details of providing a Session
in all cases, the assign_mapper
function provides some of the functionality of Query
and Session
directly off the mapped instances themselves. This is a "monkeypatch" function that creates a primary mapper, attaches the mapper to the class, and also the methods get
, select
, select_by
, selectone
, get_by
, join_to
, join_via
, flush
, delete
, expire
, refresh
, expunge
, merge
, update
, save
, and save_or_update
:
# "assign" a mapper to the User class/users table assign_mapper(User, users) # methods are attached to the class for selecting userlist = User.select_by(user_id=12) myuser = User.get(1) # mark an object as deleted for the next commit myuser.delete() # flush the changes on a specific object myotheruser.flush()
A more generic version of assign_mapper
that works with any SessionContext
is available in the assignmapper plugin.
The threadlocal
mod also establishes threadlocal
as the default strategy when calling the create_engine()
function. This strategy is specified by the strategy
keyword argument to create_engine()
and can still be overridden to be "plain
" or "threadlocal
" explicitly.
An Engine
created with the threadlocal
strategy will use a thread-locally managed connection object for all implicit statement executions and schema operations. Recall from Database Engines that an implicit execution is an execution where the Connection
object is opened and closed internally, and the connect()
method on Engine
is not used; such as:
result = table.select().execute()
Above, the result
variable holds onto a ResultProxy
which is still referencing a connection returned by the connection pool. threadlocal
strategy means that a second execute()
statement in the same thread will use the same connection as the one referenced by result
, assuming result
is still referenced in memory.
The Mapper
, Session
, and Query
implementations work equally well with either the default
or threadlocal
engine strategies. However, using the threadlocal
strategy means that Session
operations will use the same underlying connection as that of straight execute()
calls with constructed SQL objects:
# assume "threadlocal" strategy is enabled, and there is no transaction in progress result = table.select().execute() # 'result' references a DBAPI connection, bound to the current thread object = session.select() # the 'select' operation also uses the current thread's connection, # i.e. the same connection referenced by 'result' result.close() # return the connection to the pool. now there is no connection # associated with the current thread. the next execution will re-check out a # connection and re-attach to the current thread.
Author: Daniel Miller
This plugin is a generalized version of the objectstore
object provided by the threadlocal
plugin:
import sqlalchemy from sqlalchemy.ext.sessioncontext import SessionContext ctx = SessionContext(sqlalchemy.create_session) class User(object): pass mapper(User, users_table, extension=ctx.mapperextension) # 'u' is automatically added to the current session of 'ctx' u = User() # get the current session and flush ctx.current.flush()
The construction of each Session
instance can be customized by providing a "creation function" which returns a new Session
. The "scope" to which the session is associated, which by default is the current thread, can be customized by providing a "scope callable" which returns a hashable key that represents the current scope:
import sqlalchemy from sqlalchemy.ext.sessioncontext import SessionContext # create an engine someengine = sqlalchemy.create_engine('sqlite:///') # a function to return a Session bound to our engine def make_session(): return sqlalchemy.create_session(bind_to=someengine) # global declaration of "scope" scope = "scope1" # a function to return the current "session scope" def global_scope_func(): return scope # create SessionContext with our two functions ctx = SessionContext(make_session, scopefunc=global_scope_func) # get the session corresponding to "scope1", bound to engine "someengine": session = ctx.current # switch the "scope" scope = "scope2" # get the session corresponding to "scope2", bound to engine "someengine": session = ctx.current
Author: Mike Bayer
This is a generic version of the assign_mapper
function present in the threadlocal mod. It requires an explicit SessionContext.
import sqlalchemy from sqlalchemy.ext.sessioncontext import SessionContext from sqlalchemy.ext.assignmapper import assign_mapper # session context ctx = SessionContext(sqlalchemy.create_session) # assign mapper to class MyClass using table 'sometable', getting # Sessions from 'ctx'. assign_mapper(ctx, MyClass, sometable)
Author: Jonathan LaCour
ActiveMapper is a so-called "declarative layer" which allows the construction of a class, a Table
, and a Mapper
all in one step:
class Person(ActiveMapper): class mapping: id = column(Integer, primary_key=True) full_name = column(String) first_name = column(String) middle_name = column(String) last_name = column(String) birth_date = column(DateTime) ssn = column(String) gender = column(String) home_phone = column(String) cell_phone = column(String) work_phone = column(String) prefs_id = column(Integer, foreign_key=ForeignKey('preferences.id')) addresses = one_to_many('Address', colname='person_id', backref='person') preferences = one_to_one('Preferences', colname='pref_id', backref='person') def __str__(self): s = '%s\n' % self.full_name s += ' * birthdate: %s\n' % (self.birth_date or 'not provided') s += ' * fave color: %s\n' % (self.preferences.favorite_color or 'Unknown') s += ' * personality: %s\n' % (self.preferences.personality_type or 'Unknown') for address in self.addresses: s += ' * address: %s\n' % address.address_1 s += ' %s, %s %s\n' % (address.city, address.state, address.postal_code) return s class Preferences(ActiveMapper): class mapping: __table__ = 'preferences' id = column(Integer, primary_key=True) favorite_color = column(String) personality_type = column(String) class Address(ActiveMapper): class mapping: id = column(Integer, primary_key=True) type = column(String) address_1 = column(String) city = column(String) state = column(String) postal_code = column(String) person_id = column(Integer, foreign_key=ForeignKey('person.id'))
More discussion on ActiveMapper can be found at Jonathan LaCour's Blog as well as the SQLAlchemy Wiki.
Author: Jonathan Ellis
SqlSoup creates mapped classes on the fly from tables, which are automatically reflected from the database based on name. It is essentially a nicer version of the "row data gateway" pattern.
>>> from sqlalchemy.ext.sqlsoup import SqlSoup >>> soup = SqlSoup('sqlite:///') >>> users = soup.users.select() >>> users.sort() >>> users [Class_Users(name='Bhargan Basepair',email='basepair@example.edu',password='basepair',classname=None,admin=1), Class_Users(name='Joe Student',email='student@example.edu',password='student',classname=None,admin=0)]
Read more about SqlSoup on Jonathan Ellis' Blog.
Author: Jason Pellerin
The ProxyEngine
is used to "wrap" an Engine
, and via subclassing ProxyEngine
one can instrument the functionality of an arbitrary Engine
instance through the decorator pattern. It also provides a connect()
method which will send all Engine
requests to different underlying engines. Its functionality in that regard is largely superceded now by DynamicMetaData
which is a better solution.
from sqlalchemy.ext.proxy import ProxyEngine proxy = ProxyEngine() proxy.connect('postgres://user:pw@host/db')
Author: Jonas Borgström
SelectResults gives transformative behavior to the results returned from the select
and select_by
method of Query
. It supports three modes of operation; per-query, per-mapper, and per-application.
from sqlalchemy.ext.selectresults import SelectResults query = session.query(MyClass) res = SelectResults(query, table.c.column == "something") res = res.order_by([table.c.column]) #add an order clause for x in res[:10]: # Fetch and print the top ten instances print x.column2 x = list(res) # execute the query # Count how many instances that have column2 > 42 # and column == "something" print res.filter(table.c.column2 > 42).count()
Per mapper:
from sqlalchemy.ext.selectresults import SelectResultsExt mapper(MyClass, mytable, extension=SelectResultsExt()) session.query(MyClass).select(mytable.c.column=="something").order_by([mytable.c.column])[2:7]
Or across an application via the selectresults
mod:
import sqlalchemy.mods.selectresults mapper(MyClass, mytable) session.query(MyClass).select(mytable.c.column=="something").order_by([mytable.c.column])[2:7]
For a full listing of methods, see the generated documentation.
defines the base components of SQL expression trees.
represents a 'thing that can produce Compiled objects and execute them'.
represents the behavior of a particular database. Used by Compiled objects.
represents a dictionary/iterator of bind parameter key names/values. Includes parameters compiled with a Compiled object as well as additional arguments passed to the Compiled object's get_params() method. Parameter values will be converted as per the TypeEngine objects present in the bind parameter objects. The non-converted value can be retrieved via the get_original method. For Compiled objects that compile positional parameters, the values() iteration of the object will return the parameter values in the correct order.
represents a compiled SQL expression. the __str__ method of the Compiled object should produce the actual text of the statement. Compiled objects are specific to the database library that created them, and also may or may not be specific to the columns referenced within a particular set of bind parameters. In no case should the Compiled object be dependent on the actual values of those bind parameters, even though it may reference those values as defaults.
base class for elements of a programmatically constructed SQL expression.
represents a textual column clause in a SQL statement. May or may not be bound to an underlying Selectable.
the schema module provides the building blocks for database metadata. This means all the entities within a SQL database that we might want to look at, modify, or create and delete are described by these objects, in a database-agnostic way.
A structure of SchemaItems also provides a "visitor" interface which is the primary method by which other methods operate upon the schema. The SQL package extends this structure with its own clause-specific objects as well as the visitor interface, so that the schema package "plugs in" to the SQL package.
builds upon MetaData to provide the capability to bind to an Engine implementation.
represents a column in a database table. this is a subclass of sql.ColumnClause and represents an actual existing table in the database, in a similar fashion as TableClause/Table.
A plain default value on a column. this could correspond to a constant, a callable function, or a SQL clause.
builds upon MetaData to provide the capability to bind to multiple Engine implementations on a dynamically alterable, thread-local basis.
defines a column-level ForeignKey constraint between two columns.
ForeignKey is specified as an argument to a Column object.
One or more ForeignKey objects are used within a ForeignKeyConstraint object which represents the table-level constraint definition.
table-level foreign key constraint, represents a colleciton of ForeignKey objects.
Represents an index of columns from a database table
represents a collection of Tables and their associated schema constructs.
a default that takes effect on the database side
represents a sequence, which applies to Oracle and Postgres databases.
represents a relational database table. This subclasses sql.TableClause to provide a table that is "wired" to an engine. Whereas TableClause represents a table as its used in a SQL expression, Table represents a table as its created in the database.
Be sure to look at sqlalchemy.sql.TableImpl for additional methods defined on a Table.
interface for an object that can provide an Engine and a Connection object which correponds to that Engine.
Connects a ConnectionProvider, a Dialect and a CompilerFactory together to provide a default implementation of SchemaEngine.
represents a single DBAPI connection returned from the underlying connection pool. Provides execution support for string-based SQL statements as well as ClauseElement, Compiled and DefaultGenerator objects. provides a begin method to return Transaction objects.
Adds behavior to the execution of queries to provide support for column defaults, differences between paramstyles, quirks between post-execution behavior, and a general consistentization of the behavior of various DBAPIs.
The Dialect should also implement the following two attributes:
positional - True if the paramstyle for this Dialect is positional
paramstyle - the paramstyle to be used (some DBAPIs support multiple paramstyles)
supports_autoclose_results - usually True; if False, indicates that rows returned by fetchone() might not be just plain tuples, and may be "live" proxy objects which still require the cursor to be open in order to be read (such as pyPgSQL which has active filehandles for BLOBs). in that case, an auto-closing ResultProxy cannot automatically close itself after results are consumed.
convert_unicode - True if unicode conversion should be applied to all str types
encoding - type of encoding to use for unicode, usually defaults to 'utf-8'
defines an interface that returns raw Connection objects (or compatible).
a messenger object for a Dialect that corresponds to a single execution. The Dialect should provide an ExecutionContext via the create_execution_context() method. The pre_exec and post_exec methods will be called for compiled statements, afterwhich it is expected that the various methods last_inserted_ids, last_inserted_params, etc. will contain appropriate values, if applicable.
wraps a DBAPI cursor object to provide access to row columns based on integer position, case-insensitive column name, or by schema.Column object. e.g.:
row = fetchone()
col1 = row[0] # access via integer position
col2 = row['col2'] # access via name
col3 = row[mytable.c.mycol] # access via Column object.
ResultProxy also contains a map of TypeEngine objects and will invoke the appropriate convert_result_value() method before returning columns.
defines different strategies for creating new instances of sql.Engine. by default there are two, one which is the "thread-local" strategy, one which is the "plain" strategy. new strategies can be added via constructing a new EngineStrategy object which will add itself to the list of available strategies here, or replace one of the existing name. this can be accomplished via a mod; see the sqlalchemy/mods package for details.
defines a function that receives input arguments and produces an instance of sql.Engine, typically an instance sqlalchemy.engine.base.ComposedSQLEngine or a subclass.
the mapper package provides object-relational functionality, building upon the schema and sql packages and tying operations to class properties and constructors.
Persists object instances to and from schema.Table objects via the sql package. Instances of this class should be constructed through this package's mapper() or relation() function.
base implementation for an object that provides overriding behavior to various Mapper functions. For each method in MapperExtension, a result of EXT_PASS indicates the functionality is not overridden.
encapsulates the object-fetching operations provided by Mappers.
encapsulates a set of objects being operated upon within an object-relational operation.
provides a connection pool implementation, which optionally manages connections on a thread local basis. Also provides a DBAPI2 transparency layer so that pools can be managed automatically, based on module type and connect arguments, simply by calling regular DBAPI connect() methods.
proxies a DBAPI2 connect() call to a pooled connection keyed to the specific connect parameters.
uses Queue.Queue to maintain a fixed-size list of connections.
Maintains one connection per each thread, never moving to another thread. this is used for SQLite.
A simple wrapper for ScopedRegistry that provides a "current" property which can be used to get, set, or remove the session in the current scope.
By default this object provides thread-local scoping, which is the default scope provided by sqlalchemy.util.ScopedRegistry.
Usage: engine = create_engine(...) def session_factory(): return Session(bind_to=engine) context = SessionContext(session_factory)
s = context.current # get thread-local session context.current = Session(bind_to=other_engine) # set current session del context.current # discard the thread-local session (a new one will # be created on the next call to context.current)
a mapper extionsion that provides sessions to a mapper using SessionContext
Builds a query one component at a time via separate method calls, each call transforming the previous SelectResults instance into a new SelectResults instance with further limiting criterion added. When interpreted in an iterator context (such as via calling list(selectresults)), executes the query.
a MapperExtension that provides SelectResults functionality for the results of query.select_by() and query.select()
raised for all those conditions where invalid arguments are sent to constructed objects. This error generally corresponds to construction time state errors.
corresponds to internal state being detected in an invalid state
sqlalchemy was asked to do something it cant do, return nonexistent data, etc. This error generally corresponds to runtime state errors.
sqlalchemy was asked to load a table's definition from the database, but the table doesn't exist.
raised when the execution of a SQL statement fails. includes accessors for the underlying exception, as well as the SQL and bind parameters
raised when a connection pool times out on getting a connection
An SQLEngine proxy that automatically connects when necessary.
Engine proxy for lazy and late initialization.
This engine will delegate access to a real engine set with connect().