The core of SQLAlchemy's query and object mapping operations is database metadata, which are Python objects that describe tables and other schema-level objects. Metadata objects can be created by explicitly naming the various components and their properties, using the Table, Column, ForeignKey, Index, and Sequence objects imported from sqlalchemy.schema
. There is also support for reflection, which means you only specify the name of the entities and they are recreated from the database automatically.
A collection of metadata entities is stored in an object aptly named MetaData
. This object takes an optional name
parameter:
from sqlalchemy import * metadata = MetaData(name='my metadata')
Then to construct a Table, use the Table
class:
users = Table('users', metadata, Column('user_id', Integer, primary_key = True), Column('user_name', String(16), nullable = False), Column('email_address', String(60), key='email'), Column('password', String(20), nullable = False) ) user_prefs = Table('user_prefs', metadata, Column('pref_id', Integer, primary_key=True), Column('user_id', Integer, ForeignKey("users.user_id"), nullable=False), Column('pref_name', String(40), nullable=False), Column('pref_value', String(100)) )
The specific datatypes for each Column, such as Integer, String, etc. are described in The Types System, and exist within the module sqlalchemy.types
as well as the global sqlalchemy
namespace.
Foreign keys are most easily specified by the ForeignKey
object within a Column
object. For a composite foreign key, i.e. a foreign key that contains multiple columns referencing multiple columns to a composite primary key, an explicit syntax is provided which allows the correct table CREATE statements to be generated:
# a table with a composite primary key invoices = Table('invoices', metadata, Column('invoice_id', Integer, primary_key=True), Column('ref_num', Integer, primary_key=True), Column('description', String(60), nullable=False) ) # a table with a composite foreign key referencing the parent table invoice_items = Table('invoice_items', metadata, Column('item_id', Integer, primary_key=True), Column('item_name', String(60), nullable=False), Column('invoice_id', Integer, nullable=False), Column('ref_num', Integer, nullable=False), ForeignKeyConstraint(['invoice_id', 'ref_num'], ['invoices.invoice_id', 'invoices.ref_num']) )
Above, the invoice_items
table will have ForeignKey
objects automatically added to the invoice_id
and ref_num
Column
objects as a result of the additional ForeignKeyConstraint
object.
The MetaData
object supports some handy methods, such as getting a list of Tables in the order (or reverse) of their dependency:
>>> for t in metadata.table_iterator(reverse=False): ... print t.name users user_prefs
And Table
provides an interface to the table's properties as well as that of its columns:
employees = Table('employees', metadata, Column('employee_id', Integer, primary_key=True), Column('employee_name', String(60), nullable=False, key='name'), Column('employee_dept', Integer, ForeignKey("departments.department_id")) ) # access the column "EMPLOYEE_ID": employees.columns.employee_id # or just employees.c.employee_id # via string employees.c['employee_id'] # iterate through all columns for c in employees.c: # ... # get the table's primary key columns for primary_key in employees.primary_key: # ... # get the table's foreign key objects: for fkey in employees.foreign_keys: # ... # access the table's MetaData: employees.metadata # access the table's Engine, if its MetaData is bound: employees.engine # access a column's name, type, nullable, primary key, foreign key employees.c.employee_id.name employees.c.employee_id.type employees.c.employee_id.nullable employees.c.employee_id.primary_key employees.c.employee_dept.foreign_key # get the "key" of a column, which defaults to its name, but can # be any user-defined string: employees.c.name.key # access a column's table: employees.c.employee_id.table is employees >>> True # get the table related by a foreign key fcolumn = employees.c.employee_dept.foreign_key.column.table
A MetaData object can be associated with one or more Engine instances. This allows the MetaData and the elements within it to perform operations automatically, using the connection resources of that Engine. This includes being able to "reflect" the columns of tables, as well as to perform create and drop operations without needing to pass an Engine
or Connection
around. It also allows SQL constructs to be created which know how to execute themselves (called "implicit execution").
To bind MetaData
to a single Engine
, use BoundMetaData
:
engine = create_engine('sqlite://', **kwargs) # create BoundMetaData from an Engine meta = BoundMetaData(engine) # create the Engine and MetaData in one step meta = BoundMetaData('postgres://db/', **kwargs)
Another form of MetaData
exists which allows connecting to any number of engines, within the context of the current thread. This is DynamicMetaData
:
meta = DynamicMetaData() meta.connect(engine) # connect to an existing Engine meta.connect('mysql://user@host/dsn') # create a new Engine and connect
DynamicMetaData
is ideal for applications that need to use the same set of Tables
for many different database connections in the same process, such as a CherryPy web application which handles multiple application instances in one process.
Some users prefer to create Table
objects without specifying a MetaData
object, having Tables scoped on an application-wide basis. For them the default_metadata
object and the global_connect()
function is supplied. default_metadata
is simply an instance of DynamicMetaData
that exists within the sqlalchemy
namespace, and global_connect()
is a synonym for default_metadata.connect()
. Defining a Table
that has no MetaData
argument will automatically use this default metadata as follows:
from sqlalchemy import * # a Table with just a name and its Columns mytable = Table('mytable', Column('col1', Integer, primary_key=True), Column('col2', String(40)) ) # connect all the "anonymous" tables to a postgres uri in the current thread global_connect('postgres://foo:bar@lala/test') # create all tables in the default metadata default_metadata.create_all() # the table is bound mytable.insert().execute(col1=5, col2='some value')
Once you have a BoundMetaData
or a connected DynamicMetaData
, you can create Table
objects without specifying their columns, just their names, using autoload=True
:
>>> messages = Table('messages', meta, autoload = True) >>> [c.name for c in messages.columns] ['message_id', 'message_name', 'date']
At the moment the Table is constructed, it will query the database for the columns and constraints of the messages
table.
Note that if a reflected table has a foreign key referencing another table, then the metadata for the related table will be loaded as well, even if it has not been defined by the application:
>>> shopping_cart_items = Table('shopping_cart_items', meta, autoload = True) >>> print shopping_cart_items.c.cart_id.table.name shopping_carts
To get direct access to 'shopping_carts', simply instantiate it via the Table constructor. You'll get the same instance of the shopping cart Table as the one that is attached to shoppingcartitems:
>>> shopping_carts = Table('shopping_carts', meta) >>> shopping_carts is shopping_cart_items.c.cart_id.table.name True
This works because when the Table constructor is called for a particular name and MetaData
object, if the table has already been created then the instance returned will be the same as the original. This is a singleton constructor:
>>> news_articles = Table('news', meta, ... Column('article_id', Integer, primary_key = True), ... Column('url', String(250), nullable = False) ... ) >>> othertable = Table('news', meta) >>> othertable is news_articles True
Individual columns can be overridden with explicit values when reflecting tables; this is handy for specifying custom datatypes, constraints such as primary keys that may not be configured within the database, etc.
>>> mytable = Table('mytable', meta, ... Column('id', Integer, primary_key=True), # override reflected 'id' to have primary key ... Column('mydata', Unicode(50)), # override reflected 'mydata' to be Unicode ... autoload=True)
Some databases support the concept of multiple schemas. A Table
can reference this by specifying the schema
keyword argument:
financial_info = Table('financial_info', meta, Column('id', Integer, primary_key=True), Column('value', String(100), nullable=False), schema='remote_banks' )
Within the MetaData
collection, this table will be identified by the combination of financial_info
and remote_banks
. If another table called financial_info
is referenced without the remote_banks
schema, it will refer to a different Table
. ForeignKey
objects can reference columns in this table using the form remote_banks.financial_info.id
.
Tables
may support database-specific options, such as MySQL's engine
option that can specify "MyISAM", "InnoDB", and other backends for the table:
addresses = Table('engine_email_addresses', meta, Column('address_id', Integer, primary_key = True), Column('remote_user_id', Integer, ForeignKey(users.c.user_id)), Column('email_address', String(20)), mysql_engine='InnoDB' )
Creating and dropping individual tables can be done via the create()
and drop()
methods of Table
; these methods take an optional engine
parameter which references an Engine
or a Connection
. If not supplied, the Engine
bound to the MetaData
will be used, else an error is raised:
meta = BoundMetaData('sqlite:///:memory:') employees = Table('employees', meta, Column('employee_id', Integer, primary_key=True), Column('employee_name', String(60), nullable=False, key='name'), Column('employee_dept', Integer, ForeignKey("departments.department_id")) ) sqlemployees.create()
drop()
method:
sqlemployees.drop(engine=e)
The create()
and drop()
methods also support an optional keyword argument checkfirst
which will issue the database's appropriate pragma statements to check if the table exists before creating or dropping:
employees.create(engine=e, checkfirst=True) employees.drop(checkfirst=False)
Entire groups of Tables can be created and dropped directly from the MetaData
object with create_all()
and drop_all()
. These methods always check for the existence of each table before creating or dropping. Each method takes an optional engine
keyword argument which can reference an Engine
or a Connection
. If no engine is specified, the underlying bound Engine
, if any, is used:
engine = create_engine('sqlite:///:memory:') metadata = MetaData() users = Table('users', metadata, Column('user_id', Integer, primary_key = True), Column('user_name', String(16), nullable = False), Column('email_address', String(60), key='email'), Column('password', String(20), nullable = False) ) user_prefs = Table('user_prefs', metadata, Column('pref_id', Integer, primary_key=True), Column('user_id', Integer, ForeignKey("users.user_id"), nullable=False), Column('pref_name', String(40), nullable=False), Column('pref_value', String(100)) ) sqlmetadata.create_all(engine=engine)
SQLAlchemy includes flexible constructs in which to create default values for columns upon the insertion of rows, as well as upon update. These defaults can take several forms: a constant, a Python callable to be pre-executed before the SQL is executed, a SQL expression or function to be pre-executed before the SQL is executed, a pre-executed Sequence (for databases that support sequences), or a "passive" default, which is a default function triggered by the database itself upon insert, the value of which can then be post-fetched by the engine, provided the row provides a primary key in which to call upon.
A basic default is most easily specified by the "default" keyword argument to Column. This defines a value, function, or SQL expression that will be pre-executed to produce the new value, before the row is inserted:
# a function to create primary key ids i = 0 def mydefault(): global i i += 1 return i t = Table("mytable", meta, # function-based default Column('id', Integer, primary_key=True, default=mydefault), # a scalar default Column('key', String(10), default="default") )
The "default" keyword can also take SQL expressions, including select statements or direct function calls:
t = Table("mytable", meta, Column('id', Integer, primary_key=True), # define 'create_date' to default to now() Column('create_date', DateTime, default=func.now()), # define 'key' to pull its default from the 'keyvalues' table Column('key', String(20), default=keyvalues.select(keyvalues.c.type='type1', limit=1)) )
The "default" keyword argument is shorthand for using a ColumnDefault object in a column definition. This syntax is optional, but is required for other types of defaults, futher described below:
Column('mycolumn', String(30), ColumnDefault(func.get_data()))
Similar to an on-insert default is an on-update default, which is most easily specified by the "onupdate" keyword to Column, which also can be a constant, plain Python function or SQL expression:
t = Table("mytable", meta, Column('id', Integer, primary_key=True), # define 'last_updated' to be populated with current_timestamp (the ANSI-SQL version of now()) Column('last_updated', DateTime, onupdate=func.current_timestamp()), )
To use an explicit ColumnDefault object to specify an on-update, use the "for_update" keyword argument:
Column('mycolumn', String(30), ColumnDefault(func.get_data(), for_update=True))
A PassiveDefault indicates an column default that is executed upon INSERT by the database. This construct is used to specify a SQL function that will be specified as "DEFAULT" when creating tables.
t = Table('test', meta, Column('mycolumn', DateTime, PassiveDefault("sysdate")) )
A create call for the above table will produce:
CREATE TABLE test ( mycolumn datetime default sysdate )
PassiveDefault also sends a message to the Engine
that data is available after an insert. The object-relational mapper system uses this information to post-fetch rows after the insert, so that instances can be refreshed with the new data. Below is a simplified version:
# table with passive defaults mytable = Table('mytable', engine, Column('my_id', Integer, primary_key=True), # an on-insert database-side default Column('data1', Integer, PassiveDefault("d1_func")), # an on-update database-side default Column('data2', Integer, PassiveDefault("d2_func", for_update=True)) ) # insert a row r = mytable.insert().execute(name='fred') # check the result: were there defaults fired off on that row ? if r.lastrow_has_defaults(): # postfetch the row based on primary key. # this only works for a table with primary key columns defined primary_key = r.last_inserted_ids() row = table.select(table.c.id == primary_key[0])
When Tables are reflected from the database using autoload=True
, any DEFAULT values set on the columns will be reflected in the Table object as PassiveDefault instances.
Current Postgres support does not rely upon OID's to determine the identity of a row. This is because the usage of OIDs has been deprecated with Postgres and they are disabled by default for table creates as of PG version 8. Pyscopg2's "cursor.lastrowid" function only returns OIDs. Therefore, when inserting a new row which has passive defaults set on the primary key columns, the default function is still pre-executed since SQLAlchemy would otherwise have no way of retrieving the row just inserted.
A table with a sequence looks like:
table = Table("cartitems", meta, Column("cart_id", Integer, Sequence('cart_id_seq'), primary_key=True), Column("description", String(40)), Column("createdate", DateTime()) )
The Sequence is used with Postgres or Oracle to indicate the name of a database sequence that will be used to create default values for a column. When a table with a Sequence on a column is created in the database by SQLAlchemy, the database sequence object is also created. Similarly, the database sequence is dropped when the table is dropped. Sequences are typically used with primary key columns. When using Postgres, if an integer primary key column defines no explicit Sequence or other default method, SQLAlchemy will create the column with the SERIAL keyword, and will pre-execute a sequence named "tablenamecolumnnameseq" in order to retrieve new primary key values, if they were not otherwise explicitly stated. Oracle, which has no "auto-increment" keyword, requires that a Sequence be created for a table if automatic primary key generation is desired.
A Sequence object can be defined on a Table that is then used for a non-sequence-supporting database. In that case, the Sequence object is simply ignored. Note that a Sequence object is entirely optional for all databases except Oracle, as other databases offer options for auto-creating primary key values, such as AUTOINCREMENT, SERIAL, etc. SQLAlchemy will use these default methods for creating primary key values if no Sequence is present on the table metadata.
A sequence can also be specified with optional=True
which indicates the Sequence should only be used on a database that requires an explicit sequence, and not those that supply some other method of providing integer values. At the moment, it essentially means "use this sequence only with Oracle and not Postgres".
Indexes can be defined on table columns, including named indexes, non-unique or unique, multiple column. Indexes are included along with table create and drop statements. They are not used for any kind of run-time constraint checking; SQLAlchemy leaves that job to the expert on constraint checking, the database itself.
boundmeta = BoundMetaData('postgres:///scott:tiger@localhost/test') mytable = Table('mytable', boundmeta, # define a unique index Column('col1', Integer, unique=True), # define a unique index with a specific name Column('col2', Integer, unique='mytab_idx_1'), # define a non-unique index Column('col3', Integer, index=True), # define a non-unique index with a specific name Column('col4', Integer, index='mytab_idx_2'), # pass the same name to multiple columns to add them to the same index Column('col5', Integer, index='mytab_idx_2'), Column('col6', Integer), Column('col7', Integer) ) # create the table. all the indexes will be created along with it. mytable.create() # indexes can also be specified standalone i = Index('mytab_idx_3', mytable.c.col6, mytable.c.col7, unique=False) # which can then be created separately (will also get created with table creates) i.create()
A Table
object created against a specific MetaData
object can be re-created against a new MetaData using the tometadata
method:
# create two metadata meta1 = BoundMetaData('sqlite:///querytest.db') meta2 = MetaData() # load 'users' from the sqlite engine users_table = Table('users', meta1, autoload=True) # create the same Table object for the plain metadata users_table_2 = users_table.tometadata(meta2)