| misc

Want to store some JSON in a database with SQLAlchemy?

TL;DR: scroll down for the final solution.

First attempt

Many RDBMS (MySQL included) do not natively support JSON types sadface.

We can use sqlalchemy.types.TypeDecorator to create a custom data type that implicitly casts itself from dict to str and back. This is also called ‘coercing’ (implicitly casting data types).

The column in the database is VARCHAR or TEXT (a string) but in Python, it is a dictionary.

In the following snippet, the class TextPickleType is responsible for translating between the two:

import json
import sqlalchemy
from sqlalchemy.types import TypeDecorator

SIZE = 256

class TextPickleType(TypeDecorator):
    impl = sqlalchemy.Text(SIZE)

    def process_bind_param(self, value, dialect):
        if value is not None:
            value = json.dumps(value)

        return value

    def process_result_value(self, value, dialect):
        if value is not None:
            value = json.loads(value)
        return value

class SomeModel(Base):
    __tablename__ = 'the_table'
    id = Column(Integer, primary_key=True)
    json_field = Column(TextPickleType())

sm = SomeModel(json_field={'data': {'nested': 'value'}})
session.add(sm)
session.commit()
session.flush()

The problem with this approach is that SQLAlchemy does not automatically track changes to the dictionary for you:

# does not work
sm = session.query(SomeModel).first()
sm.json_field['data']['nested'] = 'value2'
session.commit()
session.flush()

To get around this problem, you could manually mark the object as ‘dirty’ so SQLAlchemy knows it needs to be updated:

from sqlalchemy.orm.attributes import flag_modified

# works
sm = session.query(SomeModel).first()
sm.json_field['data']['nested'] = 'value2'

flag_modified(sm, 'json_field')

session.commit()
session.flush()

Manually marking objects as dirty is no fun at all, so let’s make a class that does this for us.

Tracking changes

We’ll introduce an instance of sqlalchemy.ext.mutable.Mutable, which is a handy class that will track changes to the dictionary by automatically setting flag_modified:

import json

import sqlalchemy
from sqlalchemy.ext.mutable import Mutable
from sqlalchemy.types import TypeDecorator

SIZE = 256

class TextPickleType(TypeDecorator):
    impl = sqlalchemy.Text(SIZE)

    def process_bind_param(self, value, dialect):
        if value is not None:
            value = json.dumps(value)

        return value

    def process_result_value(self, value, dialect):
        if value is not None:
            value = json.loads(value)
        return value

class MutableDict(Mutable, dict):
    @classmethod
    def coerce(cls, key, value):
        "Convert plain dictionaries to MutableDict."

        if not isinstance(value, MutableDict):
            if isinstance(value, dict):
                return MutableDict(value)
            return Mutable.coerce(key, value)
        else:
            return value

    def __setitem__(self, key, value):
        "Detect dictionary set events and emit change events."
        dict.__setitem__(self, key, value)
        self.changed()

    def __delitem__(self, key):
        "Detect dictionary del events and emit change events."
        dict.__delitem__(self, key)
        self.changed()


class SomeModel(Base):
    __tablename__ = 'the_table'
    id = Column(Integer, primary_key=True)
    json_field = Column(MutableDict.as_mutable(TextPickleType))

But unfortunately, this only works for the first key(s) in the column json_field:

# works
sm = session.query(SomeModel).first()
sm.json_field['data'] = 'value2'

# does not work
sm = session.query(SomeModel).first()
sm.json_field['data']['nested'] = 'value2'

So how can we track changes to nested keys and values?

Tracking nested changes

The following blogpost (link) by Elmer de Looff is dedicated to ‘deep tracking’ of objects so that SQLAlchemy knows they’ve been mutated. Elmer made a library for this called SQLAlchemy-json.

pip install sqlalchemy-json

It has the following features:

  • Track nested changes
  • Uses JSONType from the package sqlalchemy_utils to provide support depending on the underlying RDBMS implementation (coercing when JSON is not natively supported).
  • A remove function for TrackedList

The following full snippet should be everything you need to store JSON in a database:

import json
import os

from sqlalchemy import Column, Integer, create_engine
from sqlalchemy.orm import sessionmaker
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy_utils import JSONType
from sqlalchemy_utils import force_auto_coercion
from sqlalchemy_json import NestedMutable
force_auto_coercion()

if os.path.isfile('/tmp/test.db'):
    os.remove('/tmp/test.db')
ENGINE = create_engine('sqlite:////tmp/test.db')

Base = declarative_base()


class SomeModel(Base):
    __tablename__ = 'the_table'
    id = Column(Integer, primary_key=True)
    json_field = Column(NestedMutable.as_mutable(JSONType))

Example usage:

# create tables
Base.metadata.create_all(bind=ENGINE)

# create session
Session = sessionmaker(autoflush=False, autocommit=False)
Session.configure(bind=ENGINE)
session = Session()

# adds a database obj
sm = SomeModel(json_field={'data': {'nested': 'value'}})
session.add(sm)
session.commit()
session.flush()

# verify that its there
sm = session.query(SomeModel).first()
print(json.dumps(sm.json_field, indent=4, sort_keys=True))

# modify a nested value
sm = session.query(SomeModel).first()
sm.json_field['data']['nested'] = 'value2'
session.commit()
session.flush()

# observe the change
sm = session.query(SomeModel).first()
print(json.dumps(sm.json_field, indent=4, sort_keys=True))

One last thing; please be aware that this code won’t work on Postgres 9.1 and lower, as the JSON type was introduced in 9.2. It works fine for the rest of the databases that SQLAlchemy supports.

Have fun!