Last updated Dec 8, 2017

Hibernate session and transaction management for bulk operations

These are guidelines related to the development of Confluence. The guidelines mainly apply to Atlassian employees, but reading them should provide insight for third-party plugin developers as well, so we decided to make them public.

This page describes the best practice for managing the Hibernate 2 flushing and clearing process when performing operations on large numbers of objects in Confluence. For general information about dealing with Hibernate and Spring in normal situations, see the Hibernate Sessions and Transaction Management Guidelines.

Understanding the underlying mechanisms Hibernate uses to track state is critical to understanding how this manual session and transaction management works. These details of Hibernate are described below, a quick overview and sample code showing how to work around the problem in practice.

The problem

One significant problem with ORMs like Hibernate is that they, by design, keep a reference to objects retrieved from the database for the length of the session. When you are dealing an operation on a large dataset, this means that the objects added or retrieved by Hibernate aren't eligible for garbage collection. In various places in Confluence, we work around this with manual session and transaction management to limit the amount of memory needed for bulk operations.

The solution: how to ensure memory is released from Hibernate

In order to ensure you don't retain objects in the Hibernate session, you need to:

  • commit the active transaction
    • this automatically "flushes the session", synchronising any changes made to Hibernate objects to the database, as well as committing those changes
  • clear the Hibernate session.

Using the native Hibernate and Spring library code, this amounts to the following code. You insert your batch processing logic inside the TransactionTemplate execute method.

1
2
import net.sf.hibernate.SessionFactory;
import org.springframework.orm.hibernate.SessionFactoryUtils;
import org.springframework.transaction.PlatformTransactionManager;
import org.springframework.transaction.TransactionDefinition;
import org.springframework.transaction.TransactionStatus;
import org.springframework.transaction.interceptor.DefaultTransactionAttribute;
import org.springframework.transaction.support.TransactionCallback;
import org.springframework.transaction.support.TransactionTemplate;

public class MyAction
{
    /** transaction settings that suspend the existing transaction and start a new one that will commit independently */
    private static final TransactionDefinition REQUIRES_NEW_TRANSACTION =
        new DefaultTransactionAttribute(TransactionDefinition.PROPAGATION_REQUIRES_NEW);

    private final SessionFactory sessionFactory;
    private final PlatformTransactionManager transactionManager;

    public MyAction(SessionFactory sessionFactory, PlatformTransactionManager transactionManager) {
        this.sessionFactory = sessionFactory;
        this.transactionManager = transactionManager;
    }

    public void execute() {
        // iterate over your batches
        for (final Object batch : batches)
        {
            new TransactionTemplate(transactionManager, REQUIRES_NEW_TRANSACTION).execute(new TransactionCallback()
            {
                @Override
                public Object doInTransaction(TransactionStatus status)
                {

                    // ... process batch of objects ...

                    return null;
                }
            });
            SessionFactoryUtils.getSession(sessionFactory, false).clear();
        }
    }
}

Committing the active transaction will ensure the data is flushed before committing. It will also ensure the executions list in the session doesn't maintain a reference to any persistent objects. Clearing the session will ensure that any objects attached to the session in the ID-to-object mappings will be no longer referenced. See below for more information about why these can cause problems.

In order to be confident that you are not committing changes made in the transaction by something higher in the stack, this code opens a new transaction with the propagation setting of REQUIRES_NEW. This suspends changes on the higher-level transaction and commits only those changes made at the lower level.

Because the session is cleared at the completion of each batch, changes made higher in the stack to objects attached to the Hibernate session will be discarded. For this reason, you should normally run bulk operations on a separate thread. The thread should do its own session management as described in the Hibernate session management guidelines. Most of the places in Confluence where bulk operations occur run either on a separate thread or in upgrade tasks outside the scope of any request to avoid this problem.

Relationship between the transaction and the session

Confluence uses the HibernateTransactionManager which is provided with Spring 2.0. This is responsible for creating database transactions when requested by an interceptor within the application.

When a transaction is opened, it is passed the session currently associated with the thread. If no session is active, a new one is created.

What happens when you flush the session

Flushing the session will run a "dirty check" on each object attached to the Hibernate session. This means any object which has been retrieved by or added by Hibernate will have its internal state checked against the instance state map that Hibernate keeps internally. For many objects, a dirty check is very expensive because it means checking the state of every dependent object as well as the object itself.

The dirty check executes inside SessionImpl.flushEntity which, if it determines some data has changed, will add a ScheduledUpdate object to the list of updates maintained in the session. It also executes SessionImpl.flushCollections for all the mapped collections on the object, which will register the fact that cached collections need to be updated with the changes.

Once all the attached objects have been checked for updates, the scheduled updates to objects and their collections are executed. This occurs in the SessionImpl.execute method, which iterates through all the necessary updates, executes SQL, and empties the collections.

If the query cache is enabled, which it always is in Confluence, Hibernate keeps a reference to every "execution" (insert, update or delete) that it runs against the database until the transaction is committed. This means that flushing and clearing the session isn't sufficient to clear all references to the attached objects; they will still be referenced by SessionImpl.executions until the transaction is committed.

What happens when you clear the session

Clearing the session empties the list of queued updates and the ID-based mapping that the session maintains for all attached objects. This means any updates which weren't flushed will be lost.

The executions list which keeps track of the post-transaction executions will not be cleared when clearing the session. As mentioned above, that means that flushing and clearing the session is not sufficient to clear all references to attached objects; they will still be strongly referenced by the Session until the transaction is committed.

What happens when you commit the transaction

The transaction which is managed by the HibernateTransactionManager in Confluence, an instance of net.sf.hibernate.transaction.JDBCTransaction, maintains a reference to the session it is associated with. When commit is called on the transaction (usually by the outermost transaction interceptor), it flushes the session, calls session.connection().commit() to send a commit statement directly via JDBC to the database, then finally calls SessionImpl.afterTransactionCompletion with a boolean indicating whether the transaction succeeded.

The purpose of SessionImpl.afterTransactionCompletion is to execute any post-transaction tasks on statements which have already been sent to the database. In the normal situation, this means updating persister (second-level) caches in Hibernate and releasing any locks held on these caches.

In practice, this means committing the transaction is the only way to release all resources which are held by the Hibernate session to track changes made during that transaction. Flushing the session is not sufficient. See above for recommendations on how to commit and clear the session to ensure memory usage during bulk operations is not excessive.

Rate this page: