Warlock's Thoughts: May 2011

@JoinFetch is very helpful when you try to minimize the number of SQL queries standing behind JPA query. I wrote about it in one of my previous posts: JPA Demystified (episode 1) - @OneToMany and @ManyToOne mappings, but it brings also some threats (see JPA - insert instead of update). This post will describe one of them, related to limiting number of results returned by JPA query.

Let's use same entities as in JPA Demystified (episode 1) - @OneToMany and @ManyToOne mappings. We have Employer:

@Entity
@Table(name = "EMPLOYERS")
public class Employer implements Serializable {
    ...
    private List<Employee> employees = new ArrayList<Employee>();
    ...
    @OneToMany(mappedBy = "employer", cascade = CascadeType.ALL)
    @JoinFetch(JoinFetchType.OUTER)
    public List getEmployees() {
        return employees; 
    }
    ...
}

and Employee:

@Entity
@Table(name = "EMPLOYEES")
public class Employee implements Serializable {
    ...
    private Employer employer;
    ...
    @ManyToOne(optional = false)
    @JoinColumn(name = "EMPLOYER_ID")
    @JoinFetch(JoinFetchType.INNER)
    public Employer getEmployer() {
        return employer;
    }
    ...
}

As you see, nothing special, well maybe except the @JoinFetch annotation on employees property of Employer.

We will use the query fetching only few employers from the DB:

    ...
    @PersistenceContext
    private EntityManager entityManager;
    ...
    TypedQuery<Employer> query = entityManager.createQuery("select e from Employer e", Employer.class);
    query.setMaxResults(5);    
    List<Employer> employers = query.getResultList();
    Assert.assertEquals(5, employers.size());    
    employers.get(0).getEmployees().get(0);
    ...

Suppose that we have 8 employers in the database, each of them have 5 employees. When you run the above code you'll receive: java.lang.AssertionError: expected:<5> but was:<1>. Why?

The truth is hidden in the dark ;) - JPA query will produce following SQL (target DB is MySQL):

SELECT 
    t1.ID AS a1, t1.BUSINESS_NAME AS a2, 
    t0.ID AS a3, t0.FIRST_NAME AS a4, t0.EMPLOYER_ID AS a5 
FROM EMPLOYERS t1 
    LEFT OUTER JOIN EMPLOYEES t0 ON (t0.EMPLOYER_ID = t1.ID) 
LIMIT 0, 5

As you see, @JoinFetch annotation forced EclipseLink to fetch employers and employees in one SQL query, but limiting number of records is performed the same way as for fetching the employers only, and therefore the result set will be limited to: Employer 1 - Employee 1, Employer 1 - Employee 2, ... , Employer 1 - Employee 5. This result set will be in turn converted by EclipseLink to one employer with 5 employees attached, and the result of the query will be 1 Employer, instead of expected 5.

Good Lord! Is there any hope for the people looking for some queries optimization?! - Of course my dear Watson :) - we can always replace one EclipseLink specific annotation (@JoinFetch) with another (@BatchFetch) or the JPA Query hints ...

Let's focus on the @BatchFetch annotation first. We will modify the Employer entity in the following way:

...
    @OneToMany(mappedBy = "employer", cascade = CascadeType.ALL)
    @BatchFetch(BatchFetchType.JOIN)
    public List<Employee> getEmployees() {
        return employees;
    }
    ...

The @BatchFetch annotation tells EclipseLink to fetch all related objects in one query, so instead of one query (when using @JoinFetch) we will have two (one for fetching employers, and second one for fetching the related employees). As you probably suppose, our code will work correctly this time, at least on the Java level ;) - because when we check the generated SQL, we will see:

SELECT ID AS a1, BUSINESS_NAME AS a2 FROM EMPLOYERS LIMIT 0, 5

SELECT t1.ID, t1.FIRST_NAME, t1.EMPLOYER_ID, t0.ID, t0.BUSINESS_NAME 
FROM EMPLOYERS t0, EMPLOYERS t2, EMPLOYEES t1 
WHERE ((t1.EMPLOYER_ID = t2.ID) AND (t0.ID = t1.EMPLOYER_ID))

As you see it's not the optimal choice - we fetch only 5 employers and ALL employees, from which only those who are bound to one of our 5 employers are used. Waste of DB and network resources :(

When you try EXISTS type of @BatchFetch the queries looks like this:

SELECT ID AS a1, BUSINESS_NAME AS a2 FROM EMPLOYERS LIMIT 0, 5

SELECT t1.ID, t1.FIRST_NAME, t1.EMPLOYER_ID, t0.ID, t0.BUSINESS_NAME 
FROM EMPLOYERS t0, EMPLOYEES t1 
WHERE (
    EXISTS (
        SELECT t2.ID FROM EMPLOYERS t2 
        WHERE (t1.EMPLOYER_ID = t2.ID)
    ) AND (t0.ID = t1.EMPLOYER_ID)
)

Still not the optimal choice, same as the above we fetch 5 employers and ALL employees :(

Only the IN type produces the SQL, which can be useful for us:

SELECT ID AS a1, BUSINESS_NAME AS a2 FROM EMPLOYERS LIMIT 0, 5

SELECT t1.ID, t1.FIRST_NAME, t1.EMPLOYER_ID, t0.ID, t0.BUSINESS_NAME 
FROM EMPLOYERS t0, EMPLOYEES t1 
WHERE ((t1.EMPLOYER_ID IN (1, 2, 3, 4, 5)) AND (t0.ID = t1.EMPLOYER_ID))

As you see @BatchFetch annotation give us possibility to fetch exactly 5 employers, and limit the number of required SQL queries to 2. There is one potential problem with IN type usage, which is related to limit in number of expressions for SQL IN clause for some databases (ex. Oracle - max 1000 expressions).

If you prefer, for some reason, to control the employees fetching on per-query basis, you have to switch from the annotations usage to Query Hints, for ex.

query.setHint(QueryHints.BATCH, "e.employees");
query.setHint(QueryHints.BATCH_TYPE, BatchFetchType.IN);

Few lectures for the dessert:

A Friend of mine asked me lately how would I define @OneToOne mapping in JPA with shared primary key. Well, we will definitely need an example ;) - Suppose that we have two entities: Primus and Secundus. Both entities have primary key using Long Java type. They are related 1-1, and Secundus should use the same primary key as Primus. Let's start with following state of Primus:

@Entity
@Table(name = "PRIMUS")
public class Primus {

    private Long id;
    private String name;
    private Secundus secundus;

    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    public Long getId() {
        return id;
    }

    public String getName() {
        return name;
    }

    @OneToOne(cascade = CascadeType.PERSIST, mappedBy = "primus")
    public Secundus getSecundus() {
        return secundus;
    }
...
}

Primus is more important from our point of view, it should be persisted first, and then Secundus, therefore we define the relation between them using mappedBy attribute. We also use cascading here to be able to persist both entities at once. Note, that we will only try to persist the entities, that's why there is only one type of cascade.

Let's take a look at Secundus now:

@Entity
@Table(name = "SECUNDUS")
public class Secundus {

    private Long id;
    private String name;
    private Primus primus;

    @Id
    public Long getId() {
        return id;
    }

    public String getName() {
        return name;
    }

    @OneToOne
    @PrimaryKeyJoinColumn
    public Primus getPrimus() {
        return primus;
    }
...
}

Note, that we don't define the method of generating Secundus identifier here. We also use Secundus identifier (primary key) as the join column for the primus property (see @PrimaryKeyJoinColumn).

Let's try to test the persisting of those two with following code:

 ... 
    @PersistenceContext
    private EntityManager entityManager;

    @Test
    @Transactional(readOnly = false)
    public void test01() {
        final Primus primus = new Primus();
        final Secundus secundus = new Secundus();
        primus.setSecundus(secundus);
        entityManager.persist(primus);
    }
...

The above code and our current JPA mappings will lead to the following SQL queries (used database is MySQL 5.5):

INSERT INTO PRIMUS (NAME) VALUES (null)
SELECT LAST_INSERT_ID()
INSERT INTO SECUNDUS (ID, NAME) VALUES (null, null)

and of course fail at the second insert.

As you probably suppose the Primus identifier is set right after the select last_insert_id() query is performed, let's try to use this knowledge to initialize the Secundus identifier - we will modify the Primus method setId as visible below:

    public void setId(Long id) {
        this.id = id;
        if (null != secundus) {
            secundus.setId(id);
        }
    }

This time the queries looks good:

INSERT INTO PRIMUS (NAME) VALUES (null)
SELECT LAST_INSERT_ID()
INSERT INTO SECUNDUS (ID, NAME) VALUES (12, null)

and both entities are written into database :).

At this point I would consider one more modification of Primus - which is not required for entities persisting, but will assure that both entities are bound in proper way:

    public void setSecundus(Secundus secundus) {
        this.secundus = secundus;
        if (null != secundus) {
            secundus.setPrimus(this);
        }
    }

I'm curious if you have any other solution for this situation, feel free to share it with me :)

PS: The above solution does work for following JPA providers: EclipseLink (2.1) or Hibernate (3.4), but doesn't work for OpenJPA (2.1).

For JPA 2.0+ based solution see my post: @OneToOne with shared primary key, revisited :)

Warlock's Thoughts

Sunday, May 29, 2011

EclipseLink - JPA Queries optimization - @JoinFetch, @BatchFetch and Query Hints

Wednesday, May 4, 2011

@OneToOne with shared primary key