Sunday, May 29, 2011

EclipseLink - JPA Queries optimization - @JoinFetch, @BatchFetch and Query Hints

@JoinFetch is very helpful when you try to minimize the number of SQL queries standing behind JPA query. I wrote about it in one of my previous posts: JPA Demystified (episode 1) - @OneToMany and @ManyToOne mappings, but it brings also some threats (see JPA - insert instead of update). This post will describe one of them, related to limiting number of results returned by JPA query.

Let's use same entities as in JPA Demystified (episode 1) - @OneToMany and @ManyToOne mappings. We have Employer:
@Entity
@Table(name = "EMPLOYERS")
public class Employer implements Serializable {
    ...
    private List<Employee> employees = new ArrayList<Employee>();
    ...
    @OneToMany(mappedBy = "employer", cascade = CascadeType.ALL)
    @JoinFetch(JoinFetchType.OUTER)
    public List getEmployees() {
        return employees; 
    }
    ...
}
and Employee:
@Entity
@Table(name = "EMPLOYEES")
public class Employee implements Serializable {
    ...
    private Employer employer;
    ...
    @ManyToOne(optional = false)
    @JoinColumn(name = "EMPLOYER_ID")
    @JoinFetch(JoinFetchType.INNER)
    public Employer getEmployer() {
        return employer;
    }
    ...
}
As you see, nothing special, well maybe except the @JoinFetch annotation on employees property of Employer.

We will use the query fetching only few employers from the DB:
    ...
    @PersistenceContext
    private EntityManager entityManager;
    ...
    TypedQuery<Employer> query = entityManager.createQuery("select e from Employer e", Employer.class);
    query.setMaxResults(5);    
    List<Employer> employers = query.getResultList();
    Assert.assertEquals(5, employers.size());    
    employers.get(0).getEmployees().get(0);
    ...
Suppose that we have 8 employers in the database, each of them have 5 employees. When you run the above code you'll receive: java.lang.AssertionError: expected:<5> but was:<1>. Why?

The truth is hidden in the dark ;) - JPA query will produce following SQL (target DB is MySQL):
SELECT 
    t1.ID AS a1, t1.BUSINESS_NAME AS a2, 
    t0.ID AS a3, t0.FIRST_NAME AS a4, t0.EMPLOYER_ID AS a5 
FROM EMPLOYERS t1 
    LEFT OUTER JOIN EMPLOYEES t0 ON (t0.EMPLOYER_ID = t1.ID) 
LIMIT 0, 5
As you see, @JoinFetch annotation forced EclipseLink to fetch employers and employees in one SQL query, but limiting number of records is performed the same way as for fetching the employers only, and therefore the result set will be limited to: Employer 1 - Employee 1, Employer 1 - Employee 2, ... , Employer 1 - Employee 5. This result set will be in turn converted by EclipseLink to one employer with 5 employees attached, and the result of the query will be 1 Employer, instead of expected 5.

Good Lord! Is there any hope for the people looking for some queries optimization?! - Of course my dear Watson :) - we can always replace one EclipseLink specific annotation (@JoinFetch) with another (@BatchFetch) or the JPA Query hints ...

Let's focus on the @BatchFetch annotation first. We will modify the Employer entity in the following way:
...
    @OneToMany(mappedBy = "employer", cascade = CascadeType.ALL)
    @BatchFetch(BatchFetchType.JOIN)
    public List<Employee> getEmployees() {
        return employees;
    }
    ...
The @BatchFetch annotation tells EclipseLink to fetch all related objects in one query, so instead of one query (when using @JoinFetch) we will have two (one for fetching employers, and second one for fetching the related employees). As you probably suppose, our code will work correctly this time, at least on the Java level ;) - because when we check the generated SQL, we will see:
SELECT ID AS a1, BUSINESS_NAME AS a2 FROM EMPLOYERS LIMIT 0, 5

SELECT t1.ID, t1.FIRST_NAME, t1.EMPLOYER_ID, t0.ID, t0.BUSINESS_NAME 
FROM EMPLOYERS t0, EMPLOYERS t2, EMPLOYEES t1 
WHERE ((t1.EMPLOYER_ID = t2.ID) AND (t0.ID = t1.EMPLOYER_ID))
As you see it's not the optimal choice - we fetch only 5 employers and ALL employees, from which only those who are bound to one of our 5 employers are used. Waste of DB and network resources :(

When you try EXISTS type of @BatchFetch the queries looks like this:
SELECT ID AS a1, BUSINESS_NAME AS a2 FROM EMPLOYERS LIMIT 0, 5

SELECT t1.ID, t1.FIRST_NAME, t1.EMPLOYER_ID, t0.ID, t0.BUSINESS_NAME 
FROM EMPLOYERS t0, EMPLOYEES t1 
WHERE (
    EXISTS (
        SELECT t2.ID FROM EMPLOYERS t2 
        WHERE (t1.EMPLOYER_ID = t2.ID)
    ) AND (t0.ID = t1.EMPLOYER_ID)
)
Still not the optimal choice, same as the above we fetch 5 employers and ALL employees :(

Only the IN type produces the SQL, which can be useful for us:
SELECT ID AS a1, BUSINESS_NAME AS a2 FROM EMPLOYERS LIMIT 0, 5

SELECT t1.ID, t1.FIRST_NAME, t1.EMPLOYER_ID, t0.ID, t0.BUSINESS_NAME 
FROM EMPLOYERS t0, EMPLOYEES t1 
WHERE ((t1.EMPLOYER_ID IN (1, 2, 3, 4, 5)) AND (t0.ID = t1.EMPLOYER_ID))
As you see @BatchFetch annotation give us possibility to fetch exactly 5 employers, and limit the number of required SQL queries to 2. There is one potential problem with IN type usage, which is related to limit in number of expressions for SQL IN clause for some databases (ex. Oracle - max 1000 expressions).

If you prefer, for some reason, to control the employees fetching on per-query basis, you have to switch from the annotations usage to Query Hints, for ex.
query.setHint(QueryHints.BATCH, "e.employees");
query.setHint(QueryHints.BATCH_TYPE, BatchFetchType.IN);
Few lectures for the dessert:

No comments:

Post a Comment