Enterprise Java Development@TOPIC@

Chapter 81. JPA

81.1. Lazy and Eager Fetching
81.1.1. Get Parent
81.1.2. Get Parent and Children
81.2. Obtaining Instance Counts
81.2.1. Query Parts
81.2.2. Collection Size in DAO from Relation
81.2.3. Row Count in DAO from Query
81.2.4. Row Count in DB using Count() Query
81.2.5. Row Count in DB using Count Query without JOIN
81.3. Query Loops
81.3.1. Query Parts
81.3.2. Query Loops in DAO
81.3.3. Query Loops using DB Subquery
81.4. Paging
81.4.1. Query Parts
81.4.2. Paging within DAO
81.4.3. Paging within DB
81.5. Summary

Avoiding inefficient uses of JPA.

Cost impacts of lazy and eager fetch joins

Get just the root parent of an object graph

Get just the root parent of an object graph with fetch=LAZY relationships


  • Two one-to-many relationships (cast and genres) -- genres is a non-entity ElementCollection

  • One one-to-one relationship (director)

  • All use fetch=LAZY


  • Only parent table is queried

  • Children are lazily loaded


  • Unique scan of Movie primary key index to satisfy where clause and locate rowId

  • Row access by rowId to satisfy select clause


* "Thick" parent has an extra Movie.plot text property mapped to potentially make child joins more expensive. Thin parent does not have Movie plot mapped so the results can focus on access to smaller, related entities.

Fetch=LAZY Speeds Access to Single Entity Core Information

Choosing fetch=LAZY for relationships allows efficient access to the core information for that entity without paying a price for loading unnecessary relations.

Get just the root parent of an object graph with fetch=EAGER relationships

Figure 81.8. Relationships


<entity class="ejava.jpa.examples.tuning.bo.Movie">
    <attributes>
        <many-to-one name="director" fetch="EAGER">
            <join-column name="DIRECTOR_ID"/>
        </many-to-one>
        <one-to-many name="cast" fetch="EAGER" mapped-by="movie"/>
        
        <element-collection name="genres" fetch="EAGER">
            <column name="GENRE"/>
            <collection-table name="JPATUNE_MOVIEGENRE">
                <join-column name="MOVIE_ID"/>
                <unique-constraint>
                    <column-name>MOVIE_ID</column-name>
                    <column-name>GENRE</column-name>
                </unique-constraint>
            </collection-table>
        </element-collection>

        <transient name="plot"/>
    </attributes>
</entity>
public class MovieRole {

...
    @ManyToOne(optional=false, fetch=FetchType.LAZY,
            cascade={CascadeType.DETACH})
    @JoinColumn(name="ACTOR_ID")
    private Actor actor;

<entity class="ejava.jpa.examples.tuning.bo.MovieRole">
    <attributes>
        <many-to-one name="actor" fetch="EAGER">
            <join-column name="ACTOR_ID"/>
        </many-to-one>
    </attributes>
</entity>
public class Actor {

...
    @OneToOne(optional=false, fetch=FetchType.EAGER,
            cascade={CascadeType.PERSIST, CascadeType.DETACH})
    @MapsId
    @JoinColumn(name="PERSON_ID")
    private Person person;
public class Director {

...
    @OneToOne(optional=false, fetch=FetchType.EAGER,
            cascade={CascadeType.PERSIST, CascadeType.DETACH})
    @JoinColumn(name="PERSON_ID")
    @MapsId
    private Person person;

  • Movie.director, Movie.genres, Movie.cast relationships overridden in XML to be fetch=EAGER

  • MovieRole.actor relationship overridden in XML to be fetch=EAGER

  • Director.person and Actor.person already fetch=EAGER


  • Single query (as before), but this time it also brings in the entire object graph


  • Explain plan shows use of indexes and no full table scans (a missing FK index for MovieRole.movie could have been costly)

  • Execution plan is significantly more costly than lazy alternative


Fetch=EAGER for Single Entity adds Noticeable Overhead

Choosing fetch=EAGER for relationships will make access to a single entity within the relationship more expensive to access. Consider creating a value query, value query with a result class, or a native query when querying for only single entity under these conditions.

Get just the root parent and *ONLY* the parent of an object graph with fetch=EAGER relationships by using a NativeQuery and Result Class



  • We define an alternate JPA-QL query for provider to use that does not include relationships

  • Result class not mapped beyond this query -- does not have to be an entity class


  • Value query with result class constructor permits a bypass of fetch=EAGER relationships


  • Execution plan identical to fetch=LAZY case


* Thick/Thin or Eager/Lazy should not matter to this approach since we only access a specific set of fields in the Movie that does not include Movie.plot.

Use JPA Value or Native Queries to Override Relationship Definitions

JPA provides many escape hatches to achieve desired results. Use JPAQL value queries with Object[], Tuple, or custom result class -or- use SQL to provide efficient override of fetch=EAGER definitions.

Result Class Instances are not Managed

Even though we used an entity class in this example, it is being used as a simple POJO and the returned instance is not managed. Result Classes for value queries provide convenient type-safe access to results. However, they are unmanaged and cannot be used for follow-on entity operations.

Get parent and children in an object graph

Get parent and children in an object graph with fetch=LAZY relationships

Get parent and children in an object graph with JOIN FETCH query



  • EAGER aspects adding by query adding FETCH to JOIN query construct


  • Query identical to fetch=EAGER case



Make use of JPA Queries to Achieve Tuned-for-Use Query

JPA provides several means to access an entity and its relationships. The properties defined for the relationship help define the default query semantics -- which may not be appropriate for all uses. Leverage JOIN FETCH queries when needing to fully load specific relationships prior to ending the transaction.

Pulling back entire rows from table rather than just the count

Get size of a relationship collection.


  • DAO first gets Movie by primary key

  • DAO follows up by accessing size of relationship


  • Provider will issue follow-on query for all entities in relation

  • Entity data not used by DAO -- i.e. wasted


  • Provider will issue one query for entire object graph, to include entities in relation

  • Entity data not used by DAO -- i.e. even more data is wasted


Using Relation Collections to Just Obtain Size is Inefficient

Pulling back relationship graphs to just obtain the count in that relationship is inefficient and requires DB to do much more work than it has to.

Query for relationships and count resulting rows


  • DAO forms query for just related entities


  • Provider forms query for related entities


  • Provider forms query for related entities and fetch=EAGER relationship specification causes additional MovieRole.actor to be immediately obtained -- except through separate queries by primary key.


Using fetch=EAGER Relationships can Magnify Data Retrieval Issues

When the entity model over-uses fetch=EAGER, the negative results can be magnified when pulling back unnecessary information from the database.

Performing separate subqueries off initial query

Using DAO to perform initial query and follow-on queries based on results


  • DAO makes N separate queries based on results of first query

  • Distinct was not required in query since it was issued per-movie. Distinct was handed within the DAO using a Java Set to hold the results and hashcode()/equals() implemented within Person class.


  • SQL generated queries single table


  • Range scan of compound index(actor_id, person_id) used to satisfy the person_id in where clause and movie_id for select clause

* Query was optimized to bypass Actor table


  • Range scan on compound_index(movie_id, actor_id) to satisfy where condition for Movie.id and locate MovieRole rowId

  • Unique scan of Person primary key index to obtain rowId for Person.id=MovieRole.actor.id

  • Person row accessed by rowId to satisfy select clause

* Query was optimized to bypass Actor table

** Compound index(movie_id, actor_id) permitted MovieRole table and lookup of MovieRole.actorId to be bypassed.


Beware of Query Loops Driven from Query Results

Query loops driven off of previous query results can be a sign the follow-queries could have been combined with the initial query to be accomplished within the database within a single statement.

Also beware of how the loop size can grow over the lifetime of the application. The number may grow significantly after testing/initial deployment to a significant size (i.e., from 100 to 100K loops). Defining as a single statement and applying paging limits can help speed the originally flawed implementation.

Expressing nest query as subquery to resolve within database.


  • Distinct was required to remove duplicates


  • SQL generated uses a non-correlated subquery


  • Range scan of MovieRole composite(actor_id, movie_id) used to satisfy subquery where clause for person_id and obtain movie_id

  • Range scan of MovieRole composite(movie_id, actor_id) used to join movie_ids with root query and obtain person_id

  • Unique scan of Person primary key index to locate rowId

  • Person row accessed by rowId

* cast4_ and movierole0_ are both MovieRoles. cast4_ is joined with Movie and subject Person in subquery. movierole0_ is joined with Movie and associated Person in root query.

** execution plan indicates subquery could have been re-written as a JOIN


Delegate Query Logic to Database

If possible, express query as a single transaction to the database rather than pulling data back and re-issuing subqueries from the DAO. Apply paging constraints in order to address issues with excess growth over time.

Placing reasonable limits on amount of data returned

Implementing paging within the DAO

Figure 81.66. DAO Code/JPAQL

public Collection<Person> oneStepFromPersonByDAO(Person p, Integer offset, Integer limit, String orderBy) {

    Collection<Person> result = new HashSet<Person>();
    //performing core query
    List<String> movieIds = createQuery(
            "select role.movie.id from MovieRole role " +
            "where role.actor.person.id=:personId", String.class)
            .setParameter("personId", p.getId())
            .getResultList();
    
    //loop through results and issue sub-queries
    int pos=0;
    for (String mid: movieIds) {
        List<Person> people = createQuery(
                "select role.actor.person from MovieRole role " +
                "where role.movie.id=:movieId", Person.class)
                .setParameter("movieId", mid)
                .getResultList();
        if (offset==null || (offset!=null && pos+people.size() > offset)) {
            for(Person pp: people) {
                if (offset==null || pos>=offset) {
                    result.add(pp);
                    if (limit!=null && result.size() >= limit) { break; }
                }
                pos+=1; 
            }
        } else {
            pos+=people.size();
        }
        if (limit!=null && result.size() >= limit) { break; }
    }
    return result;
}

  • DAO logic gets complicated when self-implementing paging.

  • OrderBy is not implemented by this DAO algorithm. All rows in the table would be required in order to perform ordering within the DAO.




  • This example performs better or equal to previous loop/subquery example when earlier pages are requested -- thus fewer rows returned

  • fetch=EAGER/LAZY have no impact on this test since the entity queried for has no relationships

Performance Degrades in DAO as Row Sizes Increase

Paging within the Java code starts off near equivalent to the database solution for small row sizes but gradually gets worse when row sizes increase and later pages are requested.

Implementing paging within the database


  • JPA calls to TypedQuery.setFirstResult(), setMaxResults(), and adding the orderBy to the text of the query are abstracted behind withQuery() and createQuery() helper functions within the DAO example class and are not shown here


  • Generated SQL is essentially the same as the looping/subquery example except with the addition of "order by" and paging constructs.


  • Execution plan is similar as described in the looping/subquery example

  • Extra cost to sort rows prior to paging operation added


DB-based Paging Scales

Since only the requested page of rows is returned, delegating paging to the database provides consistent performance for varying row quantities.

Doing Paging within DB enables Sorting

Implementing paging across multiple subquery calls to the database can get ugly and error-prone. Implementing paging for a single query issued to the database is quite trivial.