Enterprise Java Development@TOPIC@

JPA Tuning

Performance Topics Related to JPA and SQL

Revision: v2013-11-14

Built on: 2014-03-07 00:12 EST

Abstract

This presentation provides information for JPA/SQL developers to better understand how database constructs and database access decisions can impact application performance. It provides a brief discussion of tools that can be used and how to review an execution plan. Relative comparisons between approaches are provided to help show the costs and benefits of different approaches.


Topics
1. SQL Tuning
1.1. Reasons for Inefficient SQL Performance
1.2. Execution Plan
1.3. Diagnostic Tools
1.3.1. Client/DAO Result
1.3.2. EXPLAIN PLAN
1.3.3. AUTOTRACE
1.3.4. Display Cursor Execution Plan within V$PLAN
1.4. Summary
2. Example Domain Model: Movies
2.1. Class Model
2.2. Database Schema
2.3. Database Size
2.4. Prepare DB Between Tests
3. Table Access
3.1. Full Table Scan
3.1.1. Full Table Scan: Unconstrained Access
3.1.2. Full Table Scan: Using Where (without Index)
3.1.3. RowId Scan: Using Where (with Index)
3.1.4. Full Table Scan: Invalidating Index using Function applied to Row Column
3.1.5. RowId Scan: Using Index with Function applied to Row Column
3.1.6. Full Table Scan: Invalidating Index by using Leading Wildcards
3.2. Order By
3.2.1. Order By using Sort
3.2.2. Order By using Index
3.2.3. Order By using Index DESC
3.2.4. Order By using Reverse Index DESC
3.3. Summary
4. Indexes
4.1. Index Range Scan
4.2. Unique Index Scan
4.3. Composite Index
4.3.1. Query Parts
4.3.2. First Term Indexed
4.3.3. First and Second Term Indexed (using Composite Index)
4.4. Index Fast Full Scan (with Composite Index)
4.4.1. Query Parts
4.4.2. Option: Use Range Scan and RowId Access
4.4.3. Option: Use Range Scan Alone with Composite Index
4.4.4. Option: Fast Full Scan
4.5. Summary
5. Joins
5.1. Foreign Keys
5.1.1. Query Parts
5.1.2. No Indexes
5.1.3. Perform Query with Support for Foreign Key Index
5.1.4. Foreign Key and Where Columns Indexed
5.1.5. Foreign Key, Where, and Join Columns Indexed
5.2. Join Types
5.2.1. Nested Loop Join
5.2.2. Hash Join
5.2.3. Sort Merge Join
5.3. Summary
6. JPA
6.1. Lazy and Eager Fetching
6.1.1. Get Parent
6.1.2. Get Parent and Children
6.2. Obtaining Instance Counts
6.2.1. Query Parts
6.2.2. Collection Size in DAO from Relation
6.2.3. Row Count in DAO from Query
6.2.4. Row Count in DB using Count() Query
6.2.5. Row Count in DB using Count Query without JOIN
6.3. Query Loops
6.3.1. Query Parts
6.3.2. Query Loops in DAO
6.3.3. Query Loops using DB Subquery
6.4. Paging
6.4.1. Query Parts
6.4.2. Paging within DAO
6.4.3. Paging within DB
6.5. Summary
7. H2 Execution Plans
7.1. Column Index
7.2. Summary
8. JPA/SQL Tuning Summary
8.1. Other Topics

Figure 1.3. EXPLAIN PLAN Example using Text SQL Commands

 
EXPLAIN PLAN FOR
select * from
    ( select
        movie0_.TITLE as col_0_0_,
        person2_.FIRST_NAME as col_1_0_,
        person2_.LAST_NAME as col_2_0_ 
    from JPATUNE_MOVIE movie0_ 
    inner join JPATUNE_DIRECTOR director1_ on movie0_.DIRECTOR_ID=director1_.PERSON_ID 
    inner join JPATUNE_PERSON person2_ on director1_.PERSON_ID=person2_.ID 
    order by title DESC ) 
where
    rownum <= :limit;
SET LINESIZE 100
SET PAGESIZE 0
select * from table(DBMS_XPLAN.DISPLAY());
plan FOR succeeded.
Plan hash value: 857441453
 
----------------------------------------------------------------------------------------------------
| Id  | Operation                 | Name           | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |                |   774K|   477M|       | 25766   (1)| 00:05:10 |
|*  1 |  COUNT STOPKEY            |                |       |       |       |            |          |
|   2 |   VIEW                    |                |   774K|   477M|       | 25766   (1)| 00:05:10 |
|*  3 |    SORT ORDER BY STOPKEY  |                |   774K|    45M|    53M| 25766   (1)| 00:05:10 |
|*  4 |     HASH JOIN             |                |   774K|    45M|    11M| 14333   (1)| 00:02:52 |
|*  5 |      HASH JOIN            |                |   271K|  8746K|  5568K|  5115   (1)| 00:01:02 |
|   6 |       INDEX FAST FULL SCAN| DIRECTOR_PK    |   271K|  2385K|       |   191   (1)| 00:00:03 |
|   7 |       TABLE ACCESS FULL   | JPATUNE_PERSON |  1637K|    37M|       |  1854   (1)| 00:00:23 |
|*  8 |      TABLE ACCESS FULL    | JPATUNE_MOVIE  |   774K|    20M|       |  7169   (1)| 00:01:27 |
----------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - filter(ROWNUM<=TO_NUMBER(:LIMIT))
   3 - filter(ROWNUM<=TO_NUMBER(:LIMIT))
   4 - access("MOVIE0_"."DIRECTOR_ID"="DIRECTOR1_"."PERSON_ID")
   5 - access("DIRECTOR1_"."PERSON_ID"="PERSON2_"."ID")
   8 - filter("MOVIE0_"."DIRECTOR_ID" IS NOT NULL)






  • Access paths - strategies used to access data within table

  • Indexes - good for when accessing small amounts of data out of much larger data set

  • Full table scans - good for small tables and unrestricted row access

  • Any row can be *functionally* accessed with either method

  • Use execution plans to help use the appropriate technique

Database engine derives all information from the table directly from the physical storage location for the table. Table storage is not arranged in any guaranteed order.

Order returned results by one or more columns

  • Can be unique or non-unique

  • Can be simple or composite

  • Can be normal (ascending) or descending

  • Can be reverse key (for monotonically incrementing column values) to balance B*-tree

  • Can be function-based (to address normalization uses)

  • Can be used to implement sort for "order by"

  • Can be used to implement the entire table (Index-organized Table(IOT))

  • Can be traversed in different ways

    • Unique Scan - used with "unique" or "primary key" indexes to return a single value

    • Range Scan - used with "unique" and "non-unique" indexes to return multiple matching rows

    • Full Scan - used with composite indexes where leading where column is not part of index (i.e., can use col2 of composite)

    • Fast Full Scan - used when all columns of query (where and select) are contained within composite index -- table is skipped

    • ...

  • Can be coalesced or rebuilt with

    ALTER INDEX (index-name) (COALESCE | REBUILD)
    • Coalesce - repairs index in place. Good for small repairs

    • Rebuild - totally rebuilds index. Good for large repairs

Indexes with multiple columns to match the where clause and optionally the select and join clauses as well.

Incorporate terms from select clause into composite index to bypass table access.

Improving performance of foreign key joins.

Used when driving table is small and join condition used to access second table

Used when driving table too large to fit into memory and join based on equality

Avoiding inefficient uses of JPA.

Cost impacts of lazy and eager fetch joins

Get just the root parent of an object graph

Get just the root parent of an object graph with fetch=LAZY relationships


  • Two one-to-many relationships (cast and genres) -- genres is a non-entity ElementCollection

  • One one-to-one relationship (director)

  • All use fetch=LAZY


  • Only parent table is queried

  • Children are lazily loaded


  • Unique scan of Movie primary key index to satisfy where clause and locate rowId

  • Row access by rowId to satisfy select clause


* "Thick" parent has an extra Movie.plot text property mapped to potentially make child joins more expensive. Thin parent does not have Movie plot mapped so the results can focus on access to smaller, related entities.

Fetch=LAZY Speeds Access to Single Entity Core Information

Choosing fetch=LAZY for relationships allows efficient access to the core information for that entity without paying a price for loading unnecessary relations.

Get just the root parent of an object graph with fetch=EAGER relationships

Figure 6.8. Relationships


<entity class="ejava.jpa.examples.tuning.bo.Movie">
    <attributes>
        <many-to-one name="director" fetch="EAGER">
            <join-column name="DIRECTOR_ID"/>
        </many-to-one>
        <one-to-many name="cast" fetch="EAGER" mapped-by="movie"/>
        
        <element-collection name="genres" fetch="EAGER">
            <column name="GENRE"/>
            <collection-table name="JPATUNE_MOVIEGENRE">
                <join-column name="MOVIE_ID"/>
                <unique-constraint>
                    <column-name>MOVIE_ID</column-name>
                    <column-name>GENRE</column-name>
                </unique-constraint>
            </collection-table>
        </element-collection>

        <transient name="plot"/>
    </attributes>
</entity>
public class MovieRole {

...
    @ManyToOne(optional=false, fetch=FetchType.LAZY,
            cascade={CascadeType.DETACH})
    @JoinColumn(name="ACTOR_ID")
    private Actor actor;

<entity class="ejava.jpa.examples.tuning.bo.MovieRole">
    <attributes>
        <many-to-one name="actor" fetch="EAGER">
            <join-column name="ACTOR_ID"/>
        </many-to-one>
    </attributes>
</entity>
public class Actor {

...
    @OneToOne(optional=false, fetch=FetchType.EAGER,
            cascade={CascadeType.PERSIST, CascadeType.DETACH})
    @MapsId
    @JoinColumn(name="PERSON_ID")
    private Person person;
public class Director {

...
    @OneToOne(optional=false, fetch=FetchType.EAGER,
            cascade={CascadeType.PERSIST, CascadeType.DETACH})
    @JoinColumn(name="PERSON_ID")
    @MapsId
    private Person person;

  • Movie.director, Movie.genres, Movie.cast relationships overridden in XML to be fetch=EAGER

  • MovieRole.actor relationship overridden in XML to be fetch=EAGER

  • Director.person and Actor.person already fetch=EAGER


  • Single query (as before), but this time it also brings in the entire object graph


  • Explain plan shows use of indexes and no full table scans (a missing FK index for MovieRole.movie could have been costly)

  • Execution plan is significantly more costly than lazy alternative


Fetch=EAGER for Single Entity adds Noticeable Overhead

Choosing fetch=EAGER for relationships will make access to a single entity within the relationship more expensive to access. Consider creating a value query, value query with a result class, or a native query when querying for only single entity under these conditions.

Get just the root parent and *ONLY* the parent of an object graph with fetch=EAGER relationships by using a NativeQuery and Result Class



  • We define an alternate JPA-QL query for provider to use that does not include relationships

  • Result class not mapped beyond this query -- does not have to be an entity class


  • Value query with result class constructor permits a bypass of fetch=EAGER relationships


  • Execution plan identical to fetch=LAZY case


* Thick/Thin or Eager/Lazy should not matter to this approach since we only access a specific set of fields in the Movie that does not include Movie.plot.

Use JPA Value or Native Queries to Override Relationship Definitions

JPA provides many escape hatches to achieve desired results. Use JPAQL value queries with Object[], Tuple, or custom result class -or- use SQL to provide efficient override of fetch=EAGER definitions.

Result Class Instances are not Managed

Even though we used an entity class in this example, it is being used as a simple POJO and the returned instance is not managed. Result Classes for value queries provide convenient type-safe access to results. However, they are unmanaged and cannot be used for follow-on entity operations.

Get parent and children in an object graph

Get parent and children in an object graph with fetch=LAZY relationships

Get parent and children in an object graph with JOIN FETCH query



  • EAGER aspects adding by query adding FETCH to JOIN query construct


  • Query identical to fetch=EAGER case



Make use of JPA Queries to Achieve Tuned-for-Use Query

JPA provides several means to access an entity and its relationships. The properties defined for the relationship help define the default query semantics -- which may not be appropriate for all uses. Leverage JOIN FETCH queries when needing to fully load specific relationships prior to ending the transaction.

Pulling back entire rows from table rather than just the count

Get size of a relationship collection.


  • DAO first gets Movie by primary key

  • DAO follows up by accessing size of relationship


  • Provider will issue follow-on query for all entities in relation

  • Entity data not used by DAO -- i.e. wasted


  • Provider will issue one query for entire object graph, to include entities in relation

  • Entity data not used by DAO -- i.e. even more data is wasted


Using Relation Collections to Just Obtain Size is Inefficient

Pulling back relationship graphs to just obtain the count in that relationship is inefficient and requires DB to do much more work than it has to.

Query for relationships and count resulting rows


  • DAO forms query for just related entities


  • Provider forms query for related entities


  • Provider forms query for related entities and fetch=EAGER relationship specification causes additional MovieRole.actor to be immediately obtained -- except through separate queries by primary key.


Using fetch=EAGER Relationships can Magnify Data Retrieval Issues

When the entity model over-uses fetch=EAGER, the negative results can be magnified when pulling back unnecessary information from the database.

Performing separate subqueries off initial query

Using DAO to perform initial query and follow-on queries based on results


  • DAO makes N separate queries based on results of first query

  • Distinct was not required in query since it was issued per-movie. Distinct was handed within the DAO using a Java Set to hold the results and hashcode()/equals() implemented within Person class.


  • SQL generated queries single table


  • Range scan of compound index(actor_id, person_id) used to satisfy the person_id in where clause and movie_id for select clause

* Query was optimized to bypass Actor table


  • Range scan on compound_index(movie_id, actor_id) to satisfy where condition for Movie.id and locate MovieRole rowId

  • Unique scan of Person primary key index to obtain rowId for Person.id=MovieRole.actor.id

  • Person row accessed by rowId to satisfy select clause

* Query was optimized to bypass Actor table

** Compound index(movie_id, actor_id) permitted MovieRole table and lookup of MovieRole.actorId to be bypassed.


Beware of Query Loops Driven from Query Results

Query loops driven off of previous query results can be a sign the follow-queries could have been combined with the initial query to be accomplished within the database within a single statement.

Also beware of how the loop size can grow over the lifetime of the application. The number may grow significantly after testing/initial deployment to a significant size (i.e., from 100 to 100K loops). Defining as a single statement and applying paging limits can help speed the originally flawed implementation.

Expressing nest query as subquery to resolve within database.


  • Distinct was required to remove duplicates


  • SQL generated uses a non-correlated subquery


  • Range scan of MovieRole composite(actor_id, movie_id) used to satisfy subquery where clause for person_id and obtain movie_id

  • Range scan of MovieRole composite(movie_id, actor_id) used to join movie_ids with root query and obtain person_id

  • Unique scan of Person primary key index to locate rowId

  • Person row accessed by rowId

* cast4_ and movierole0_ are both MovieRoles. cast4_ is joined with Movie and subject Person in subquery. movierole0_ is joined with Movie and associated Person in root query.

** execution plan indicates subquery could have been re-written as a JOIN


Delegate Query Logic to Database

If possible, express query as a single transaction to the database rather than pulling data back and re-issuing subqueries from the DAO. Apply paging constraints in order to address issues with excess growth over time.

Placing reasonable limits on amount of data returned

Implementing paging within the DAO

Figure 6.66. DAO Code/JPAQL

public Collection<Person> oneStepFromPersonByDAO(Person p, Integer offset, Integer limit, String orderBy) {

    Collection<Person> result = new HashSet<Person>();
    //performing core query
    List<String> movieIds = createQuery(
            "select role.movie.id from MovieRole role " +
            "where role.actor.person.id=:personId", String.class)
            .setParameter("personId", p.getId())
            .getResultList();
    
    //loop through results and issue sub-queries
    int pos=0;
    for (String mid: movieIds) {
        List<Person> people = createQuery(
                "select role.actor.person from MovieRole role " +
                "where role.movie.id=:movieId", Person.class)
                .setParameter("movieId", mid)
                .getResultList();
        if (offset==null || (offset!=null && pos+people.size() > offset)) {
            for(Person pp: people) {
                if (offset==null || pos>=offset) {
                    result.add(pp);
                    if (limit!=null && result.size() >= limit) { break; }
                }
                pos+=1; 
            }
        } else {
            pos+=people.size();
        }
        if (limit!=null && result.size() >= limit) { break; }
    }
    return result;
}

  • DAO logic gets complicated when self-implementing paging.

  • OrderBy is not implemented by this DAO algorithm. All rows in the table would be required in order to perform ordering within the DAO.




  • This example performs better or equal to previous loop/subquery example when earlier pages are requested -- thus fewer rows returned

  • fetch=EAGER/LAZY have no impact on this test since the entity queried for has no relationships

Performance Degrades in DAO as Row Sizes Increase

Paging within the Java code starts off near equivalent to the database solution for small row sizes but gradually gets worse when row sizes increase and later pages are requested.

Implementing paging within the database


  • JPA calls to TypedQuery.setFirstResult(), setMaxResults(), and adding the orderBy to the text of the query are abstracted behind withQuery() and createQuery() helper functions within the DAO example class and are not shown here


  • Generated SQL is essentially the same as the looping/subquery example except with the addition of "order by" and paging constructs.


  • Execution plan is similar as described in the looping/subquery example

  • Extra cost to sort rows prior to paging operation added


DB-based Paging Scales

Since only the requested page of rows is returned, delegating paging to the database provides consistent performance for varying row quantities.

Doing Paging within DB enables Sorting

Implementing paging across multiple subquery calls to the database can get ugly and error-prone. Implementing paging for a single query issued to the database is quite trivial.