Enterprise Java Development@TOPIC@
Improving performance of foreign key joins.
Select N properties for object M matching certain criteria
Figure 80.1. JPA Query
"select r from MovieRole r join r.movie m where m.title=:title and m.releaseDate=:releaseDate", params={title=Tremors, releaseDate=1990-07-01}
Figure 80.2. Generated SQL
select movierole0_.ID as ID1_3_, movierole0_.ACTOR_ID as ACTOR3_3_, movierole0_.MOVIE_ID as MOVIE4_3_, movierole0_.MOVIE_ROLE as MOVIE2_3_ from JPATUNE_MOVIEROLE movierole0_ inner join JPATUNE_MOVIE movie1_ on movierole0_.MOVIE_ID=movie1_.ID where movie1_.TITLE=? and movie1_.RELEASE_DATE=?
Perform query without any index support for where, select, or join clauses
Full table scan of Movie to satisfy where clause and to obtain movie_id for join
Full table scan of MovieRole to locate MovieRoles.movie_id and to obtain rows to satisfy select clause
* Large portion of cost spent looking for children
The cost of performing a full table scan of the child table looking for children to match a specific parent is magnified when the overall query selects multiple parent rows. The child table must be scanned for each parent.
Address the biggest cost item in the execution plan by adding an index on the foreign key column.
Full table scan of Movie to satisfy where clause and to obtain movie_id for join
Range scan of foreign key index used to locate MovieRoles.movie_id
MovieRole rows accessed by rowId to satisfy select clause
* Eliminated much of the cost to locate children. Still have cost to locate parent
Add indexes to foreign key columns of child tables when the child table is frequently joined with parent rows. The index will go unused if the child is simply searched alone.
Figure 80.8. DB Index(es)
create index movie_title_rdate_idx on jpatune_movie(title, release_date) create index movierole_movie_fkx on jpatune_movierole(movie_id)
Range scan of composite used to satisfy where clause
Movie rows accessed by rowId to obtain Movie.id for join
(remaining steps are same as prior approach)
Range scan of foreign key index used to locate MovieRoles.movie_id
MovieRole rows accessed by rowId to satisfy select clause
* Eliminated much of the cost to locate children. Still have cost small cost obtaining Movie.id
Figure 80.10. Relative Test Result
FKs and Where Indexed.Get Children:warmups=2, rounds=11, ave=0.18 [+- 0.01]
Figure 80.11. DB Index(es)
create index movie_title_rdate_id_idx on jpatune_movie(title, release_date, id) create index movierole_movie_fkx on jpatune_movierole(movie_id)
Range scan of composite used to satisfy where clause and obtain Movie.id for join
No access to Movie table required
(remaining steps are same as prior approach)
Range scan of foreign key index used to locate MovieRoles.movie_id
MovieRole rows accessed by rowId to satisfy select clause
* The last rowId access is necessary to return the full MovieRole object
Figure 80.13. Relative Test Result
FKs, Where, and Join Indexed.Get Children:warmups=2, rounds=11, ave=0.20 [+- 0.01]
By adding the join column of the parent to the composite index used to locate the parent rows, you can save row accesses to the parent table (looking for the join column value). This is a small improvement relative to the other indexes added earlier because you are only eliminating an already efficient (but not the most efficient) means of locating row data.
Nested Loop Join - small driving table, join condition used to access second table
Hash Join - larger data joins based on equality matches through hashing functions
Sort-merge Join - larger data joins based on non-equality joins
Used when driving table is small and join condition used to access second table
Select all from table M joined with table P where P.cols match narrow condition
Figure 80.14. JPA Query
"select m from Movie m join m.cast as r where r.role=:role", params={role=Valentine McKee}, limit=2
Figure 80.15. Generated SQL
select * from ( select movie0_.ID as ID1_2_, movie0_.DIRECTOR_ID as DIRECTOR7_2_, movie0_.MINUTES as MINUTES2_2_, movie0_.PLOT as PLOT3_2_, movie0_.RATING as RATING4_2_, movie0_.RELEASE_DATE as RELEASE5_2_, movie0_.TITLE as TITLE6_2_ from JPATUNE_MOVIE movie0_ inner join JPATUNE_MOVIEROLE cast1_ on movie0_.ID=cast1_.MOVIE_ID where cast1_.MOVIE_ROLE=? ) where rownum <= ?
No indexes on where clause or foreign key columns
Full table scan of MovieRoles table performed to satisfy role from where clause
Nested Loop using driving table (MovieRole) foreign key (MovieRole.movie_id)
Unique Scan of Movie PK Index (SYS_C0012387) used to locate join rows (MovieRole.movie_id = Movie.id)
RowId scan used to access row data for Movie
Figure 80.17. Relative Test Result
No Table or FK Indexes.Small Driving Table:warmups=2, rounds=11, ave=0.28 [+- 0.04]
Most of the time spent within this query plan is attributed to the full table scan due to the lack of an index matching the where clause.
Single index on where clause column(s)
Range scan on index of MovieRoles.roles column used to satisfy where clause
MoveRoles row accessed using RowId from MoviesRoles.roles index to obtain movie_id
(from here is is the same as the previous full table scan case)
Nested Loop using driving table (MovieRole) foreign key (MovieRole.movie_id)
Unique Scan of Movie PK Index (SYS_C0012387) used to locate join rows (MovieRole.movie_id = Movie.id)
RowId scan used to access row data for Movie
Figure 80.20. Relative Test Result
Table Indexes Only.Small Driving Table:warmups=2, rounds=11, ave=0.18 [+- 0.03]
Adding the foreign key to an (composite) index will eliminate need to access row for foreign key during join.
Separate indexes; one for where clause and second for FK for join
Figure 80.21. DB Index(es)
create index movie_role_idx on jpatune_movierole(movie_role) create index movie_role_movie_fdx on jpatune_movierole(movie_id)
There is no difference between this and the previous case
Foreign Key indexes help locate child rows when join coming from parent
Index identifies rowId of the child having the parent value -- not the rowId of the parent
Figure 80.23. Relative Test Result
Individual Table and FK Indexes.Small Driving Table:warmups=2, rounds=11, ave=0.19 [+- 0.03]
A foreign key index (from the child to the parent) is not used when the child is the driving table and the parent is being located. In this case, the primary key index of the parent is used instead.
Single composite index containing columns from where clause and foreign key
Figure 80.24. DB Index(es)
create index movie_role_movie_cdx on jpatune_movierole(movie_role, movie_id)
Range scan on composite index of MovieRoles used to satisfy where clause
Composite index provides movie_id without having to access MovieRole row
(from here is is the same as the previous cases)
Nested Loop using driving table (MovieRole) foreign key (MovieRole.movie_id)
Unique Scan of Movie PK Index (SYS_C0012387) used to locate join rows (MovieRole.movie_id = Movie.id)
RowId scan used to access row data for Movie
Figure 80.26. Relative Test Result
Composite with FK Index.Small Driving Table:warmups=2, rounds=11, ave=0.19 [+- 0.03]
Adding the foreign key column to an index used to satisfy the where clause can eliminate the need for the DB to access the row for the foreign key during a join operation.
Used when driving table too large to fit into memory and join based on equality
Select all from table M joined with table P where P.cols match broad condition
Figure 80.27. JPA Query
-- Bounded "select m from Movie m join m.cast as r where r.role like :role", params={role=V%}, limit=2
-- Offset Bounded ... offset=1000, limit=2
-- Unbounded ... params={role=V%}
* Used multiple forms of query since Hash algorithm seemed to better detect an early completion limit than the nested loop approach.
Figure 80.28. Generated SQL
select * from ( select movie0_.ID as ID1_2_, movie0_.DIRECTOR_ID as DIRECTOR7_2_, movie0_.MINUTES as MINUTES2_2_, movie0_.PLOT as PLOT3_2_, movie0_.RATING as RATING4_2_, movie0_.RELEASE_DATE as RELEASE5_2_, movie0_.TITLE as TITLE6_2_ from JPATUNE_MOVIE movie0_ inner join JPATUNE_MOVIEROLE cast1_ on movie0_.ID=cast1_.MOVIE_ID where cast1_.MOVIE_ROLE like ? ) where rownum <= ?
No indexes on where clause or foreign key columns
Used When:
Joining large amounts of data
Joining on equality (equi-join) and not relative or function-based
Full table scan of MovieRoles table performed to satisfy role from where clause
DB creates in-memory hash table of foreign key values (i.e., multiple rows/FKs may hash into single hash entry)
Full table scan of Movie table performed to locate and hash(movie.id) -- primary key index is not used
Unique rows located within each hash bucket -- empties discarded
Matching Movies forwarded to next level
Figure 80.30. Relative Test Result
No Table or FK Indexes.Large Driving Table (Bounded):warmups=2, rounds=11, ave=0.17 [+- 0.01] No Table or FK Indexes.Large Driving Table Offset (Bounded):warmups=2, rounds=11, ave=0.26 [+- 0.02] No Table or FK Indexes.Large Driving Table (Unbounded:warmups=2, rounds=11, ave=31.51 [+- 1.25]
Single index on where clause column(s)
Range scan on index of MovieRoles.roles column used to satisfy where clause
MoveRoles row accessed using RowId from MoviesRoles.roles index to obtain movie_id
DB creates in-memory hash table of foreign key values
(from here is is the same as the previous full table scan case)
Full table scan of Movie table performed to locate and hash(movie.id) -- primary key index is not used
Unique rows located within each hash bucket -- empties discarded
Matching Movies forwarded to next level
Figure 80.33. Relative Test Result
Table Indexes Only.Large Driving Table (Bounded):warmups=2, rounds=11, ave=0.19 [+- 0.03] Table Indexes Only.Large Driving Table Offset (Bounded):warmups=2, rounds=11, ave=0.26 [+- 0.04] Table Indexes Only.Large Driving Table (Unbounded:warmups=2, rounds=11, ave=31.40 [+- 1.38]
Single composite index containing columns from where clause and foreign key
Figure 80.34. DB Index(es)
create index movie_role_movie_cdx on jpatune_movierole(movie_role, movie_id)
Range scan on composite index of MovieRoles used to satisfy where clause
Composite index provides movie_id without having to access MovieRole row
(from here is is the same as the previous cases)
Full table scan of Movie table performed to locate and hash(movie.id) -- primary key index is not used
Unique rows located within each hash bucket -- empties discarded
Matching Movies forwarded to next level
Figure 80.36. Relative Test Result
Composite with FK Index.Large Driving Table (Bounded):warmups=2, rounds=11, ave=0.18 [+- 0.01] Composite with FK Index.Large Driving Table Offset (Bounded):warmups=2, rounds=11, ave=0.22 [+- 0.03] Composite with FK Index.Large Driving Table (Unbounded:warmups=2, rounds=11, ave=30.20 [+- 1.24]
Used:
Over Nested Loops when no suitable driving table can be selected (i.e. fit in memory)
Over Hash Join when join is by non-equality comparison
Basic Algorithm
No driving table selected
Both tables ordered according to a common key
Ordered rows from both tables are then merged
Figure 80.37. Altered SQL
select * from ( select /*+ use_merge(cast1_ movie0_) */ movie0_.ID as ID1_2_, ... where cast1_.MOVIE_ROLE like ? ) where rownum <= ?
Results from MovieRole (movie_role=:role) and Movie (id=MovieRole.id) sorted at considerable cost
Sorted results merged to form result
Non-equijoins eliminate DB-flexibility to locate matching rows between tables and force the database to order tables based on the key column prior to forming joined result.