不稳定的执行计划(Oracle执行计划稳定性/不稳定性)
本文是翻译Kerry Osborne的Unstable Plans (Oracle Plan Stability/Instability)这篇文章,翻译如有不当的地方,敬请谅解,请尊重原创和翻译劳动成果,转载的时候请注明出处。谢谢!文中有些翻译的地方按自己的理解进行了调整,但是尽量符合原文意思。请知晓!Oracle基于成本的优化器(Cost Based Optimizer,CBO)有时候表现得非常不稳定,这可能是最令人沮丧的事情之一。它的性能表现有时似乎是随机的。解决这些问题也颇具挑战性。这有点像把你的车送去修车厂,因为它发出奇怪的声音,但当修车师傅在场时,它又从不发出那种声音。幸运的是,我们有ASH(Active Session History)和AWR(Automatic Workload Repository)工具,它们可以捕获大量关于数据库当时正在做什么的信息。
这里有两个脚本,我发现它们很有用。
脚本unstable_plans.sql
第一个脚本我命名为unstable_plans.sql. 如下所示
----------------------------------------------------------------------------------------<br>--<br>-- File name: unstable_plans.sql<br>--<br>-- Purpose: Attempts to find SQL statements with plan instability.<br>--<br>-- Author: Kerry Osborne<br>--<br>-- Usage: This scripts prompts for two values, both of which can be left blank.<br>--<br>-- min_stddev: the minimum "normalized" standard deviation between plans <br>-- (the default is 2)<br>--<br>-- min_etime: only include statements that have an avg. etime > this value<br>-- (the default is .1 second)<br>--<br>-- See http://kerryosborne.oracle-guy.com/2008/10/unstable-plans/ for more info.<br>---------------------------------------------------------------------------------------<br><br>set lines 155<br>col execs for 999,999,999<br>col min_etime for 999,999.99<br>col max_etime for 999,999.99<br>col avg_etime for 999,999.999<br>col avg_lio for 999,999,999.9<br>col norm_stddev for 999,999.9999<br>col begin_interval_time for a30<br>col node for 99999<br>break on plan_hash_value on startup_time skip 1<br>select * from (<br>select sql_id, sum(execs), min(avg_etime) min_etime, max(avg_etime) max_etime, stddev_etime/min(avg_etime) norm_stddev<br>from (<br>select sql_id, plan_hash_value, execs, avg_etime,<br>stddev(avg_etime) over (partition by sql_id) stddev_etime<br>from (<br>select sql_id, plan_hash_value,<br>sum(nvl(executions_delta,0)) execs,<br>(sum(elapsed_time_delta)/decode(sum(nvl(executions_delta,0)),0,1,sum(executions_delta))/1000000) avg_etime<br>-- sum((buffer_gets_delta/decode(nvl(buffer_gets_delta,0),0,1,executions_delta))) avg_lio<br>from DBA_HIST_SQLSTAT S, DBA_HIST_SNAPSHOT SS<br>where ss.snap_id = S.snap_id<br>and ss.instance_number = S.instance_number<br>and executions_delta > 0<br>group by sql_id, plan_hash_value<br>)<br>)<br>group by sql_id, stddev_etime<br>)<br>where norm_stddev > nvl(to_number('&min_stddev'),2)<br>and max_etime > nvl(to_number('&min_etime'),.1)<br>order by norm_stddev<br>/<br>它可以用来显示那些执行时间存在明显/显著差异的SQL语句(它也可以修改为查找逻辑I/O存在差异的SQL,但我会留给读者来完成这个练习)。它使用分析函数来计算SQL执行计划的平均执行时间的标准方差。因此,那些具有多个计划且计划之间响应时间差异很大的语句将被该脚本返回。脚本会提示输入两个值:第一个是最小标准差数量,第二个是最小执行时间(通常,如果一个语句有时执行时间为0.005秒,有时为0.02秒,我并不在意,尽管从统计学上来说这是一个很大的波动)。顺便说一下,这两个输入都有默认值。
脚本awr_plan_change.sql
第二个脚本为awr_plan_change.sql,如下所示:
set lines 155<br>col execs for 999,999,999<br>col avg_etime for 999,999.999<br>col avg_lio for 999,999,999.9<br>col begin_interval_time for a30<br>col node for 99999<br>break on plan_hash_value on startup_time skip 1<br>select ss.snap_id, ss.instance_number node, begin_interval_time, sql_id, plan_hash_value,<br>nvl(executions_delta,0) execs,<br>(elapsed_time_delta/decode(nvl(executions_delta,0),0,1,executions_delta))/1000000 avg_etime,<br>(buffer_gets_delta/decode(nvl(buffer_gets_delta,0),0,1,executions_delta)) avg_lio<br>from DBA_HIST_SQLSTAT S, DBA_HIST_SNAPSHOT SS<br>where sql_id = nvl('&sql_id','4dqs2k5tynk61')<br>and ss.snap_id = S.snap_id<br>and ss.instance_number = S.instance_number<br>and executions_delta > 0<br>order by 1, 2, 3<br>/<br>它显示了给定语句的执行计划随时间的变化信息,以及一些统计数据,例如平均执行时间(average elapsed time)和平均逻辑I/O读写(average lio’s)。
总之,这里有一个使用这两个脚本的示例(顺便说一下,示例是在11gR1数据库上进行的,但这些脚本在10g上也能正常工作)。
> sqlplus / as sysdba<br><br>SQL*Plus: Release 11.1.0.6.0 - Production on Tue Oct 7 15:44:20 2008<br><br>Copyright (c) 1982, 2007, Oracle. All rights reserved.<br><br>Connected to:<br>Oracle Database 11g Enterprise Edition Release 11.1.0.6.0 - Production<br>With the Partitioning, OLAP, Data Mining and Real Application Testing options<br><br>SQL> alter system set optimizer_mode=rule;<br><br>System altered.<br><br>SQL> @awr_snap <br><br>PL/SQL procedure successfully completed.<br><br>SQL> @unstable_plans<br>Enter value for min_stddev: <br>Enter value for min_etime: <br><br>SQL_ID SUM(EXECS) MIN_ETIME MAX_ETIME NORM_STDDEV<br>------------- ---------- ----------- ----------- -------------<br>c5by9gdw3814r 14 .03 .12 2.1274<br>848dyu9288c3h 16 .02 .16 2.1309<br>2am60vd2kw8ux 11 .05 .22 2.4976<br>frxg53fac2n8j 5 .03 .30 4.2479<br>0qa98gcnnza7h 62 25.58 314.34 7.9833<br>g0jvz8csyrtcf 2 .09 1.19 8.2304<br>2cn0kc8u4b81w 545 .02 .42 12.8022<br>9wt62290ah0f7 6 .01 .47 38.5857<br>d8mayxqw0wnpv 1373 .01 .85 48.3874<br><br>9 rows selected.<br><br>SQL> /<br>Enter value for min_stddev: <br>Enter value for min_etime: 2<br><br>SQL_ID SUM(EXECS) MIN_ETIME MAX_ETIME NORM_STDDEV<br>------------- ---------- ----------- ----------- -------------<br>0qa98gcnnza7h 62 25.58 314.34 7.9833<br><br>SQL> @find_sql<br>Enter value for sql_text: <br>Enter value for address: <br>Enter value for sql_id: 0qa98gcnnza7h<br><br>SQL_ID CHILD PLAN_HASH EXECS ETIME AVG_ETIME USERNAME SQL_TEXT<br>------------- ------ ---------- ------------ ------------- ------------- ------------- -----------------------------------------<br>0qa98gcnnza7h 0 3723858078 5 356.53 71.31 SYS select avg(pk_col) from kso.skew where co<br> l1 > 0<br><br>0qa98gcnnza7h 1 568322376 1 7.92 7.92 SYS select avg(pk_col) from kso.skew where co<br> l1 > 0<br><br>0qa98gcnnza7h 2 568322376 10 52.14 5.21 SYS select avg(pk_col) from kso.skew where co<br> l1 > 0<br><br>0qa98gcnnza7h 3 568322376 30 1,064.19 35.47 KSO select avg(pk_col) from kso.skew where co<br> l1 > 0<br><br>0qa98gcnnza7h 4 3723858078 10 4,558.62 455.86 KSO select avg(pk_col) from kso.skew where co<br> l1 > 0<br><br>SQL> @awr_plan_change<br>Enter value for sql_id: 0qa98gcnnza7h<br><br> SNAP_ID NODE BEGIN_INTERVAL_TIME SQL_ID PLAN_HASH_VALUE EXECS AVG_ETIME AVG_LIO<br>---------- ------ ------------------------------ ------------- --------------- ------------ ------------ --------------<br> 3206 1 02-OCT-08 08.00.38.743 AM 0qa98gcnnza7h 568322376 4 10.359 121,722.8<br> 3235 1 03-OCT-08 01.00.44.932 PM 0qa98gcnnza7h 1 10.865 162,375.0<br> 3235 1 03-OCT-08 01.00.44.932 PM 0qa98gcnnza7h 3723858078 1 127.664 28,913,271.0<br> 3236 1 03-OCT-08 01.28.09.000 PM 0qa98gcnnza7h 568322376 1 7.924 162,585.0<br> 3236 1 03-OCT-08 01.28.09.000 PM 0qa98gcnnza7h 3723858078 1 86.682 27,751,123.0<br> 3305 1 06-OCT-08 10.00.11.988 AM 0qa98gcnnza7h 4 64.138 22,616,931.5<br> 3305 1 06-OCT-08 10.00.11.988 AM 0qa98gcnnza7h 568322376 2 5.710 81,149.0<br> 3306 1 06-OCT-08 11.00.16.490 AM 0qa98gcnnza7h 6 5.512 108,198.5<br> 3307 1 06-OCT-08 12.00.20.716 PM 0qa98gcnnza7h 2 3.824 81,149.0<br> 3328 1 07-OCT-08 08.39.20.525 AM 0qa98gcnnza7h 30 35.473 156,904.7<br> 3335 1 07-OCT-08 03.00.20.950 PM 0qa98gcnnza7h 3723858078 10 455.862 28,902,128.6<br><br>11 rows selected.<br><br>SQL> @dplan_awr<br>Enter value for sql_id: 0qa98gcnnza7h<br>Enter value for plan_hash_value: <br><br>PLAN_TABLE_OUTPUT<br>-----------------------------------------------------------------------------------------------------------------------------------------------------------<br>SQL_ID 0qa98gcnnza7h<br>--------------------<br>select avg(pk_col) from kso.skew where col1 > 0<br><br>Plan hash value: 568322376<br><br>---------------------------------------------------------------------------<br>| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |<br>---------------------------------------------------------------------------<br>| 0 | SELECT STATEMENT | | | | 44497 (100)| |<br>| 1 | SORT AGGREGATE | | 1 | 11 | | |<br>| 2 | TABLE ACCESS FULL| SKEW | 32M| 335M| 44497 (2)| 00:08:54 |<br>---------------------------------------------------------------------------<br><br>SQL_ID 0qa98gcnnza7h<br>--------------------<br>select avg(pk_col) from kso.skew where col1 > 0<br><br>Plan hash value: 3723858078<br><br>--------------------------------------------------<br>| Id | Operation | Name |<br>--------------------------------------------------<br>| 0 | SELECT STATEMENT | |<br>| 1 | SORT AGGREGATE | |<br>| 2 | TABLE ACCESS BY INDEX ROWID| SKEW |<br>| 3 | INDEX RANGE SCAN | SKEW_COL1 |<br>--------------------------------------------------<br><br>Note<br>-----<br> - rule based optimizer used (consider using cbo)<br><br>33 rows selected.<br>几点简短的评论,unstable_plan.sql 脚本显示了存在多个执行计划/执行计划存在切换的SQL语句。它们中的大多数都是亚秒级响应时间(大多数的响应时间都在一秒以内)。真正引人注目的是sql_id为0qa98gcnnza7h的语句。它被执行了62次,其中一个执行计划的平均执行时间约为25秒,而另一个计划的平均执行时间则约为314秒。awr_plan_changes.sql 脚本显示此语句一直在两个计划(3723858078 和 568322376)之间切换。3723858078 是效率较低的计划(每次执行执行 25M 左右的逻辑 I/O),而计划 568322376要好得多(每次执行只执行大约 120K 的逻辑I/O)。
当绑定变量窥探(bind variable peeking)出现问题时,我们通常会看到这种现象,即在两个或三个选项之间来回切换执行计划。这篇文章不是专门讨论绑定变量窥探问题的,因此我不会深入介绍所有细节(如果你想了解更多信息,只需在此站点上搜索相关内容)。但我要说的是,无论使用何种绑定变量(在本例中为 568322376 的执行计划),单个计划通常是可以接受的。使用 Outline、SQL Profile 或 Baseline(取决于您使用的 Oracle 版本)让 Oracle 使用特定计划非常容易。而我们在这个案例中正是通过这种方式解决了问题。
参考资料 1: http://kerryosborne.oracle-guy.com/2008/10/08/unstable-plans/
扫描上面二维码关注我如果你真心觉得文章写得不错,而且对你有所帮助,那就不妨帮忙“推荐"一下,您的“推荐”和”打赏“将是我最大的写作动力!本文版权归作者所有,欢迎转载,但未经作者同意必须保留此段声明,且在文章页面明显位置给出原文连接.
来源:程序园用户自行投稿发布,如果侵权,请联系站长删除
免责声明:如果侵犯了您的权益,请联系站长,我们会及时删除侵权内容,谢谢合作!
页:
[1]