Bo's Oracle Station

【博客文章2022】 In-Memory Column Store系列2

2022-4-2 16:03| 发布者: admin| 查看: 98| 评论: 0|原作者: Bo Tang

摘要: In-Memory Column Store系列2
【博客文章2022】In-Memory Column Store系列2

Author: Bo Tang

1. 查询中的In-Memory加载过程

    默认,当Oracle首次查询访问具有in-memory定义的表时,就会加载它进入in-memory column store。这个过程是通过后台进程异步地完成的。加载进入IMCU中,每个列就是一个cu:

[oracle@classroom demo]$ ps aux | grep ora_w.*_rcat
oracle    195676  0.0  0.7 2593160 121856 ?      Ss   04:09   0:00 ora_w000_rcat
oracle    195680  0.0  0.9 2594336 150708 ?      Ss   04:09   0:00 ora_w001_rcat
oracle    195764  0.0  0.6 2593180 105324 ?      Ss   04:09   0:00 ora_w002_rcat
oracle    195950  0.0  0.9 2595224 147008 ?      Ss   04:09   0:00 ora_w003_rcat
oracle    195981  0.0  0.9 2602944 158468 ?      Ss   04:09   0:00 ora_w004_rcat
oracle    198035  0.0  1.0 2596252 166256 ?      Ss   04:19   0:00 ora_w005_rcat
oracle    198044  0.0  0.6 2593152 100148 ?      Ss   04:19   0:00 ora_w006_rcat
oracle    198055  0.0  0.9 2595372 145828 ?      Ss   04:19   0:00 ora_w007_rcat
oracle    220764  0.0  0.0 224908  1000 pts/16   S+   06:22   0:00 grep --color=auto ora_w.*_rcat
 
    如果对象有被加载进内存中的in-memory column store,它会被显示在V$IM_SEGMENTS和V$IM_USER_SEGMENTS;如果它没有被加载就不会被显示。V$IM_USER_SEGMENTS和V$IM_SEGMENTS的区别是,它只显示当前登录数据库的用户的对象:

SQL>  create table sales2 inmemory as select * from sales;
Table created.
SQL>  select segment_name, populate_status, bytes_not_populated, bytes/1024/1024, inmemory_size/1024/1024 from v$im_user_segments;
-------------------------------------------------------------------------------------------------------------------------------
没有结果
SQL> conn  / as sysdba
Connected.
SQL>  select owner, segment_name, populate_status, bytes_not_populated, bytes/1024/1024, inmemory_size/1024/1024  from v$im_segments;
-----------------------------------------------------------------------------------------------------------------------
没有结果

    我们来查询sales2表,目的是让该表被自动加载到in-memory column store: 

SQL> select count(*) from sales2;
  COUNT(*)
----------
    918843
SQL>  select segment_name, populate_status, bytes_not_populated, bytes/1024/1024, inmemory_size/1024/1024 from v$im_user_segments;
-------------------------------------------------------------------------------------------------------------------------------
SALES2    COMPLETED    0    36    6.4375
SQL> conn / as sysdba
Connected.
SQL>  select owner, segment_name, populate_status, bytes_not_populated, bytes/1024/1024, inmemory_size/1024/1024  from v$im_segments;
-----------------------------------------------------------------------------------------------------------------------
SH    SALES2    COMPLETED    0    34.6328125    6.4375

    bytes_not_populated为0,并且populate_status为COMPLETED,说明整张表被完全加载进入in-memory column store。bytes指表实际在硬盘上存储的大小,而inmemory_size指它经过压缩(5倍)后加载在内存中的大小。该表的压缩算法是query low:

SQL> select  inmemory, inmemory_compression, inmemory_distribute, inmemory_duplicate, inmemory_priority  from user_tables where table_name='SALES2';
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
ENABLED    FOR QUERY LOW    AUTO    NO DUPLICATE    NONE

    还可以使用DBMS_INMEMORY包强制进行加载:
    或者使用DBMS_INMEMORY包强制进行重新加载:
    同时,优先级也决定了表加载的顺序:如果要发生加载,那么高优先级的表如果没有加载完毕,低优先级的表不会加载
    SH执行以下的查询语句触发inmemory的加载:

SQL> select a.table_name, b.inmemory_priority, b.populate_status,
to_char(c.createtime, 'YYYY-MM-DD:hh24:mi:ss.ff2') start_pop,
to_char(max(d.timestamp),'YYYY-MM-DD:hh24:mi:ss.ff2') finish_pop
from dba_tables a, v$im_segments b, v$im_segments_detail c, v$im_header d, dba_objects e
where a.table_name in ('CUSTOMERS','SALES2','COSTS')
and a.table_name = b.segment_name
and e.object_id = c.baseobj
and c.dataobj = d.objd
group by a.table_name, b.inmemory_priority, b.populate_status,c.createtime
order by finish_pop, a.table_name;
-------------------------------------------------------------------------------------------------------------------------------------------------------
COSTS    LOW    COMPLETED    2022-04-22:16:48:52.59    2022-04-22:16:48:53.01
CUSTOMERS    MEDIUM    COMPLETED    2022-04-22:16:48:52.59    2022-04-22:16:48:53.01
SALES2    HIGH    COMPLETED    2022-04-22:16:48:52.59    2022-04-22:16:48:53.01
COSTS    LOW    COMPLETED    2022-04-22:16:48:53.25    2022-04-22:16:48:53.26
CUSTOMERS    MEDIUM    COMPLETED    2022-04-22:16:48:53.25    2022-04-22:16:48:53.26
SALES2    HIGH    COMPLETED    2022-04-22:16:48:53.25    2022-04-22:16:48:53.26
COSTS    LOW    COMPLETED    2022-04-22:16:48:53.31    2022-04-22:16:48:53.31
CUSTOMERS    MEDIUM    COMPLETED    2022-04-22:16:48:53.31    2022-04-22:16:48:53.31
SALES2    HIGH    COMPLETED    2022-04-22:16:48:53.31    2022-04-22:16:48:53.31
COSTS    LOW    COMPLETED    2022-04-22:16:48:53.33    2022-04-22:16:48:53.34
CUSTOMERS    MEDIUM    COMPLETED    2022-04-22:16:48:53.33    2022-04-22:16:48:53.34
SALES2    HIGH    COMPLETED    2022-04-22:16:48:53.33    2022-04-22:16:48:53.34
COSTS    LOW    COMPLETED    2022-04-22:16:48:53.35    2022-04-22:16:48:53.36
CUSTOMERS    MEDIUM    COMPLETED    2022-04-22:16:48:53.35    2022-04-22:16:48:53.36
SALES2    HIGH    COMPLETED    2022-04-22:16:48:53.35    2022-04-22:16:48:53.36
COSTS    LOW    COMPLETED    2022-04-22:16:48:53.38    2022-04-22:16:48:53.39
CUSTOMERS    MEDIUM    COMPLETED    2022-04-22:16:48:53.38    2022-04-22:16:48:53.39
SALES2    HIGH    COMPLETED    2022-04-22:16:48:53.38    2022-04-22:16:48:53.39
COSTS    LOW    COMPLETED    2022-04-22:16:48:53.40    2022-04-22:16:48:53.42
CUSTOMERS    MEDIUM    COMPLETED    2022-04-22:16:48:53.40    2022-04-22:16:48:53.42
SALES2    HIGH    COMPLETED    2022-04-22:16:48:53.40    2022-04-22:16:48:53.42
COSTS    LOW    COMPLETED    2022-04-22:16:48:53.43    2022-04-22:16:48:53.44
CUSTOMERS    MEDIUM    COMPLETED    2022-04-22:16:48:53.43    2022-04-22:16:48:53.44
SALES2    HIGH    COMPLETED    2022-04-22:16:48:53.43    2022-04-22:16:48:53.44
COSTS    LOW    COMPLETED    2022-04-22:16:48:53.46    2022-04-22:16:48:53.46
CUSTOMERS    MEDIUM    COMPLETED    2022-04-22:16:48:53.46    2022-04-22:16:48:53.46
SALES2    HIGH    COMPLETED    2022-04-22:16:48:53.46    2022-04-22:16:48:53.46
COSTS    LOW    COMPLETED    2022-04-22:16:48:53.48    2022-04-22:16:48:53.48
CUSTOMERS    MEDIUM    COMPLETED    2022-04-22:16:48:53.48    2022-04-22:16:48:53.48
SALES2    HIGH    COMPLETED    2022-04-22:16:48:53.48    2022-04-22:16:48:53.48
COSTS    LOW    COMPLETED    2022-04-22:16:48:53.50    2022-04-22:16:48:53.51
CUSTOMERS    MEDIUM    COMPLETED    2022-04-22:16:48:53.50    2022-04-22:16:48:53.51
SALES2    HIGH    COMPLETED    2022-04-22:16:48:53.50    2022-04-22:16:48:53.51
COSTS    LOW    COMPLETED    2022-04-22:16:48:53.53    2022-04-22:16:48:53.53
CUSTOMERS    MEDIUM    COMPLETED    2022-04-22:16:48:53.53    2022-04-22:16:48:53.53
SALES2    HIGH    COMPLETED    2022-04-22:16:48:53.53    2022-04-22:16:48:53.53
COSTS    LOW    COMPLETED    2022-04-22:16:48:53.55    2022-04-22:16:48:53.56
CUSTOMERS    MEDIUM    COMPLETED    2022-04-22:16:48:53.55    2022-04-22:16:48:53.56
SALES2    HIGH    COMPLETED    2022-04-22:16:48:53.55    2022-04-22:16:48:53.56
COSTS    LOW    COMPLETED    2022-04-22:16:48:53.58    2022-04-22:16:48:53.59
CUSTOMERS    MEDIUM    COMPLETED    2022-04-22:16:48:53.58    2022-04-22:16:48:53.59
SALES2    HIGH    COMPLETED    2022-04-22:16:48:53.58    2022-04-22:16:48:53.59
COSTS    LOW    COMPLETED    2022-04-22:16:48:53.60    2022-04-22:16:48:53.62
CUSTOMERS    MEDIUM    COMPLETED    2022-04-22:16:48:53.60    2022-04-22:16:48:53.62
SALES2    HIGH    COMPLETED    2022-04-22:16:48:53.60    2022-04-22:16:48:53.62
COSTS    LOW    COMPLETED    2022-04-22:16:48:53.64    2022-04-22:16:48:53.64
CUSTOMERS    MEDIUM    COMPLETED    2022-04-22:16:48:53.64    2022-04-22:16:48:53.64
SALES2    HIGH    COMPLETED    2022-04-22:16:48:53.64    2022-04-22:16:48:53.64
COSTS    LOW    COMPLETED    2022-04-22:16:48:52.58    2022-04-22:16:48:53.65
CUSTOMERS    MEDIUM    COMPLETED    2022-04-22:16:48:52.58    2022-04-22:16:48:53.65
SALES2    HIGH    COMPLETED    2022-04-22:16:48:52.58    2022-04-22:16:48:53.65
COSTS    LOW    COMPLETED    2022-04-22:16:48:53.67    2022-04-22:16:48:53.68
CUSTOMERS    MEDIUM    COMPLETED    2022-04-22:16:48:53.67    2022-04-22:16:48:53.68
SALES2    HIGH    COMPLETED    2022-04-22:16:48:53.67    2022-04-22:16:48:53.68
COSTS    LOW    COMPLETED    2022-04-22:16:48:53.66    2022-04-22:16:48:53.79
CUSTOMERS    MEDIUM    COMPLETED    2022-04-22:16:48:53.66    2022-04-22:16:48:53.79
SALES2    HIGH    COMPLETED    2022-04-22:16:48:53.66    2022-04-22:16:48:53.79

    从上面的结果可以看出:每一轮加载过程中,如果优先级高的没有完成加载(FINISH_POP),那么优先级低的就不会开始下一轮加载(START_POP)。
    在inmemory加载过程中伴随着压缩,inmemory压缩用得好不仅节省内存空间而且还可以通过减少IO以提高性能,下面来讨论压缩算法。默认的压缩选项是MEMCOMPRESS FOR QUERY LOW,该选项由于在查询时不需要解压缩数据,所以提供了最快的读取数据的速度。其他的选项添加了额外的压缩层,读取数据的时候需要解压缩。

压缩语法

解释

NO MEMCOMPRESS

 没有压缩

MEMCOMPRESS FOR DML

 DML优化过的最小压缩

MEMCOMPRESS FOR QUERY LOW

 对查询优化的,不需要解压缩数据的压缩选项。这是默认选项。算法是Dictionary Encoding, Run Length Encoding and Bit-Packing.

MEMCOMPRESS FOR QUERY HIGH

 对查询优化的,同时空间节省的选项

MEMCOMPRESS FOR CAPACITY LOW

 平衡的,但是倾向于空间节省的选项。算法是在默认的算法基础上,加上对Oracle Database优化过的OZIP. 需要先解压缩才能应用where条件。

MEMCOMPRESS FOR CAPACITY HIGH

 空间最节省的选项. 需要先解压缩才能应用where条件。


   我们可以使用Oracle Compression Advisor来预判压缩比率,进而来预先知道inmemory_size要设置多大。Oracle Compression Advisor不仅可以建议硬盘上表的Basic压缩和row store advanced压缩的压缩率,还可以建议inmemeory的压缩算法:
   下面使用算法dbms_compression.comp_inmemory_capacity_high建议:


declare
  v_blkcnt_cmp number;
  v_blkcnt_uncmp  number;
  v_row_cmp number;
  v_row_uncmp number;
  v_cmp_ratio number;
  v_comptype_str  varchar2(200);
BEGIN
DBMS_COMPRESSION.GET_COMPRESSION_RATIO(scratchtbsname => 'EXAMPLE',
                                                                                             ownname =>'SH',
                                                                                             objname =>'SALES2',
                                                                                             subobjname =>null,
                                                                                             comptype => dbms_compression.comp_inmemory_capacity_high,
                                                                                             blkcnt_cmp => v_blkcnt_cmp,
                                                                                             blkcnt_uncmp =>  v_blkcnt_uncmp,
                                                                                             row_cmp =>v_row_cmp,
                                                                                             row_uncmp => v_row_uncmp,
                                                                                             cmp_ratio =>  v_cmp_ratio,
                                                                                             comptype_str =>v_comptype_str);                                                 
DBMS_OUTPUT.PUT_LINE('Blk count compressed = ' || v_blkcnt_cmp);
DBMS_OUTPUT.PUT_LINE('Blk count uncompressed = ' || v_blkcnt_uncmp);
DBMS_OUTPUT.PUT_LINE('Row count per block compressed = ' || v_row_cmp);
DBMS_OUTPUT.PUT_LINE('Row count per block uncompressed = ' || v_row_uncmp);
DBMS_OUTPUT.PUT_LINE('ratio: '||v_cmp_ratio);
DBMS_OUTPUT.PUT_LINE('Compression type = ' || v_comptype_str);
end;
/
------------------------------------------------------------------------------------------------------------------------------------------
anonymous block completed
Blk count compressed = 0
Blk count uncompressed = 4432
Row count per block compressed = 0
Row count per block uncompressed = 207
ratio: 14.2
Compression type = "In-memory Memcompress Capacity High"
 

    下面使用算法dbms_compression.comp_inmemory_query_low建议:


declare
  v_blkcnt_cmp number;
  v_blkcnt_uncmp  number;
  v_row_cmp number;
  v_row_uncmp number;
  v_cmp_ratio number;
  v_comptype_str  varchar2(200);
BEGIN
DBMS_COMPRESSION.GET_COMPRESSION_RATIO(scratchtbsname => 'EXAMPLE',
                                                                                             ownname =>'SH',
                                                                                             objname =>'SALES2',
                                                                                             subobjname =>null,
                                                                                             comptype => dbms_compression.comp_inmemory_query_low,
                                                                                             blkcnt_cmp => v_blkcnt_cmp,
                                                                                             blkcnt_uncmp =>  v_blkcnt_uncmp,
                                                                                             row_cmp =>v_row_cmp,
                                                                                             row_uncmp => v_row_uncmp,
                                                                                             cmp_ratio =>  v_cmp_ratio,
                                                                                             comptype_str =>v_comptype_str);                                                 
DBMS_OUTPUT.PUT_LINE('Blk count compressed = ' || v_blkcnt_cmp);
DBMS_OUTPUT.PUT_LINE('Blk count uncompressed = ' || v_blkcnt_uncmp);
DBMS_OUTPUT.PUT_LINE('Row count per block compressed = ' || v_row_cmp);
DBMS_OUTPUT.PUT_LINE('Row count per block uncompressed = ' || v_row_uncmp);
DBMS_OUTPUT.PUT_LINE('ratio: '||v_cmp_ratio);
DBMS_OUTPUT.PUT_LINE('Compression type = ' || v_comptype_str);
end;
/

------------------------------------------------------------------------------------------------------------------------------------------
anonymous block completed
Blk count compressed = 0
Blk count uncompressed = 4432
Row count per block compressed = 0
Row count per block uncompressed = 207
ratio: 4.6
Compression type = "In-memory Memcompress Query Low"

 

    In-memory Columnar压缩与硬盘上的表的Hybrid Columnar Compression密切相关。它们内部都采用了Intel CPU的SIMD向量处理指令集(包括Intel AVX256比特向量寄存器)。把表逆时针旋转90度后,一个向量指令可以处理许多原来的行,导致每秒可以访问10亿原来的行(现在的列)。而不采用向量处理指令,每秒最多只能访问100万行。In-memory columnar压缩与硬盘上的表的Hybrid Columnar Compression最大的不同是:前者对内存进行向量处理,而后者对硬盘进行向量处理。


路过

雷人

握手

鲜花

鸡蛋

QQ|手机版|Bo's Oracle Station   

GMT+8, 2022-4-25 09:32 , Processed in 0.058094 second(s), 21 queries .

返回顶部