Bo's Oracle Station

【博客文章2025】在Oracle数据库中使用DOMAIN INDEX实现JSON数据的全文搜索

2025-1-20 17:29| 发布者: admin| 查看: 62| 评论: 0|原作者: Bo Tang

摘要: 本博客介绍了如何在Oracle数据库中为JSON数据创建DOMAIN索引以加速查询。对使用JSON_TEXTCONTAINS运算符进行全文搜索查询给出实验案例。这样,我们在Oracle数据库中使用DOMAIN INDEX实现了JSON数据的全文搜索。
【博客文章2025】在Oracle数据库中使用DOMAIN INDEX实现JSON数据的全文搜索


Author: Bo Tang

1. 在Oracle数据库中为JSON数据创建DOMAIN索引:

    AI应用倾向于在JSON结构中存储信息。如果要从JSON结构中查询某些词,比如:要显示运输地址中包含国家名的采购订单,可以这样查询(实验环境来自:https://www.botangdb.com/mytec/mytec_basicadmin/202501/00900133.html):

SELECT doc FROM j1
 WHERE JSON_EXISTS(doc,'$.ShippingInstructions.Address.country');



DOC
{ "Number":4, "Reference":"钱先生-20241202", "Requestor":"钱先生", "User":"SIRQIAN", "CostCenter":"A", "ShippingInstructions": { "name":"钱先生", "Address": { "street":"第2街道", "city":"长春", "province":"吉林", "zipCode":130000, "country":"中国" }, "Phone": [ { "type":"固定电话", "number":"0431-88777223" } ] }, "Special Instructions":"", "LineItems": [ { "ItemNumber":1, "Part": { "Description":"《大教堂》", "UnitPrice":19.95, "UPCCode":53396040194 }, "Quantity":7.0 } ] }
{ "Number":1, "Reference":"赵先生-20241201", "Requestor":"赵先生", "User":"SIRZHAO", "CostCenter":"A", "ShippingInstructions": { "name":"赵先生", "Address": { "street":"第1街道", "city":"哈尔滨", "province":"黑龙江", "zipCode":150000, "country":"中国" }, "Phone": [ { "type":"固定电话", "number":"0451-86779817" } ] }, "Special Instructions":"平寄邮件", "LineItems": [ { "ItemNumber":1, "Part": { "Description":"《哲学与社会——老年沉思录》", "UnitPrice":19.95, "UPCCode":43396040144 }, "Quantity":7.0 }, { "ItemNumber":2, "Part": { "Description":"《陈平原文集》", "UnitPrice":19.95, "UPCCode":43396040145 }, "Quantity":7.0 }, { "ItemNumber":3, "Part": { "Description":"《蘑菇猎人:探寻北美野生蘑菇的地下世界》", "UnitPrice":19.95, "UPCCode":12236101345 }, "Quantity":1.0 }, { "ItemNumber":4, "Part": { "Description":"《楼兰新史(增订本)》", "UnitPrice":19.95, "UPCCode":85391756323 }, "Quantity":8.0 }, { "ItemNumber":5, "Part": { "Description":"《人类思想发展史:关于古代近东思辨思想的讨论》", "UnitPrice":19.95, "UPCCode":13023009592 }, "Quantity":8.0 }, { "ItemNumber":6, "Part": { "Description":"《韩家往事》", "UnitPrice":19.95, "UPCCode":27616864451 }, "Quantity":8.0 } ] }






Execution Plan
----------------------------------------------------------
Plan hash value: 2534316378

-----------------------------------------------------------------------------------------------
| Id  | Operation                | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |          |     1 |  1310 |    12    (0)| 00:00:01 |
|*  1 |  TABLE ACCESS BY INDEX ROWID BATCHED| J1      |     1 |  1310 |    12    (0)| 00:00:01 |
|   2 |   BITMAP CONVERSION TO ROWIDS        |          |       |       |        |          |
|   3 |    BITMAP INDEX FULL SCAN        | IBMP_J1 |       |       |        |          |
-----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(JSON_EXISTS2("DOC" FORMAT JSON ,
          '$.ShippingInstructions.Address.country' FALSE ON ERROR TYPE(LAX) )=1)



Statistics
----------------------------------------------------------
      1  recursive calls
      0  db block gets
      6  consistent gets
      0  physical reads
      0  redo size
       1196  bytes sent via SQL*Net to client
       1071  bytes received via SQL*Net from client
      2  SQL*Net roundtrips to/from client
      0  sorts (memory)
      0  sorts (disk)
      2  rows processed

    从执行计划上来看,COST为12。访问路径中使用了位图函数索引,这是因为对这种索引进行的全扫描代替了全表访问(受上个实验的影响)。如果该表包含数万以上的行,查询将会非常慢,将需要使用索引。对于这种类型的搜索,只有DOMAIN索引适合。
    依靠上下文基础结构为JSON数据编制索引:

CREATE INDEX  idomain_j1
ON j1  ( doc)
indextype is ctxsys.context
parameters('section group
CTXSYS.JSON_SECTION_GROUP');

Index IDOMAIN_J1 已创建。

    再次执行相同的查询,查看执行计划:

Execution Plan
----------------------------------------------------------
Plan hash value: 280323407

------------------------------------------------------------------------------------------
| Id  | Operation            | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |         |     1 |  1322 |     4   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| J1     |     1 |  1322 |     4   (0)| 00:00:01 |
|*  2 |   DOMAIN INDEX            | IDOMAIN_J1 |     |     |     4   (0)| 00:00:01 |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("CTXSYS"."CONTAINS"("J1"."DOC",'HASPATH(/ShippingInstructions/Addres
          s/country)')>0)


Statistics
----------------------------------------------------------
       2373  recursive calls
     20  db block gets
       3000  consistent gets
     52  physical reads
       2884  redo size
       1196  bytes sent via SQL*Net to client
       1071  bytes received via SQL*Net from client
      2  SQL*Net roundtrips to/from client
    110  sorts (memory)
      0  sorts (disk)
      2  rows processed

    再次查询时,使用了IDOMAIN_J1 DOMAIN索引。谓词信息显示:通过搜索大于0的ShippingInstructions.Address.country键值作为访问路径。COST从原来的12下降为4

2. 使用JSON_TEXTCONTAINS运算符进行全文搜索查询:

    JSON_TEXTCONTAINS 运算符在JSON 路径下执行全文搜索,使用到了IDOMAIN_J1 DOMAIN索引:

SELECT count(*) FROM j1
WHERE JSON_TEXTCONTAINS(
doc,'$.LineItems.Part.Description','Attraction');  2    3  

  COUNT(*)
----------
     0


Execution Plan
----------------------------------------------------------
Plan hash value: 3425708830

-------------------------------------------------------------------------------
| Id  | Operation     | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |          |     1 |  1322 |     4    (0)| 00:00:01 |
|   1 |  SORT AGGREGATE  |          |     1 |  1322 |        |          |
|*  2 |   DOMAIN INDEX     | IDOMAIN_J1 |       |       |     4    (0)| 00:00:01 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("CTXSYS"."CONTAINS"("J1"."DOC",'(Attraction) INPATH
          (/LineItems/Part/Description)')>0)


Statistics
----------------------------------------------------------
     15  recursive calls
      0  db block gets
     19  consistent gets
      0  physical reads
      0  redo size
    585  bytes sent via SQL*Net to client
    108  bytes received via SQL*Net from client
      2  SQL*Net roundtrips to/from client
      2  sorts (memory)
      0  sorts (disk)
      1  rows processed

    谓词信息显示:通过搜索大于0的LineItems.Part.Description键值作为访问路径。如果不存在 JSON 搜索索引,则使用JSON_TEXTCONTAINS运算符的搜索将失败:

drop index idomain_j1;

Index dropped.

SQL> SELECT count(*) FROM j1
WHERE JSON_TEXTCONTAINS(
doc,'$.LineItems.Part.Description','Attraction');  2    3  
SELECT count(*) FROM j1
                     *
ERROR at line 1:
ORA-40467: JSON_TEXTCONTAINS() cannot be evaluated without a JSON-enabled context index
Help: https://docs.oracle.com/error-help/db/ora-40467/






路过

雷人

握手

鲜花

鸡蛋

QQ|手机版|Bo's Oracle Station   

GMT+8, 2025-1-21 22:21 , Processed in 0.039093 second(s), 21 queries .

返回顶部