【博客文章2025】在Oracle数据库中使用DOMAIN INDEX实现JSON数据的全文搜索
Author: Bo Tang
1. 在Oracle数据库中为JSON数据创建DOMAIN索引:
AI应用倾向于在JSON结构中存储信息。如果要从JSON结构中查询某些词,比如:要显示运输地址中包含国家名的采购订单,可以这样查询(实验环境来自:https://www.botangdb.com/mytec/mytec_basicadmin/202501/00900133.html):
SELECT doc FROM j1 WHERE JSON_EXISTS(doc,'$.ShippingInstructions.Address.country');
DOC |
{
"Number":4,
"Reference":"钱先生-20241202",
"Requestor":"钱先生",
"User":"SIRQIAN",
"CostCenter":"A",
"ShippingInstructions":
{
"name":"钱先生",
"Address":
{
"street":"第2街道",
"city":"长春",
"province":"吉林",
"zipCode":130000,
"country":"中国"
},
"Phone":
[
{
"type":"固定电话",
"number":"0431-88777223"
}
]
},
"Special Instructions":"",
"LineItems":
[
{
"ItemNumber":1,
"Part":
{
"Description":"《大教堂》",
"UnitPrice":19.95,
"UPCCode":53396040194
},
"Quantity":7.0
}
]
} |
{
"Number":1,
"Reference":"赵先生-20241201",
"Requestor":"赵先生",
"User":"SIRZHAO",
"CostCenter":"A",
"ShippingInstructions":
{
"name":"赵先生",
"Address":
{
"street":"第1街道",
"city":"哈尔滨",
"province":"黑龙江",
"zipCode":150000,
"country":"中国"
},
"Phone":
[
{
"type":"固定电话",
"number":"0451-86779817"
}
]
},
"Special Instructions":"平寄邮件",
"LineItems":
[
{
"ItemNumber":1,
"Part":
{
"Description":"《哲学与社会——老年沉思录》",
"UnitPrice":19.95,
"UPCCode":43396040144
},
"Quantity":7.0
},
{
"ItemNumber":2,
"Part":
{
"Description":"《陈平原文集》",
"UnitPrice":19.95,
"UPCCode":43396040145
},
"Quantity":7.0
},
{
"ItemNumber":3,
"Part":
{
"Description":"《蘑菇猎人:探寻北美野生蘑菇的地下世界》",
"UnitPrice":19.95,
"UPCCode":12236101345
},
"Quantity":1.0
},
{
"ItemNumber":4,
"Part":
{
"Description":"《楼兰新史(增订本)》",
"UnitPrice":19.95,
"UPCCode":85391756323
},
"Quantity":8.0
},
{
"ItemNumber":5,
"Part":
{
"Description":"《人类思想发展史:关于古代近东思辨思想的讨论》",
"UnitPrice":19.95,
"UPCCode":13023009592
},
"Quantity":8.0
},
{
"ItemNumber":6,
"Part":
{
"Description":"《韩家往事》",
"UnitPrice":19.95,
"UPCCode":27616864451
},
"Quantity":8.0
}
]
}
Execution Plan ---------------------------------------------------------- Plan hash value: 2534316378
----------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ----------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 1310 | 12 (0)| 00:00:01 | |* 1 | TABLE ACCESS BY INDEX ROWID BATCHED| J1 | 1 | 1310 | 12 (0)| 00:00:01 | | 2 | BITMAP CONVERSION TO ROWIDS | | | | | | | 3 | BITMAP INDEX FULL SCAN | IBMP_J1 | | | | | -----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id): ---------------------------------------------------
1 - filter(JSON_EXISTS2("DOC" FORMAT JSON , '$.ShippingInstructions.Address.country' FALSE ON ERROR TYPE(LAX) )=1)
Statistics ---------------------------------------------------------- 1 recursive calls 0 db block gets 6 consistent gets 0 physical reads 0 redo size 1196 bytes sent via SQL*Net to client 1071 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 2 rows processed
|
|
从执行计划上来看,COST为12。访问路径中使用了位图函数索引,这是因为对这种索引进行的全扫描代替了全表访问(受上个实验的影响)。如果该表包含数万以上的行,查询将会非常慢,将需要使用索引。对于这种类型的搜索,只有DOMAIN索引适合。
依靠上下文基础结构为JSON数据编制索引:
CREATE INDEX idomain_j1 ON j1 ( doc) indextype is ctxsys.context parameters('section group CTXSYS.JSON_SECTION_GROUP');
Index IDOMAIN_J1 已创建。
|
再次执行相同的查询,查看执行计划:
Execution Plan ---------------------------------------------------------- Plan hash value: 280323407
------------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 1 | 1322 | 4 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID| J1 | 1 | 1322 | 4 (0)| 00:00:01 | |* 2 | DOMAIN INDEX | IDOMAIN_J1 | | | 4 (0)| 00:00:01 | ------------------------------------------------------------------------------------------
Predicate Information (identified by operation id): ---------------------------------------------------
2 - access("CTXSYS"."CONTAINS"("J1"."DOC",'HASPATH(/ShippingInstructions/Addres s/country)')>0)
Statistics ---------------------------------------------------------- 2373 recursive calls 20 db block gets 3000 consistent gets 52 physical reads 2884 redo size 1196 bytes sent via SQL*Net to client 1071 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 110 sorts (memory) 0 sorts (disk) 2 rows processed
|
再次查询时,使用了IDOMAIN_J1 DOMAIN索引。谓词信息显示:通过搜索大于0的ShippingInstructions.Address.country键值作为访问路径。COST从原来的12下降为4。
2. 使用JSON_TEXTCONTAINS运算符进行全文搜索查询:
JSON_TEXTCONTAINS 运算符在JSON 路径下执行全文搜索,使用到了IDOMAIN_J1 DOMAIN索引:
SELECT count(*) FROM j1 WHERE JSON_TEXTCONTAINS( doc,'$.LineItems.Part.Description','Attraction'); 2 3
COUNT(*) ---------- 0
Execution Plan ---------------------------------------------------------- Plan hash value: 3425708830
------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 1322 | 4 (0)| 00:00:01 | | 1 | SORT AGGREGATE | | 1 | 1322 | | | |* 2 | DOMAIN INDEX | IDOMAIN_J1 | | | 4 (0)| 00:00:01 | -------------------------------------------------------------------------------
Predicate Information (identified by operation id): ---------------------------------------------------
2 - access("CTXSYS"."CONTAINS"("J1"."DOC",'(Attraction) INPATH (/LineItems/Part/Description)')>0)
Statistics ---------------------------------------------------------- 15 recursive calls 0 db block gets 19 consistent gets 0 physical reads 0 redo size 585 bytes sent via SQL*Net to client 108 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 2 sorts (memory) 0 sorts (disk) 1 rows processed
|
谓词信息显示:通过搜索大于0的LineItems.Part.Description键值作为访问路径。如果不存在 JSON 搜索索引,则使用JSON_TEXTCONTAINS运算符的搜索将失败:
drop index idomain_j1;
Index dropped.
SQL> SELECT count(*) FROM j1 WHERE JSON_TEXTCONTAINS( doc,'$.LineItems.Part.Description','Attraction'); 2 3 SELECT count(*) FROM j1 * ERROR at line 1: ORA-40467: JSON_TEXTCONTAINS() cannot be evaluated without a JSON-enabled context index Help: https://docs.oracle.com/error-help/db/ora-40467/
|
|