2019年12月29日 (日)

RDFナレッジ・グラフ / 備忘録 19c

5年ぐらい前に触ったことがある程度で、すっかり忘れてしまった。しばらくぶりで思い出してパタパタしてみたら、Apache Jenaも含めていろいろ変わっていたので備忘録

RDFナレッジ・グラフ開発者ガイド RDFナレッジ・グラフの概要

R2RML: RDB to RDF Mapping Language

A Direct Mapping of Relational Data to RDF

いろいろ思い出しながら....

$ sqlplus sys@orclcdb as sysdba

SQL*Plus: Release 19.0.0.0.0 - Production on Sat Dec 28 00:51:52 2019
Version 19.3.0.0.0

Copyright (c) 1982, 2019, Oracle. All rights reserved.

Enter password:

Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.3.0.0.0

SQL> col namespace for a30
SQL> col value for a40
SQL> col description for a50
SQL> set linesize 400
SQL> set tab off
SYS@orclcdb> select * from MDSYS.RDF_PARAMETER;

NAMESPACE ATTRIBUTE VALUE DESCRIPTION
------------------- ----------------- ------------- --------------------------------------------------
COMPONENT RDFCTX INSTALLED Semantic (Text) Search component
COMPONENT RDFOLS INSTALLED RDF Optional component for OLS support
MDSYS SEM_VERSION 19.1.0.0.0 VALID

SYS@orclcdb> r
1 SELECT namespace, attribute, value FROM mdsys.rdf_parameter
2 WHERE namespace='MDSYS'
3 AND attribute IN ('FLOAT_DOUBLE_DECIMAL',
4 'XSD_TIME', 'XSD_BOOLEAN',
5* 'DATA_CONVERSION_CHECK')

no rows selected

SYS@orclcdb> conn system@orcl
Enter password:
Connected.
SYSTEM@orcl> create tablespace rdf_users datafile 'rds_users01.dbf' size 128m reuse autoextend on next 64m maxsize unlimited segment space management auto;

Tablespace created.

SYSTEM@orcl> create user rdfuser identified by hogehoge default tablespace rdf_users temporary tablespace temp;

User created.

SYSTEM@orcl> grant create view to rdfuser;

Grant succeeded.

SYSTEM@orcl> grant unlimited tablespace to rdfuser;

Grant succeeded.

SYSTEM@orcl> grant unlimited tablespace to mdsys;

Grant succeeded.

SYSTEM@orcl> grant select any dictionary to rdfuser;

Grant succeeded.

SYSTEM@orcl> grant connect, resource to rdfuser;

Grant succeeded.

SYSTEM@orcl> execute sem_apis.create_sem_network('RDF_USERS', network_owner=>'RDFUSER', network_name=>'LOCALNET');

PL/SQL procedure successfully completed.


SYSTEM@orcl> conn rdfuser@orcl
Enter password:
Connected.
RDFUSER@orcl>

RDFUSER@orcl> l
1 create table r2rmlview_nt_staging_tab (
2 rdf$stc_sub varchar2(4000) not null
3 ,rdf$stc_pred varchar2(4000) not null
4 ,rdf$stc_obj varchar2(4000) not null
5 )
6* nologging
RDFUSER@orcl> /

Table created.

RDFUSER@orcl> l
1 create table rdfview_export_tab (
2 rdf$stc_sub varchar2(4000) not null
3 ,rdf$stc_pred varchar2(4000) not null
4 ,rdf$stc_obj varchar2(4000) not null
5 )
6* nologging
RDFUSER@orcl> /

Table created.

RDFでいろいろやるには sga_targetやsga_max_sizeはそれなりに必要なので適当に調整

そして、Apache Jenaのrdfcatもdeprecatedになってて、これまたしばし時代に追いつく作業を....

RDFUSER@orcl> !rdfcat --help
------------------------------------------------------------------
DEPRECATED: Please use 'riot' instead.
http://jena.apache.org/documentation/io/#command-line-tools
------------------------------------------------------------------

------------------------------------
DEPRECATED: Please use riot instead.
------------------------------------

Usage: java jena.rdfcat (option|input)*
Concatenates the contents of zero or more input RDF documents.
Options: -out N3 | N-TRIPLE | RDF/XML | RDF/XML-ABBREV
-n expect subsequent inputs in N3 syntax
-x expect subsequent inputs in RDF/XML syntax
-t expect subsequent inputs in N-TRIPLE syntax
-[no]include include rdfs:seeAlso and owl:imports
input can be filename, URL, or - for stdin
Recognised aliases for -n are: -n3 -ttl or -N3
Recognised aliases for -x are: -xml -rdf or -rdfxml
Recognised aliases for -t are: -ntriple
Output format aliases: x, xml or rdf for RDF/XML, n, n3 or ttl for N3, t or ntriple for N-TRIPLE
See the Javadoc for jena.rdfcat for additional details.

rdfcatに変えて、riot ってやつをつかわなきゃいけなくなったっぽい。

RDFUSER@orcl> !riot -version
Jena: VERSION: 3.13.1
Jena: BUILD_DATE: 2019-10-06T18:57:39+0000
RIOT: VERSION: 3.13.1
RIOT: BUILD_DATE: 2019-10-06T18:57:39+0000

ちなみに、SHユーザのEMPLOYEES表や索引をRDFUSERユーザへコピーした上で、以下のような、RDFVIEWとRelational表と列のマッピングをTurtle (Terse RDF Triple Language)で定義した。

RDFUSER@orcl> !cat test_real_rdf_r2rml.ttl
@prefix rr: .
@prefix xsd: .
@prefix ex: .

ex:TriplesMap_Employees
rr:logicalTable [ rr:tableName "EMPLOYEES" ];
rr:subjectMap [
rr:template "http://r2rml.com/employees/{EMPLOYEE_ID}";
rr:class ex:Employees;
];

rr:predicateObjectMap [
rr:predicate ex:FirstName;
rr:objectMap [ rr:column "FIRST_NAME" ];
];

rr:predicateObjectMap [
rr:predicate ex:LastName;
rr:objectMap [ rr:column "LAST_NAME" ];
];

rr:predicateObjectMap [
rr:predicate ex:ManagerId;
rr:objectMap [ rr:column "MANAGER_ID" ];
].

5年ぐらい前の記憶ではrdfcatを利用して変換していたが、今は、 riot というコマンドを利用してTurtle(Terse RDF Triple Language)N-Triplesへ変換するみたいね。

RDFUSER@orcl> !riot --out=N-TRIPLE test_real_rdf_r2rml.ttl > test_real_rdf_r2rml_use_riot.nt

RDFUSER@orcl> !cat test_real_rdf_r2rml_use_riot.nt
_:Bf70c7f0d1b418dc63ad89dbcea313cd1 "EMPLOYEES" .
_:Bf70c7f0d1b418dc63ad89dbcea313cd1 .
_:Bdd4cf5eb9ecb0b12212d342518513827 "http://r2rml.com/employees/{EMPLOYEE_ID}" .
_:Bdd4cf5eb9ecb0b12212d342518513827 .
_:Bdd4cf5eb9ecb0b12212d342518513827 .
_:B310abfde7d61aac8f303fbd1f4ba5db8 .
_:Bfe48b8372e4da59d0ce99e0e24b894ad "FIRST_NAME" .
_:B310abfde7d61aac8f303fbd1f4ba5db8 _:Bfe48b8372e4da59d0ce99e0e24b894ad .
_:B310abfde7d61aac8f303fbd1f4ba5db8 .
_:B11c76919901c01b44cb0e2507a997c28 .
_:B5474fc7a1b7c7302521175dac3c58028 "LAST_NAME" .
_:B11c76919901c01b44cb0e2507a997c28 _:B5474fc7a1b7c7302521175dac3c58028 .
_:B11c76919901c01b44cb0e2507a997c28 .
_:Be7008cecff6b130ea15b4ba060cee0fb .
_:B0f41936e13c4655082ee47f866dc9f61 "MANAGER_ID" .
_:Be7008cecff6b130ea15b4ba060cee0fb _:B0f41936e13c4655082ee47f866dc9f61 .
_:Be7008cecff6b130ea15b4ba060cee0fb .


SQL*Loaderを使用したステージング表へのN-Triple形式のデータのロードを参考

RDFUSER@orcl> !cat test_real_rdf_r2rml_nt_load.ctl
UNRECOVERABLE
LOAD DATA
TRUNCATE
into table r2rmlview_nt_staging_tab
when (1) <> '#'
(
RDF$STC_sub CHAR(4000) terminated by whitespace
"(
CASE
WHEN substr(:RDF$STC_sub,1,1)='<' AND substr(:RDF$STC_sub,-1,1)='>' AND
length(:RDF$STC_sub)>2
THEN :RDF$STC_sub
WHEN substr(:RDF$STC_sub,1,2)='_:' AND
REGEXP_LIKE(:RDF$STC_sub,'^(_:)[[:alpha:]][[:alnum:]]*$')
THEN :RDF$STC_sub
WHEN substr(:RDF$STC_sub,1,1) NOT IN ('\"','<','#') AND
substr(:RDF$STC_sub,-1,1) NOT IN ('\"','>')
THEN ('<' || SDO_RDF.replace_rdf_prefix(:RDF$STC_sub) || '>')
WHEN substr(:RDF$STC_sub,1,1)='#'
THEN SDO_RDF.raise_parse_error(
'Ignored Comment Line starting with ', :RDF$STC_sub)
ELSE SDO_RDF.raise_parse_error('Invalid Subject', :RDF$STC_sub)
END
)",
RDF$STC_pred CHAR(4000) terminated by whitespace
"(
CASE
WHEN substr(:RDF$STC_pred,1,1)='<' AND substr(:RDF$STC_pred,-1,1)='>' AND
length(:RDF$STC_pred)>2
THEN :RDF$STC_pred
WHEN substr(:RDF$STC_pred,1,2) != '_:' AND
substr(:RDF$STC_pred,1,1) NOT IN ('\"','<') AND
substr(:RDF$STC_pred,-1,1) NOT IN ('\"','>')
THEN ('<' || SDO_RDF.replace_rdf_prefix(:RDF$STC_pred) || '>')
ELSE SDO_RDF.raise_parse_error('Invalid Predicate', :RDF$STC_pred)
END
)",
--
-- right-trimming of WHITESPACEs is reqd for "RDF$STC_obj"
-- (due to absence of "TERMINATED BY WHITESPACE")
--
-- For ease of editing below replace
-- "rtrim(:RDF$STC_obj,'. '||CHR(9)||CHR(10)||CHR(13))" with ":xy".
-- and then replace back
--
RDF$STC_obj CHAR(4000)
"(
CASE
WHEN substr(rtrim(:RDF$STC_obj,'. '||CHR(9)||CHR(10)||CHR(13)),1,1)='<' AND
substr(rtrim(:RDF$STC_obj,'. '||CHR(9)||CHR(10)||CHR(13)),-1,1)='>' AND
length(rtrim(:RDF$STC_obj,'. '||CHR(9)||CHR(10)||CHR(13)))>2
THEN rtrim(:RDF$STC_obj,'. '||CHR(9)||CHR(10)||CHR(13))
WHEN substr(rtrim(:RDF$STC_obj,'. '||CHR(9)||CHR(10)||CHR(13)),1,1)='\"' AND
substr(rtrim(:RDF$STC_obj,'. '||CHR(9)||CHR(10)||CHR(13)),-1,1)='\"' AND
length(rtrim(:RDF$STC_obj,'. '||CHR(9)||CHR(10)||CHR(13)))>1
THEN rtrim(:RDF$STC_obj,'. '||CHR(9)||CHR(10)||CHR(13))
WHEN substr(rtrim(:RDF$STC_obj,'. '||CHR(9)||CHR(10)||CHR(13)),1,2)='_:' AND
REGEXP_LIKE(rtrim(:RDF$STC_obj,'. '||CHR(9)||CHR(10)||CHR(13)),
'^(_:)[[:alpha:]][[:alnum:]]*$')
THEN rtrim(:RDF$STC_obj,'. '||CHR(9)||CHR(10)||CHR(13))
WHEN substr(rtrim(:RDF$STC_obj,'. '||CHR(9)||CHR(10)||CHR(13)),1,1)
NOT IN ('\"','<') AND
substr(rtrim(:RDF$STC_obj,'. '||CHR(9)||CHR(10)||CHR(13)),-1,1)
NOT IN ('\"','>')
THEN ('<' ||
SDO_RDF.replace_rdf_prefix(
rtrim(:RDF$STC_obj,'. '||CHR(9)||CHR(10)||CHR(13))) ||
'>')
WHEN substr(rtrim(:RDF$STC_obj,'. '||CHR(9)||CHR(10)||CHR(13)),1,1)='\"' AND
substr(rtrim(:RDF$STC_obj,'. '||CHR(9)||CHR(10)||CHR(13)),-1,1)
NOT IN ('\"','>') AND
instr(rtrim(:RDF$STC_obj,'. '||CHR(9)||CHR(10)||CHR(13)),'\"\@',-1)>1 AND
REGEXP_LIKE(rtrim(:RDF$STC_obj,'. '||CHR(9)||CHR(10)||CHR(13)),
'^\"[[:print:]]*\"\@[[:alpha:]]+(-[[:alnum:]]+)*$')
THEN rtrim(:RDF$STC_obj,'. '||CHR(9)||CHR(10)||CHR(13))
WHEN (substr(rtrim(:RDF$STC_obj,'. '||CHR(9)||CHR(10)||CHR(13)),1,1)='\"' AND
instr(rtrim(:RDF$STC_obj,'. '||CHR(9)||CHR(10)||CHR(13)),'\"^^',-1)>1 AND
(length(rtrim(:RDF$STC_obj,'. '||CHR(9)||CHR(10)||CHR(13)))-
(instr(rtrim(:RDF$STC_obj,'. '||CHR(9)||CHR(10)||CHR(13)),'\"^^',-1)+4)
)>0)
THEN SDO_RDF.pov_typed_literal(
rtrim(:RDF$STC_obj,'. '||CHR(9)||CHR(10)||CHR(13)))
ELSE SDO_RDF.raise_parse_error(
'Invalid Object', rtrim(:RDF$STC_obj,'. '||CHR(9)||CHR(10)||CHR(13)))
END
)"
)

RDFUSER@orcl> exit


$ sqlldr userid=rdfuser@orcl control=test_real_rdf_r2rml_nt_load.ctl data=test_real_rdf_r2rml_use_riot.nt direct=true skip=0 load=1000000 discardmax=0 bad=test_real_rdf_r2rml_nt_load.bad discard=test_real_rdf_r2rml_nt_load.rej log=test_real_rdf_r2rml_nt_load.log errors=1000000
Password:

SQL*Loader: Release 19.0.0.0.0 - Production on Sat Dec 28 02:05:44 2019
Version 19.3.0.0.0

Copyright (c) 1982, 2019, Oracle and/or its affiliates. All rights reserved.

Path used: Direct, LOAD=1000000

Load completed - logical record count 17.

Table R2RMLVIEW_NT_STAGING_TAB:
17 Rows successfully loaded.

Check the log file:
test_real_rdf_r2rml_nt_load.log
for more information about the load.

そして、SEM_MATCH()関数を利用してモデルと問い合わせると....

RDFUSER@orcl> @test_real_rdf_cre_rdfview.sql

PL/SQL procedure successfully completed.

RDFUSER@orcl> l
1 SELECT
2 s
3 , p
4 , o
5 FROM
6 TABLE (
7 SEM_MATCH (
8 '{?s ?p ?o}'
9 , SEM_MODELS('TEST_MODEL1')
10 , null
11 , null
12 , null
13 , null
14 , ' '
15 , null
16 , null
17 , 'RDFUSER'
18 , 'LOCALNET'
19 )
20 )
21 ORDER BY
22 s
23 ,p
24*
RDFUSER@orcl>
RDFUSER@orcl> @test_query_rdfview.sql

S P O
---------------------------------------- -------------------------------------------------- ----------------------------------------
http://r2rml.com/employees/100 http://r2rml.com/ns#FirstName Steven
http://r2rml.com/employees/100 http://r2rml.com/ns#LastName King
http://r2rml.com/employees/100 http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://r2rml.com/ns#Employees
http://r2rml.com/employees/101 http://r2rml.com/ns#FirstName Neena
http://r2rml.com/employees/101 http://r2rml.com/ns#LastName Kochhar
http://r2rml.com/employees/101 http://r2rml.com/ns#ManagerId 100
http://r2rml.com/employees/101 http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://r2rml.com/ns#Employees
http://r2rml.com/employees/102 http://r2rml.com/ns#FirstName Lex
http://r2rml.com/employees/102 http://r2rml.com/ns#LastName De Haan
http://r2rml.com/employees/102 http://r2rml.com/ns#ManagerId 100

・・・中略・・・

http://r2rml.com/employees/205 http://r2rml.com/ns#FirstName Shelley
http://r2rml.com/employees/205 http://r2rml.com/ns#LastName Higgins
http://r2rml.com/employees/205 http://r2rml.com/ns#ManagerId 101
http://r2rml.com/employees/205 http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://r2rml.com/ns#Employees
http://r2rml.com/employees/206 http://r2rml.com/ns#FirstName William
http://r2rml.com/employees/206 http://r2rml.com/ns#LastName Gietz
http://r2rml.com/employees/206 http://r2rml.com/ns#ManagerId 205
http://r2rml.com/employees/206 http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://r2rml.com/ns#Employees

| | コメント (0)