In OracleText, diacritics are treated as separate characters by default. For example, searching for the word Daniel will not find Daniél. The full-text index can be configured to transform letters with diacritics to regular letters by enabling a setting on the lexer.
By default, Blueriq uses the CTX_SYS.DEFAULT_LEXER
which comes pre-configured with various settings depending on the language used when the database was installed. For example, if the Oracle database was installed with the Dutch language, the default lexer has composite indexing and alternate spelling enabled for the Dutch language. If a custom lexer is defined, make sure not to omit any settings from the default lexer that you would like to keep.
Creating the lexer
A custom lexer that transforms letters with diacritics to their normal counterparts must be created as follows:
begin ctx_ddl.create_preference('example_lexer', 'BASIC_LEXER'); ctx_ddl.set_attribute('example_lexer', 'base_letter', 'yes'); -- yes = transform diacritics, no = do not transform diacritics end;
Setting the lexer at index creation time
The custom lexer is specified as an index parameter:
drop index aq_fulltext_index; create index aq_fulltext_index on aq_fulltext(text) indextype is ctxsys.context parameters ('datastore aq_fulltext_uds lexer example_lexer sync(every "sysdate+1/24")');
Changing the lexer without recreating the index
The lexer can also be changed without dropping the index first:
alter index aq_fulltext_index parameters ('replace metadata lexer example_lexer'); alter index aq_fulltext_index rebuild;
For this option, the following points need to be taken into consideration:
- the 'alter index rebuild parameters' may not be used to change the lexer, instead changing the lexer and rebuilding the index must be done using two separate statements
- simply changing the value of an attribute on a lexer preference has no effect on the full-text index, the 'alter index parameters' statement must be used
- after the lexer is changed, the effects will not become visible until the index is rebuilt