Skip to content

Search

Technical Overview

The search is a central helper component in XELOS and provides the search and indexing logics.

Starting with XELOS 8 the search index is being migrated to a new logic. While in prior versions each module implemented a add_to_index() function in the controller and primarily worked with the old document id triplet (instance_id, post_type and post_id) the new system works only based on the centralized document_index_id. Therefore the indexing logic has been moved to the document model which require to implement SearchableDocumentModel. In the new logic the indexing flow is as follows:

  1. Frontend: Document is saved using the model's save()
  2. Frontend: A Document Index Event is generated
  3. Backend / Daemon: The search received the Index Event and calls the onSearchIndexUpdate() function of the model (if implemented) which returns a SearchIndexRecord object
  4. Backend / Daemon: All Search Pre-Processor hooks are being called to allow other modules to enhance the search index entry (e.g. Translation)
  5. Backend / Daemon: The SearchIndexRecord is finally being PreProcessed by the search module to add additional data (e.g. tags, attachment full-text,...)
  6. Backend / Daemon: The configured search backend adapter stores the content of the SearchIndexRecord object in its index

SSL encryption

The Elastic documentation describes how to configure the Docker container for SSL encrypted traffic.

The host has to be changed from http:// to https://.

Additionally, when the CN in the server certificate doesn't match the following constant has to be set in the config.custom.php:

define('XF_ELASTIC_SSL_VERIFY_SERVER', false);

SearchIndexRecord

The SearchIndexRecord provides the basic model to transfer indexable information to the indexing backend. It implements the basic index field structure and gives developers a convenient interface.

Search Adapter

The SEARCH module uses ElasticSearch as the default Backend.

The former optional SQL Adapter is marked as deprecated in XELOS 8 and has been removed in favor of the faster Elastic Search starting with XELOS 9.

Languages

The Search supports indexing of foreign language content. While the SQL Adapter only supports a basic search implementation, the ElasticSearch provides a fully features implementation.

Each document (Document Index) may have multiple language versions of the same content. The search differentiates content which is in another language and has been created by the user (e.g. a news post which has been published in 1-n specific languages) and translated content which was automatically generated (e.g. by the translation module). This differentiation is being made to avoid displaying all documents in all languages in systems where auto-translation is being enabled.

Facet Filtering

The filtering of i18n documents on the search page is being implemented similar to the Google search logic:

  • If the user searches a document all languages variants are searched as we can not be sure if the searchterm is in a specific language. In this case all matching documents are being displayed. If multiple variants of one document exist only one variant is being displayed - in case a variant in a users language is found this variant will be displayed.

  • If the users applies the language filter only documents are being displayed which specific language variant matches to the search term.

  • Documents which have only a language unspecific version (locale = *) are included in the unfiltered search (default result) but are not included if the result is filtered for a specific language.

For Developers

Helper: Search

The search library included in XELOS delivers search functionality across all modules via an SQL or ELASTIC SEARCH interface.

Adding documents to the search index

If you already have a module with a document model, the easiest way to add your document data to the central search is by implementing the SearchableDocumentModelInterface in your DocumentModel:

MyDocumentModel.php (Example)

<?php
use XELOS\Framework\Module\Model\DocumentModel;
use XELOS\Framework\Module\Model\Interfaces\SearchableDocumentInterface;
use XELOS\Modules\Search\SearchIndexRecord;

class MyDocumentModel extends DocumentModel implements SearchableDocumentInterface {
    # [...]

    public function onSearchIndexUpdate(SearchIndexRecord $searchIndexRecord): SearchIndexRecord {

        // Set Base-Index
        $searchIndexRecord
            ->setIndexContent($this->myContent)
            ->setTitle($this->title)
            ->setSummary($this->description);

        // If additional language versions are available -> Add them as translated content
        $searchIndexRecord->addTranslatedContent('EN_GB')->addTitle($this->title_en);

        // Return index record
        return $searchIndexRecord;
    }
}

You need to make sure that your document model is registered for index_events by implements the SeachableDocumentModelInterface.

Updating the index

To get the defined meta data into the search index, a re-indexing of the instances is required, e.g. the Lookbook enables filtering by the browsable properties (used on the browse page). To enable filtering of all profiles, the Lookbook instance has to be re-indexed.

Hook: Search Preprocessors

The Search Preprocessor Hooks allow other modules to add additional indexing data for each document or to tranform indexing data. These hooks will be called after the original document has provided it's indexing information and before these information are being written to the search index. For this the hook must be registered as search.search_preprocessor hook and implement XELOS\Modules\Search\Hook\Base\SearchPreprocessor. The base function preProcessDocument() is being called each time a document is being indexed and allows for adding and modifiying the index data (e.g. add data or searchable content based on custom rules). It should be used to enrich the search index with additional information and can be combined with the SearchFacets hook to make these information searchable for the user.

The hook also can provide a onBackgroundUpdate() function which is being called on a regular basis from the search to allow for background updates of the index. This can be used to mass-update your custom fields as the search index is static and will only be updated in case of changes of the original document. But if your additional index data is being modified in some cases (e.g. category names change, after mass updates and much more) it might be required to keep these index fields in sync. For this you can call the $search->action->indexer->updateDocumentsByCallback() helper function which allows you to update documents using a callback. This technique should be used to fast data updates to the search index as it much faster and efficient than a full maintenance run on each document which also triggers a re-index but may be slow on large scale systems and therefore can not be executed on a daily base.

Adding additional faceted filters to the search results

The search index has been extended, so that modules can add meta data to the index with the SearchIndexRecord:

/** SearchIndexRecord $baseIndexRecord */
$baseIndexRecord->addMetaData('extension', $this->get_file_extension());
# Alternatively, set multiple meta data
$meta = [
  'filesize' => '9894',
  'extension => 'pptx',
  // ...
];
$baseIndexRecord->setMetaData($meta);

Alternatively other modules can add meta data with the SearchPreprocessor hook:

$baseIndexRecord->addMetaData('extension', $this->get_file_extension(), $this->mod->context);

The given instance ID ($this->mod->context) later decides which hook handles display and filtering of the faceted values. If no context is given, the context ID of the document index of the search index record is used.

The added meta data requires a \XELOS\Modules\Search\Hook\Base\SearchMetaData hook. The hook returns the labels for the keys and the values of the meta data array.

Further it is possible to define custom aggregations to return custom values which are not in the index, e.g. the System DMS saves the filesize for each DMS file in the index, but it returns a facetted filter for files smaller then 1 MB, 1-5MB and bigger than 5MB. To enable the custom values a custom search query is implemented in getSearchQuery.

See \XELOS\Modules\SystemDMS\Hook\Search\SearchMetaData for details.

TODOs

  • Final migration away from models
  • New module class structure (Refactoring)
  • Call Document Index Update instead of internal search index update function
  • Migrate all modules to use new searchable document integration
  • For Re-Indexing: Keep HASH or TIMESTAMP of indexed attachments to check if re-indexing of attachments is necessary (avoid high cpu/io load for text extraction in case of no file changes)
  • Remove JSON + RSS references to hit->path

BUGS

-