Applicable: | This tutorial applies to Confluence 7.0 or higher. |
Level of experience: | Advanced. You should complete at least one intermediate tutorial before working through this tutorial. |
If you are updating your app to be compatible for Confluence 8.0 and newer, see Upgrading for Confluence 8.0.
The Extractor2
plugin module allows you to hook into the mechanism that Confluence uses to populate its search indexes. Each time content is created or updated in Confluence, it is passed through a chain of extractors that assemble the fields and data that will be added to the search indexes for that content. By writing your own extractor you can add information to the content index.
To help you become familiar with the extractor2
module, we've created a simple plugin demonstrating how to create an extractor. In the following sections we'll explain how different parts of the plugin are implemented and fit together.
To complete this tutorial, you'll need to be familiar with:
rm
, ls
, curl
You can find the source code for this tutorial on Atlassian Bitbucket.
To clone the repository, run the following command:
1 2git clone https://bitbucket.org/atlassian_tutorial/confluence-extractor2-tutorial.git
Alternatively, you can download the source as a ZIP archive.
This tutorial was last tested with Confluence 7.0 using Atlassian SDK 8.0.2.
The ExtraCommentDataExtractor
implements 2 methods extractText
and extractFields
. They receive a searchable object which is the Confluence content object (e.g. Page, Attachment, Comment) that is being saved, and passed through the extractor chain.
1 2public class ExtraCommentDataExtractor implements Extractor2 { private final CommentManager commentManager; @Autowired public ExtraCommentDataExtractor(@ComponentImport CommentManager commentManager) { this.commentManager = requireNonNull(commentManager, "commentManager"); } public StringBuilder extractText(Object searchable) { ... } public Collection<FieldDescriptor> extractFields(Object searchable) { ... }
The result of the extractText
method is appended into the default Text
field "content" that is used in a regular Confluence site search. The result of the extractFields
method is a collection of fields to be added into the Search index document.
The @Autowired
and @ComponentImport
annotation in the constructor asks Confluence to inject CommentManager
into the extractor at the creation time.
The ExtraCommentDataExtractor#extractText
concatenates comments of the given page.
1 2... public StringBuilder extractText(Object searchable) { StringBuilder builder = new StringBuilder(); if (searchable instanceof Page) { Page page = (Page) searchable; builder.append(commentManager.getPageComments(page.getId(), page.getCreationDate()).stream() .map(ContentEntityObject::getBodyAsString) .collect(joining(" "))); } return builder; } ...
The ExtraCommentDataExtractor#extractFields
demonstrates how to create fields of different types.
First we'll define the mapping for each field:
1 2public class ExtraCommentFields implements FieldMappingsProvider { public static final TextFieldMapping CREATOR = TextFieldMapping.builder("comment-creator").store(true).analyzer(new TwoGramAnalyzerDescriptor()).build(); public static final DateFieldMapping MODIFIED = DateFieldMapping.builder("comment-modified").store(true).build(); public static final IntFieldMapping COUNT = IntFieldMapping.builder("comment-count").store(true).build(); public static final DoubleFieldMapping SCORE = DoubleFieldMapping.builder("comment-score").store(true).build(); @Override public Collection<FieldMapping> getFieldMappings() { return List.of(MODIFIED, COUNT, CREATOR, SCORE); } }
The ExtraCommentDataExtractor#extractFields
extracts the fields for each page.
1 2... public Collection<FieldDescriptor> extractFields(Object searchable) { Page page = getPage(searchable); if (page == null) { return emptyList(); } List<Comment> comments = commentManager.getPageComments(page.getId(), page.getCreationDate()); if (comments.isEmpty()) { return emptyList(); } ImmutableList.Builder<FieldDescriptor> builder = ImmutableList.builder(); comments.stream() .map(ConfluenceEntityObject::getCreator) .filter(Objects::nonNull) .map(ConfluenceUser::getLowerName) .filter(Objects::nonNull) .forEach(username -> builder.add(ExtraCommentFields.CREATOR.createField(username))); Comment lastComment = comments.get(comments.size() - 1); builder.add(ExtraCommentFields.MODIFIED.createField(lastComment.getLastModificationDate())); builder.add(ExtraCommentFields.COUNT.createField(comments.size())); int commentTextLength = comments.stream() .mapToInt(x -> x.getBodyAsString().length()) .sum(); double commentScore = Math.log1p((double) commentTextLength / comments.size()); builder.add(ExtraCommentFields.SCORE.createField(commentScore)); return builder.build(); } ...
Here is an example atlassian-plugin.xml
file containing a single search extractor:
1 2... <field-mappings-provider key="extraCommentFields" index="CONTENT" class="com.atlassian.confluence.plugins.extractor.tutorial.ExtraCommentFields" /> <extractor2 name="extraCommentDataExtractor" key="extraCommentDataExtractor" class="com.atlassian.confluence.plugins.extractor.tutorial.ExtraCommentDataExtractor" priority="1100"> </extractor2> ...
Extractor2
.One way to see how an extractor works is to debug into a running Confluence instance. Here's the key steps:
extractText
and extractFields
.The old Extractor
module will be removed in Confluence 8.0. It is being replaced by the Extractor2
module. This is part of an initiative to make the Confluence search API agnostic from the information retrieval implementation, Lucene.
This will enable future upgrades to the library, without breaking changes to the API.
There will be no loss of functionality, when re-writing Extractor
classes to Extractor2
.
To learn how, take an example inspired by how the internal CommentExtractor
was re-written.
1 2public class CommentExtractor implements Extractor { @Override public void addFields(Document document, StringBuffer defaultSearchableText, Searchable searchable) { if (searchable instanceof Comment) { Comment comment = (Comment) searchable; ContentEntityObject owner = comment.getContainer(); defaultSearchableText.append(comment.getTitle()); // only add the URL if this comment belongs to a page as others currently have no UI if (owner instanceof AbstractPage) { AbstractPage page = (AbstractPage) owner; document.add(new Field(PageContentEntityObjectExtractor.FieldNames.PAGE_URL_PATH, GeneralUtil.getIdBasedPageUrl(page), Field.Store.YES, Field.Index.NO)); // use id based url to avoid dependency on page title (and the link breaking if the page title is renamed) } if (owner != null) { // Add the type of owner this is attached to. document.add(new Field(PageContentEntityObjectExtractor.FieldNames.CONTAINER_CONTENT_TYPE, owner.getType(), Field.Store.NO, Field.Index.NOT_ANALYZED)); document.add(new Field(PageContentEntityObjectExtractor.FieldNames.PAGE_DISPLAY_TITLE, owner.getDisplayTitle(), Field.Store.YES, Field.Index.NO)); } } } }
1 2public class CommentExtractor implements Extractor2 { @Override public StringBuilder extractText(Object searchable) { StringBuilder resultBuilder = new StringBuilder(); if (searchable instanceof Comment) { Comment comment = (Comment) searchable; resultBuilder.add(comment.getTitle()); } return new StringBuilder(); } @Override public Collection<FieldDescriptor> extractFields(Object searchable) { final ImmutableList.Builder<FieldDescriptor> resultBuilder = ImmutableList.builder(); if (searchable instanceof Comment) { Comment comment = (Comment) searchable; ContentEntityObject owner = comment.getContainer(); //only add the URL if this comment belongs to a page as others currently have no UI if (owner instanceof AbstractPage) { AbstractPage page = (AbstractPage) owner; resultBuilder.add(SearchFieldMappings.PAGE_URL_PATH.createField(GeneralUtil.getIdBasedPageUrl(page))); // use id based url to avoid dependency on page title (and the link breaking if the page title is renamed) } if (owner != null) { resultBuilder.add(SearchFieldMappings.CONTAINER_CONTENT_TYPE.createField(owner.getType())); resultBuilder.add(SearchFieldMappings.PAGE_DISPLAY_TITLE.createField(owner.getDisplayTitle())); } } return resultBuilder.build(); } }
Rather than adding to the default searchable text via a StringBuffer
, instead implement the extractText
method and add to the StringBuilder
.
A FieldMapping
corresponds to a Mapping on OpenSearch. For each different field type, there is an equivalent FieldMapping
implementation. You can use the createField(value)
method on the FieldMapping
to create a FieldDescriptor
.
A FieldDescriptor
corresponds to a Field
on an individual Document
in Lucene or OpenSearch. Rather than creating the Document
, describe the document with a Collection
of FieldDescriptor
.
XML configuration files used to define indexed fields for specified content types are being replaced by the Extractor2
module in Confluence 8.0. The new Extractor2
module provides a greater range of functionality.
To learn how, take the example of re-writing a Page.lucene.xml
configuration file.
1 2<configuration> <field type="UnIndexed" fieldName="versionComment" attributeName="versionComment"/> <field type="Text" fieldName="content-name-unstemmed" attributeName="title"/> <field type="Keyword" fieldName="exact-title" attributeName="title"/> </configuration>
1 2public class PageExtractor implements Extractor2 { @Override public StringBuilder extractText(Object searchable) { return new StringBuilder(); } @Override public Collection<FieldDescriptor> extractFields(Object searchable) { final ImmutableList.Builder<FieldDescriptor> resultBuilder = ImmutableList.builder(); if (searchable instanceof Page) { Page page = (Page) searchable; if (page.isVersionCommentAvailable()) { resultBuilder.add(SearchFieldMappings.LAST_UPDATE_DESCRIPTION.createField(page.getVersionComment())); } String title = page.getTitle(); if (!isBlank(title)) { resultBuilder.add(SearchFieldMappings.UNSTEMMED_TITLE_FIELD_NAME.createField(page.getTitle())); resultBuilder.add(SearchFieldMappings.EXACT_TITLE.createField(page.getTitle())); } } return resultBuilder.build(); } }
Attribute names can be re-written by using getter methods on the content. For example title
is getTitle()
.
Extractor2
allows definition of field values with more complex logic. Rather than a getter function call, multiple services can be called for data and multiple transformations can be done. In the
above example, blank and null checks are performed before creating the fields.
FieldMapping
provides the following additional functionality
TextFieldMapping
can specify custom analysis, rather than relying on the Confluence default.TextFieldMapping.isStored()
allows choosing to store or not store for an indexed field.Learn more about extending Confluence's search capabilities with these tutorials:
Rate this page: