List of Questions:
How is the indexing performed?
How does Collexis generate its search results?
What makes Collexis different?
How does Collexis deal with low concept density documents or queries?
Where can Collexis be applied?
What can Collexis do when there is no thesaurus for a specific type of business?
What are the limitations of Collexis?
How accurate is Collexis?
What databases does Collexis support?
How does Collexis deal with different languages?
Is integration difficult?
How long has Collexis been in business?
Q: How is the indexing performed?
A: Indexing is the process of creating a Conceptual Fingerprint from a text. In Collexis, this automated indexing mechanism performs the following steps on the text: removing the stop words, normalizing the text, selecting concepts by comparison with the thesaurus, clustering the concepts and attaching a relative weight to the concepts by means of a set of algorithms and measuring the specificity, similarity and frequency of the concepts.
Back to Top
Q: How does Collexis generate its search results?
A: Collexis employs vector matching: comparing a search query with the Fingerprints from the records in a Collexion. The outcome is a very accurate and relevant list of content items and/or experts in the form of a list of records. There also exists the possibility of over-specifying a query (i.e., using a considerable piece of text), thus adding context to the query. This context will help the system to improve the accuracy of the query and return references to those content items that are contextually related. The system administrator can enlarge or reduce the set of returned documents by entering a threshold that indicates the minimum “distance” between the records returned and the query. Matching of a search query with Collexion records can be performed on multiple Collexions at the same time.
Back to Top
Q: What makes Collexis different?
A: Initially, Collexis differentiates itself from full-text search engines by making use of thesauri for information retrieval. The high-quality search is based on semantics that have been defined in a thesaurus or ontology: synonymous terms and terms in different languages are linked to a single concept. Hierarchical relations between concepts, links between definitions and terms, and other semantic relationships are utilized in the search applications. This process helps to highlight those terms most relevant to the searcher’s query.
Additionally, Collexis’ matching technology is unique. The matching technology computes “distances” between the query and the content items that are being searched, which allows discovery of partially matching documents. Users do not have to construct a complicated (Boolean) search query, but can simply enter a free text search without the risk of getting “no results” due to extensive search term use. In fact, with matching technology the use of more search terms means faster and more accurate search results in general.
Yet another aspect that differentiates Collexis is that the computer can easily manipulate the Fingerprints generated by the software. They can be aggregated, associated, clustered, etc. These manipulations also allow Collexis to provide information that goes beyond the level of a single document. Searchers can see information distributed over different documents as well as discernible patterns in a group of documents - e.g., a group of documents written by one author or belonging to a particular semantic category.
Back to Top
Q: How does Collexis deal with low concept density documents or queries?
A: A standard possibility is to index a document without a thesaurus. This process incorporates most of the indexing steps (stop words, normalization etc.), but will generate a fingerprint with word-based entries instead of concept entries. Since Collexis is able to work with multiple thesauri simultaneously, such a “free text” fingerprint can be used in addition to a thesaurus- based fingerprint and can take into account terms not present in the thesaurus. These word-based entries can relate to any number of consecutive words (bigrams, trigrams, etc.). Naturally, such a free text fingerprint does not offer the advantages of a thesaurus-based fingerprint like multilingualism, synonymy, etc.
Back to Top
Q: Where can Collexis be applied?
A: Collexis can be applied wherever it is important to retrieve information in a swift, easy, and high-quality manner, whether it is within or outside of an organization. Typical application fields are in knowledge discovery (drug discovery), policy making (trend analysis, unrevealed relationships, etc.), and competitor analysis (gap mining, comparison, searching patent databases).
Back to Top
Q: What can Collexis do when there is no thesaurus for a specific type of business?
A: We can build one with relative ease. Collexis offers tools that can analyze documents and generate candidate terms for inclusion in a thesaurus. By optionally starting with existing lists of terms, a thesaurus can be expanded quickly.
Back to Top
Q: What are the limitations of Collexis?
A: Aside from practical considerations there are no real limitations to the amount of content that can be fingerprinted by and stored in Collexis.
Back to Top
Q: How accurate is Collexis?
A: Because the Collexis fingerprint creation algorithm uses a thesaurus, it yields fingerprints that are extremely accurate. Consequently the outcome of a search is an accurate and relevant list of content and/or experts. It is not possible to over-specify a query, as will happen with Boolean search techniques. Quite the contrary is true. Using a sizable piece of text as a search term will only yield a better search result.
Back to Top
Q: What databases does Collexis support?
A: Any database which can be converted to text entries – which is almost everything. Collexis can deal with situations where an organization has a multitude of databases within many different database systems, containing both structured and unstructured information. The system processes content from any type of database as well as information that is not stored in databases, such as web pages or e-mails.
Back to Top
Q: How does Collexis deal with different languages?
A: The concepts used in the conceptual fingerprinting algorithm are real-world entities rather than language-defined terms or phrases. Using a multilingual thesaurus, Collexis can match a query in one language with a fingerprint collection referring to information items in other languages.
Back to Top
Q: Is integration difficult?
Collexis can be easily integrated into any desired application. It supports open technology and comes with JAVA and .NET development kits.
Back to Top
Q: How long has Collexis been in business?
A: After numerous information-sharing achievements by its founders, the Collexis company was established in Europe in 1999, followed by its U.S. counterpart in 2006.
Other questions? Mail us at info@collexis.com!
Back to Top