How to Buy
News
Contact Info

ProductsServicesDownloadsSupportCompany Info


Onix Full Text Indexing and Retrieval Toolkit
Fast, Scalable, and Easy to Use
 

Overview

Onix provides you with solutions to your application's indexing and retrieval needs. Onix provides solutions with both high performance and flexibility. When you integrate the Onix library into your application project you will give your application superior performance and reliability.  

Onix is used in a wide variety of solutions ranging from electronic publishing to document management and imaging. Onix is also used for web crawlers, search engines, mail routers and in many other unique applications.

We encourage you to evaluate our demonstration source code that shows how easy it to use Onix or integrate it into your current project. The demostrations show both how to program with the Onix library and also can give you an idea of its speed and power. .

If you have any questions about integrating Onix into your own projects please feel free to contact us. We provide very competitive licensing and free technical support.

 

Fast Indexing Speed  and Low Memory Use

The underlying indexing technology Onix uses provides extremely high indexing throughput. High throughput allows you to index very large datasets in minimal time. Onix can index gigabytes of data even under very tight memory constraints. 

Onix was designed from the start with a scalable architecture to run extremely fast on systems where memory use is critical. It runs even faster when memory is plentiful.

 

Incredibly Fast Query Processing Speed

Onix uses a unique lookup scheme which allows the index for a given query term to be located on average in a single disk seek. The efficiency is available even for large indexes.  This means that if query speed is important to your application Onix will outperform its competitors. When your indexes are being searched interactively this quick response time is crucial for your users experience.

 

Multiple Indexing Formats

Flexibility in toolkit libraries is crucial. What meets one customer's need may not meet an other's. To provide you with maximal flexibility Onix supports a variety of index formats.  Each contains differing amounts of information about the text indexed, allowing you to control decisions of index size, performance, and query flexibility. With Onix you won't be trapped in a "one size fits all" toolkit. 

Onix's primary index styles are "record level" and "word level."  Record level indexes store which words occur in a document or record. Word level indexes stores both the record a word occurs in and also the location of the word within the record.  Record level indexes provide the smallest index size. Word level indexes are slightly larger since they store both a record number and a word number.  Word level indexes provide the additional benefit of being able to rapidly perform phrase searches and proximity searches.  Each style of indexing has its own advantages and disadvantages. Onix allows you to choose which of the indexing modes is most appropriate to your particular application's needs.

 

Flexible Query Operations

Onix provides powerful query operators to help you find the information you need. Onix supports the standard boolean operators (and, or, not) as well as range searches, phrase searches and proximity searches

Onix provides wildcards when specifying query terms. This allows a wide variety of word forms to easily be located.  Onix is flexible enough to find not only regular English words but also Unicode words and even search for binary data. This enables you to index and search on a wide variety of languages and character sets.  In addition, Onix's flexible query language allows one to search for terms such as "and", "or", and "not" which many systems surpsingly do not allow.

Onix also provides the flexibility of indexing multiple forms of words in the same location. You can, for instance index corrected forms of words or even translations of words. Onix is smart enough to correctly handle phrases or proximities with these multiple words forms.  

Some applications require being able to manually process or analyze queries. Onix doesn't lock you into only using what our query language provides. We provide a variety of functions for examining the wordlist as well as for performing the various query operations. These functions let you narrow a search or to show the user the words in the index as they type.

 

Low Index Overhead

Onix uses a proprietary index compression technology to compress indexes into the smallest space possible. This increases the speed of procesing queries by reducing disk I/O, increasing lookup speed. This additionally has the benefit of reducing the amount of disk space required to store indexes. 

Some retrieval systems generate indexes that can be as much as four times the size of your indexed text.  While this may be sufficient for small retrieval needs, large amounts of text quickly render these systems unweldly and slow. 

The idexes Onix generates are significantly smaller. For a record level index they are between 8 - 20% of the original text's size. A word level index containing addition information will generally be around 45% of the size of the original text.  It is important to note that these figures reflect standard text with every word being indexed. This includes words as "and", "to", "be", "the", "been", etc. If you skip these words, called stop words, you can reduce your index size even further. Onix provides utilities which allow you skip stop words during indexing if you wish.

Onix's proprietary index compression technology makes it possible to index all the terms found within your texts without significantly adding index overhead.

 

Scalability

Onix was designed from the beginning to be able to work as well with small systems as it does large ones. This lets Onix easily index data ranging from a few hundred kilobytes to data measured in the terabytes. Onix also allows you to distribute your index across multiple disks or machines. This allows you to bring more storage space and processing power online as need be. 

Onix generates dynamic indexes that allow you to add to an existing index long after the original index was generated.

 

Stemming

Onix provides a stemmer for English words. Stemming reduces all the forms of a word to a single token.  For example the token it generates for the words "engineering," "engineered," and "engineer" is "engineer." If you index the stemmed form of words rather than the words themselves it allows you to more easily find similar words. For instance searching for "engineer" would find all the various forms of the word that exist in Enlgish. Stemming words before they are indexed can also reduce your index size by between 26-38%.   

Onix's stemmer uses the Porter algorithm to stem its words. The Porter stemmer is one of the best algorithms available.

Since stemming is optional you remain in full control of how your text is indexed.

 

Multiple Character Set Support

Onix supports a wide variety of character sets including (but not limited to) Unicode, EBCDIC, ASCII, and ANSI. Onix also provides support functions to help with character normalization for Unicode. This allows you to optionally remove accents and other character set features.  

You can easily search words from each of these character sets. Surprisingly, some retrieval toolkits allow for the indexing but not searching of Unicode characters.

Onix not only can search standard character sets but it can even index and search binary data. (Many toolkits can't do this) Onix provides enormous flexibility in both what and how you index.

 

Easy To Use API

Onix's power lies in its easy to use API. Many programmers who use Onix are surprised with how easy it is to integrate Onix into their applications. The API is simple and straightforward. It takes very little time to see how to integrate Onix into your project. The API was also designed to be very flexible. This allows you to apply Onix to your unique application needs with minimal extra coding.

We provide sample code that shows how each of the functions is used. We also provide programs that show you how to put all of the functions together into a final solution.

Simple pseudocode for an indexing application where each document is considered to be a record. 
 

ixCreateIndex(IndexName);
ixOpenIndex(IndexName);
ixStartIndexingSession()
for(Each Document) {
   for (Each Word In Document) {
      ixIndexWord(Word);
   }
   ixIncrementRecord(); 
}
ixEndIndexingSession();
ixCloseIndex();


 

Most other indexing and searching operations are just as simple.

Portability

Onix is written in ANSI C++ and runs on a wide variety of compilers and platforms. We have ported it for Dos, Windows, MacOS, OSX, Linux, Solaris, BeOS, and BSD Unix. If we don't yet support your operating system we will port it for you.
 

Unlimited Support

Onix comes with virtually unlimited support.  We are dedicated to helping you get your project up as quickly as possible. 

As you integrate Onix into your applications you can be assured that we will be with you helping you at no cost.  This is a free service to our customers.

 

License Structure

We offer a wide variety of licenses for the Onix text indexing and retrieval engine. 

We offer licenses for integrating Onix into web crawlers, directories, end user products, and corporate in-house applications. We also offer source code license available to qualified applicants.

Please email for further details.
 

Contact Information

Lextek International
1051 Fir
Provo, UT 84606
801.655.1994  (Voice)
801.373.5342  (Fax)
sales@Lextek.com (EMail)

 


Information
Indexing Speed
Query Speed
Index Styles
Query Operations
Low Index Overhead
Scalability
Stemming
Unicode Support
EBCDIC/ASCII/ANSI Support
Simple API
Portability
Customer Support
License Structure

Contact Information

API Documentation


© 2000, Lextek International. All rights reserved. Site Development by Simeo Corp. & Mousetrix
Color