Semantic Indexing

Semantic Indexing is the name for a family of techniques for searching and organizing large data collections. The goal of semantic indexing is to find patterns in unstructured data (documents without descriptors such as keywords or special tags) and use those patterns to offer more effective search and categorization services. Semantic indexing techniques are language-agnostic, so data collections don’t have to be in English, or even in any human language at all. For example, they have had good preliminary results in protein structure prediction using algorithms adapted from a text search engine. Latent Semantic Indexing (LSI or LSA, for latent semantic analysis) was originally described in a 1990 paper by Deerwester, Dumais, Furnas, Landauer, and Harshman, and is a topic of active study. You can find links to journal articles and other LSI websites on our references page. This has been added to the semantics web section of Deep Web Research Subject Tracer™ Information Blog and Bot Research Subject Tracer™ Information Blog.