Target Word Clustering (native SenseClusters)

Target word clustering takes as input multiple contexts, each of which includes a single target word that is marked with a special XML tag known as "head". The object is to cluster those contexts to discover the different meanings of the target word. This is based on the idea that words that occur in similar contexts will have similar meanings. Likewise, the various similar contexts in which a target word occurs will reflect different meanings of that word.

In SenseClusters native mode, a word co-occurrence matrix is created from a separate set of training data, or the data to be clustered, and that is used to provide word vectors that replace each of the words that surround a target word in a context. These word vectors are averaged together to create a representation of the context. The premise is that contexts that are made up of words that occur with some of the same other words will be similar to each other.