This page provides a downloadable version of the Contextual Self-Organizing Map (click here to download), a software package that applies a corpus-based algorithm to derive semantic representations of words. The algorithm relies on the analyses of contextual information extracted from a text corpus, that is, analyses of word co-occurences in a large-scale electronic database of text. Specifically, a target word is represented as the combination of the average of all words preceding the target and all the words following it among all the text within a corpus. This representation can be further processed by a self-organizing map (SOM, Kohonen, 2001), an unsupervised neural network model that provides efficient data extraction and representation. Due to its topography-preserving features, the SOM projects the statistical structure of the context onto a 2-D space, such that words with similar meanings cluster together, forming groups that correspond to lexically meaningful categories. Such a representation system has applications in a variety of contexts, including computational modeling of language acquisition and processing. In this package we present specific examples in two languages (English and Chinese) to illustrate how the method is used to extract semantic representations for words.
© Xiaowei Zhao & Ping Li, August, 2010