Statistical-Associational Correllation and Symbol Reasoning may be mutually reinforcing. The example of LSA.

by M. Robert Showalter

One might organize the history of cognitive science and AI as a "war" between symbolic and statistical views of neural function. This isn't the place to review that conflict. I feel that there are strong reasons to believe that the brain is BOTH a statistical associator and a symbol manipulator⁽¹⁾. I believe that the two roles can often be mutually reinforcing, with each kind of processing supplementing and checking the other. This view connects well with the S-K model.

A brief comment on intuition seems in order. I believe that intuition is one of many important values. I believe that a right description of the brain must make sense to us, once it is found. That belief is based on faith and experience. Things that work well almost always do make sense. Even so, intuition has its limits. When we speak of brain function at levels approaching the neural, we must accept that we are beyond the reach of introspection. Some kinds of processing may seem less "natural" to us than others. For me, symbol processing seems somewhat intuitive and association does not seem intuitive at all. Those feelings may be natural, but they are nonetheless a source of bias. We do NOT know how the brain works. We are trying to construct models that make sense, and that can lead to sharper questions and better models in the future. I believe that, if brains are both statistical and symbol processing entities, it is easier to imagine how they do what they do. I'm proceeding on that basis.

Here is Nick Chater, referring to the contrast and false conflict between statistical and symbol processing approaches to mind:

". . . . The advocate of statistical methods pursues the possibility that aspects of cognition can be understood in terms of the apparatus of probability, statistics, information theory, and decision theory. The advocate of symbolic methods pursues the claim that aspects of cognition involve the formal manipulation of structured symbolic representations. . . These are independent and entirely compatible claims about the nature of mind; they do not stand in competition. . . . . . If the debate between statistical and symbolic ideas seems ill conceived, the debate between neural networks (a special case of statistics) and symbolic ideas seems equally ill conceived.⁽²⁾ "

Statistics has strengths and limitations. Keynes refers to a dog, living in a religiously observant household:

"If a dog is generally given scraps at table, that is sufficient for him to judge it to be reasonable to be there."

but

"It would take the dog a long time to find out that he was given scraps except on fast days, and that there was the same number of these in every year⁽³⁾."

Statistical inference lets us infer information that we are not explicitly told. But the inference can be prohibitively expensive or inferentially impossible with the time, attention, and computational resources actually available.

Symbol manipulation has different strengths and limitations. Symbols (words) can convey information or misinformation much more efficiently and compactly than statistical inference could. Imagine trying to convey the symbolic information in a book or newspaper statistically! It would be impossible and, to me, unthinkable. Even so, the process of checking symbolic information involves experience that fundamentally involves association, including implicitly statistics. People understand and believe what they are told in light of their experience. People create their language and their ideas in light of their experience. In these senses, we are both symbolic and statistical.

I find it useful to offer an example, that is not poetry, but that I nonetheless regard as multiply evocative, at a number of levels. The passage is inescapably symbolic in form and content. It would be impossible to convey the message of this passage statistically. If the passage has meaning to the reader, the passage inherently involves imagery, schemata, and sequential patterns. Even so, my experience of this passage seems to me to involve multiple latent connections that go well beyond logic, and far into the realm of the associative:

"Man's production is always, and of necessity, a social enterprise. Men together produce a human environment, with the totality of its socio-cultural and psychological formations. None of these formations may be understood as products of man's biological constitution, which, as indicated, provides only the outer limits for human productive activity.⁽⁴⁾"

At one level, this passage is a mass of abstractions. But the abstractions are multiply connected to associations that the passage evokes. Those associations go well beyond any meanings that might be found in the passage by constructions of dictionary usages. My experience of this passage, and passages like it, reinforces my sense that I am both symbolic and associative, and that the symbolic and the associative reinforce and focus each other. For me the associative resembles the statistical in important ways, and to particularly resemble the sort of statistical relations that might be called "latent semantics."

Latent semantic analysis, LSA, is a statistical correlation technique that finds fits and, within limits, fashions multidimensional logic spaces to "make sense" of the content that it "digests."⁽⁵⁾ LSA makes no effort to be brain-like - it is top-of-the-line statistical analysis. LSA uses prodigious computational resources. Even so, LSA, more than any effort I have seen, illustrates what association and statistics might do as a part of neural logic.

The words "latent semantic" in LSA are interesting - they refer to unconscious statistical correlations between words, without any definitional information at all. The "latent semantics" of a word in a particular data set is the statistical correlations between that word and other words in that particular data set. LSA simulations were intended to address the following question, and did so successfully:

Could a "simple" linear statistical correlation model acquire a knowledge of a word's "probable meaning similarities" if given a large amount of natural language text?

The answer was yes. The following results were shown, judged by success in executing a multiple choice word synonym test used for ESL students⁽⁶⁾.

In a meaning-free correlational sense, the LSA program acquired "word knowledge" per unit text read, at a rate comparable to the rate a child learns words from text. (The machine and human learning were not the same.)

MOST of the knowledge in the correlations came from indirect inferences that combine information across more than two samples.

LSA carries or conveys NO meaning in the ordinary human sense. LSA assumes that

"two words that appear in the same window of discourse - a phrase, a sentence, a paragraph . . tend to come from nearby locations in semantic space⁽⁷⁾."

Using many, most, or all of the statistical tests for joint occurrence that seem to be computationally available, LSA proceeds on the assumption that

". . . the psychological similarity between any two words is reflected in the way they cooccur in small subsamples of language, that the source of language samples produces words in a way that ensures a mostly orderly stochastic mapping between semantic similarity and output distance. It then fits all the pairwise similarities into a common space of high but not unlimited dimensionality. Because, as we see later, the model predicts what words should occur in the same contexts, an organism using such a mechanism could, either by evolving or learning, adaptively adjust the number of dimensions on the basis of trial and error . . . . (is the same way that) we have varied the dimensionality of the simulation model to achieve best results.⁽⁸⁾"

On the basis of connections in time and similarities in statistical-contextual context, LSA connects words. LSA programs can do so well enough to score impressively (~ 50% correct) on the multiple choice synonym tests that are the usual tests for human word knowledge. LSA does so with no definitions at all.

Landauer and Dumais draw this basic conclusion:

" . . . with respect to (correlations) supposed to allow the learning of language and other large bodies of complexly structured knowledge, domains in which there are very many facts each weakly related to very many others, effective simulation may require data sets of the same size and content as those encountered by human learners. Formally, that is because weak local constraints can combine to produce strong local effects in aggregate⁽⁹⁾.

Such correlation is, of course, vastly beyond the capacities of PDP connectionism. But a particular computational arrangement is not assumed. As Landauer and Dumais state:

" We, of course, intend no claim that the mind or brain actually computes a singular value decomposition on a perfectly remembered event-by-context matrix of its lifetime experience using the mathematical machinery of complex sparse-matrix manipulation algorithms. What we suppose is merely that the mind-brain stores and reprocessed its input in some manner that has approximately the same effect⁽¹⁰⁾."

I'll suggest that a S-K model might have that approximate effect.

LSA is the best illustration I have encountered of the potential power of correlation (that is, the potential power of complicated association) with nearly unlimited computational resources devoted to it. That power is great. That power also seems strongly complementary to inherently sequential and inherently symbolic logical processes.

One point here seems directly relevant to instruction. If there IS much latent, inexpressible, extensive information in our brains, this is a STRONG argument for the power (but not the infallibility) of human feelings of intuition. It is a STRONG argument for dialectic kinds of instruction that attend to the comfort of teachers and learners. If there IS much latent, inexpressible, extensive information in our brains, this is a STRONG argument against over-reliance on "logical rigor" and stark "simple solutions" when the objective is to persuade and to teach material so that it can be remembered and used by young or old people.

The success of LSA occurs without any use of sequence information - that is, without syntax, without logic, and without morphology. LSA shows that much can be done without any of the sequence information students of language and learning have often thought essential⁽¹¹⁾. One need not argue that syntax, logic and morphology are unimportant. The success of LSA does show that very sophisticated association logic, without syntax, logic, and morphology can be powerful, and arguably essential, supplements of syntax, logic, and morphology, and that sophisticated association might be an essential source of the neuro-logical power that people and animals have.

It is reasonable to argue that statistical and associationist reasoning is PART of what goes on in brains. The S-K model, because it can involve broadcasting, may be much more capable of association reasoning than PDP models. A resonant broadcast signal (correlated with time, or some particular aspect of context) may associate far more neural elements, and far more separated neural elements, than could be connected by conduction (and particularly, than could be connected by Kelvin-Rall conduction). In the S-K model, many such associations might occur at the same time. I'll argue that the computational breadteh and power needed for a real-time LSA might exist in brains operating according to the S-K model.

Neural function is plainly more complicated than the association-correlation of LSA. Syntax, logic, and morphology ARE important. The definition of "consensus cognitive science" used here depends on schema theory⁽¹²⁾ ⁽¹³⁾. Schema manipulations are more than just the correlation of "meaningless marks." The importance of sequence in the "little computer programs" we conceptualize as schema (and use to describe aspects of our language, vision, and other world knowledge) involve more than the sequence-less information LSA uses.

Even so, the existence of association mechanisms capable of rapid and multiple associations that combine information across many samples may be useful in many ways, for both learning and for thinking. Such capabilities, operating in parallel with sequential patterns, may make levels of performance possible that would be impossible without the association.

As I advocate the S-K model for brain function, I'll advocate both statistical and symbolic processing, and will assume that both work together.

Paraphrasing and following Nick Chater, I'll advocate statistical methods that presume that aspects of cognition can be understood in terms of the apparatus of probability, statistics, information theory, and decision theory, and will, at the same time advocate symbolic methods that presume that aspects of cognition involve the formal manipulation of structured symbolic representations.

There IS no inherent conflict between statistical and symbolic approaches, so long as one does not ask the brain to be simpler than it can reasonably be expected to be. I believe that LSA illustrates that statistics can powerfully supplement symbolic operations in a sufficiently parallel brain with prodigiously powerful processing. To describe animal and human behavior, one must assume such a brain. The S-K model is proposed as a step toward conceptualizing how such a brain might work.

NOTES:

1. It probably also makes sense to talk of other neural roles, such as pattern recognition, that do not easily classify as either statistics or symbol manipulation.

2. Cater, N. "Neural Networks: The New Statistical Models of Mind" in CONNECTIONIST MODELS OF MEMORY AND LANGUAGE J.P.Levy, D. Bairaktaris, J.A.Bullinaria, P.Cairns, eds University College London Press , 1995, p. 221

3. Keynes, J.M. A TREATISE ON PROBABILITY 1929; Harper Torchbook, Harper & Row, N.Y. 1957 Chapter 28, p 332.

4. Berger, P.L., and Luckmann, T. THE SOCIAL CONSTRUCTION OF REALITY: A Treatise in the Sociology of Knowledge Anchor, Doubleday, Garden City N.Y. 1967 p. 51.

5. Landauer T.K. and Dumais, S.T. "A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge" Psychological Review, v 104, n.2, 211-240, 1997.

6. 80 synonyms dropped from english as a second language standardized tests. op. cit. p 220.

7. ⁷ op. cit. p. 215.

8. ⁸ op. cit. p. 216.

9. op. cit. p. 214

10. op. cit. p. 218

11. ¹¹ Landauer, T.K., Laham, D., Rehder, B. & Schreiner, M.E. "How well can passage meaning be derived without using word order: A comparison of Latent Semantic Analysis and humans" Proceedings of the Cognitive Science Society, 1997. http://lsa.colorado.edu/papers.htm

12. Johnson-Laird, P.N. op. cit.

13. Davidson, D. op. cit..