Statistical-Associational Correllation and Symbol Reasoning may be mutually reinforcing. The example of LSA.
by M. Robert Showalter
One might organize the history of cognitive science and AI as a "war" between symbolic and statistical views of neural function. This isn't the place to review that conflict. I feel that there are strong reasons to believe that the brain is BOTH a statistical associator and a symbol manipulator(1). I believe that the two roles can often be mutually reinforcing, with each kind of processing supplementing and checking the other. This view connects well with the S-K model.
A brief comment on intuition seems in order. I believe
that intuition is one of many important values. I believe that a right
description of the brain must make sense to us, once it is found. That
belief is based on faith and experience. Things that work well almost always
do make sense. Even so, intuition has its limits. When we speak of brain
function at levels approaching the neural, we must accept that we are beyond
the reach of introspection. Some kinds of processing may seem less "natural"
to us than others. For me, symbol processing seems somewhat intuitive and
association does not seem intuitive at all. Those feelings may be natural,
but they are nonetheless a source of bias. We do NOT know how the brain
works. We are trying to construct models that make sense, and that can
lead to sharper questions and better models in the future. I believe that,
if brains are both statistical and symbol processing entities, it is easier
to imagine how they do what they do. I'm proceeding on that basis.
Here is Nick Chater, referring to the contrast and false
conflict between statistical and symbol processing approaches to mind:
". . . . The advocate of statistical methods pursues the possibility
that aspects of cognition can be understood in terms of the apparatus of
probability, statistics, information theory, and decision theory. The advocate
of symbolic methods pursues the claim that aspects of cognition involve
the formal manipulation of structured symbolic representations. . . These
are independent and entirely compatible claims about the nature of mind;
they do not stand in competition. . . . . . If the debate between statistical
and symbolic ideas seems ill conceived, the debate between neural networks
(a special case of statistics) and symbolic ideas seems equally ill conceived.(2)
"
Statistics has strengths and limitations. Keynes refers
to a dog, living in a religiously observant household:
"If a dog is generally given scraps at table, that is sufficient
for him to judge it to be reasonable to be there."
but
"It would take the dog a long time to find out that he was given
scraps except on fast days, and that there was the same number of these
in every year(3)."
Statistical inference lets us infer information that we
are not explicitly told. But the inference can be prohibitively expensive
or inferentially impossible with the time, attention, and computational
resources actually available.
Symbol manipulation has different strengths and limitations.
Symbols (words) can convey information or misinformation much more efficiently
and compactly than statistical inference could. Imagine trying to convey
the symbolic information in a book or newspaper statistically! It would
be impossible and, to me, unthinkable. Even so, the process of checking
symbolic information involves experience that fundamentally involves association,
including implicitly statistics. People understand and believe what they
are told in light of their experience. People create their language and
their ideas in light of their experience. In these senses, we are both
symbolic and statistical.
I find it useful to offer an example, that is not poetry,
but that I nonetheless regard as multiply evocative, at a number of levels.
The passage is inescapably symbolic in form and content. It would be impossible
to convey the message of this passage statistically. If the passage has
meaning to the reader, the passage inherently involves imagery, schemata,
and sequential patterns. Even so, my experience of this passage seems to
me to involve multiple latent connections that go well beyond logic, and
far into the realm of the associative:
"Man's production is always, and of necessity, a social enterprise.
Men together produce a human environment, with the totality of its socio-cultural
and psychological formations. None of these formations may be understood
as products of man's biological constitution, which, as indicated, provides
only the outer limits for human productive activity.(4)"
At one level, this passage is a mass of abstractions.
But the abstractions are multiply connected to associations that the passage
evokes. Those associations go well beyond any meanings that might be found
in the passage by constructions of dictionary usages. My experience of
this passage, and passages like it, reinforces my sense that I am both
symbolic and associative, and that the symbolic and the associative reinforce
and focus each other. For me the associative resembles the statistical
in important ways, and to particularly resemble the sort of statistical
relations that might be called "latent semantics."
Latent semantic analysis, LSA, is a statistical correlation
technique that finds fits and, within limits, fashions multidimensional
logic spaces to "make sense" of the content that it "digests."(5)
LSA makes no effort to be brain-like - it is top-of-the-line statistical
analysis. LSA uses prodigious computational resources. Even so, LSA, more
than any effort I have seen, illustrates what association and statistics
might do as a part of neural logic.
The words "latent semantic" in LSA are interesting
- they refer to unconscious statistical correlations between words, without
any definitional information at all. The "latent semantics" of
a word in a particular data set is the statistical correlations between
that word and other words in that particular data set. LSA simulations
were intended to address the following question, and did so successfully:
Could a "simple" linear statistical correlation
model acquire a knowledge of a word's "probable meaning similarities"
if given a large amount of natural language text?
The answer was yes. The following results were shown,
judged by success in executing a multiple choice word synonym test used
for ESL students(6).
In a meaning-free correlational sense, the LSA program
acquired "word knowledge" per unit text read, at a rate comparable
to the rate a child learns words from text. (The machine and human learning
were not the same.)
MOST of the knowledge in the correlations came from indirect
inferences that combine information across more than two samples.
LSA carries or conveys NO meaning in the ordinary human
sense. LSA assumes that
"two words that appear in the same window of discourse - a phrase,
a sentence, a paragraph . . tend to come from nearby locations in semantic
space(7)."
Using many, most, or all of the statistical tests for
joint occurrence that seem to be computationally available, LSA proceeds
on the assumption that
". . . the psychological similarity between any two words is reflected
in the way they cooccur in small subsamples of language, that the source
of language samples produces words in a way that ensures a mostly orderly
stochastic mapping between semantic similarity and output distance. It
then fits all the pairwise similarities into a common space of high but
not unlimited dimensionality. Because, as we see later, the model predicts
what words should occur in the same contexts, an organism using such a
mechanism could, either by evolving or learning, adaptively adjust the
number of dimensions on the basis of trial and error . . . . (is the same
way that) we have varied the dimensionality of the simulation model to
achieve best results.(8)"
On the basis of connections in time and similarities in
statistical-contextual context, LSA connects words. LSA programs can do
so well enough to score impressively (~ 50% correct) on the multiple choice
synonym tests that are the usual tests for human word knowledge. LSA does
so with no definitions at all.
Landauer and Dumais draw this basic conclusion:
" . . . with respect to (correlations) supposed to allow the learning
of language and other large bodies of complexly structured knowledge, domains
in which there are very many facts each weakly related to very many others,
effective simulation may require data sets of the same size and content
as those encountered by human learners. Formally, that is because weak
local constraints can combine to produce strong local effects in aggregate(9).
Such correlation is, of course, vastly beyond the capacities
of PDP connectionism. But a particular computational arrangement is not
assumed. As Landauer and Dumais state:
" We, of course, intend no claim that the mind or brain actually
computes a singular value decomposition on a perfectly remembered event-by-context
matrix of its lifetime experience using the mathematical machinery of complex
sparse-matrix manipulation algorithms. What we suppose is merely that the
mind-brain stores and reprocessed its input in some manner that has approximately
the same effect(10)."
I'll suggest that a S-K model might have that approximate effect.
LSA is the best illustration I have encountered of the
potential power of correlation (that is, the potential power of complicated
association) with nearly unlimited computational resources devoted to it.
That power is great. That power also seems strongly complementary to inherently
sequential and inherently symbolic logical processes.
One point here seems directly relevant to instruction.
If there IS much latent, inexpressible,
extensive information in our brains, this is a STRONG argument for the
power (but not the infallibility) of human feelings of intuition. It is
a STRONG argument for dialectic kinds of instruction that attend to the
comfort of teachers and learners. If there IS much latent, inexpressible,
extensive information in our brains, this is a STRONG argument against
over-reliance on "logical rigor" and stark "simple solutions"
when the objective is to persuade and to teach material so that it can
be remembered and used by young or old people.
The success of LSA occurs without any use of sequence
information - that is, without syntax, without logic, and without morphology.
LSA shows that much can be done without any of the sequence information
students of language and learning have often thought essential(11).
One need not argue that syntax, logic and morphology are unimportant. The
success of LSA does show that very sophisticated association logic, without
syntax, logic, and morphology can be powerful, and arguably essential,
supplements of syntax, logic, and morphology, and that sophisticated association
might be an essential source of the neuro-logical power that people and
animals have.
It is reasonable to argue that statistical and associationist
reasoning is PART of what goes on in brains. The S-K model, because it
can involve broadcasting, may be much more capable of association reasoning
than PDP models. A resonant broadcast signal (correlated with time, or
some particular aspect of context) may associate far more neural elements,
and far more separated neural elements, than could be connected by conduction
(and particularly, than could be connected by Kelvin-Rall conduction).
In the S-K model, many such associations might occur at the same time.
I'll argue that the computational breadteh and power needed for a real-time
LSA might exist in brains operating according to the S-K model.
Neural function is plainly more complicated than the association-correlation
of LSA. Syntax, logic, and morphology ARE important. The definition of
"consensus cognitive science" used here depends on schema theory(12)
(13). Schema manipulations are more than
just the correlation of "meaningless marks." The importance of
sequence in the "little computer programs" we conceptualize as
schema (and use to describe aspects of our language, vision, and other
world knowledge) involve more than the sequence-less information LSA uses.
Even so, the existence of association mechanisms capable
of rapid and multiple associations that combine information across many
samples may be useful in many ways, for both learning and for thinking.
Such capabilities, operating in parallel with sequential patterns, may
make levels of performance possible that would be impossible without the
association.
As I advocate the S-K model for brain function, I'll advocate
both statistical and symbolic processing, and will assume that both work
together.
Paraphrasing and following Nick Chater, I'll advocate
statistical methods that presume that aspects of cognition can be understood
in terms of the apparatus of probability, statistics, information theory,
and decision theory, and will, at the same time advocate symbolic methods
that presume that aspects of cognition involve the formal manipulation
of structured symbolic representations.
There IS no inherent conflict between statistical and
symbolic approaches, so long as one does not ask the brain to be simpler
than it can reasonably be expected to be. I believe that LSA illustrates
that statistics can powerfully supplement symbolic operations in a sufficiently
parallel brain with prodigiously powerful processing. To describe animal
and human behavior, one must assume such a brain. The S-K model is proposed
as a step toward conceptualizing how such a brain might work.
NOTES:
1. It probably also makes sense to talk of other neural roles, such as pattern recognition, that do not easily classify as either statistics or symbol manipulation.
2. Cater, N. "Neural Networks: The New Statistical Models of Mind" in CONNECTIONIST MODELS OF MEMORY AND LANGUAGE J.P.Levy, D. Bairaktaris, J.A.Bullinaria, P.Cairns, eds University College London Press , 1995, p. 221
3. Keynes, J.M. A TREATISE ON PROBABILITY 1929; Harper Torchbook, Harper & Row, N.Y. 1957 Chapter 28, p 332.
4. Berger, P.L., and Luckmann, T. THE SOCIAL CONSTRUCTION OF REALITY: A Treatise in the Sociology of Knowledge Anchor, Doubleday, Garden City N.Y. 1967 p. 51.
5. Landauer T.K. and Dumais, S.T. "A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge" Psychological Review, v 104, n.2, 211-240, 1997.
6. 80 synonyms dropped from english as a second language standardized tests. op. cit. p 220.
11. 11 Landauer, T.K., Laham, D., Rehder, B. & Schreiner, M.E. "How well can passage meaning be derived without using word order: A comparison of Latent Semantic Analysis and humans" Proceedings of the Cognitive Science Society, 1997. http://lsa.colorado.edu/papers.htm