What is text, really?
In the article about Natural Language Processing (NLP) we touched the question of what the text really is. And the difficulty to answer it precisely. Obviously, we can describe the visual definition of text (graphical representations of alphabet), we can touch the point of grammar of the language of the written text, or that there is a phonetic equivalent of the text. Still there are certain phenomena left unanswered. Phenomena, like the following example we have mentioned:
If I would ask you "what dughfsdkjh means?", you hopefully will answer that the word is some non-sense.
But if first I gave you the following text:
"Jack turned to John: so have you tried the fresh dughfsdkjh?
They are next to bananas on the stand over there, arrived together with
Thai coconuts. You really should try it, so juicy and delicious!"
You cannot help but to have an intuitive guess that it is some sort of a tropical fruit you haven't heard of yet.
Since today we work so hard in making machines to understand and use our language, we find here that its worth spending some time to understand the nature of verbal/written communication we as humans usually take for granted.
Value or a Reference?
One of the beautiful things about computers is that we create them instinctively in our own image. So that we now, as the creator of the machines -- can look into the intricacies of existing solutions to find reflections of ourselves.
In programming languages we have a differentiation between working on the direct value of a variable, or working with the reference (pointer) to the data stored in variable. On the picture above the value of the number "42" can be found by the address/reference of "Block 3" in some imaginary computer memory. For those further away from programming, an easy way to think about it is a following example:A taxi driver asked to pick a person in some city living under some address and deliver the person to the airport -- the driver doesn't need to know who the person is. The taxi drives needs just to know that the person satisfies some interface (like it is a human who can pay and fits with the luggage into the dimensions of the car). This is an example of working with the reference [we work with address as a blackbox]. If a driver is tasked to drive a particular person -- it is working with the value [driving a physical person to the airport loads "individual value" into the taxi car]. Here, the essential aspect is not the fact of the drive happening -- but the way we assume the work with the information.
It is hopefully clear so far: there are people and addresses where they live, there are values and references to those values. There are computers operating on numbers and communicating to each other with numbers. What all of this has to do with the nature of text? Lets take it a step further.
This picture is a Cone Mosaics of Retina found in humans eyes. Those are the colour-specific pixels of cameras through which we see the world. Each pixel is colour-specific and sends an electric impulse upon a photon of the corresponding wavelength hitting it. And while most probably you have compared the two halves of the picture already, my actual point is that the precise configurations of those mosaics are different among all humans. It is kind of obvious and known fact -- we have different DNA, different food and overall organism's growing conditions. There are variations on eyeballs sizes, densities of pixels and relative populations of specific colours of those pixels. But lets dwell on this fact a little longer.
In text or speech, we name colours -- like "red" or "purple". While we are small kids, before any language structures have a hold in our brains -- our parents show us those fancy little books, they show us a "green" tree, a "red" apple. They take us on a walk and we wait on "red" traffic light for the "green" one.
If you have followed me so far, hopefully you see the same facts: (1) there are no two humans in the world physically receiving the exact same colour information for the exact same visual input (understood as the combination of retina electrical signals); (2) in our childhood we go through the stage of conditioning of the basic "labelling" of objects seen around us, with particular words like colour names being those labels. We can even run a small methematical induction-like method to see that all of the words we use in speech are either introduced by other words we heard or introduced as a reference to an experience. This, in its turn, leads us to the conclusion that all of the words are references to the information -- not the actual value of the information.
This might be coming as a surprise. After all, in multiple languages I know there are sayings like "I explain to them in pure language X, but they don't understand". Those sayings are written in a form implying "how ridiculous it is -- I tell them the right words, but they don't follow me -- while they obviously should".
However it is only natural for us not to get each others words. Lets see what are the things we assume we all understand. As we are the same species, we share most of the biological means and needs. Every human who experienced dehydration knows what "thirst" is -- we just need to agree on the way to reference the concept to each other. Similarly everybody knows the feeling of "delicious". Imagine us trying to explain how the particular combination of amino-acids in a particular meal matches the need of body to recover muscles after a marathon [most of us don't have the language even to describe it, nor the sensual resolution mapping clearly it to words in the first place]. But what about expecting a person never expierencing thirst to understand what the "thirst" is?
On the picture above, the retina on the right is an example of retina of colour blind person (blind to the red colour in this case). Colour blind person still can use "red" as a word to describe the particular light on the traffic light, but the meaning is different compared to that of a person with retina on the left for whom traffic lights a different by being top-most and bottom-most.
What is hopefully a both -- clear and somewhat scary observation. The observation is that we cannot really completely recover the original information from the words we hear from other people. Indeed, we started from the observation that no two retinas are the same -- same goes to the information we store/decode on its basis. For example a colour blind person would use differentiation between green and blue in attempts to imagine the differentiation between red and green. What leads us to the next point.
Text and the dimensionality of information
So we made our mental experiment journey and we now see that words are references to the information experienced by human beings. On the example of colour blindness we see how words can be used by assosiation without the actual anchoring in the value the word is expected to cover.
We all naturally agree that experiencing something does not equal the description of the experience (the actually delicious meal vs friend talking about them having a delicious meal). However somehow when it comes to abstract concepts, we easily assume expectations that everybody who speaks "the Word" is on the same page. Being at school, hopefully we rememember how some classmates were good in memorising formulas and applying them to tasks vs classmates who were actually solving the thing every single time. The tricky part is that the end result on paper is the same, the words spoken are the same.So here we touch the somewhat psychologically uncomfortable topic of the dimensionality/resolution of the information physically existing in our brains. Usually we find it comfortable to talk about on surface -- like it is not hard to admit that an Olympic Athlet "knows/feels" body in higher resolution than I do (higher precision of information on what 'balance' is, higher precision of information of what 'oxygen level in blood' experienced internally is etc.). But if I would say to somebody directly "we don't see the problem on the same level of resolution/same number of dimensions" -- usually there is an offence. Since there is nobody really talking to you right now, I hope that no offence is taken and we can continue peacefully with the topic. Of course keeping in mind the fact of our personal limitations on perception of the information to that what we truly know.
The difficulty with the language, is that it allows us to mimic knowing something we don't actually know (including the mimicing for self). We can learn medical sentences and speak like doctors, without knowing anything about the human body actually. And this is what the previous generation NLP approaches like the one discussed used to do: a machine focused on the complexity of the language, not of the problem. A machine, similar to a student memorising material for a chosen subject and living in assumption of a particular physical form of the exam; a mapper of input words to output words.
Such machine might give an intelligent impression in the begining, especially for people who are not used to differentiate between the word and the concept it represents. However after a little while, it becomes visible that the dimensionality of such a simplistic NLP approach does not match the dimensionality of the real concept the words aim to describe.
Just to draft a bit of a background of a scale of the problem we are talking about, let's have a look into our species. We have six basic senses: sight, hearing, balance, smell, taste, touch. Those six senses can be described as electrical signals from the perspective of the central neural system (like sensors on the robot). We have mentioned briefly already that the resolution of each sense varies from individual to individual, and there is a tendency of a decay of underused/overused senses. There are at least 14 interoception senses -- senses from within the body, like hunger or wanting to wee. There are emotions, which cannot be mapped as clearly to electrical signals but which play an important role as a higher-level aggregator of the information. Emotions are more difficult to quantify, but from the daily living it is easy to see that the human ability for anger, fear, compassion also varies from an individual to an individual -- and so do the ranges of resolutions, the shades of the experience of an emotion. And there is knowledge, a web connecting experiences and observations, interleaving body memory of action to define a skill (for instance of a carpenter); and there is a wastness of variety of combinations of what the knowledge is, webs connecting senses, emotions, time, causes and effects among all of those.
Now, we put all of this richness of human experience into a thought-stream called text and we expect somehow to be well-understood by other humans. Even better -- we want to be understood by machines which do not share the base senses with us.
Where this all leaves us?
I don't feel like I'm qualified to answer this question better than you, my dear reader.
I can only say that in my current understanding, human daily experience in senses alone is 20-dimensional (each sense representing a dimension). Practically it is even more than that, because senses like sight are 4-dimensional (red, blue, green, and low-light types of sensors) and senses like touch are tracking not only a single sense data (hot-cold, touch/non-touch) but a certain subjective abstract data associated with shapes/sensations of the touch.
On top of that we have a multitude of emotion/knowledge dimensions.
We map it all to sounds/written form of language, which encodes those experiences so that they can be somewhat decoded on the other side by a different human being. The text follows certain grammar for that reason and provides us with the ability to reference concepts. This way we are able to start in the domain of commonly-shared among humans concepts and use them to refer to a higher dimensionality/less-common experience.
Maybe more-or-less this, probably much more than this.
But we can be pretty sure what the text is not -- it is not the objective physical information on its own (other than vibrations of the sound when we speak or pixel constellations on the screen). Text works only because we as a group agreed to its meaning and to the way the language can evolve. But as for the any chosen single word of a particular language -- we can rename red to blue, keep it this way for several generations and our actual physical behaviours won't change for as long as we are in sync. Words are not reality, but pointers to the reality. Unfortunately we are very skilled in forgetting this though.
Comments
Post a Comment