Jeff Hawkins is the founder of Palm and Handspring. He is also the author of the 2004 book “On Intelligence” in which he discusses his ideas and reveals recent discoveries about how the brain works and its relationship to the analysis of large sets of data. Earlier this month, Jeff published a paper Hierarchical Temporal Memory (HTM) in which it is shown that through advances in modeling the neocortex we are closing in on developing useful algorithms that can be used on real world computers.
HTM offers new computer programming possibilities and new ways about thinking about programming that has widespread uses. It maybe particularly helpful in the areas Artificial Intelligence and robotics where real-time learning and prediction in terms of anticipation and preparedness to react are very important.
There are three key points to understand about HTM; hierarchy, “sparse distributed representation”and time.
- Hierarchy is the efficient and robust method of recognising and predicting patterns by passing information from region to region in the neocortex until a stable memory is formed.
The benefit of hierarchical organization is efficiency. It significantly reduces training time and memory usage because patterns learned at each level of the hierarchy are reused when combined in novel ways at higher levels. This allows to learn something new and not have to relearn its components in the same way you don’t have to relearn the alphabet every time you come across a new word.
- Sparse Distributed Representation is the description how a minimal amount of cells are needed to recognise and predict patterns of information. When an HTM receives a signal, it will compare it to previously learned spatial and temporal patterns.
- Time denotes the idea that this is a continuously active and dynamic process that allows learning by inference and prediction. If somehow time were stopped nothing could be inferred, predicted or learned.
First, a little bit of neuroscience.
HTM is based on models of how the brain works, specifically the neocortex which consists of roughly 75% of the brain’s mass. It should be noted that it is not a simulation but a model based on how the brain itself processes information.
As Jeff Hawkins likes to demonstrate the neocortex can be imagined to be like a folded up tea towel. The compressed contours are all squeezed up together which results in high density of cells for a given space with a resulting high level of connectivity. However, if this notional tea towel which represents the topography of the neocortex were laid out flat it would only be five layers of cells deep.
Each of these layers are conformed slightly differently and seem to have a different roles. Most information travels horizontally through the layer but there is a vertical component as well. Information can traverse the layers via columns of cells.
The neocortex is divided up into two kinds of regions. One type of region receives input directly from our sensory apparatus, eyes and ears and so on. However, the other type of regions which forms the vast majority of the neocortex only receives input after the signal has passed through other regions before them. This transferring of information from region to region forms a hierarchy. At each stage each region will match the information it receives to previously learned spatial and temporal patterns. Being able to successfully match new inputs to previously stored sequences is the essence of recognizing and predicting patterns. The information passes from region to region and is its predictions are further refined until it has a solid value in a given region which can be termed a memory.
Also, as information moves up through the hierarchy it becomes a more stable and robust memory. This is because in each region acts as a pattern recognition system. If enough cells are lit up in an array across the layers and columns that correspond with a previously experienced stimuli then the region can be said to be acting as a predictor. Based on all the previous stimuli, a sight, a sound, a touch and so on, it has received it can, by having the same cells activated again and again over time, predict whether someone is looking at a ball or a tin can.
A very important point to take into account here is the importance of time. Because prediction has to take place across time there can be no static view at any given point. This is an important digression from conventional computing where time is largely irrelevant.
Like many of nature’s systems this hierachical method of processing information is highly efficient and remarkably robust. Only the cells most relevant to making an accurate representation of a pattern are lit up. A region can contain 10^6 cells but due to the differing natures of each layer and the array of columns as few as 20 cells need to be lit up to give the representation of a specific stimuli.
The algorithms that constitute the HTM have now been developed into a pseudocode which can be found at the site of Jeff Hawkins’s company Numenta. Here is a sample of the pseudocode. A fuller description can be found in Chapter 4 of the HTM paper.
The most immediate practical uses are in areas where there is a huge amount of data that contain a great deal of time-based statistics. These are a good place to start as there already exists a great deal of data to provide a solid basis for pattern recognition and detection.
- Credit card fraud – large amounts of pre-existing data allows for quick implementation of the algorithm
- Large sensor environments – handling data from multiple security cameras for instance
- Web click prediction – being able to statistically predict movement through a site
Since HTM works in a purely statistical manner. it just has to have enough data to create pre-existing patterns that it can compare new data with.
HTM is at heart a memory based system. Like a biological system, the learning algorithms in an HTM region are capable of “on-line learning”, i.e. they continually learn from each new input. As the patterns in the input change, the HTM region will gradually change too and new memory is formed.
These new algorithms promise greater efficiency because instead of having to have lots of power to handle lots of information we can now use the hierarchical system to handle only what is needed for pattern recognition and prediction to take place. A lot of this is based on what has been learned from the sparse distribution representation that takes place in the regions of the neocortex.
The challenge is in being able to adapt the thinking of programmers to factor time as part of the computing process. This HTM model requires the constant handling of predictive processes which can only take place over time.
But for now the most important juncture has been reached. We can now see how modeling of the neocortex can possibly lead to real world computing activity.
In this highly entertaining and informative video presentation “Advances in Modeling Neocortex and its Impact on Machine Intelligence” Jeff Hawkins does an excellent job of setting the context for HTM and outlining its importance.