Computational Attention & Applications
***
Matei Mancas
_____________________________________________________________
[Computation attention model] A proposed computational attention model based on rarity and signal simplification
[Engineering applications] Some applications of computational attention in the egineering field
[Predictions for neuroscience] Predictions of the proposed model on the brain organisation
[Attention models comparison and validation] Some codes and mouse-tracking data
[Related links] Other links on (computational) attention
[Contact] Here you can contact me and download the PDF version of the book edited after my PhD thesis
Welcome on the Computational Attention Web Page !
Effect of attention on an image as observed (eye tracking, Dr. Le Meur - Thomson R&D) and as predicted by the presented model.
|
Why this web page?
This web page deals with computational attention (sometimes called saliency or more specifically visual attention). An increasing number of papers in the engineering field include attention-based techniques. Even if this web page presents a specific point of view on computational attention it may be a good introduction and the starting point of a more exhaustive research.
The scope of this page may be summarized in the following points:
-
It introduces my work on Computational Attention initiated by my PhD thesis at the Engineering Faculty of Mons.
-
It intends to explain the very important transversal impact that computational attention in general may have within computer science and artificial intelligence fields.
-
It provides several predictions which arise from the proposed model about how brain may be organized. These predictions are just hypothesis and only studies in neuroscience may confirm them or not.
-
It provides with some codes and a website providing online mouse-tracking results for still images.
-
It provides with links to most of the webpages dealing with attention in the engineering field.
|
What is Attention?
"Everybody knows what attention is." (William James - Principles of Psychology, 1890)
Attention is a simplification or filtering process which transforms a huge acquired unstructured data set into a smaller structured one while preserving the main information. All cognitive processes need attention; humans pay attention (consciously or unconsciously) from their birth to their death in every single moment. Attention is even used during the dreams and the R.E.M. (Rapid Eye Movements) sleep phase.
Nevertheless, attention is not specifically a human process but it is simply used by any living being from humans to insects.
Attention is the beginning of intelligence: there is no intelligence without attention!
Similarly to the fact that attention is the beginning of intelligence in biology, computational attention may be the starting point of artificial intelligence in engineering applications.
Computational attention provides machines with human-like reactions and behaviours and let them free to make decisions even in unexpected situations:
- A computer which pays attention is able to be surprised and interested in novel data.
- A computer which pays attention is able to understand novel situations and to choose the important data it will learn.
[^ Back to Menu ^]
Computational Attention Model
This section provides a short description of the philosophy of the proposed attention model. More details concerning this computational attention model are available in my thesis at Chapter 3.
Rarity: the key of attention
Rare messages are more important! This example shows two messages: m1 which is "I love you!" and m2 which is "I hate you!". If your partner tells you during several days that he (she) loves you,
the attention you pay to this sentence will logarithmically decrease as the probability of this sentence grows. But if he (she) says "I hate you!" just once, you will be surprised and pay
maximum attention. Of course, if he (she) tells you every day "I hate you!" you will also logarithmically gate used...
Many publications defined dozens of "salient" features which can attract human gaze. Nevertheless, a salient feature can become less salient if it is repeated many times in an image or a signal. Attention is not driven by a specific feature. Heterogeneous or homogeneous, dark or bright, symmetric or asymmetric, moving or static objects can all attract attention.
Human beings are more interested in pairs of features and their variations. More specifically, humans are attracted by the features which are in minority in a signal.
That is why attention seems to be based on rare regions or events.
Rarity is here a synonym to anomalous and interesting areas. Humans and more generally any life form are programmed to find and isolate an anomaly in a natural scene because this anomaly could be either a food source or a danger (as a predator). This definition is though based on the evolutionary concept of survival and vision or hearing are the most important attributes leading to survival.
An intuition about attention is that the main factor leading to saliency is event rarity or novelty which induces an impression of abnormality and surprise. Rarity can be quantified by the simplest concept of the information theory: the self information. The equation (which appears on the top-image) shows that the attention inferred by a message is inversely proportional to the message occurrence probability and it has a logarithmic behavior.
The proposed model
Serial data simplification! A three-step model which progressively selects the important regions from multimedia data is proposed.
The proposed model is based on a double influence:
- The bottom-up information takes into account the rarity of the input signal features to compute an attention (or saliency) map. This is a universal algorithm which can be applied on any signal.
- The top-down information takes into account a priori knowledge about the current situation. This approach is able to adapt to a particular situation or application.
Bottom-up:
The originality of this model is to propose a serial simplification of the input data using three steps:
- The low-level step: It uses simple features as amplitude, intensity, movements or colour which are available in the early neural paths as described here: for vision, these features are available in the superior colliculus and the lateral geniculate nucleus. This first analysis allows us to eliminate most of the homogenous or regular textures which will have a very low attention score compared to the others.
- The medium-level step: It uses behavior features as size, orientation, motion direction or duration which are available mainly in the cortical areas (as the primary cortex for example). These features are only computed on the remaining regions of the signal after the low-level step. The result of the medium-level step is an attention map where areas containing rare shapes are highlighted.
- The high-level step: It uses complex features extracted from log-polar patches. The attention map computed with the previous steps is serially inspected from the most important area to the others by comparing their log-polar patches. The region with the lower mean correlation score will be the most important. High correlation score between an inspected patch in the current signal and a previously stored patch in the memory can lead to object recognition. This is also the starting point of the top-down influence: a region in the current signal which highly correlates with a patch within the memory will appear as very interesting.
The choice of the log-polar patches is first du to the cone receptors radial density in the retina (biological reason) and to the fact that log-polar patches are invariant to scale or rotations.
Top-down:
The model includes two top-down information classes:
- Normality information or atlases: For a given application, "normal data" is included in the computation when available. Normal data will increase the occurrence probability of normal events while leading to rarer unusual events.
- Task-oriented information: The knowledge about the mean behaviour of observers or the knowledge about specific objects which attracts attenion (like faces) can help in modelling a weighted map to be mixed with the bottom-up attention map.
Relative influence of Bottom-up and Top-down attention
From left to right: top-down models of attention for natural scene images, advertisements and web sites.
Observers' behavior can be modeled by using eye-tracking or other alternative methods such as mouse-tracking to detect their gaze path. The mean of the gaze path of several observers is called a priority map and it highlights, for one image, the areas
where the mean of a set of observers mostly looks. A top-down model can be achieved by using the mean of the priority maps obtained for a specific set of images (images with common meaning). The more the
set of images is specific, the more the top-down model is accurate.
A very interesting conclusion of this article is that the more one knows about an image, the higher the top-down influence part will be. On the other side, for
an unknown image, the bottom-up attention mechanism will be very important. Thus, if the role of top-down information in attention is always very important, its part in the attention process depends on the amount of knowledge that a mean observer may
have on a given kind of images.
On one side, bottom-up attention is oriented in learning which areas of an image are the most relevant mainly for new images and situations. On the other side, top-down attention uses already learnt situations to select some areas of the current image by inhibiting those where there are very few
chances to find relevant information. Bottom-up and top-down attention interaction aims in optimizing reactions to both novel and already experienced situations.
[^ Back to Menu ^]
Applications of Computational Attention
As there is no intelligence without attention for living beings, computational attention is useful each time intelligence has to be added to a computer algorithm.
That is why, there is a huge application field for computational attention in the computer science and engineering fields. This section aims in showing some applications of computational attention within the signal processing area, nevertheless, any area which needs smart data processing needs attention algorithms.
This list of applications is of course not exhaustive, and if you have references or links to other applications using attention in signal processing, data optimization, etc... do not hesitate to send them to me by e-mail.
Automatic pathology detection and localisation in medical imaging
|
Computational attention results on medical images when using a bilateral symmetry feature. On top: CT-scan images and tumours segmented (red contour), On bottom: results after using the bilateral symmetry rarity and the topographical reconstruction (red: higher pathology probability)
|
The first step in medical imaging is to identify a robust feature which alerts the radiologist on the possibility of pathology presence.
Two main features were used to apply computational attention in the medical imaging: the pixels grey-level and the bilateral symmetry.
The pixel grey-level:
Global low-level attention can be applied directly on the CT-scan image in order to find rare grey-levels. A low-level atlas built by using healthy CT scan slices brings some top-down information about grey-levels and enhances the results.
Nevertheless, this approach only works on the images where tumours recognition main feature is the pathological pixel grey-level compared to normal tissues grey-level. This is not always the case on the head and neck images which were used here.
But grey-level difference may be the main feature in the case of other body parts which have no specific geometric properties. This may be the case of the liver for example, where only the grey-level variations should be enough to detect pathologies.
The use of symmetry:
The bilateral symmetry is a very interesting feature in many body parts. It works well in head and neck but it also work (even better) in brain and other symmetric areas.
Symmetry is detected on the log-polar representation of the CT-scan image centred on the throat. A low-level atlas models the specialist knowledge about symmetry in head and neck by adding symmetry measures from normal slices
(top-down information). Results are encouraging and they are computed on the whole patient volume: most of the tumours slices are detected and this technique is efficient enough to highlight the main areas which have a high probability to contain pathological regions.
The more asymmetric grey-level most likely to be part of the tumour is used close to the airway to choose a seed for a proposed topographical distance which is able to reconstruct coarsely the tumour.
The image on the left shows some results of this technique.
Details on the application of computational attention in medical imaging are available in my PhD thesis at Chapter 5.
|
|
IN BRIEF
Computational Attention is able to automatically provide a 3D attention map that a radiologist may use to look in priority to the volumes which are likely to be pathological. This map is also very useful in avoiding the radiologist to miss some areas which may be pathological.
|
Automatic defect localisation in machine vision
|

Computational attention results in machine vision applications. Left: Initial apples with the segmented defects (red contour), Middle: result after the global low-level approach, Right: result after using a healthy apple atlas (top-down information).
|
The global rarity low-level approach is applied after a simple pre-processing step which eliminates the background and part of the uneven illumination due to the apples boundaries. The use of an atlas of concatenated healthy fruits (top-down information) provides an important efficiency growing.
The results are very interesting. A 256 fruit database (red image modality only) was used to test the algorithm. Some defects are very well highlighted, and even simple thresholding should be able to provide defect segmentation.
Other defects are better visible in other imaging modalities (blue, green, etc...), thus the results on the red modality are not clear enough to perform a simple thresholding-based segmentation.
Nevertheless, the results are very interesting in choosing important areas, where a more complex classifier should pick its features to classify the fruit.
Computational attention is very useful before any complex classification task as it reduces the candidate pixels and data confusion and it speeds up the classification process.
This technique may be used not only on fruit defect detection but on any industrial product detection. More details concerning the use of computational attention in machine vision are available in my PhD thesis in Chapter 6.
|
IN BRIEF
Computational attention techniques are able to roughly locate defects in machine vision applications. These detected areas contain the most "interesting" thus informative features which should be used in training supervised classifiers like state vector machines or artificial neural networks.
An attentive computer is able to choose the data it will learn: attention is the first step towards self-training machines.
|
Automatic event detection in audio signals
|

Computational attention results in audio processing. From left to right: initial spectrogram of an audio event, low-level attention map, the low-level attention map after using top-down information, medium-level attention map.
|
|
Attention is applied on each frequency line for the low-level step. This method is efficient and it can be adapted to a real-time analysis.
The 3L model is also relevant for audio signals: a list of events is available after the low-level step. This list is simplified and modified by the medium-level step. Finally the high-level step should select very few events from this list. These more significant events are recognised and memorized. Little by little the initial rough signal is reduced and structured around the most important events. Data is transformed in a meaningful structure: attention is thus a filter with a crucial role in understanding and in memory.
Results are effective in audio event detection: the beginning of the event is detected even in very noisy environments. More details on audio events detection are available in my PhD thesis at Chapter 8.
|
IN BRIEF
Computational attention is very useful in audio surveillance applications, where unexpected (thus impossible to model) events may occur. In this case an operator can receive an alert for audio tracks reaching a given level of attention. Moreover, smart recording systems can be set using this technique.
If only the events are recorded on the whole night, the surveillance operator will only have a few minutes of recordings to hear to get a summary of what happened during all the night.
|
Anisotropic signal filtering
|
Attention-based anisotropic image filtering. From left to right: initial image and more and more low-passed ones. Homogenous areas and spatial textures are more and more fuzzy while the important areas of the image remain unfiltered or are less filtered.
Attention-based anisotropic audio filtering. From left to right: initial audio spectrogram and more and more low-passed ones. Temporally homogenous areas and temporal textures become very fuzzy while important audio events remain unfiltered.
|
|
Computational attention can be used in signal filtering and enhancement. As some areas are more important as others, a los of information on the less important areas may be better accepted by humans, or it may be less important in specific applications.
A very interesting property of the computational attention model is to provide low attention score to homogenous textures: The more a texture is regular, the less it is important as it repeats itself.
The attention score is thus a good texture regularity measure. This is one of the differences of an attention-based anisotropic filtering differences compared with classical methods as Perona and Malik which are mainly based on preserving important edges.
The proposed global attention model is useful in keeping defects and anomalous regions unfiltered, while filtering uninteresting noisy or regular textured areas. This is the second big difference between the proposed approach and classical approaches: defects or pathologies are less well handled by the edges strength then by their rarity in an image. The proposed filtering technique will keep defects intact.
More details about the attention-based anisotrpoic filtering are available in my thesis at Chapter 7 sections 7.1 to 7.3.
|
IN BRIEF
An attention-based anisotropic filtering is different from a classical edge-based anisotropic filtering. The relative perceptual importance of the areas in a signal is better taken into account by using an attention-based approach: some areas with lower edges score may be important and others with very high edges amplitudes (like textures) may be much less important. Moreover, attention maps can be computed for still images but also for video sequences and audio signals.
|
Signal coding and smart transmission
Computational attention has a natural application in lossy image or signal coding. As for the previous application, it is less dangerous to lose information on uninteresting areas then on interesting ones.
The idea is to compress with higher compression rates (lower resolution) the areas which are less important and to use lower compression rates (higher resolution) for important regions. Three different methods were used for attention-based lossy compression and interesting conclusions
were achieved.
Three main points were used in compression assessment:
- The compression gain compared to the equivalent isotropic compression
- The complexity of the algorithm (the compression algorithm need a lot of changes or not ?)
- The flexibility of the algorithm (the ability to transmit first important information and than less important ones on a heterogeneous network
|
Region-based coding approach: lower JPEG compression rates were used for important areas
This first approach is the direct implementation of the idea of a different compression depending on the regions in the image. It may be implemented by using the ROI properties o the JPEG 2000 norm, but here the basic JEPG technique was used.
The image is divided into squares, and each square is coded with a compression rate which is inversely proportional to its attention score.
The compression gain is not huge compared to the classical JPEG image compression and this compression difference depends upon the JEPG quality of the higher resolution regions.
The complexity of the algorithm is quite high.
The flexibility is very efficient.
Texture-based coding: First row: initial images, Second row: The textures areas as computed by the attention-based algorithm (white areas may be considered as textures, black areas cannot), Third row: image reconstruction with no change for the non-texture regions and by tiling a square for the texture regions.
This second approach uses the texture redundancy and the ability of the presented attention model to find regular textures. Once a texture regular enough is found, only a square is coded, and the rest of the texture is reconstructed by simply tilling several times the same initial square.
Colour information is here used in addition to the regularity information to find textures with similar colour properties and similar regularity.
The compression gain compared to the classical JPEG image compression is better than in the previous method but it remains limited.
The complexity of the algorithm is very high.
The flexibility is less efficient: to change the overall quality of the result, the parameter set to be changed is quite complex.
Anisotropic filtering-based coding. The anisotropic filtering technique is used before classical compression (JEPG). Upper line: the frame of 3 video sequences at time T very heavily filtered, Bottom line: the frames at time T+1 les filtered.
Finally, the third method uses a smart signal pre-processing based on the anisotropic filtering presented in the previous application. The main difference with the two other methods is in the fact that computational attention is used as a preprocessing step and no adaptation is required for the compression algorithm.
This approach seems the simplest and the most efficient.
The compression gain compared to the classical JPEG image compression is good but it remains limited. Moreover, the compression gain depends few on the JEPG quality of the higher resolution regions.
The complexity of the algorithm is very low. No changes to the compression algorithm are required: only a pre-processing step is required.
The flexibility is very efficient.
More details concerning the application of computational attention to image coding are available in my thesis at Chapter 7.
|
IN BRIEF
The work on attention-based compression led to several interesting conclusions. One of them is that the JPEG algorithm is good! Important compression rates improvement may be achieved only if most of the less important areas information is almost completely lost. Nevertheless this fact disturbs the human visual system and the compression artefacts are very visible within these regions. The main advantage in using attention-based compression is in the algorithm flexibility and transmission over the network. The best method implemented here is the third anisotropic filtering technique: for low compression rates improvements (compared to classical JPEG) the image can even be enhanced (most of the noise is low-pass filtered), the compression algorithm must not be altered which implies decoding compatibility with existing JPEG codecs (only compression changes by the use of the anisotropic preprocessing method) and finally the flexibility of the algorithm is very interesting as smart progressive image transmission may be easily achieved.
|
Image ergonomics
|
Web-site visual ergonomics study. From left to rght: initial web site, regions attended by human gaze (obtained by mouse-tracking tests), predicted attention map.
|
Perceptual zoom in images: automatic zoom (larger box) and maximum zoom (smaller box).
|
Two applications have been developed within the image ergonomics concept. The first one concerns an automatic perceptual zoom for images to be displayed on small screens. The second application is in the field of the quantification of the visual documents efficiency: do they attract customer's eyes attention on the important message?
Perceptual zoom:
An increasing number of portable devices as smart phones, PDAs, iPods which are able to provide multimedia content are available. There screens remain narrow compared to classical screens (computer, TV, etc...) and the display of small images may be problematic. A method which is able to automatically zoom onto the regions of interest of multimedia documents is in this case very important for the users' visual comfort.
The left image shows the results of the proposed attention-based perceptual zoom. This zoom is obtained by thresholding the low-pass filtered attention map. Tests were made on a 100 images database and a threshold is enough to zoom on the areas which perceptually are the most interesting.
An interesting conclusion of this study is that this threshold should not be too selective as important areas need some context
to be accepted by humans. On the left image, the automatic zoom is the larger yellow box: for all the images the cropped image is meaningful. The smaller yellow box is the maximum zoom (if a more important zoom is achieved, the cropped area will be more than 20 times smaller than the previous zoom which is not realistic: this means that a higher zoom will imply a too high information loss).
Visual documents efficiency:
Computational attention is able to provide an attention map which predicts the human eye behavior. The top image shows in the middle the real regions which attracts human gaze (mouse-tracking results) and on its right the predicted map. These maps are quite similar which shows that there is a hope to predict human gaze for precise applications by using both bottom-up and top-down information.
The top-down information was obtained as the mean of the mean observers in a mouse-tracking test on web-sites and it shows the importance of the top-left corner in this kind of documents. Other top-down information must be added as face-detection (a face is a very powerful top-down attention attractor. This may be due to the prediction of the memory organization displayed in the next section) and the rarity between the part of the text and images within the document (a small text in a document full of images must be highlighted while few images in a text document have also to be highlighted).
The application of visual document efficiency which was developed proposes a prediction of human gaze using both bottom-up and top-down information, a confirmation of the fact that the important regions are really seen by customers and, if this is not the case, some proposition to enhance the important regions saliency is provided. This system can be applied in many situations: advertisement, curriculum vitae evaluation, merchandising (where should I put my product in a supermarket in order to be well seen by costumers?), web site evaluation, slides-based oral presentation evaluation, emergency signs efficiency validation ...
More details concerning the application of computational attention to image ergonomics are available in my thesis at Chapter 9.
|
IN BRIEF
Computational attention is very useful in image ergonomics. It is able to provide accurate perceptual zooming on images. It can also provide quantitative and objective evaluation of the efficiency of any visual document.
|
Object video tracking
Object tracking and image registration applications both use the high-level attention step of the proposed attention model.
|
Attention-based video tracking: Top-line: low-level tracking, high-level tracking initialisation, target is lost, target is found again. Bottom-line: fixation of the target before loosing it, fixation to taarget candidate 1, fixation to target candidate 2
There are two object tracking strategies in humans: low-level and high-level tracking.
Low-level tracking
The top-image shows the principle of the two tracking systems. The low-level tracking is achieved where there is no possible confusion between the tracked object (the target) and any other. In the left image of the top figure
the target is within a red box. A second yellow box represents the target immediate neighborhood. If no other object is within this immediate neighborhood, there is no possible confusion and the target can be followed with very few cognitive load in a reflex way and using only the luminance information.
In order to model this tracking, the target in frame at time T+1 is the object overlapping the target from frame at time T.
The second image shows the limitations of this first approach: if another object (blue box) comes into the immediate neighborhood of the target (yellow box), there is a possibility of confusion between the target (red box) and this object (blue box). In this case, the high-level tracking is initiated.
High-level tracking
A fixation on the target will get log-polar information about its spatial and colour information. This fixation is shown in the first image of the bottom line of the top figure and it will need a conscious step of looking to the target, thus a high cognitive load.
If the target and the object fuse, the target is lost. In this case, after a certain time, the object and the target will split again (third image of the first line of the top figure). In this case, we do not know anymore where the target is and where the object is: that is why both candidates are within two green boxes.
In order to find the target, a fixation is done to both objects (the middle and right image from the bottom line of the top figure). Than, a comparison between the previous fixation of the target and the two fixations of the candidate objects allow us to find which one is the target (red box on the right image on the top line of the top figure).
More details about the attention-based tracking are available in my thesis at Chapter 10 sections 10.1 and 10.2.
|
IN BRIEF
Two levels of object tracking were found. There is a low-level tracking which is reflex and which has a low computational load when no confusion is possible between the target and another object. If confusion between the target and another object is possible a high-level tracking which needs several fixations and comparisons is achieved: this second step is conscious and needs a high cognitive load (proportional with the number of possible distractors).
|
Image registration
Object tracking and image registration applications both use the high-level attention step of the proposed attention model.
|
Attention-based image registration. Top line from left to right: reference image, attention map with the highest importance point and again the reference image. Bottom line from left to right: the image to be registered, its attention map with its two most important points, and the registered image.
Image registration is very important in applications which need image comparison. A good registration between two images guarantees that both of them are in the same coordinate system and a comparison between them makes sense.
The low- and medium-level attention maps of both the reference image and the image to be registered are computed as on the middle row of the top figure. These maps are weighted by a centered Gaussian model. This is done because a centered Gaussian top-down model is the best one for natural scenes.
Moreover, it allows more importance to the center of the images where the interesting features have the best chance to be in common within the two images.
As scale or rotation transformations may occur between the reference image and the one to be registered the features located on the boarder of the images may not be found in both images.
The log-polar patch centered on the most important point of the reference image is compared with a list of log-polar patches centered on the most important points from the image to be registered. In this example the second most important point from the image to be registered corresponds with a very high correlation score to the more important one of the reference image.
The correlation of log-polar patches can be achieved even in case of rescale or rotations of the image because of the properties of a log-polar representation which transforms rescale and rotations into translations which are well handled by simple similarity measures like linear correlation.
When a common point is found in both the reference image and the image to be registered, the values of scale and rotation changes can be roughly obtained from the translations of the log-polar patches.
This point is than centered and the inverse transforms are applied to the image to be registered. The result is shown on the third row of the top figure: on top the reference image and on bottom the registered image.
Only one corresponding point is enough to lead to a registration by using log-polar patches. Even if the result has not a huge precision, it is it efficient and fast. This is also the case in humans where a precise comparison between two images needs several fixations on both images and a very high cognitive load.
More details about the attention-based image registration are available in my thesis at Chapter 10 section 10.3.
|
IN BRIEF
Computational attention is useful in obtaining human-like reaction. This is also the case in image registration which is not very precise but very fast and efficient. At least one eye fixation per image is needed, but in most of the cases, several fixations are achieved within the two images.
|
Other applications
There are of course many other applications of computational attention which use attention in signal processing, data optimization, etc... In this section some other applications which were not especially described within my thesis are listed.
If you see other applications of attention and saliency within the engineering field, please send them to be by e-mail.
- Lighting localization.
This application is in the field of audio event detection. Several microphones can be used to detect automatically thunders, and thus compute the place of impact of the corresponding lightening.
Important top-down information is available on the fact that thunders have mostly a low-frequency signature.
- Object recognition.
Computational attention is very useful in CBIR (Content-Based Image Retrieval) by its ability to select important points and compare these points efficiently with log-polar patches which are not sensitive to rotations or scale changes.
This a a major application of computational attention and it needs low-level, medium-level and high-level analysis to be achieved.
- Robotics.
Robots are moving computers which intend to react like living beings. In robotics, attention is crucial in selecting the good data to analyze in self-localization for example: interesting points already seen are important clues to realize that the corresponding areas have already been inspected.
- Architecture study.
This application is in the field of visual ergonomics. In order to improve the saliency of building shapes or to study the impact of illumination on the way humans perceive their surroundings, attention maps may have an important impact.
Top-down information on the way people look to a city can also be used to improve the overall results: the top-down information is different between a classical European city (more horizontal) and New York for example (more vertical).
- Automatic focus
Also in the filed of image ergonomics, nowadays digital cameras focus automatically to maximize the edges amplitude. Attention-based focus is more efficient especially when textures should not necessarily be in focus.
- Elaborated HCI (Human Computer Interfaces) as avatars
This application stands within the framework of man-machine communication. An avatar can move its eyes very naturally when focusing on areas which should really attract humans’ eyes. The person who looks to the avatar is very important, but if somebody runs in its back, the avatar should be disturbed by this event as a human would be!
|
[^ Back to Menu ^]
Computational Attention Predictions for Neuroscience
Currently, neuroscience and psychology provide evidence and experiments which were extensively used by engineers to build the current computational attention models. The engineering field used information from psychologists or neuroscientists but they did not provide many results back to them.
Nevertheless, current computational models are more and more sophisticated which lead to an interesting question: Will computational models be able soon to predict some aspects of the brain's cognitive organisation? If the answer is yes, a very interesting interaction between engineers, psychologists and neuroscientists could be achieved.
Concerning the presented computational model, there are three main unchecked predictions about the attentional mechanism and the memory organisation. These three hypothesis are detailed in the following section by increasing importance order.
Hypothesis I: on visual features
This first prediction is that attention mechanism may have three levels and not only two. This means that two categories of features exist:
- Primitive features like amplitude, intensity, movement or colour should be perceived in priority.
- Behavior features like size (intensity spatial variation), orientation, speed (intensity temporal variation), movement direction, duration (amplitude time variation) which may be perceived just after the primitive features.
The bottom-figure shows that primitive features are likely to be computed within very early visual path (like SC or LGN) while behavior features need to wait for the V1 area to be computed.
More details about the two classes of features are available in my thesis at Chapter 3 section 3.1.2 and 3.1.3.
|
Some visual features may be processed serially. Primitive features and global behavior features are computed in the early visual path like the Superior Colliculus (SC) or the Lateral Geniculate Nucleus (LGN). Behavior features and local computation needs to wait until the primary visual cortex region V1 to be computed. As some features are computed before the others within the visual path, they may also be perceived serially.
|
IN BRIEF
There are two classes of features which are computed and perceived serially.
|
Hypothesis II: on the role of the Lateral Geniculate Nucleus (LGN)
The second prediction deals with the role of the Lateral Geniculate Nucleus (LGN). The LGN is able to provide very rapidly (it is an early structure) important information about the current eye fixation: global direction, shape and colour are available. Moreover, the log-polar Fourier decomposition is able to produce a Fourier-Mellin-like processing: the bottom-figure shows on bottom-right the LGN structure providing global orientation (Theta arrow) and global frequency decompositions from layer 1 (green and low-frequency LF) to layer 6 (brown and high-frequency HF).
There is another interesting point: 20% of the afferent LGN neurons come from the retina while 80% come from the cortex. This means that the LGN needs a huge amount of data from the cortex (huge cortical feedback).
A hypothesis of the role of the LGN is it compares the current image fixation with the long-term memory coming from the cortex. If this is true, the LGN may play a huge role in attention and pattern recognition as opposed to the role of neural relay it currently has. This hypothesis also infers a more important role of the primitive brain in general.
More details about the role of LGN are available in my thesis at Chapter 2 section 2.2.4.
|
The LGN may have an important role in fixation comparison and recognition. It has an important cortical feedback and a structure (bottom-right image) which is close to a Fourier decomposition of the log-polar fixation.
|
IN BRIEF
The LGN may be a key structure in high-level attention and object recognition and not only a neural relay: it is a "primitive cortex" and it may not have lost all its functions only because the cortex development.
|
Hypothesis III: on the memory structure
The third prediction concerns the memory structure. It arises from the proposed foveated patches approach and from the observation than top-down information importance depends on the fixated image structure. As for the two other predictions there is no evidence of such structure so far.
Memory may be split into two parts:
- A master concept map may contain the basic concepts. Then for each concept a pointer may redirect further processing to the corresponding specialized memory area. A hypothesis about the location of the master concept map is in the inferior occipital gyrus.
- This specialized memory area contains more specific information about a basic concept and it may be more or less developed: the specialized memory area of faces is very important while the specialized area for stones for example should be less important (unless the person is a geologist).
A hypothesis about the location of the specialized memory is in the lateral fusiform gyrus.
The master map contains concepts which are common to many people within the same culture, but also personal concepts which may not be shared with a huge group of people. Interesting top-down information should mainly be generated by concepts which are common to many people (text, faces, etc…)
The interesting point is that the master concept map only includes pointers to the specialized memory area. Further learning about each concept is not simply concatenated to the master concept map, but added in parallel brain areas which act like plug-ins on the master concept memory. The importance of the specialized memory depends on each person, but some common concepts may have an important specialized memory: this is the case with faces. In this case, the top-down influence is very important.
More details and proofs about this parallel plug-in memory structure are available in my thesis at Chapter 10 section 10.5.2.
|
There may be two areas in memory: a general concept area and a specific memory area containing more specialized details about each concept
|
IN BRIEF
Memory may be split into two areas: a master concept map containing general concepts and a specialized area containing details for each of the concepts into the master concept map. These specialized areas come as plug-ins on the master concept map. The plug-in approach is similar to the cortex which comes as a plug-in into the primitive brain. The brain appears to be a concatenation of layers plugged the ones into the others and seems to have an onion-like layer architecture.
This architecture opens the system to any evolution and new capacities may be added over the old ones ad infinitum.
|
[^ Back to Menu ^]
Attention Models Comparison and Validation
Here you may find some codes of attention models to compare with yours. You can also find "gold-standard" mouse-tracking results to validate your models or ideas on attention. Also, if you have a specific set of images you can get a top-down model for this set of images.
However, validation is not an easy task and it is no universal method for this. If you want to know more about attention model validation issues, mouse-tracking and their use to build top down models, please see Chapter 3, section 3.7 of my PhD thesis.
Do not hesitate to send me some links of other free codes of attention algorithm (even very simple as some of those bellow).
Bottom-up attention models comparison:
Here there are some codes of computational attention models:
A very simple global rarity algorithm inspired from the algorithms presented in my PhD thesis. This is not the LG1 or the LG2 method, but only a fast implementation of the global approach. No local information or spatial orientation is used here. May be interesting for images with rare defects which have low contrast. Free to use for research purposes. For still images.
A very simple local contrast algorithm inspired from the local part of the LG1 algorithm presented in my PhD thesis. This is not the LG1 algorithm and no global information or spatial orientation is taken into account here. May be interesting for images where the local contrast is the most important. Free to use for research purposes. For still images.
The Saliency Toolbox was updated by Dirk Waltherand inspired from the famous Itti&Koch saliency model. Free to use for research purposes. For still images.
A very simple video attention top-down model. It uses a classical motion detection algorithm, but than it adds top-down normality motion atlas information. This was written and inspired from my PhD thesis. Free for research purposes. May be interesting for fixed background videos containing temporal textures as moving trees or flickering lights. For video sequences.
|
Attention models evaluation and top-down models:
A free mouse-tracking utiliy was set-up at the TCTS Lab of FPMs. You can upload your images and get the mouse-tracking results on these images. You may also upload entire sets of specific images and than ask for the top-down model to the website administrator.
This tool is called Validattention and it is available at http://tcts.fpms.ac.be/~mousetrack.
|
[^ Back to Menu ^]
Computational Attention Community
Here you will find a list of links related to computational attention either in the theoretical or in the applicative field. It is important to notice the following things:
- This list is not exhaustive and if you know a web page dealing with computational attention and which is not listed here, please send it to me by e-mail.
- The order is a chronological order and does not mean that the first pages present better results than the last ones or inversely.
- I can only list here web pages links (no direct links to documents like PDFs)...I cannot list all the articles dealing with computational attention. You can make a small webpage with your own links to your articles and a small presentation of your work and send it to me.
Koch Laboratory is headed by C. Koch who initiated with S. Ullman a model of a biological visual attention mechanism which led to one of the first biologically-inspired algorithmic implementation of attention.
Itti Laboratory is the lab of L. Itti who built one of the most famous attention algorithm of visual attention based on the Koch and Ullman architecture. Current interesting research within this lab deals with surprise and top-down models of attention.
F. Stentiford webpage presents a computational attention model based on a global approach of comparisons between pixel neighborhoods in an image. This work was one of the sources which inspired my own approach.
O. Boiman webpage presents a computational attention model based on a global approach of comparisons between pixel patches in a signal. Results seem impressive.
J.K. Tsotsos webpage presents a work on both neurological and computational sides of attention. The proposed selective tuning model shows an interesting mix of bottom-up and top-down processes into an unique framework.
N. Bruce webpage presents an information-theory based computational approach. The idea of the use of self-information is common with the model presented on this page. Nevertheless the two models are quite different.
Center for perceptual systems presents a "computation" section with very interesting ideas. A was impressed at the beginning of my work by a talk of Prof. W. Geisler who was another starting point of my research in attention.
Aude Oliva research page: interesting bottom-up and top-down approach. Gist and fast scene understanding are also presented.
Feng-Gui web page: interesting interface to get in "real-time" a bottom-up attention map of the images you upload. Details on the algorithm are for instance unavailable. Mainly advertising and web-site ergonomics oriented, but some other applications are available.
[^ Back to Menu ^]
Contact me
Do not hesitate to contact me if you have questions or suggestions ! You may find my mail and e-mail coordinates on my homepage at http://tcts.fpms.ac.be/~mancas/.
The PDF version of the book edited after my PhD thesis is available here.
[^ Back to Menu ^]
|