(Released on August 26, 2008)
The image, speech sound and text (language) information which are closely related to the human visual and auditory perception play important roles in the society, economy, national security and other fields, and they will remain the rapid growth in the coming period of time. Such information can be directly perceived and understood by human, and they can also be processed with computer, however the computer’s processing ability is far less than that of human, and its processing efficiency can not meet the development requirement of today’s society. How to use the human cognitive mechanism and the latest research achievements of relevant mathematic for reference to establish new computational models and methods will significantly improve the computer’s comprehension ability and processing efficiency to this kind of information. Furthermore, it not only can strongly promote the rapid development of information science and will also make significant contributions to economic and social development.
I. Scientific Targets
Focusing on major national demands and making important contributions to ensure the national security and public safety, to promote the development of information services and related industries and to raise the level of national life and health, the overall scientific targets of this Major Research Plan are to study and construct new computational models and methods based on the human visual and auditory cognitive mechanism and by giving full play to the advantages of interdisciplinary of information sciences, life sciences and mathematical sciences to improve the computer’s comprehension ability of unstructured visual and auditory perception information and its processing efficiency of massive heterogeneous information and overcome the bottleneck difficulties in the images, speech and text (language) information processing. The expecting progresses are as follows: the important progresses will be made in the basic theory research of visual and auditory information processing; the major breakthroughs will be achieved in three key technologies such as the collaborative computing of visual and auditory information, the Chinese understanding and the brain-computer interface related to the visual and auditory perception; and the unmanned vehicle verification platform with the perception ability to natural environment and the intelligent decision-making capability will be developed by integrating the related research achievements mentioned above, where its main performance indicators should reach the world advanced level. Therefore, the purpose of this Major Research Plan is to enhance China’s overall research strength in the field of visual and auditory information processing, to cultivate outstanding talents and teams with international influence and to provide the research environments and technical supports for the national security and social development.
II. Key Scientific Issues
Focusing on the key scientific issues such as “perceptual feature extraction, expression and integration”, “machine learning and understanding of perception data” and “collaborative computing of multi-modal information”, this Major Research Plan will organize and implement the following research works in four main aspects.
1. Image and Visual Information Computing
To mainly study the cognitive mechanism of image and visual information computing, the extraction and selection of visual basic feature, the object recognition and the understanding of image content, the behavior analysis of moving object in complex scenes and so on. To propose some high-performance computational models of images and visual information, to obtain internationally recognized and original research achievements (high-level papers should be published in journals such as Nature, Science, IEEE Trans. PAMI, etc.), and to cultivate outstanding talents and research teams with international influence.
2. Computing of Speech Sound and Auditory Information
To mainly study the mechanism of auditory perception and the scene analysis of speech, the speech recognition and synthesis in the natural environment, the analysis and understanding of oral dialogue and so on. To obtain internationally influential and original research results, to propose some effective computational models of speech and auditory information, to publish high-level papers in internationally authoritative journals in this field, and to cultivate outstanding talents and research teams with international influence.
3. Natural Language (Chinese) Understanding
To mainly study the cognitive mechanism of language processing, the modeling of language knowledge and the computational models for semantics, the machine translation method based on the semantic understanding, the network-oriented moderate understanding model of Chinese and the tools of serial analysis, the key techniques that support the analysis, recognition and understanding of oral dialogue in the natural environment and so on. Based on the existing and related results in the domestic, to comprehensively construct large-scale and high-standard Chinese semantic knowledge base. To apply the aforementioned research results to the typical language (Chinese) information processing system to significantly improve the comprehension ability of natural language (sentences, paragraphs, chapters), and to achieve verification in the network-based information retrieval, filtering and knowledge acquisition.
4. Collaborative Computing of Multi-Modal Information and Brain-Computer Interface
To mainly study the cognitive mechanism and computational models of collaboration of the multi-modal perception information, the pattern recognition and the environment interaction based on the fusion of visual and auditory information, the cross-modal video information retrieval and the sensitive information filtering over network and so on. To significantly improve the precision of cross-modal video information retrieval, and to remarkably enhance the overall research strength in this field.
To study the methods and techniques for extraction of brain signal, localization of brain regions and network analysis of brain function related to the visual and auditory cognition, the techniques of signal transmission, processing and control in brain-computer interaction, and typical applications of brain-computer interface related to the visual and auditory cognition. To be verified or applied in the aspects such as the improvement of life quality and the functional rehabilitation of the persons with disabilities, and to provide new techniques for extending and improving the human’s ability of behavior control.
III. Key Technologies and Integration and Verification Platform
Based on the research works mentioned above, this Major Research Plan will further study and develop the key technologies and the integration and verification platform related to the visual and auditory information processing.
1. Key Technology of Collaborative Computing of Visual and Auditory Information
To study the machine’s collaborative computational models of visual and auditory information and the techniques of system realization, the techniques for pattern recognition based on the fusion of visual and auditory information and the corresponding verification system, and the techniques of cross-modal video information retrieval and the sensitive information filtering over network and their applications. To make the precision of video information retrieval over network to be higher 5%-10% than the best level of foreign countries in the same period by using the computational models of multi-modal collaboration, and to be verified in the areas such as network information security and services.
2. Key Technology of Natural Language (Chinese) Understanding
To study the standardized semantic knowledge base of common vocabulary of Chinese and its construction techniques, the realization techniques of the network-oriented moderate understanding model of Chinese and the tools of serial analysis, and the key techniques supporting the analysis, recognition and understanding of oral dialogue in the natural environment. To comprehensively construct the semantic knowledge base of Chinese based on the existing related results in the domestic, where the size of common vocabulary of Chinese will be not less than 50 thousands words, and the size of Chinese balanced corpus base with semantic labeling will be not less than 10 million words. To be applied in the Chinese processing system under the network environment, where the accurate rates of information retrieval and knowledge acquisition should be significantly improved than that with the best available technique.
3. Key Technology of Brain-Computer Interface Related to Visual and Auditory Cognition
To study the techniques for extraction of brain signal, localization of brain regions and network analysis of brain function related to the visual and auditory cognition, the techniques of signal transmission, processing and control in brain-computer interaction and the system realization, and typical applications of brain-computer interface related to the visual and auditory cognition. The proposed information extraction and analysis techniques of non-invasive brain-computer interface should have the international leading level in the same period and should be verified or applied in the aspects such as the improvement of life quality and the functional rehabilitation of the persons with disabilities.
4. Integration and Verification Platform of Unmanned Vehicle
By integrating the related research achievements of the aforementioned basic theories and key technologies and combining the traditional model of visual computing with the new visual cognitive models, to achieve new breakthroughs in environment perception and modeling; to realize the information fusion with multi-sensor, cross-modal and cross-scale, to generate high-quality and three-dimensional map of scenes cognition, and to construct the high-performance verification platform of unmanned and intelligent vehicle; to provide the new key technology of intelligence-assisted safe driving based on the comprehensive analysis of people-car-road state, and to be verified or applied in the defense, intelligence-assisted safe driving and other related fields with important impact.
IV. Research Projects to Be Funded in 2008
This Major Research Plan will fund the applications in forms of the “fostering projects”, “key funding projects” and “integrated projects”, where their grant intensities and goals are different. The application that has good innovative and academic idea and research value but still requires further exploration will be granted as the “fostering project”, and the application that has better innovative and academic idea and research value, good research foundation and achievement accumulation, and great contribution to the overall targets of this research plan will be granted as the “key funding project”. The application that has a decisive role to the realization of the overall goal of research plan will be granted as “integrated project” with greater funding intensity. According to the annual progress or inspection results of project implementation, this Major Research Plan will be allowed to appropriately adjust the funding of the approved projects (suspension of the project or additional funding). The “key funding projects” and the related “fostering projects” in the following areas will be funded by this Major Research Plan in 2008.
1. Collaborative Computing of Multi-Modal Information
1) Research Direction of “Key Funding Project”: The Internet-oriented cross-media mining and search engine
By fusing the new methods from multiple disciplines such as the natural language understanding, image and video analysis and cross-media data mining, to study the effective content mining methods of Web text, image and video, and the effective techniques of Web analysis, to construct the vertical search algorithm with high-precision, high-speed and robustness, and to develop the Internet cross-media search engine for specific users.
Assessment Objective: The precision and recall rate of text content in specific areas should be more than 90%, and that of image and video content in specific areas be more than 70%; to be able to realize the quasi-real-time Internet cross-media content mining and search.
2) Research Directions of “Fostering Project”:
(1) The mechanism of selective attention for the interaction of text, image and speech perception information, and the semantic feature extraction;
(2) The “ambient smart” collaborative computing of multi-modal information.
2. Natural Language (Chinese) Understanding
1) Research Direction I of “Key Funding Project”: The semantic computing and understanding of Web text
To establish the high-standard, large-scale and expansible semantic knowledge base (including the core semantic labeling in the levels of term, sentence and chapter); to set up the semantic computing framework and computational model oriented to large-scale Chinese text; to study the content-based key techniques for information retrieval of Web text, event detection and content digest, and to realize the Web content understanding of particular semantic objectives.
Assessment Objective: The obtained method for semantic computing and understanding should be obviously superior to the non-semantic method, where the accuracy of information retrieval for specific Web text should be improved at least 20% than the existing technique.
2) Research Direction II of “Key Funding Project”: The content analysis and understanding of multi-modal oral dialogue
To establish the human oral dialog model by centering on the spoken language primarily and by integrating the cross-modal information of speech sound, vision, behavior, emotion and so on, and to support the understanding of spoken language dialogue and the man-machine interaction by voice.
Assessment Objective: Oriented to the areas such as voice navigation and voice communications for car, to develop the prototype system of multi-modal natural man-machine oral dialogue, where the understanding accuracy rate of specific dialogue should be more than 90%, the correct response rate of man-machine dialogue more than 80%, and the mission completion rate greater than 90%.
3) Research Directions of “Fostering Project”:
(1) The psychology study on the cognitive mechanism of Chinese language;
(2) The computational model of Chinese semantic suitable for sentence and chapter;
(3) The mechanism of auditory perception and the audio scene analysis;
(4) The resource sharing and evaluation methods for semantic computing and understanding.
3. Brain-Computer Interface
1) Research Direction I of “Key Funding Project”: The key techniques of man-machine interaction related to the visual and auditory perception
To study the techniques for the extraction of brain signal and the localization of brain regions related to the visual and auditory cognition, the techniques of information transmission, processing and control for the brain-computer interaction, and the applied techniques of brain-computer interface related to the vision and the sense of hearing.
Assessment Objective: To propose and realize the strategy of on-line and automatic mode learning for the brain-computer interaction, and to improve the robustness and adaptability of the brain-computer interface system. The research on the extraction, analysis and application of brain electricity information should have the international leading level in the same period and should be verified or applied in the functional rehabilitation of the persons with disabilities.
2) Research Directions of “Fostering Project”:
(1) The brain-computer interface based vehicle navigation and control technique;
(2) The non-invasive collection, transmission and processing of brain electricity signal;
(3) The new concepts and new methods of brain-computer interactive paradigm.
4. Cognitive Mechanism of Driving Behavior
1) Research Direction of “Key Funding Project”: The cognitive mechanism and neural expression of driving behavior—Selective attention and its relationship to action
The selective attention is an essential cognitive function of human driving behavior. A fundamental issue of the research on selective attention and that on the attention of driving behavior is: “what is the selection of attention?” This project requests to make a substantive breakthrough in this major issue of cognitive science, especially to propose and develop the original system theory of “object” based selective attention, and to be applied in the computational model of environment perception of unmanned driving. On the one hand, to establish the scientific definition of the concept of object expression and the accurate description of the cognitive mechanism of object expression and driving action interaction; on the other hand, by using various methods of brain functional imaging, to find the neural expression of cerebral cortex based on the object attention and driving action. As a result, to provide the basis of cognitive science for establishing the new model of driving behavior with the active visual function and the mechanism of “attention shift”.
2) Research Directions of “Fostering Project”:
(1) The perception learning of driving behavior;
(2) The cognitive mechanism of eye movement and visual attention and the active vision;
(3) The behavioral psychology and cognitive structure model of driver.
5. Integration and Verification Platform of Unmanned Vehicle
1) Research Direction of “Key Funding Project”: The key techniques for unmanned vehicle and system platform
Assessment Objective: Under the premise of complying with traffic laws, to realize the autonomous driving on the following three roads.
(1) Urban road: The unmanned vehicle is required to safely pass into and out of multi-lane traffic scenes, and it has the abilities of lane keeping, lane changing and overtaking, while the required traveling distance is approximately 5 kilometers; In some sections, the vehicle can pass through a series of parallel-parked vehicles and roadblocks and is able to travel to the location of designated parking spot. Test environment: The test road is relatively crowded, and there are several crossroads, while the vehicle is able to identify the obstacles and to re-select the traveling path.
(2) Highway: The requested traveling distance is about 2,000 kilometers, and the distance of manual intervention is less than 3%; The unmanned vehicle is able to safely and effectively overtake and merge into traffic, to be able to accurately identify the common traffic signs of highway and to make the right and safe driving action. Test environment: There are some road junctions of viaduct on the testing road.
(3) Rural road: Under the conditions of a variety of road surfaces (dirt roads, gravel, cement or asphalt, etc.), the requested traveling distance is about 200 kilometers, and the mileage of manual intervention is less than 4%; The unmanned vehicle be able to maintain lane, to safely follow and to beyond the front vehicle, and to stop, restart and bypass; and it can identify obstacles and can avoid collision with the pedestrians, bicycles, and other objects such as roadside trees, utility poles and so on. Test environment: The condition of some sections is bad (uneven road surface, different road width, without lane marking, etc.), there are some road forks on the testing road.
The above assessment objective is the objective of final inspection for this integration and verification platform required by this Major Research Plan, and the applicant of this “key funding project” may carry on the decomposition on it and propose the stage goals to achieve.
2) Research Directions of “Fostering Project”:
(1) The environmental perception of vehicle driving based on the cognitive mechanism and the information fusion of multi-sensor;
(2) The three-dimensional sensor usable to complex driving environment;
(3) The method and technique for high-quality and three-dimensional map generation oriented to driving environment;
(4) The key techniques of vehicle assistance safety driving;
(5) The test environment design and evaluation method for the unmanned vehicle driving
(6) The high-reliability planning method of local path for complex driving environment.
V. Basic Principles for Project Selection
In order to ensure the achievement of the overall targets, this Major Research Plan encourages the following researches:
(1) The exploratory research with the original ideas and unique characteristic;
(2) The research on key technology closely related to the overall target;
(3) The interdisciplinary and collaborative research involving the life, mathematics and information sciences;
(4) The research with the participation of outstanding overseas scientists.
VI. Notes for Application
1) Before writing proposals, the applicants should read this Project Guide carefully. The research content and goal of proposal should be closely related to the scientific targets of this Major Research Plan. The proposals which are out of the range of the Project Guide will be not accepted.
2) According to the research directions of the annual project guide, the applicants may self-determine the project name, scientific goals, research, technical route and the corresponding research funding.
3) When filling in the application form, “Major Research Plan” must be indicated in the column of funding category, “Fostering Project” or “Key Funding Project” in the column of sub-category, and “Cognitive Computing of Visual and Auditory Information” in the annotation. Appropriate application code should be selected according to the specific research contents.
4) The proposals will be handled by the Department of Information Sciences;
5) In 2008, about 20 “fostering projects” will be funded with the funding intensity not less than 500 thousand RMB Yuan/project and the project implementation period of 3 years, and about 8 “key funding projects” will be funded with the funding intensity not less than 3 million RMB Yuan/project and the project implementation period of 4 years, while the “integrated project” will not be funded. The total project funding in 2008 is approximately 35 million RMB Yuan.