The research and teaching program will rely on the following themes:
Axis 1 - Reinforcement learning and stochastic optimization/simulation
In many cases, the “machine” that automatically produces decisions/predictions interacts with the environment from which the data comes (e.g. the “market impact” phenomenon in finance), the learning must then rely on a necessary compromise between “exploration” and “decision” and can be formulated like a stochastic optimization problem, whose solution requires the use of advanced simulation methods. The search for an optimum balance between “exploration” and “exploitation” can sometimes rely on human expertise, to define the best actions/decisions and to more easily identify the parameters of the models governing the system evolution. This is called active learning.
Axis 2 - Graph-mining and social network analysis
In the applications linked to the Internet or to the exploration of social networks for example, the data naturally takes the appearance of a graph, whose dimension is often too large to allow direct visualisation. The automatic extraction of very large network properties is one research axis considered in this Chair, its applications range from the study of information diffusion in a social network to the analysis of the hidden Web.
Axis 3 - Ranking and Anomaly Detection
In applications such as massive digital data base processing for the design of search or recommendation engines, the goal is not to learn to predict a label probabilistically associated to an observation (like in the case of supervised classification), but to learn to sort the possible values for the random observation vector in an order identical to the one induced retrospectively by the probability. However, the ranking doesn’t reduce to the (supervised) problem of learning an order but can also refer to the aggregation of orders or preferences, with applications in the field of meta-search engines or in database middleware.
Axis 4 - Cloud Learning and Distributed Learning Algorithms
The networks (Internet, social networks, etc.) have led to a real explosion of databases. As an example, at the end of 2013, the complete volume of the Internet represented 1 yottabite and in 2014, about 50 billion of pages were indexed by Google, for about 3 billion users. Such amounts of data can only be stored in a distributed way. Beyond the storage issue, it is the analysis of these “data clouds” that are a real challenge today. This is the reason why we are pursuing a very important research effort on the theme of asynchronous decentralisation of supervised (and non-supervised) online learning algorithms (e.g. consensus algorithms, gossip, etc.) as well as on software architecture allowing abstraction in this decentralisation, elaboration of online and distributed learning algorithms under explicit constraints of capacity (calculation time, memory, etc.)
Axis 5. Large Dimension Learning and Series/Time Data Streams
Signal processing techniques (e.g. filtering, computational harmonic analysis, source separation) are still widely unknown in the machine learning field, and a deeper knowledge of these could lead to the development of massive multi-varied approaches, multi-scale/frequency for the purposes of prediction and exploration. The time rate at which certain databases are updated (finance, e-commerce, Internet, etc.), sometimes in real time, motivates the search of efficient learning methods in a sequential and adaptive context. The capacity to efficiently represent complex data is often a key aspect of learning. Beyond the search of a mathematical formula for the statistical learning problems mentioned above, for algorithmic solutions, for a framework of theoretical validity for the latter and for experimental proof of concept, the research work will be dedicated to the question of the control and statistical evaluation of the performance of the proposed approaches.