top of page
  • 13:30 - 14:00 Accueil des Participants / Participants Welcome
    .
  • 14:00 - 14:15 Mots d'Accueil / Welcome Address
    .
  • 14:15 - 14:35 Maurizio Vichi (Sapienza University of Rome, Italy). "
    .Abstract. In the era of Big Data and Knowledge Society, Statistics and Data Analysis are drastically changing. In this presentation, I will underline the high contribution of professor Edwin Diday towards a modern Data Analysis and Statistical Science.
  • 14:35 - 14:55 Yves Lechevallier (INRIA, France). The “carrée” function. An efficient and useful framework allowing the convergence of the “Nuées Dynamiques” method."
    .
  • 14:55 - 15:15 Christiane Guinot (PhG-BioConsulting, Paris & University of Tours, France). Application of Multivariate Data Analysis Techniques: Decorative Tattoos and Reasons for Their Removal."
    Abstract. If some tattoos are still handmade by amateurs, they are more and more frequently done with an electric tattoo machine by artists paid for their work. Anyway some people regret their tattoos one day and decide to have them removed. Currently laser treatment is the benchmark, but the result obtained depends on the natural color of the skin, the depth to which the pigments have been injected into the dermis, and the quantity and chemical nature of the pigments. In addition, this operation, which requires several sessions two months apart for the same skin site, is painful and expensive. The objective of our study was to describe a large sample of French people requesting the removal of a laser tattoo, to identify groups of subjects with similar characteristics and to investigate the reasons for the tattooing and the reasons of its elimination. The data have been analysed using Multivariate Data Analysis techniques. It involves three stages. 1) Processing of textual data: Textual analysis method (SPAD software, MOTS and SEGME procedure). 2) Description of the subjects and their tattoos: SAS software FREQ procedure (CHI² or exact Fisher test estimated by the Monte-Carlo method). 3) Search and description of profiles: Multiple correspondence analysis and ascending hierarchical classification method (SAS software, CORRESP, BENZECRI, PLOT, and CLUSTER procedures: Ward's minimum variance), then descriptive analysis (FREQ procedure: CHI² or exact test of Fisher estimated by the Monte-Carlo method). Four distinct profiles have been identified. Types 1 and 3 were made up of individuals with amateur, monochrome and custom tattoos. Their tattoo was done immediately after the decision to get it, and the motivation for its removal was often associated with a feeling of "embarrassment or shame". Type 1, mainly made up of men over forty with a primary education level, had medium or large tattoos. On the other hand type 3, composed of men and women of more than forty years with a level of primary or secondary education, presented small tattoos. Types 2 and 4 consisted of individuals with a high level of education with tattoos made by professionals. Unlike types 1 and 3, the realization of the tattoo was preceded by a time of reflection and the motivation for its removal was linked to aesthetic reasons. Type 2, mostly made up of women under the age of 30 with secondary or higher education, had small tattoos. Type 4, mostly made up of men under the age of forty, had medium to large custom tattoos. A number of reasons may cause some people to change their appearance with tattoos, including assertiveness and self-esteem, as well as a "fashion" effect linked to the increasing popularity of tattoos in all social strata. However, tattoos can become a social problem and create a psychological burden. It would therefore be desirable to set up clear information campaigns on tattooing and tattoo removal, targeted according to different subject profiles, as well as education programs for adolescents.
  • 15:15 - 15:35 Véronique Cariou (ONIRIS, INRA, Nantes-Atlantique, France). Clustering Multiblock Datasets in Sensometrics"
    Abstract. From the last decade onward, developments in sensory evaluation and consumer studies have led to the cluster analysis of higher-order data structures. In the scope of sensometrics, multiblock vs three-way structures are considered herein. Multiblock structures correspond to the fact that the same products (in rows) are evaluated according to sets of attributes (in columns) by different subjects, with each subject associated with one block. This situation arises in some rapid profiling techniques such as Projective mapping or Free Sorting. Other situations turn out to ask the subjects to evaluate the same set of products on the basis of the same set of attributes generating a three-way structure Product Attribute Subject, as in Check-All-That-Apply (CATA) experiments. Within the multiblock and three-way framework, the cluster analysis issue has gained in interest since the last few years to detect potential segments of subjects. After a review of existing strategies, two methods are discussed: CLUSTATISand CLV3W. They both extend the cluster analysis of variables around latent components (Vigneau and Qannari, 2003) to higher-order data structures, for the partitioning into clusters of blocks. With regard with CLUSTATIS for the cluster analysis of multiblock datasets (Llobell et al., 2020), alongside the determination of the clusters of blocks, a latent configuration is determined by the STATIS method. Alternatively, CLV3W(Cariou and Wilderjans, 2018) turns out to seek a clusterwise model (Diday, 1978) which corresponds to a clusterwise PARAFAC one applied on the 2ndmode (or resp. 3rdmode) of the three-way data array. These approaches have been implemented in the R packages ClustBlock and ClustVarLV. They are illustrated on the basis of datasets pertaining to Sensory Analysis and Consumer Studies. References Cariou, V., & Wilderjans, T. F. (2018). Consumer segmentation in multi-attribute product evaluation by means of non-negatively constrained CLV3W. Food Quality and Preference, 67, 18-26. Diday, E. (1978). Analyse canonique du point de vue de la classification automatique. Rapport de Recherche INRIA, n° 293. Llobell, F., Cariou, V., Vigneau, E., Labenne, A., & Qannari, E. M. (2020). Analysis and clustering of multiblock datasets by means of the STATIS and CLUSTATIS methods. Application to sensometrics. Food Quality and Preference, 79, 103520. Vigneau, E., & Qannari, E. M. (2003). Clustering of variables around latent components. Communications in Statistics-Simulation and Computation, 32(4), 1131-1150
  • 15:35 - 16:15 Pause Café / Coffee Break
    .
  • 16:15 - 17:35 Paula Brito (University of Porto, Portugal). Symbolic Data Analysis: The Legacy of E. Diday to the Big Data Generation"
    In the eighties of last century, E. Diday introduced Symbolic Data Analysis (SDA), as a new approach that would allow representing and analysing data containing information that classical data models cannot consider, as well as providing self-explanatory results of multivariate data analysis methodologies. The “model” for data representation should allow taking into account intrinsic variability, therefore allowing representing with a same language, e.g., elements and clusters of a given set. From the early age logical-rooted models for symbolic data representation, the discipline then evolved to tabular-based representations, defining new variable types to account for data variability: set-valued, interval-valued, distributional-valued variables. While from design this new framework allows representing and analysing complex data with internal variability, such as biological species, overcoming the need to resort to statistical summaries (means, medians), it progressively became clear that its scope is much broader than initially considered. Symbolic Data Analysis is most relevant in Big Data Mining applications, where huge sets of observations are collected but data should be analysed at a higher level, requiring a data aggregation step. This is the case of official data, but also of large surveys, company (e.g. department stores) databases or sensor data. SDA provides criteria for data aggregation while preserving its intrinsic variability, and methods for the multivariate analysis of the resulting complex symbolic data. Recent approaches also focus on the development of statistical methodologies to infer properties of underlying big datasets from higher-level symbolic summaries. Symbolic Data Analysis is a most powerful tool for such complex Big Data analysis; it will certainly play a central role in modern statistics and data analysis.
  • 16:35 - 16:55 Monique Noirhomme-Fraiture (Univerity of Namur, Belgium). L'aventure européenne de l'analyse de données symboliques."
    Abstract. De 1995 à 2004, l'analyse de données symboliques a bénéficié de financements de la Commission Européenne via des projets IST. Ceci a permis de fédérer des communautés de chercheurs et donnera plus tard naissance aux ateliers SDA. Nous souhaitons présenter les principaux acteurs de ces projets et leur contribution. Edwin, comme conseillé scientifique, y a eu bien sûr un rôle prépondérant.
  • 16:55 - 17:15 Boris Béranger (University of New South Wales, Sydney, Australia). Estimating Equations for Data Summaries”"
    Abstract. In the current data centric world, considerable efforts are made to analyse complex and big datasets. Symbolic data analysis (SDA) offers opportunities for the statistical modelling of data (symbols) possessing internal variation as the result of a measurement imprecision or the application of an aggregation function to reduce the size of the original dataset (data summaries). Traditionally, much of the work undertaken in this field involves the strong assumption that the underlying data is uniformly distributed with conclusions drawn at the symbol level. Recently, new methods have been developed where a parametric assumption is made about the underlying distribution and estimated directly using the symbolic data. However, such assumptions are sometimes not suitable and working within a non-parametric framework preferable. We thus propose a non-parametric counter-part to these parametric methods relying on concepts from estimating equations. This requires us to initially define an extension of the empirical likelihood to the symbolic context, allowing to then derive symbolic estimating equations and estimate some summary statistics of the underlying data (e.g. mean, variance, quantiles, etc.) directly using the symbolic dataset. This is permitted conditionally on the structure of the microdata within each symbol being well defined. Furthermore, we provide new non-parametric procedures to improve the estimation of the within-symbol structure of the microdata for interval and histogram-valued data. Improvements are demonstrated through various simulation studies and the utility of the proposed framework is illustrated on some real data analyses.
  • 17:15 - 17:35 Francisco de Carvalho (Federal University of Pernambuco, Recife, Brazil). The use of adaptive distances on the dynamical clustering algorithm."
    .
  • 17:40 - 18:20 Edwin Diday (University of Paris Dauphine). Advances in Data Science.
    Abstract. The aim of this talk is mainly to give explanatory tools for the understanding and extracting knowledge of standard, complex and big data. First, we recall some basic principle in Data Science: what are complex data? What are classes and classes of complex data? Which kind of internal class variability can be considered? Then, we define “symbolic data” which express the within variability of classes and we give some advantages of such kind of class description. Often in practice the classes are given. In standard machine learning, classes are given. When they are not given, clustering can be used to build them by the Dynamic Clustering method (DCM) from which piecewise Regression and Canonical Analysis, Mixture decomposition, Adaptive distances and the like can be obtained. Another way of obtaining classes is by using hierarchical or more gene pyramidal clustering. The description of these classes yields by aggregation a symbolic data table. We say that the description of a class is much more explanatory when it is described by symbolic variables (closer from the natural language of the users), then by its usual analytical multidimensional description. The explanatory and characteristic power of classes can then be measured by criteria based on the symbolic data description of these classes and induce a way for comparing clustering methods by their explanatory power. These criteria are defined in a Symbolic Data Analysis framework for categorical variables, based on three random variables defined on the ground population. Tools are then given for ranking individuals, classes and their symbolic descriptive variables from the more towards the less characteristic. These characteristics are not only explanatory but can also express the concordance or the discordance of a class with the other classes. An improvement of the standard Tf-IDF, the LDA (latent Dirichlet allocation) and the BLS likelihood of symbols are finally suggested. We suggest finally several directions of research and applications. Recent publications F. Afonso, E. Diday, C. Toque (2018) “Data Science par Analyse des Données Symboliques”. Book (448 pages). TECHNIP editor. L. Billard E. Diday (2019) “Clustering Methodology for Symbolic Data”. John Wiley & Sons Ltdt. (Print ISBN:9780470713938 |Online ISBN:9781119010401 |DOI:10.1002/9781119010401). Diday E., Rong G., Saporta G., Wang H., (editors and co-authors) (2020) Advances in Data Science (Symbolic, Complex and Network Data). ISTE WILEY Science Publishing Ltd. Emilion R., Diday E. (2020) " Likelihood in the Symbolic Context" Chapter 2 in Advances in Data Sciences, edited by Diday E., Rong G., Saporta G., Wang H. , (2020), Publisher: ISTE WILEY Science Publishing Ltd). . http://www.iste.co.uk/book.php?id=1597. Diday E. (2020) Explanatory Tools for Machine Learning in the Symbolic Data Analysis Framework. Chap 1 in Advances in Data Sciences, edited by Diday E., Rong G., Saporta G., Wang H., Publisher: ISTE WILEY Science Publishing Ltd). . http://www.iste.co.uk/book.php?id=1597.
  • 18:20 - 18:30 Clôture de la journée / Closing Session
    .

© 2023 par Fête à venir. Créé avec Wix.com

  • Black Facebook Icon
  • Black Twitter Icon
  • Black Instagram Icon
bottom of page