marc santolini
Long term fellow
I am a network scientist, visiting researcher at the Network Science Institute in Boston and a research collaborator at Harvard Medical School. I am also the co-founder of Just One Giant Lab, a nonprofit initiative aimed at developing decentralized open science using smart digital tools.
Marc's Bio

I am a long-term research fellow at CRI, where I use Network Science to understand collective phenomena underlying collaborative research and learning. In my project "iGEM TIES (Team Interactions Study)", I use sensors and communication data to map the interactions among teams members and build predictors of team performance and learning, focusing on the iGEM competition as a model for collaborative research.

Personal webpage: https://marcsantolini.com/


My background is highly interdisciplinary. I majored in Physics and minored in philosophy of science at ENS, Paris. I followed up with a PhD in statistical physics applied to gene regulatory networks, and during my postdoc I have been investigating biology at all scales, from network medicine (protein interactome analysis) and personalized medicine to hospital network analysis, to the making of biology by studying the iGEM competition, an international student competition of synthetic biology. I have organized several satellites at various conferences (on Network Medicine at NetSci, on Hybrid Collective Intelligence at CCS) and am a co-organizer of the flagship conference in complex systems, the International Conference of Complex Systems in Cambridge, MA. More generally, I am interested in how decentralized open science can solve health related problems at scale in an inclusive way, and in 2018 I received the Sage Bionetworks “Young Investigators Award” for my work on “Algorithms and the role of the individual”.


My current research interests explore the role of networks in various domains, from biology to social science, using large-scale datasets. Behind all my research projects stands the question of understanding how collective phenomena emerge from elementary parts: cells from proteins, organisms from cells, teams from individuals etc. These collective phenomena exhibit macroscopic observables, or "phenotypes". As such, understanding the disease state of an individual through the lens of the molecular protein interaction network will involve a similar framework than understanding the performance of a team through the lens of team members interactions. My research mixes network science for representing the data and extracting relevant features, and machine learning / data science / physics to understand how these properties relate to the phenotype in question. It usually results in a set of predictions that can be experimentally tested for validation.


The current projects I'm pursing are varied and can be grouped under several umbrellas:


Team Science


  • Large scale analysis of team success in the iGEM competition. In collaboration with the Barabasi Lab at Northeastern University, we use data extracted from open wiki webpages from 2,000+ teams that have participated in the international Genetically Engineered Machines (iGEM) competition over the past 10 years. From these wikis, we can extract information on the collaborative structure of the project (who did what at what time), as well as which teams collaborated with other teams, and how teams re-combined BioBricks produced by other teams to make new innovative BioBricks. Features from this data are then associated to team success (medal, prizes, finalists) to explore their importance in performance and success.
  • Dissecting iGEM team interactions. In this project, we work with iGEM teams to study their interactions in the lab using proximity sensors (bluetooth enabled smartphone app) as well as communication data (e-mail, Slack...). This is an ongoing study with the goal to study 20+ teams in 2019. The proximity sensors are developed in collaboration with the Matter Lab at the Child Mind Institute in NYC.


Collaborative learning


  • Problem-based learning. We leverage a dataset of 1,000 teams from a medical school that have pursued a curriculum of collaborative problem based learning. Students can interact on a Moodle platform, allowing to reconstruct their interaction networks. Network properties can then be associated to grades to understand the healthy collaborative practices in learning.
  • The social network of learners. In collaboration with Orange Labs, we also study another dataset of 400+ learners from Madagascar for which we have answers to quizzes as well as interactions through mobile phones over a period of 3 months.


Science of Science / Science Design


  • Microscopic patterns of science in the making. Here we study the edition of laboratory notebooks and the progress of knowledge at the microscopic level, the one that occurs before the publication.
  • Measuring social bias in research. We analyse how laboratories around the world have created the "interactome" map of human protein-protein interactions in the past 30 years. By comparing this map with the recently obtained "ground truth" systematic map (obtained by checking random pairs of proteins for interactions), we can disentangle the importance of biology (involvement of a protein in a disease), topology (how far that protein is in the known network) and social bias (who published about that protein and is it a collaborator?) in the evolution of the knowledge about the interactome.
  • Science Design. In this project, we build recommendation systems for large open science projects. This work is done in the context of the JOGL (Just One Giant Lab) platform, with the key question being: if there are 1,000 people working on a same open science project, how can we filter information so that each person can contribute significantly without being overwhelmed by all of the project information?


Network Medicine


  • Role of non-coding RNAs in diseases. In collaboration with the Sharma Lab at Harvard Medical School, we study the impact of miRNAs and lincRNAs in the interactome to prioritize their role in diseases. We also built a disease similarity map based on shared miRNAs, showing high resolution to stratify disease subtypes.
  • Multi-layer interactome. We use the information on cell to cell and tissue to tissue communication to build the mutli-layer interactome representing the interactions between tissue-specific layers. Using this map, we study the spread of perturbations in the organism.
  • Personalized medicine. Using data from stressors or drug treatments at the individual level, we study how gene expression relates to the response using machine learning and biological networks.
  • Virus-host protein interactions. In collaboration with Thomas Rolland and Yves Jacob from Pasteur Institute, we use experimental data of interactions between virus and host proteins to decipher the interactome effects that explain different virulences of various strains of Ebola viruses.
  • Hospital network mobility. In collaboration with the Dhand Lab at Northeastern University, we study the mobility of millions of patients in the hospital system in California.


Network Physiology


  • Social network modules. Based on the "disease module" hypothesis in which genes linked to a similar disease state cluster in the same neighborhood of the interactome, we study similar effects in the context of "social" biomarkers that associate to social properties. In particular, we look at the role of hormones and their link to social network properties, such as betweenness centrality in a team. For this, we are designing low-cost DIY solutions to map the level of several hormones from saliva samples using changes of color of a strip that can be recorded on a smartphone.
  • Large-scale database on emotions. In collaboration with CRI fellow Felix Schoeller, we investigate what stimuli cause strong emotions, such as chills, using large scale data collection and analysis. To do so, we extracted a large dataset of Youtube videos for which individuals significantly experienced strong emotions (goosebumps), as given by the analysis of the comments. Using videos metadata as well as their underlying relational network, we reconstruct the various types of stimuli underlying these strong emotional experiences to build a first large scale unsupervised investigation of what causes chills.


AI and networks


  • Building an AI network scientist. In collaboration with CRI fellow Remy Kusters, we investigate how to use neural networks to 1/ learn and recognize various types of networks (Barabasi-Albert, Erdös-Renyi, Watts-Strogatz...) using convolutional neural networks, and 2/ learn finer network structure that distinguishes various networks of different sizes corresponding to team networks in the iGEM competition, with the goal to predict their end performance (medals).
Biomarkers of social interactions
Mapping the link between hormones, stress and social networks.
The Paleome: a network analysis of dinosaur ecosystems
Reconstructing dinosaur networks from site collection open data to better understand ancestral ecosystems.
iGEM TIES (Team IntEraction Study): mapping social interaction networks of iGEM scientific teams
In this project, we study the collaboration patterns of iGEM teams underlying their performance and learning using data-driven social network analyses
Social bias in collective knowledge production
Study the decade long discovery of the protein interaction network to measure social bias in knowledge production
The AI network scientist
Classification of graphs using Neural Networks
Quantifying innovation and innovators in Science
Mapping scientific trajectories to uncover the universal patterns behind paradigms
Big data for collective emotion analysis
We leverage large scale data from Youtube and Twitter to analyse the dynamics of collective emotions
Network seminars at CRI
This event brings together network scientists to discuss open network problems
ArchaeoEcology: a large-scale investigation of human ecosystems in deep time
This collaboration with Stefani Crabtree aims at investigating the network impact of humans in their ecosystems from archeological sites
Quantifying massive collaborations in open-source communities
We study the 7,000 most popular repositories from GitHub to quantify the universal organizational principles behind large-scale self-organized communities.
A quantitative study of interdisciplinarity and team performance in the iGEM competition
Quantifying how background and skills diversity affect team performance in iGEM teams
Mobility analysis
Analysis of mobility data for social good. Analysis of individual trajectories and global open data of human travels.
Collaborative learning from student forum and phone call data
Reconstruct student interaction network from online and phone call data to predict grades.
Radicalization of digital communities
Use large-scale datasets from Twitter to understand the dynamics of radicalization of digital communities
Can recommender systems enhance collective intelligence for patient-led research?
We are investigating how to make it easier for patients to do research together on the questions that matter most to them
COVID-19 Symptoms Twitter Analysis
Understanding how local tweets about symptoms correlate with case numbers