Every two weeks, we find the most interesting data science links from around the web and collect them in Data Science Briefings, the DataMiningApps newsletter. Subscribe now for free if you want to be the first to get up to speed on interesting resources.
- AI could help with the next pandemic—but not with this one
“The hype outstrips the reality. In fact, the narrative that has appeared in many news reports and breathless press releases—that AI is a powerful new weapon against diseases—is only partly true and risks becoming counterproductive” - Coronavirus: We Need Better Data Hygiene
In the case of the coronavirus, the data so far are simply terrible, and it is imperative that we recognize this - What makes forecasting hard?
Forecasting pandemics is harder than many people think - Simulating a pandemic: some visualizations we liked
Washington Post, Melting Asphalt, and 3Blue1Brown - AI Employed to Model Progression, Research Drugs, Therapies to Fight Coronavirus
“A team at Boston Children’s Hospital is using machine learning to scour social posts, news reports, data from public health channels and information supplied by doctors, for warning signs of how the virus is moving” - Lung Infection Quantification of COVID-19 in CT Images with Deep Learning
In this paper, a deep learning based segmentation system is developed to automatically quantify infection regions of interest and their volumetric ratios in lungs - A graph-based functional API for building complex scikit-learn pipelines
baikal is a graph-based, functional API for building complex machine learning pipelines of objects that implement the scikit-learn API. It is mostly inspired on the excellent Keras API for Deep Learning - Why your brain is not a computer
For decades it has been the dominant metaphor in neuroscience. But could this idea have been leading us astray all along? - A Gentle Introduction to tidymodels
tidymodels focus on making all the tasks around fitting the model much easier - A visual debugger for Jupyter
“Today, after several months of development, we are glad to announce the first public release of the Jupyter visual debugger” - StyleGAN2 Distillation for Feed-forward Image Manipulation
- Lagrangian Neural Networks
“In this project, we introduce a complimentary class of models called Lagrangian Neural Networks (LNNs)” - AutoML-Zero
“AutoML-Zero aims to automatically discover computer programs that can solve machine learning tasks, starting from empty or random programs and using only basic math operations” - Finding Mona Lisa in the Game of Life
Reversing the game of life using SAT solvers - GIG: A Practical Method for Explaining Diverse Ensemble Machine Learning Models
Generalized Integrated Gradients (GIG) is Zest AI’s new credit assignment algorithm that overcomes the limitations of Shapley by applying the tools of measure theory - AI learned to realistically change the time of day in the photo
- Can computers ever replace the classroom?
“With 850 million children worldwide shut out of schools, tech evangelists claim now is the time for AI education. But as the technology’s power grows, so too do the dangers that come with it” - Ten Considerations Before You Create Another Chart About COVID-19
Do more to understand the numbers than just downloading and diving right into a dataset - Microsoft can filter out the sound of you eating potato chips on a conference call
Microsoft’s built new technology for its Teams software that can identify your voice and filter out any other sounds - Musical Robot Learns to Sing
Has Album Dropping on Spotify - A constructive prediction of the generalization error across scales (ICLR 2020)
“The dependency of the generalization error of neural networks on model and dataset size is of critical importance both in practice and for understanding the theory of neural networks. We present a functional form which approximates well the generalization error in practice. We show that the form both fits the observations well across scales, and provides accurate predictions from small- to large-scale models and data” - Microsoft’s AI determines whether statements about video clips are true
“Researchers affiliated with Carnegie Mellon, the University of California at Santa Barbara, and Microsoft’s Dynamics 365 AI Research describe a challenge that tasks AI with inferring whether a statement is entailed or contradicted by a given video clip
The idea is to spur investigations into video-and-language understanding, they say, which could enhance tools used in the enterprise for automatic meeting transcription” - Fast subsets of large datasets with Pandas and SQLite
Let’s say you have a large amount of data, too large to fit in memory, and you want to load part of it into Pandas