Every two weeks, we find the most interesting data science links from around the web and collect them in Data Science Briefings, the DataMiningApps newsletter. Subscribe now for free if you want to be the first to get up to speed on interesting resources.
- Are We Really Making Much Progress? A Worrying Analysis of Recent Neural Recommendation Approaches
“We considered 18 algorithms that were presented at top-level research conferences in the last years. Only 7 of them could be reproduced with reasonable effort. For these methods, it however turned out that 6 of them can often be outperformed with comparably simple heuristic methods, e.g., based on nearest-neighbor or graph-based techniques.” - AI can read your emotions. Should it?
Advertisers, tech giants and border forces are using face tracking software to monitor our moods – whether we like it or not - The fight against deepfakes
Earlier this month, Rep. Adam Schiff, chairman of the House Intelligence Committee, expressed concern that Google, Facebook, and Twitter don’t have no clear plan to deal with the problem. - AI Algorithms Need FDA-Style Drug Trials
Opinion: Algorithms cause permanent side effects on society. They need clinical tests. - I wasn’t getting hired as a Data Scientist. So I sought data on who is
Instead of focusing on skills thought to be required of data scientists, we can look at what they have actually done before - Having mastered Space Invaders, chess, and Go, AI tackles video soccer
Google’s artificial-intelligence researchers have created a football simulator for training the next generation of machine-learning algorithms. - Seeing how computers ‘think’ helps humans stump machines and reveals AI weaknesses
Researchers have figured out how to reliably create questions that challenge computers and reflect the complexity of human language through a human-computer collaboration, developing a dataset of more than 1,200 questions that, while easy for people to answer, stump the best computer answering systems today. - Nvidia’s latest breakthroughs in conversational AI
Trains BERT in under an hour, launches Project Megatron to train transformer based models at scale. - Exploring DNA with Deep Learning
Neural networks are changing the way that Lex Flagel studies DNA. - This AI startup claims to automate app making but actually just uses humans
Who could have seen that coming? - Building a State of the Art Bacterial Classifier with Paperspace Gradient and Fast.ai
“In this post, we’re going to be demonstrating how to to build a state of the art Bacterial Classification model on Gradient using the Fast.ai machine learning library.” - Nvidia CEO calls AI ‘the single most powerful force’ as earnings beat expectations
While gaming remains Nvidia’s primary source of growth, the company’s artificial intelligence business is growing rapidly. - AI is here to stay. Now we need to ensure everyone benefits
How to raise the potential of AI? - AI is in danger of becoming too male – new research
“A review published by the AI Now Institute earlier this year, showed that less than 20% of the researchers applying to prestigious AI conferences are women, and that only a quarter of undergraduates studying AI at Stanford and the University of California at Berkeley are female.” - When students get stuck, Socratic can help
“Socratic, a mobile learning app we acquired last year, now uses AI technology to help high school and university students when they’re doing school work outside the classroom.” - Model-agnostic feature importance through ablation
Feature ablation is a technique for calculating feature importances that works for all machine learning models. (Interview question: which feature importance technique is also model agnostic and works better?) - Reverse Engineering the Big Mac™ Index
“Although the BMI indicates which countries are ‘expensive’ and which are ‘cheap’, it doesn’t explain why. Economists have written extensively on the why, for example this paper specifically about the BMI. Partly for fun, and partly because economists’ explanations are not entirely satisfying, I decided to attempt to reverse engineer the BMI.” - Models for integrating data science teams within organizations
“In this post, I compare some of the popular models of integrating data science teams within organizations.” - A new tool uses AI to spot text written by AI
We’ve come full circle. - Machine Learning and Data Science Applications in Industry
Looking for inspiration where and how data science is being applied — this curated list has it all! - An Interactive, Automated 3D Reconstruction of a Fly Brain
Google visualizes a fly brain using cloud computing and neural networks. - Listening to the neural network gradient norms during training
A somewhat strange idea and not the most pleasant to listen to, but shows interesting alternatives to follow up on model training. - A Gentle Introduction to the Progressive Growing GAN
Progressive Growing GAN is an extension to the GAN training process that allows for the stable training of generator models that can output large high-quality images. - When BERT meets Pytorch
A walkthrough of using BERT with pytorch for a multilabel classification use-case - How To Get A Data Science Hiring Manager To Take You Seriously
Some good advice. - MineRL: Towards AI in Minecraft
“We want to solve Minecraft using state-of-the-art Machine Learning! To do so, we have created one of the largest imitation learning datasets with over 60 million frames of recorded human player data” - Python vs Rust for Neural Networks
“I’ll definitely reach for rust in the future when I need to write optimized low-level code with minimal dependencies. However using it as a full replacement for python or C++ will require a more stabilized and well-developed ecosystem of packages.” - spaCy meets PyTorch-Transformers: Fine-tune BERT, XLNet and GPT-2
“In this post we introduce our new wrapping library, spacy-pytorch-transformers. It features consistent and easy-to-use interfaces to several models, which can extract features to power your NLP pipelines.” - ‘Anonymised’ data can never be totally anonymous, says study
Findings say it is impossible for researchers to fully protect real identities in datasets - L2 Regularization and Batch Norm
“In particular, when used together with batch normalization in a convolutional neural net with typical architectures, an L2 objective penalty no longer has its original regularizing effect. Instead it becomes essentially equivalent to an adaptive adjustment of the learning rate!” - King – Man + Woman = King?
Some of the best known examples used to explain the power of prominent Natural Language Processing tools (like Word2Vec) only seem to work with some cheating.