Every two weeks, we find the most interesting data science links from around the web and collect them in Data Science Briefings, the DataMiningApps newsletter. Subscribe now for free if you want to be the first to get up to speed on interesting resources.
- One Dataset, Visualized 25 Ways
““Let the data speak.” It’s a common saying for chart design. The premise — strip out the bits that don’t help patterns in your data emerge — is fine, but people often misinterpret the mantra to mean that they should make a stripped down chart and let the data take it from there.” Great article on data viz!
- NanoNets : How to use Deep Learning when you have Limited Data
“One common barrier for using deep learning to solve problems is the amount of data needed to train a model. The requirement of large data arises because of the large number of parameters in the model that machines have to learn.”
- What is a GPU and Why Do I Care? A Businessperson’s Guide
While 2016 was the year of the GPU for a number of reasons, the truth of the matter is that outside of some core disciplines (deep learning, virtual reality, autonomous vehicles) the reasons why you would use GPUs for general purpose computing applications remain somewhat unclear.
- AI Could Transform the Science of Counting Crowds
The Trump administration’s controversial attempt to declare its recent presidential inauguration as having “the largest audience to witness an inauguration, period,” has inadvertently highlighted the fact that counting crowds remains a painstaking and inexact science. But the rise of artificial intelligence could soon spare crowd scientists the task of manually counting heads.
- 6 areas of AI and machine learning to watch closely
“Here are six areas of AI that are particularly noteworthy in their ability to impact the future of digital products and services.”
- Unlearning descriptive statistics
“If you’ve ever used an arithmetic mean, a Pearson correlation or a standard deviation to describe a dataset, I’m writing this for you. Better numbers exist to summarize location, association and spread: numbers that are easier to interpret and that don’t act up with wonky data and outliers.”
- Scaling Recommendation Engine: 15,000 to 130M Users in 24 Months
“Delivering users with precise product recommendations (recs) is the creative force that drives Retention Science to continue to iterate, improve and innovate. In this post, our team unveils our iteration from a minimum viable product to a production-ready solution.”
- Beringei from Facebook
Beringei is a high performance, in-memory storage engine for time series data.
- missingno: Missing data visualization module for Python
Messy datasets? Missing values? missingno provides a small toolset of flexible and easy-to-use missing data visualizations and utilities that allows you to get a quick visual summary of the completeness (or lack thereof) of your dataset. It’s built using matplotlib, so it’s fast, and takes any pandas DataFrame input that you throw at it, so it’s flexible. Just pip install missingno to get started.
- Clipper: A Low-Latency Online Prediction Serving System (paper)
“In this paper, we introduce Clipper, the first general-purpose low-latency prediction serving system. Interposing between end-user applications and a wide range of machine learning frameworks, Clipper introduces a modular architecture to simplify model deployment across frameworks.”
- Understanding Deep Learning Requires Rethinking Generalization (paper)
“Despite their massive size, successful deep artificial neural networks can exhibit a remarkably small difference between training and test performance. Conventional wisdom attributes small generalization error either to properties of the model family, or to the regularization techniques used during training. Through extensive systematic experiments, we show how these traditional approaches fail to explain why large neural networks generalize well in practice.”
- CommAI: Evaluating the first steps towards a useful general AI (paper)
“With machine learning successfully applied to new daunting problems almost every day, general AI starts looking like an attainable goal. However, most current research focuses instead on important but narrow applications, such as image classification or machine translation. We believe this to be largely due to the lack of objective ways to measure progress towards broad machine intelligence. In order to fill this gap, we propose here a set of concrete desiderata for general AI, together with a platform to test machines on how well they satisfy such desiderata, while keeping all further complexities to a minimum.”
- Deep learning algorithm does as well as dermatologists in identifying skin cancer
In hopes of creating better access to medical care, Stanford researchers have trained an algorithm to diagnose skin cancer.