Every two weeks, we find the most interesting data science links from around the web and collect them in Data Science Briefings, the DataMiningApps newsletter. Subscribe now for free if you want to be the first to get up to speed on interesting resources.
- Algorithms Designed to Fight Poverty Can Actually Make It Worse
How algorithms designed to alleviate poverty can perpetuate it instead. - Artificial Intelligence Hits the Barrier of Meaning
Machine learning algorithms don’t yet understand things the way humans do — with sometimes disastrous consequences. - On Hold for 45 Minutes? It Might Be Your Secret Customer Score
Retailers, wireless carriers and others crunch data to determine what shoppers are worth for the long term—and how well to treat them. - This Thermometer Tells Your Temperature, Then Tells Firms Where to Advertise
Kinsa says its smart thermometers are in more than 500,000 American households. - A controversial artwork created by AI has hauled in $435,000 at auction
“A portrait created using an AI program has fetched $435,000 in auction at Christie’s, blowing the expected price of $7,000 to $10,000 out of the water.” - Develop and control: Xi Jinping urges China to use artificial intelligence in race for tech future
Beijing must ensure firm grasp of area’s core technologies, Chinese leader says. - China can apparently now identify citizens based on the way they walk
The “gait recognition” technology is already being used by police in Beijing and Shanghai where it can identify individuals even when their face is obscured or their back is turned. - Baidu’s Chinese-to-English translator finishes your sentence for you
Baidu has developed a new technique for smoother real-time machine translation. Also see demo movies here. - World’s first AI news anchor unveiled in China
The ‘tireless’ artificial news readers simulate the voice, facial movements, and gestures of real-life broadcasters. - I’m an Amazon Employee. My Company Shouldn’t Sell Facial Recognition Tech to Police.
Amazon’s ‘Rekognition’ program shouldn’t be used as a tool for mass surveillance. - These new tricks can outsmart deepfake videos
… for now. - PDC 2 – Walking Through the Official Document on China’ Social Credit System
“If you follow news on politics or China, you probably know the existence of Social Credit system in China. But all the Western media are too fixated on the citizen score and surveillance part. Let me walk you through the real Social Credit system.” - Scaling Machine Learning at Uber with Michelangelo
“In this article, we reflect on the evolution of ML at Uber from the platform perspective over the last three years.” - Uber’s Big Data Platform: 100+ Petabytes with Minute Latency
“In this article, we dive into Uber’s Hadoop platform journey and discuss what we are building next to expand this rich and complex ecosystem.” - Importance of Skepticism in Data Science
An online notebook discussing the importance of being skeptical of data science results, with examples. - Building a fly brain in a computer
Researchers from two different CIFAR programs collaborate to show fruit flies can do more than previously thought possible. - Can Optical Illusions fool Artificial Intelligence too?
“The software predictions are matching human experience.” - Reinforcement Learning with Prediction-Based Rewards
Impressive results from OpenAI: “We’ve developed Random Network Distillation (RND), a prediction-based method for encouraging reinforcement learning agents to explore their environments through curiosity, which for the first time1 exceeds average human performance on Montezuma’s Revenge.” - Spinning Up in Deep RL
More from OpenAI: “We’re releasing Spinning Up in Deep RL, an educational resource designed to let anyone learn to become a skilled practitioner in deep reinforcement learning. Spinning Up consists of crystal-clear examples of RL code, educational exercises, documentation, and tutorials.” - My secret sauce to be in top 2% of a kaggle competition
“I developed some standard ways to explore features and build better machine learning models. These simple, but powerful techniques helped me get a top 2% rank in Instacart Market Basket Analysis competition.” - Horizon: Facebook’s Open Source Applied Reinforcement Learning Platform
“Horizon is an end-to-end platform designed to solve industry applied RL problems where datasets are large (millions to billions of observations), the feedback loop is slow (vs. a simulator), and experiments must be done with care because they don’t run in a simulator.” An interesting take on the normal RL environments! - Generating custom photo-realistic faces using AI
Controlled image synthesis and editing using a novel TL-GAN model. - Zero-shot learning: Using text to more accurately identify images
Researchers at Facebook have developed a new, more accurate ZSL model that uses neural net architectures called generative adversarial networks (GANs) to read and analyze text articles, and then visually identify the objects they describe. This novel approach to ZSL allows machines to classify objects based on category, and then use that information to identify other similar objects, as opposed to learning each object individually, as other models do. - Extended Isolation Forest for Anomaly Detection
“This is a simple package implementation for the Extended Isolation Forest method. It is an improvement on the original algorithm Isolation Forest. The original algorithm suffers from an inconsistency in producing anomaly scores due to slicing operations. Even though the slicing hyperplanes are selected at random, they are always parallel to the coordinate reference frame. The shortcoming can be seen in score maps as presented in the example notebooks in this repository. In order to improve the situation, we propose an extension which allows the hyperplanes to be taken at random angles. The way in which this is done gives rise to multiple levels of extension depending on the dimensionality of the problem. For an N dimensional dataset, Extended Isolation Forest has N levels of extension, with 0 being identical to the case of standard Isolation Forest, and N-1 being the fully extended version.” - A (Long) Peek into Reinforcement Learning
“In this post, we are gonna briefly go over the field of Reinforcement Learning (RL), from fundamental concepts to classic algorithms. Hopefully, this review is helpful enough so that newbies would not get lost in specialized terms and jargons while starting.” - Distributed Filesystems for Deep Learning
More training data gives predictable gains in prediction accuracy - Basilica: Extract usable features from high-dimensional data
“Basilica is an API that embeds high-dimensional data like images and text. You send us e.g. an image, and we send you back a vector of floats.” - Raster Vision: A New Open Source Framework for Deep Learning on Satellite and Aerial Imagery
Azavea announces the release of Raster Vision, a new open source framework for deep learning on satellite and aerial imagery. - Improving AI language understanding by combining multiple word representations
“A novel approach to using word embeddings (where words or phrases are mapped to sequences of numbers that represent their meaning) for natural language processing (NLP) that dynamically selects the right types of embeddings for the task at hand. These dynamic meta-embeddings outperform similar models that use a single type of word embedding. Facebook’s AI researchers have open-sourced this code.” - Drilling Down on Depth Sensing and Deep Learning
“This post explores two independent innovations and the potential for combining them in robotics. Two years before the AlexNet results on ImageNet were released in 2012, Microsoft rolled out the Kinect for the X-Box. This class of low-cost depth sensors emerged just as Deep Learning boosted Artificial Intelligence by accelerating performance of hyper-parametric function approximators leading to surprising advances in image classification, speech recognition, and language translation. Today, Deep Learning is also showing promise for end-to-end learning of playing video games and performing robotic manipulation tasks.”