Every two weeks, we find the most interesting data science links from around the web and collect them in Data Science Briefings, the DataMiningApps newsletter. Subscribe now for free if you want to be the first to get up to speed on interesting resources.
- Aided by machine learning, scientists find a novel antibiotic able to kill superbugs in mice
A study published Thursday in the journal Cell describes how researchers at the Massachusetts Institute of Technology used machine learning to identify a molecule that appears capable of countering some of the world’s most formidable pathogens. - The real test of an AI machine is when it can admit to not knowing something
Mark Zuckerberg and Brussels both have ideas on AI regulation, but it’s a Cambridge statistician who has produced something intelligible - The New Business of AI (and How It’s Different From Traditional Software)
“At a technical level, artificial intelligence seems to be the future of software. AI is showing remarkable progress on a range of difficult computer science problems, and the job of software developers – who now work with data as much as source code – is changing fundamentally in the process.” - AI could constantly scan the internet for data privacy violations, a quicker, easier way to enforce compliance
Novel technologies for machines to understand data privacy laws and enforce compliance with them using artificial intelligence. - Computer Vision Basics in Microsoft Excel (using just formulas)
“Computer Vision is often seen by software developers and others as a hard field to get into. In this article, we’ll learn Computer Vision from basics using sample algorithms implemented within Microsoft Excel, using a series of one-liner Excel formulas. We’ll use a surprise trick that helps us implement and visualize algorithms like Face Detection, Hough Transform, etc., within Excel, with no dependence on any script or a third-party plugin.” - AI algorithms intended to root out welfare fraud often end up punishing the poor instead
“As it turns out, a state review later determined that 93% of the fraud determinations were wrong.” - Evaluating Ray: Distributed Python for Massive Scalability
Ray is an open-source system for scaling Python applications from single machines to large clusters. - Holes in Bayesian Statistics
“Our main concern is to wrestle with the larger issues of incoherence in Bayesian data analysis.” - Fast Differentiable Sorting and Ranking
“The sorting operation is one of the most basic and commonly used building blocks in computer programming. In machine learning, it is commonly used for robust statistics. However, seen as a function, it is piecewise linear and as a result includes many kinks at which it is non-differentiable. In this paper, we propose the first differentiable sorting and ranking operators with O(nlogn) time and O(n) space complexity.” - AI is not just another technology project
Implementing AI technology requires a different approach altogether - Only AI can save us from a world of fakes (a world AI is also creating)
Deepfake video and audio. AI-generated texts, poetry and lyrics. Fake sites. Fake influencers. Fake news. Will life ever be real again? - New artificial intelligence algorithm better predicts corn yield
With some reports predicting the precision agriculture market will reach $12.9 billion by 2027, there is an increasing need to develop sophisticated data-analysis solutions that can guide management decisions in real time. - Pitch Detection with Convolutional Networks
“I wondered if you could build neural networks to classify pitches, intervals, and chords in recorded audio. Turns out the answer is yes. To all of them.” - Introducing Materialize: the Streaming Data Warehouse
“Today we’d like to introduce Materialize: the first Streaming Data Warehouse. It connects directly to your existing event-streaming infrastructure, and to the client, it walks and quacks like Postgres, so that familiar tooling can plug-and-play with it exactly as if they’re talking to an analytics-capable read-replica of an OLTP database.” - Suspicious discontinuities
“If you read any personal finance forums late last year, there’s a decent chance you ran across a question from someone who was desperately trying to lose money before the end of the year. There are a number of ways someone could do this; one commonly suggested scheme was to buy put options that were expected to expire worthless, allowing the buyer to (probably) take a loss.” - Disappearing-People – Person removal from complex backgrounds over time
Removing people from complex backgrounds in real time using TensorFlow.js in the web browser - We’ve Just Seen the First Use of Deepfakes in an Indian Election Campaign
While many of the popular deepfake videos are complete faceswaps, a subtler version is to alter only the lip movements from an original video to match the target audio - Apache Flink 1.10.0 Release Announcement
“Flink 1.10 marks the completion of the Blink integration, hardening streaming SQL and bringing mature batch processing to Flink with production-ready Hive integration and TPC-DS coverage.” - ZeRO & DeepSpeed: New system optimizations enable training models with over 100 billion parameters
“Microsoft is releasing an open-source library called DeepSpeed, which vastly advances large model training by improving scale, speed, cost, and usability, unlocking the ability to train 100-billion-parameter models. DeepSpeed is compatible with PyTorch. One piece of that library, called ZeRO, is a new parallelized optimizer that greatly reduces the resources needed for model and data parallelism while massively increasing the number of parameters that can be trained.” - HiPlot: High-dimensional interactive plots made easy
HiPlot is a lightweight interactive visualization tool to help AI researchers discover correlations and patterns in high-dimensional data. - ML-fairness-gym: A Tool for Exploring Long-Term Impacts of Machine Learning Systems
“In order to facilitate algorithmic development with this broader context, we have released ML-fairness-gym, a set of components for building simple simulations that explore potential long-run impacts of deploying machine learning-based decision systems in social environments” - 10 differentiable physical simulators built with Taichi differentiable programming (DiffTaichi, ICLR 2020)
Differentiable programming in Taichi allows you to optimize neural network controllers efficiently with brute-force gradient descent, instead of using reinforcement learning. - AutoFlip: An Open Source Framework for Intelligent Video Reframing
“We are happy to announce AutoFlip, an open source framework for intelligent video reframing” - These Lyrics Do Not Exist
Lyrics generated using Artificial Intelligence