Every two weeks, we find the most interesting data science links from around the web and collect them in Data Science Briefings, the DataMiningApps newsletter. Subscribe now for free if you want to be the first to get up to speed on interesting resources.
- 98 personal data points that Facebook uses to target ads to you
“Say you’re scrolling through your Facebook Newsfeed and you encounter an ad so eerily well-suited, it seems someone has possibly read your brain. Whatever the subject, you’ve seen ads like this. You’ve wondered — maybe worried — how they found their way to you.”
- Interactive Data Visualization of Geospatial Data using D3.js, DC.js, Leaflet.js and Python
“The goal of this tutorial is to introduce the steps for building an interactive visualization of geospatial data. We will cover a wide range of technologies tutorial: Pandas for cleaning the data, Flask for building the server, Javascript libraries d3.js, dc.js and crossfilter.js for building the charts and Leaflet.js for building the map.”
- Introducing Time Series Analysis with dplyr
Interesting blog post again emphasizing the greatness of dplyr and friends (lubridate in this case).
- CrimeRadar is using machine learning to predict crime in Rio
The software carves the city into sectors of 250 square metres and predicts crimes based on time and place.
- Was there a problem with the Rio pool?
Barry Revzin has seemed to notice something very interesting about the pool in Rio. Fascinating article!
- What’s Next for Artificial Intelligence
The best minds in the business—Yann LeCun of Facebook, Luke Nosek of the Founders Fund, Nick Bostrom of Oxford University and Andrew Ng of Baidu—on what life will look like in the age of the machines.
- Forget Python vs. R: how they can work together
“A few weeks ago I had the opportunity to speak at SciPy about how we use both Python and R at Civis. Why go all the way to a Python conference to talk about R? Was I fanning the flames of yet another Python vs R language war? No! We happily work in both languages—not only for our daily work solving data science problems, but also in writing tools.”
- AI’s Language Problem
Machines that truly understand language would be incredibly useful. But we don’t know how to build them.
- A Deep Learning Approach to Fraud
Short blog post on the potential of deep learning in fraud analytics.
- Goods: organizing Google’s datasets
Great point: “You can (try and) build a data cathedral. Or you can build a data bazaar. By data cathedral I’m referring to a centralised Enterprise Data Management solution that everyone in the company buys into and pays homage to, making a pilgrimage to the EDM every time they want to publish or retrieve a dataset. A data bazaar on the other hand abandons premeditated centralised control.”
- Seven ways to be data-driven off a cliff
Quick fun post on how data can “drive you off a cliff.”
- Rodeo for Windows is here!
From yhat: “Earlier this summer we released version 2.0 of our Python IDE, Rodeo, for Mac & Linux. After lots of code rewriting and TLC, we are excited to finally officially support Windows with version 2.1.”
- Deep Deterministic Policy Gradients in TensorFlow
Google DeepMind has devised a solid algorithm for tackling the continuous action space problem. Building off the prior work of on Deterministic Policy Gradients, they have produced a policy-gradient actor-critic algorithm called Deep Deterministic Policy Gradients (DDPG) that is off-policy and model-free, and that uses some of the deep learning tricks that were introduced along with Deep Q-Networks.
- Machine Learning Exercises In Python
This post is part of a series covering the exercises from Andrew Ng’s machine learning class on Coursera.
- Apple acquires Turi
Machine learning and artificial intelligence startup Turi has been acquired by Apple in a deal characterized as a blockbuster exit for the Seattle-based company, formerly known as Dato and GraphLab, GeekWire has learned.
- Facebook V: Predicting Check Ins
Fellow Belgian Tom Van de Wiele won the Kaggle competition, congrats!
- Gaussian Processes for Dummies
A very basic intro to Gaussian Processes.
- Building a Data Pipeline with Airflow
“In this blog post I’ll setup a data pipeline that takes currency exchange rates, stores them in PostgreSQL and then caches the latest exchange rates in Redis.”
- Using Machine Learning for Network Intrusion Detection (presentation PDF)
An older presentation (from 2010) already hints towards the possibilities of machine learning for intrusion detection.