Curriculum Vitae

David Adrián Cañones Castellano


Summary

I am a Data Scientist with more than five years of experience helping companies and institutions solving complex problems using data.

I have successfully completed projects ranging from predictive modeling to data pipelines designing for both enterprises and startups.

I have extensive experience using Python Data Science toolkit (pandas, scikit-learn, Tensorflow, Keras, PySpark, etc.) and working with large amounts of data (~GB, TB) in a daily basis, so I am also experienced with Big Data and distributed computing tools (Hadoop Ecosystem: Spark, Hive, Impala, etc.) as well as parallel computing ones (GPU accelerated computing).

In 2019 I started my own company with two different business lines:


Selected Experience

Director of Data Science & Partner, WhiteBoxᴹᴸ (10/2019 - Today):

Tasks:

  • Leading Data Science practice in the company
  • Advising our clients in AI adoption processes
  • Discovering and testing new algorithms and libraries and integrate them in our company stack

Achievements:

  • Helped dozens of companies, from enterprises to middle sized businesses and startups to get a competitive advantage using data science techniques
  • Managed to overcome technical and cultural challenges in our clients
  • Started a new company profitable from the beginning and made a dent in a market dominated by big players
  • Developed a new consulting approach based on transparency and a high quality standard

Partner, TheGurus (1/2019 - Today):

Tasks:

  • Evaluating investment opportunities
  • Developing AI solutions for our invested companies

Portfolio:

  • Moderator Guru: a text content moderation solution based on AI
  • El Tren Barato: a web scraping based search engine for high speed
    trains in Spain along with an alarm system and pricing forecasting tool

Lead Teacher of Data & Analytics, Ironhack (7/2019 - Today):

Data and Analytics Lead Teacher in one of the best Data Science bootcamps worldwide, combining both theoretical and technical stack to build the future generation of Data Scientists. Most important asset I offer to my students is my link and experience with real world projects as I don’t spend more than 25 percent of my time in academia.

Senior Data Scientist, Pragsis Bidoop (7/2018 - 10/2019):

Tasks:

  • Development of Machine Learning models using traditional (scikit learn, XGBoost, lightGBM) and Deep Learning (TensorFlow, Keras) frameworks
  • Scaling Machine Learning models from prototyping to production using distributed and parallel computing (Spark, Dask, Celery)
  • Orchestrating data pipelines using Apache Airflow
  • Leveraging Python toolkit for Data Science to extract valuable information from Data and explain it through visualizations and reports using among other tools pandas, NumPy, Matplotlib, Plotly, Bokeh, Seaborn, etc.
  • Development of Computer Vision solutions using Google Edge TPUs, Nvidia GPUs, Tensorflow and OpenCV

Achievements:

  • Improved power production forecasting error in 10 percentage points for a cluster of 13 wind farms (about 1GW total managed power) located in Washington, USA, resulting in important savings for our client
  • Developed a Reinforcement Learning algorithm for Amazon Web Services DeepRacer League and became member of the winning team (Gold, Silver and Copper positions)
  • Developed a live tracking system for detecting people using Computer Vision techniques optimized for low consumption hardware requirements (Google Edge TPU, Raspberry Pi)

Data Scientist, Kernel Analytics (10/2017 - 7/2018):

Tasks:

  • Development of Machine Learning models using traditional (scikit-learn) and Big Data (Spark MLlib) frameworks
  • Designing custom KPIs based on customers needs and data availability
  • Designing data pipelines able to ingest data from heterogeneous inputs into Hadoop Distributed File System (HDFS)
  • Orchestrating data pipeline executions using Apache Airflow
  • Extraction of insights from customers data and creation of meaningful visualizations using Matplotlib, ggplot2,
    Seaborn, Plotly and Bokeh
  • Creating interactive dashboards using Plotly Dash and Microsoft PowerBI

Achievements:

  • Developed a Customer Experience Management framework for a successful Mobile Operator. Designed pipeline from ingesting 3G/4G antennas data to creating a model to relate Customer Experience with Churn and Complaints, resulting into our client being able to monitor its mobile network infrastructure impact in Customer Experience
  • Developed a predictive model for a well known Mobile Operator able to predict users complaints based on consumption patterns and user personal profile, resulting into our client being able to automate part of support process

Data Scientist, Grupo Servinform (2/2015 - 10/2017):

Tasks:

  • Designing data pipelines able to automatically clean and ingest data from heterogeneous inputs into relational databases using pandas and NumPy
  • Developing Natural Language Processing models using NLTK
  • Developing a data product (web app) to make data exploration easier for pharmaceutical and healthcare researchers allowing them to establish complex relationships between data from different sources
  • Identifying potential automation opportunities internally and for our clients and validating technical feasibility

Achievements:

  • Developed a Natural Language Processing Model able to interpret user queries (written as natural language) and translate into queries to our database
  • Developed a framework able to automate parts of hand-made back-office processes and integrate seamlessly with human workers, resulting in a new business line for our company and the ability to tackle projects that otherwise would be discarded
  • Collaborated with my manager in the writing of a proposal for H2020 R&D program with an excellent result and public research funds granted to our company

Education

EOI Business School (9/2014 - 9/2015):

MBA, Corporate Finance

Universidad de Sevilla (9/2007 - 9/2014):

MS Industrial Engineering, Energy


Honors & Awards

AWS DeepRacer League Madrid, 3rd position (5/2019)

Machine Learning competition organized by AWS which consisted on developing a Reinforcement Learning model for an autonomous car. I got the 3rd position in Spanish competition and was member of the team that made the top 3 positions (Gold, Silver and Copper).


Courses & Certifications


Technical Stack

Section Technologies
Machine Learning scikit-learn, XGBoost, lightGBM, H2O, MLlib
Deep Learning TensorFlow, Keras, PyTorch
Computer Vision OpenCV
NLP NLTK, spaCy, Fasttext
In Memory pandas, NumPy, Apache Arrow
Relational PostgreSQL, Oracle
Big Data Apache Spark
Cloud Computing AWS
Orchestration Apache Airflow
Languages Python, R, SQL
Visualization Matplotlib, seaborn, Plotly, PowerBI