eRum 2018

May 14-16
Budapest, Hungary

Stay informed!

Not on Twitter or want to get notified in a more traditional way? No problem! Simply sign-up to the low volume eRum 2018 mailing list:

The European R Users Meeting, eRum, is an international conference that aims at integrating users of the R language living in Europe. Although the useR! conference series also serve similar goals, but as it's alternating between Europe and USA (and more recently Australia in 2018), we decided to start another conference series in the years when the useR! is outside of Europe.

The first eRum conference was held in 2016 in Poznan, Poland with around 250 attendees and 20 sessions spanning over 3 days, including more than 80 speakers. Around that time, we also held another, although shorter conference in Budapest: the first satRday event happened with 25 speakers and almost 200 attendees from 19 countries. The eRum 2018 conference brings together the heritage of these two successful events: planning for 400-500 attendees from all around Europe at this 1+2 days international R conference.

The local Organizing Committee is lead by Gergely Daroczi, who chaired the Budapest satRday event as well. Just like the satRday series, eRum is also a nonprofit conference and driven by enthusiasm for open-source, R and the related community, so we make no financial gains and we do not get paid at all for working on this event. If you want to get in touch, please feel free to e-mail us.

The Program Committee (including members from the local R User Group, eRum 2016, R Forwards, R Ladies and other European R User Groups) guarantees to bring you a fantastic lineup of speakers and a quality program:

  • Adolfo Alvarez, Poland
  • Ágnes Salánki, Hungary
  • Andrew Lowe, Hungary
  • Bence Arató, Hungary
  • Branko Kovač, Serbia
  • Eszter Windhager-Pokol, Hungary
  • Gergely Daróczi, Hungary
  • Heather Turner, UK
  • Kevin O'Brien, Ireland
  • Imre Kocsis, Hungary
  • László Gönczy, Hungary
  • Maciej Beresewicz, Poland
  • Mariachiara Fortuna, Italy
  • Przemyslaw Biecek, Poland
  • Szilárd Pafka, USA

As we try to make this nonprofit event as affordable as possible for the attendees, yet keeping all the venue, catering and program quality extremely high (eg look at our venues and expect exciting keynotes and invited talks), so thus we heavily rely on our generous sponsors contributing to the success of this event and make this happen. Although we already got a good number of great offers, but sponsorship opportunities are still available (starting from $1,000) -- if interested, please get in touch!

We thank all our generous sponsors for supporting this conference -- their financial help and great commitment to the R community is highly appreciated and was essential to bring this event to life! Please find below the list of our partners per sponsorship level, and we kindly ask you to visit their homepages to get some quick insights on what are they doing and how they use R:

Platinum

Gold

Silver

Bronze

Please find below the most important milestones of the conference based on the prelimenary schedule:

EventDate
Workshop Ideas Submission Deadline Jan 14, 2018
Abstract Submissions Deadline Feb 25, 2018
Notification of Acceptance March 14, 2018
Early-Bird Registration Deadline March 15, 2018
Registration Deadline April 29, 2018
Workshops May 14, 2018
Conference May 15-16, 2018

To minimize the financial barriers of attending this nonprofit conference and thanks to the generous contributions from our sponsors, we decided to keep the registration fees as low as possible and supposed to be affordable to even students and other interested parties paying for the registration on their own:

StudentAcademicIndustry
Early bird registration
Between Jan 1 and Mar 15
15,000 HUF
(~50 EUR)
30,000 HUF
(~100 EUR)
60,000 HUF
(~200 EUR)
Standard registration
Between Mar 16 and Apr 29
25,000 HUF
(~80 EUR)
45,000 HUF
(~150 EUR)
85,000 HUF
(~275 EUR)
Late and on-site registration Not available.

We also offer "Supporter" level tickets for around 30% extra over the "Industry" ticket prices to express your support for this event and the R community by making it possible for others without the requred financials to attend the event. We suggest this option for freelancers and smaller companies without a budget to become an official sponsor of the conference. Note, that the "Supporter" level tickets provide the exact same features as any other ticket, the only extra feature is a special-colored badge to highlight your generous contribution.

Not sure which conference ticket to buy?

  • We decided to keep the student ticket fees as low as possible (not even covering our catering and related expenses -- so heavily relying on our sponsors's generous contributions) and affordable for students even without any salary. Pick this option if you are an actual full-time student without income letting you pick a more expensive ticket type
  • R users working in academia (eg higher education or research institutes etc) should pick the Academic option, which is still very reasonably priced and planned to be affordable for most European researchers, professors and PhD candidates -- hopefully covered by your institute
  • Everyone else with a full-time and paid job (or equivalent, eg freelancing) should pick the Industry option (which is still pretty affordable compared to past years' useR! and other conference ticket prices) -- hopefully sponsored or reimbursed by your employer
Buy Conference Ticket(s)

Popup window not working? Please click here to open the form in a new tab.

Registering for the event and purchasing a ticket entitles you to attend 2 half-day or a full-day workshop on May 14, all conference talks on May 15-16 -- including coffee breaks and lunch on all 3 days with no hidden costs. 27% Hungarian VAT included and we provide electronic invoices on all purchases in a few weeks after payment cleared. You can pay by PayPal (including easy payment options with credit/debit card and wire transfer), but please get in touch if you need any special assistance with the payment, invoice etc.

Why would you wait any longer? Register for the event today -- the number of available spots are limited!

Why should you consider giving a talk?

  • there's a fantastic R community in Hungary and an expected 400-500 R users from all around Europe looking forward to attending your talk or poster,
  • get quick feedback on your proposal -- send an e-mail to the Program Committee with your questions and we are happy to help you improve your abstract(s) and will also notify everyone before the end of March so that you can plan ahead,
  • we might provide financial support to reimburse registration and/or travel & accomodation expenses,
  • and most importantly: this is pretty unique opportunity to give a talk focusing on R to a larger crowd in Europe in 2018 -- especially in a club under a pool :)

Please feel free to submit one or more proposal(s) in English on the below URL with the following presentation formats:

  • Workshop (3-6 hours): Tutorial for 10-50 (or more) persons on a beginner or advanced R topic
  • Regular talk: Abstracts accepted for talks will take place during oral sessions. Each talk is allowed 20 minutes for the presentation including questions and answers.
  • Lightning talk (5min): A variation of the pecha kucha and ignite formats that we love at the useR! confereces: 15 slides shown for 20-20 seconds.
  • Poster: Abstracts accepted for posters will take place during an afternoon poster session, which is a social event. There are no parallel talks or events happening, so everyone can talk and stop by posters they are interested in. The dimensions of each poster should not exceed 4' x 4' or 120cm x 120cm.
Abstract Submission Form

The already confirmed keynote and invited speakers have been announced! We also received a very good number of high quality submissions on half-day and full-day workshop ideas, so a very promising tutorial day is to be expected! The Call for Papers for regular talks and posters is still open, so if you are interested in giving a talk, head to our CFP section. Also, make sure to follow us to get notified about the most recent news as soon as possible!

We are extremely happy to announce that five fantastic keynote speakers confirmed their attendance to the conference:

Achim Zeileis
Professor of Statistics
at Universität Innsbruck (AT)

Being an R user since version 0.64.0, Achim is co-author of a variety of CRAN packages such as zoo, colorspace, party(kit), sandwich, or exams.

He is a Professor of Statistics at the Faculty of Economics and Statistics at Universität Innsbruck.

In the R community he is active as an ordinary member of the R Foundation, co-creator of the useR! conference series, and co-editor-in-chief of the open-access Journal of Statistical Software.

Martin Mächler
Senior Scientist in Statistics
at ETH Zurich (CH)

Martin is a Mathematician (Ph.D. ETH Z) and Statistician, Lecturer and Senior Scientist at Seminar für Statistik, ETH Zurich, R Core member, Secretary General of the R Foundation.

Authored more than 20 R packages (such as Matrix, cluster, robustbase, cobs, VLMC, bitops or copula).

Emacs ESS Core Developer since 1997 and Project Leader since 2004, author of several books and over 50 scientific journal articles.

Nathalie Villa-Vialaneix
Researcher
at INRA (FR)

Nathalie is a researcher at the French National Institute for Agronomical Research (INRA) in the Unit of Applied Mathematics and Computer Sciences in Toulouse.

She is the maintainer of the SOMbrero, SISIR and RNAseqNet R packages and author of a number of others.

She received her PhD in Mathematics from the University Toulouse 2 (Le Mirail), in 2005. She is a board member of the biostatistics platform in Toulouse and a former board member of the French Statistical Association (SFdS).

Stefano Maria Iacus
Professor of Statistics
at University of Milan (IT)

Stefano is a full professor in Statistics, former R Core Team member (1999-2014) and maintainer of several R packages e.g: (sde, cem, rrp and opefimor)

Founder and president of Voices from the Blogs running sentiment analysis and text mining projects

Author of several scientific books, book chapters and journal articles.

Roger Bivand
Professor
at Norwegian School of Economics (NO)

Roger received his PhD in geography from the London School of Economics and post-doctoral degree from Adam Mickiewicz University.

His current research interests are in developing open source software for analysing spatial data. He has been active in the R community since 1997.

He is an auditor of the R Foundation, editor of the Journal of Statistical Software, Journal of Geographical Systems, Geographical Analysis and Norsk Geografisk Tidsskrift; and Editor-in-Chief of the R Journal.

We are also very excited to share the news on our invited speakers who already confirmed their attendance:

Arthur Charpentier
Professor
at Univ. de Rennes (FR)

Professor at the faculty of Economics at Universite de Rennes, in France.

Editor of 'Computational Actuarial Science with R' (CRC Press, 2014) and of the blog https://freakonometrics.hypotheses.org/.

Barbara Borges Ribeiro
Software Engineer
at RStudio (US)

Barbara is a software engineer at RStudio working primarily in the Shiny package.

She holds a double major in Statistics and Computer Science from Macalester College.

After four freezing Minnesota winters, she is back in her warm homeland of Portugal (but to the disappointment of many, she’s not a soccer fan).

Claudia Vitolo
Scientist
at ECMWF (UK)

I'm an Italian scientist working for the European Centre for Medium-range Weather Forecasts (ECMWF) on forecasting natural hazards.

In the past, I worked on air pollution modelling and the development of web services for environmental monitoring. I'm passionate about reproducible research and open source projects, and addicted to the R statistical language.

I'm a member of the R-Ladies leadership team, a world-wide organization to promote gender diversity in the R community.

Colin Gillespie
Senior Lecturer
at Newcastle University (UK)

Colin Gillespie is Senior lecturer (Associate professor) at Newcastle University, UK.

He has been running R courses (www.jumpingrivers.com) for over eight years at a variety of levels, ranging from beginners to advanced programming.

He is co-author of the recent book: Efficient R programming.

Erin LeDell
Chief ML Scientist
at H2O.ai (US)

Erin LeDell is the Chief Machine Learning Scientist at H2O.ai, an artificial intelligence company in Mountain View, California, USA, where she works on developing H2O, an open source library for scalable machine learning.

Before joining H2O.ai, she was the Principal Data Scientist at Wise.io and Marvin Mobile Security, and the founder of DataScientific, Inc.

Erin received her Ph.D. in Biostatistics from University of California, Berkeley and has a B.S. and M.A. in Mathematics.

Gábor Csárdi
Software Engineer
at RStudio (UK)

Gábor has been writing R tools for more than 10 years, and is the co-author of the igraph R package, and the author of several others.

He is the main architect of the Metacran services and r-hub, the first major project of the R Consortium. As of Apr, 2017 he joined Hadley Wickham's team at RStudio. Gábor received his Ph.D. in Computer Science from ELTE, Budapest in 2010.

Henrik Bengtsson
Associate Professor
at Univ. of California (US)

Henrik Bengtsson has a background in Computer Science (MSc) and Mathematical Statistics (PhD) and is an Associate Professor at the UCSF Department of Epidemiology and Biostatistics.

He has extensive experience in applied statistics, computational genomics, and large-scale processing. He has worked with R since 2000 and since contributed 30+ packages to CRAN and Bioconductor.

Jeroen Ooms
Postdoctoral researcher
at rOpenSci (US)

Jeroen graduated in 2014 at the UCLA department of statistics and is now a post doctoral researcher at UC Berkeley with the rOpenSci group.

His official job description involves development of algorithms and software to enable processing, security and archiving of research data to facilitate data-driven open science. In practice he writes R packages that do cool and important stuff.

Some popular ones are opencpu, jsonlite, curl, V8, openssl, mongolite, commonmark, pdftools and hunspell.

Recently he developed an interest in cryptography and the decentralized web.

Mark van der Loo
Methodologist
at Statistics Netherlands

Mark van der Loo works as a consultant and researcher at the department of statistical methods of Statistics Netherlands. He has (co)authored and published several R packages related to data cleaning, including 'validate', 'dcmodify', 'errorlocate', 'extremevalues', and 'stringdist'.

Mark is coauthor of the book 'Statistical Data Cleaning with Applications in R' published by Wiley, Inc (2018).

Matthias Templ
Senior Lecturer
at ZHAW (CH)

Matthias Templ is lecturer at the Zurich University of Applied Sciences, Switzerland. His research interest includes imputation, statistical disclosure control, compositional data analysis and computational statistics.

He published two books and more than 45 papers. Additionally, he is the author of several R packages.

In addition, Matthias Templ is the editor-in-chief of the Austrian Journal of Statistics. With two of his colleagues he owns and founded the company data-analysis OG.

Olga Mierzwa-Sulima
Senior Data Scientist
at Appsilon (PL)

Olga is a senior data scientist at Appsilon Data Science and a co-founder of datahero.tech. She leads a team of data scientists and build data science predictive/explanatory solutions and deploy them in production, usually wrapped in a Shiny App UI.

She develops Appsilon’s open-source R packages. Olga holds a MSc degree in Econometrics from the University of Rotterdam.

She co-organizes the largest meetup of R users in Poland and is a co-founder of R-Ladies Warsaw chapter.

Przemyslaw Biecek
Data Scientist
at Warsaw University (PL)

Data Scientist with background in both mathematical statistics and software engineering.

Research activities are mainly focused on high-throughput genetic profiling in oncology.

Also interested in evidence based education, evidence based medicine, general machine learning modeling and statistical software engineering.

An R enthusiast: three books, dozen packages, lots of talks, classes and workshops.

Szilard Pafka
Chief Scientist
at Epoch (US)

Szilard has a PhD in Physics for using statistical methods to analyze the risk of financial portfolios.

For the last decade he's been the Chief Scientist of a tech company in California doing everything data (analysis, modeling, data visualization, machine learning etc).

He is the founder of the LA R meetup, the author of a machine learning benchmark on github (1000+ stars), a frequent speaker at conferences, and he has taught graduate machine learning courses at two universities (UCLA, CEU).

Aimee Gott
Sr Data Science Consultant
at Mango Solutions (UK)

Aimee, Douglas and Mark work in the data science team at Mango Solutions.

Aimee is the lead trainer at Mango and has taught courses across all aspects of data science with a particular focus on R.

Anne Helby Petersen
Research Assistant
at Univ. of Copenhagen (DK)

Anne Helby Petersen holds a MS in statistics and is the primary author of the dataMaid R package.

She is experienced in communicating statistical topics to a wide audience as a teaching assistant at the University of Copenhagen.

Benjamin Ortiz Ulloa
Data Visualization Engineer
at VT-ARC (US)
A data visualization engineer who is very passionate about graph data structures.
Claus Ekstrøm
Professor
at Univ. of Copenhagen (DK)

Claus Thorn Ekstrøm is the creator/contributor to several R packages (dataMaid, MESS, MethComp, SuperRanker) and is the author of "The R Primer" book.

He has previously given tutorials on Dynamic and interactive graphics and the role of interactive graphics in teaching.

Colin Fay
Data Analyst and R Trainer
at ThinkR (FR)

Colin Fay is Data Analyst, R trainer and Social Media Expert at ThinkR, a French agency focused on everything R-related.

Colin is a prolific open source developer, author of more than 12 R packages actively maintained on GitHub (6 of them being on CRAN): attempt, proustr, tidystringdist...

He also contributes to several other packages.

He is a member of the RWeekly team, a collaborative news bulletin about R, and the cofounder of the Breizh Data Club, an association of French data professionals.

Colin Gillespie
Senior Lecturer
at Newcastle University (UK)

Colin Gillespie is Senior lecturer (Associate professor) at Newcastle University, UK.

He has been running R courses (www.jumpingrivers.com) for over eight years at a variety of levels, ranging from beginners to advanced programming.

He is co-author of the recent book: Efficient R programming.

Douglas Ashton
Principal Data Scientist
at Mango Solutions (UK)

Aimee, Douglas and Mark work in the data science team at Mango Solutions.

Douglas is a principal consultant and specialises in machine learning/deep learning, working with customers to embed these techniques in their analytic workflows.

Grace Meyer
Techincal Adviser
at Oxera (UK)

Grace Meyer is a commercial analytics expert with proven success in advising business strategy based on data driven insights.

At Oxera, she applies machine learning to strategic projects and leads the data science and programming team.

Grace & Kasia are both mentors in R-Ladies London.

Heather Turner
Statistical/R Consultant
at Freelance (UK)

Over ten years experience providing statistical and programming support to investigators in statistics, social science, drug discovery, bioinformatics and agriculture.

Specialties: statistical programming, R programming, non-clinical statistics, biostatistics, statistical modelling

Heather and Isabella are core team members of the R Foundation Forwards taskforce for women and under-represented groups.

Ildiko Czeller
Data Scientist
at Emarsys (HU)

Ildi Czeller is a mathematician who has worked as a data scientist at Emarsys in Budapest for almost 3 years now.

She writes code mainly in R using the ggplot2, shiny, data.table, purrr and rmarkdown packages.

She has a major role in developing an in-house R package ecosystem of 5+ packages.

Isabella Gollini
Assistant Professor, Stats
at Univ. College Dublin (IE)

Dr Isabella Gollini is an Assistant Professor in Statistics at University College Dublin, Ireland.

She is the author and contributor of three R packages: tailloss, GWmodel, lvm4net.

Isabella and Heather are core team members of the R Foundation Forwards taskforce for women and under-represented groups.

Jakub Nowosad
Postdoc
at Univ. of Cincinnati (US)

Jakub Nowosad is a postdoc in the Space Informatics Lab at University of Cincinnati.

He is co-author of the spData package and the author of the rcartocolor, pollen and rgeopat2 packages.

Jannes Muenchow
Postdoc at
Friedrich Schiller Univ. (DE)

The speaker:

  • has a special interest in and passion for predictive mapping of landslide susceptibility and biodiversity (using statistical and machine learning models).
  • worked as a geo-data scientist for a location analyst consulting company.
  • is the creator and maintainer of the R package RQGIS and a co-author of the forthcoming book "Geocomputation with R".
János Divényi
Lead Data Scientist
at Emarsys (HU)

János Divényi is a PhD candidate in economics at the Central European University (CEU) who works as lead data scientist at Emarsys in Budapest.

He writes code in R (and Python), likes to think carefully about causality, and seeks intuitive understanding of complicated stuff.

He is an occasional speaker of the local R meetup, and has more than 5 years' experience of teaching from various institutions (CEU, BME, MCC).

Jenő Pál
Data Scientist
at Emarsys (HU)

Jenő Pál is an economist holding a PhD from the Central European University (CEU).

He works as a data scientist at Emarsys in Budapest and also occasionally teaches at CEU.

He has done empirical research on online news and earlier on numerical dynamic programming.

He is an enthusiastic R user and he also likes working with Python.

Jo-fai Chow
Data Scientist
at H2O.ai (UK)

Jo-fai (or Joe) is a data scientist at H2O.ai. Before joining H2O, he was in the business intelligence team at Virgin Media in UK where he developed data products to enable quick and smart business decisions.

He also worked remotely for Domino Data Lab in the US as a data science evangelist promoting products via blogging and giving talks at meetups.

Kasia Kulma
Data Scientist
at Aviva (UK)

Dr. Kasia Kulma is a Data Scientist at Aviva with experience in building recommender systems, customer segmentations, predictive models and is now leading an NLP project.

She is the author of the blog R-tastic.

Kasia & Grace are both mentors in R-Ladies London.

Mark Sellors
Head of Data Engineering
at Mango Solutions (UK)

Aimee, Douglas and Mark work in the data science team at Mango Solutions.

Mark is head of data engineering and works with Mango customers to set up their infrastructure ready for advanced analytics techniques.

He is the author of the 'Field Guide to the R Ecosystem'.

Martijn Tennekes
Data Scientist
at Statistics Netherlands

Martijn Tennekes has a PhD in game theory, and has been working at Statistics Netherlands for eight years on data visualization, big data, and R.

He authored three data visualization packages (treemap, tabplot, and tmap).

Mateusz Staniak
Math/Big Data Msc Student
at University of Wrocław (PL)

Interests and experience in theoretical and applied Mathematics and Statistics. Huge enthusiast of mathematical modelling - both probabilistic and statistical.

Fan of R programming interested in creating tools for data analysis.

Currently associated with MI^2 Warsaw group and working on machine learning interpretability.

Przemyslaw Biecek
Data Scientist
at Warsaw University (PL)

Data Scientist with background in both mathematical statistics and software engineering.

Research activities are mainly focused on high-throughput genetic profiling in oncology.

Also interested in evidence based education, evidence based medicine, general machine learning modeling and statistical software engineering.

An R enthusiast: three books, dozen packages, lots of talks, classes and workshops.

Robin Lovelace
Researcher
at University of Leeds (UK)

I’m a geographer and environmental scientist specialising in spatial data analysis and computing, especially R programming.

Based at the Leeds Institute for Transport Studies (ITS) I have a strong interest in transport modelling. This is described in a review of a book on the subject by Boyce and Williams (2015).

There is no separate extra fee for attending workshops, although we kindly ask you to register for the workshop day when purchasing your conference ticket, so that we can plan the required capacity. The list of planned 2x1.5 hours long workshops/tutorials:

Packages & Performance

TitlePresenter(s)
Building a package that lasts Colin Fay

Overview

You’ve got the killer idea for an R package, but you’re not sure where to start? Then you’ve come to the right place!

During this workshop, we’ll go through the whole process of building a package that lasts. In other words, we’ll review the best practices to create a package that works (obviously), but more importantly a package that is extensively tested to prevent bugs, that will be easier to maintain on the long run, and that will be fully documented.

At the end of this workshop, the attendees will have a road map for writing robust packages, designed for industrial use and/or for CRAN.

Plan of the workshop

  • Package basics : understand and organise your package structure
  • Best practices for writing functions in packages (optimised for speed and for maintenance)
  • Using tests : why you should write tests for your package, and how to do it.
  • Documentation : Using roxygen to document your functions, create a vignette to explain what your package does, and enhance your documentation with pkgdown.
  • Continuous integration and code coverage with GitHub, Travis and Codecov

Required packages

  • devtools
  • usethis
  • attempt
  • testthat

Required skills of participants

  • Basic knowledge of R
  • Functions
  • Basic Markdown

Required work to do before workshop

  • Install a recent version of R and RStudio on your laptop
  • Participants can come with a series of functions they want to put in a package, or use the examples that will be provided by the speaker during the workshop.

Clean R code - how to write it and what will the benefits be

Ildiko Czeller

Jenő Pál

Clean R code - how to write it and what will the benefits be

Goals

By the end of the tutorial participants should be able to:
  • recognize code improvement possibilities,
  • refactor analysis code by extracting some parts into functions,
  • reason whether a piece of code could be regarded as clean,
  • understand the benefits of applying clean code principles.

Detailed Outline

During the tutorial the participants will perform a guided data analysis task and make several refactoring steps as we progress. They will immediately experience the benefit of applying the shown techniques. The tutorial will cover launguage-agnostic and R-specific topics as well.

Some of the language-agnostic topics covered:
  • extract code into functions
  • organize code into files and folders
  • organize functions within an R file
  • how to choose variable and function names
  • single responsibility principle
  • what is a pure function
Some R-specific topics covered:
  • how to extract some ggplot2 layers into functions
  • how to create functions operating on different columns of a data frame
  • parametrized rmarkdown documents
  • useful RStudio keyboard shortcuts

Pre-requisites

Participants should be able to create simple R functions and use R for data analysis. We believe the tutorial is useful for beginners as well as for more experienced R programmers.

Justification

Most R users do not have a formal background in software engineering, however, coding is a significant part of their job. We believe every R user can benefit from writing cleaner and simpler code which also makes it more reusable and less error-prone. Writing clean code also helps with reproducibility. Refactoring early and often makes the life of your future self and your collaborators as well as others wishing to understand your code easier.

Efficient R programming Colin Gillespie

Detailed Outline

This tutorial will cover a variety of techniques that will increase the productivity of anyone using R. Topics include optimizing your set-up, tips for increasing code performance and ways to avoid memory issues. Ranging from guidance on the use of RStudio to ensure an efficient workflow to leveraging C++, this tutorial provides practical advice suitable for people from a wide range backgrounds.

An overview of the topics covered are:
  • Efficient set-up: the .Rprofile and .Renviron files, the importance of a good IDE, and switching BLAS libraries.
  • Efficient hardware: assessing your computer hardware with the benchmarkme package.
  • Efficient collaboration: coding guidelines and the importance of version control.
  • Efficient programming: common R data types, good programming techniques, parallel computing and the byte compiler.
  • Efficient learning: practical suggestions for improving your general R knowledge.
  • Efficient C++ programming: A brief introduction to Rcpp.

Pre-requisites

Participants should be familiar with for loops, if statements and writing simple functions.

Justification

R is now used in many disparate settings. However, the majority of R programmers have no formal computer training, and instead have "learned on the job". This tutorial aims to fill in some of these gaps.

Forwards Package Development Workshop for Women

Isabella Gollini

Heather Turner

Overview

An analysis of CRAN maintainers in 2016 estimated that 11.4% were women (http://forwards.github.io/data/). This proportion is much lower than the proportion of women that attended the R conference useR! in the same year (28%). In addition a survey of the participants at that useR! conference found that women were less likely than men to have experience of contributing to or writing packages (http://forwards.github.io/blog/2017/03/11/users-relationship-with-r/. Also women were less likely to use R recreationally, so perhaps have less opportunity to develop package development skills.

This workshop is designed to address this skills gap. It is for women who have done some R coding and are ready to take the next step in providing it to others to use.

During the tutorial participants will learn how to
  • make code into an R package,
  • do collaborative coding with GitHub,
  • write a vignette or an article,
  • build a package web page,
  • submit a package to CRAN.
Participants can bring their own code that they wish to make into a package, or work with our example.

R packages: knitr, devtools, pkgdown, rmarkdown, roxygen2, testthat
IDE: RStudio

Thanks to funding from the R Consortium, we are able to offer a limited number of scholarships to attend the Package Development Workshop. Please apply by filling in the application form before Feb 25, 2018.

The beauty of data manipulation with data.table János Divényi

Goals

The data.table package is a powerful tool for manipulating data, especially if the underlying data set gets large (~ above 1 GB). In spite of its clear advantages the package is underused. Many R users are afraid of it because of its "ugly" syntax. This workshop aims to dismantle this belief by showing the beauty in the package logic, and illustrating its strengths in performance.

By the end of the tutorial participants should be able to:

  • use data.table for common data manipulation tasks (data input/output, aggregation, reshaping, joins)
  • build on their understanding of data.table syntax to solve more complicated tasks
  • compare the performance of the package to other widely used alternatives

Method

The workshop would start with a short introduction into the logic and syntax of the data.table package. The participants would discover the power and beauty of the package through guided data manipulation tasks borrowed mainly from Emarsys use cases.

Pre-requisites

Participants should be able to solve data manipulation tasks in R (using base R or the hadleyverse/tidyverse). No knowledge of the data.table package is required.

Machine Learning

TitlePresenter(s)
Automatic and Interpretable Machine Learning in R with H2O and LIME Jo-fai Chow

Overview

General Data Protection Regulation (GDPR) is just around the corner. The regulation will become enforceable a week after the eRum conference (from 25 May 2018). Are you and your organization ready to explain your models?

This is a hands-on tutorial for R beginners. I will demonstrate the use of two R packages, h2o & LIME, for automatic and interpretable machine learning. Participants will be able to follow and build regression and classification models quickly with H2O's AutoML. They will then be able to explain the model outcomes with a framework called Local Interpretable Model-Agnostic Explanations (LIME).

References:

Building an Interpretable NLP model to classify tweets

Grace Meyer

Kasia Kulma

Overview

Unstructured text data is rapidly increasing in volume and variety and with advances in Machine Learning it’s becoming available to be tapped for insights and patterns. One of the use-cases of predictive modelling in text analytics would be to classify an author based on text alone. However, even the most accurate model may be difficult to interpret and therefore understand how reliable it is or whether it produces insights that can be generalized. One of the solutions here is to apply the Local Interpretable Model-agnostic Explanations (LIME) framework to the classifiers to generate interpretable explanations.

In this workshop, we will take you step-by-step through tidytext principles and text-analytics pipeline to create a predictive model classifying tweets by Clinton or Trump. We will go over the data collection, exploration, feature engineering and model building. Finally, we will apply the LIME framework to better understand and interpret what drives model predictions.

R packages: readr, dplyr, tm, tidytext, text2vec, caret, xgboost, lime

Data used: Clinton-Trump-tweets

DALEX: Descriptive mAchine Learning EXplanations . Tools for exploration, validation and explanation of complex machine learning models

Przemyslaw Biecek

Mateusz Staniak

Overview

Complex machine learning models are frequently used in predictive modelling. There are a lot of examples for random forest like or boosting like models in medicine, finance, agriculture etc.

In this workshop we will show why and how one would analyse the structure of the black-box model.

This will be a hands-on workshop with four parts. In each part there will be a short lecture (around 20 minutes) and then time for practice and discussion (around 20 min).

Introduction

Here we will show what problems may arise from blind application of black-box models. Also we will show situations in which the understanding of a model structure leads to model improvements, model stability and larger trust in the model.

During the hands-on part we will fit few complex models (like xgboost, randomForest) with the mlr package and discuss basic diagnostic tools for these models.

Conditional Explainers

In this part we will introduce techniques for understanding of marginal/conditional response of a model given a one- two- variables. We will cover PDP (Partial Dependence Plots) and ICE (Individual Conditional Expectations) packages for continuous variables and MPP (Merging Path Plot from factorMerger package) for categorical variables.

Local Explainers

In this part we will introduce techniques that explain key factors that drive single model predictions. This covers Break Down plots for linear models (lm / glm) and tree-based models (randomForestExplainer, xgboostExplainer) along with model agnostic approaches implemented in the live package (an extension of the LIME method).

Global Explainers

In this part we will introduce tools for global analysis of the black-box model, like variable importance plots, interaction importance plots and tools for model diagnostic.

Packages

  • mlr (Bernd Bischl and others)
  • live (Staniak Mateusz, and Przemysław Biecek)
  • FactorMerger(Sitko Agnieszka, and Przemyslaw Biecek)
  • pdp (Greenwell, Brandon)
  • ALEPlot (Apley, Dan)

Deep Learning with Keras for R

Aimee Gott

Douglas Ashton

Mark Sellors

Overview

If you don’t work at one of the big tech giants, then deep learning may seem out of reach. Fortunately, in recent years the barrier to entry has dropped dramatically. Libraries, such as TensorFlow, have made it much easier to implement the low level linear algebra. While Keras builds on this to provide a high level Python API specifically for building neural networks. A single data scientist can now quickly build a deep network, layer by layer, without losing time on implementation details. You don’t need terabytes of data and GPU clusters to get started; even relatively small problems can now benefit from deep learning.

A key aim of Keras is to reduce the time from idea to implementation. Many data scientists choose to use the R language for its first class statistics functionality, powerful data manipulation, and vibrant community. While Python is also a fantastic choice for data science, learning it is a significant investment when what you really want to be doing is trying out your idea. For this reason RStudio created the R Interface to Keras. This allows an R user to quickly experiment with neural networks to see if they are right for their problem.

In this workshop we will get you up and running with Keras for R. We will cover some theoretical background but the focus is on implementation. We will demonstrate how to setup different types of neural network to solve simple problems with time series, and give you the opportunity to build your own with guided exercises.

A cloud based RStudio Server environment will be provided so attendees only require a laptop with internet access and a modern browser. Basic R knowledge is required, and it will help if attendees are familiar with packages such as dplyr for data manipulation.

Spatial Data

TitlePresenter(s)
Geocomputation with R

Jannes Muenchow

Jakub Nowosad

Robin Lovelace

Geocomputation with R

Geographic data is special and has become ubiquitous. Hence, we need computational power, software and related tools to handle and extract the most interesting patterns of this ever-increasing amount of (geo-)data.

This workshop gives an introduction how to do so using R. It will introduce the audience how the two most important spatial data models - vector and raster - are implemented in R. The workshop will also give an introduction to spatial data visualization.

Maps are a compelling way to display complex data in a beautiful way while allowing first inferences about spatial relationships and patterns.

Additionally, we will bridge R with Geographic Information Systems (GIS), i.e., we show how to combine the best of two worlds: the geoprocessing power of a GIS and the (geo-)statistical data science power of R.

We will do so with a use case presenting spatial and predictive modeling.

Learning objectives

By the end of this workshop, the participants should:
  • know how to handle the two spatial data models (vector and raster) in R.
  • import/export different geographic data formats.
  • know the importance of coordinate reference systems.
  • be able to visualize geographic data in a compelling fashion.
  • know about geospatial software interfaces and how they are integrated with R (GEOS, GDAL, QGIS, GRASS, SAGA).
  • know about the specific challenges when modeling geographic data.

Tutorial content

  • The R spatial ecosystem
  • Vector data model: simple features (sf)
  • Raster data model (raster)
  • Geographic data visualization (ggplot2, mapview, tmap)
  • Bridges to GIS (RQGIS, RSAGA, rgrass7)
  • Spatial modeling case study

Plotting spatial data in R Martijn Tennekes

Overview

In this workshop you will learn how to plot spatial data in R by using the tmap package. This package is an implementation of the grammar of graphics for thematic maps, and resembles the syntax of ggplot2. This package is useful for both exploration and publication of spatial data, and offers both static and interactive plotting.

For those of you who are unfamiliar with spatial data in R, we will briefly introduce the fundamental packages for spatial data, which are sf, sp, and raster. With demonstrations and exercises, you will learn how to process spatial objects from various types (polygons, points, lines, rasters, and simple features), and how to plot them. Feel free to bring your own spatial data.

Besides plotting spatial data, we will also discuss the possibilities of publication. Maps created with tmap can be exported as static images, html files, but they can also be embedded in rmarkdown documents and shiny apps.

R packages: tmap, sf, sp, raster, rmarkdown, shiny

Tennekes, M. (2018) tmap: Thematic Maps in R. Forthcoming in the Journal of Statistical Software (JSS).

Data Structures

TitlePresenter(s)
Building a pipeline for reproducible data screening and quality control

Claus Ekstrøm

Anne Helby Petersen

Overview

One of the biggest challenges for a data analyst is to ensure the reliability of the data since the validity of the conclusions from the analysis hinges on the quality of the input data. This tutorial will cover the workflow of data screening and -validation that transforms raw data into data that can be used for statistical analysis.

In particular, we will discuss organizing research projects, tidy data formats, internal and external validity of data, requirements for reproducible research, the dataMaid R package for customized data screening, the assertr and assertive R packages for data validation and data validation rule sets, and how to produce code books that summarize the final result of the data screening process and provide a starting point for the subsequent statistical analyses.

Graphs: A datastructure to query Benjamin Ortiz Ulloa

Overview

When people think of graphs, they often think about mapping out social media connections. While graphs are indeed useful for mapping out social networks, they offer so much more. Graphs provide a datastructure that scales very well and can be queried in intuitive ways. Data structures in the real world resemble vertices and edges more than they resemble rows and columns. Gremlin and Cypher are query languages that take advanatage of the natural structure of graph databases. In this tutorial I will show how we can use igraph in a similar manner as these graph query languages to get new insites into our data.

The workshop will be divided in 4 parts:
  1. A survey of graphs and how they are used (45 min)
    • Social Network Analyses
    • Natural Language Processing
    • NoSQL
    • Epidemiology
  2. An introduction of **igraph** (70 min)
    • Graph structures
    • igraph syntax
    • Graph IO
    • Summary Statistics
    • Base Plotting
    • ggraph
  3. Case Study: NLP (45 min)
    • Filter logic
    • Traversal logic
    • tidytext
    • magritr
  4. Graph Ecosystem (30 min)
    • Gephi
    • D3
    • TinkerPop
R packages: igraph, magrittr, tidyr, ggplot2, ggraph

References:

Please note that no computers will be provided at the workshops, so we recommend bringing your own computer.

Please note that this is still a prelimenary program that might change at any time:

Workshop day (May 14, 2018 -- Monday)

Track Packages & performance I. Machine Learning I. Packages & performance II. Machine Learning II. Spatial data Data structures Packages & performance III.
9:00
Efficient R programming
DALEX: Descriptive mAchine Learning EXplanations
Clean R code - how to write it and what will the benefits be
Building an Interpretable NLP model to classify tweets
Geocomputation with R
Graphs: A datastructure to query
Forwards Package Development Workshop for Women
9:30
10:00
10:30
Coffee break
11:00
Efficient R programming
DALEX: Descriptive mAchine Learning EXplanations
Clean R code - how to write it and what will the benefits be
Building an Interpretable NLP model to classify tweets
Geocomputation with R
Graphs: A datastructure to query
Forwards Package Development Workshop for Women
11:30
12:00
12:30
Lunch break
13:00
13:30
Building a package that lasts
Deep Learning with Keras for R
The beauty of data manipulation with data.table
Automatic and Interpretable Machine Learning in R with H2O and LIME
Plotting spatial data in R
Building a pipeline for reproducible data screening and quality control
Forwards Package Development Workshop for Women
14:00
14:30
15:00
Coffee break
15:30
Building a package that lasts
Deep Learning with Keras for R
The beauty of data manipulation with data.table
Automatic and Interpretable Machine Learning in R with H2O and LIME
Plotting spatial data in R
Building a pipeline for reproducible data screening and quality control
Forwards Package Development Workshop for Women
16:00
16:30

Conference day 1 (May 15, 2018 -- Tuesday)

StartEndSession
8:00 9:00
Registration
9:00 11:00
Keynotes
11:00 11:30
Break
11:30 13:00
Session 2 & 3
13:00 14:00
Lunch
14:00 15:30
Session 4 & 5
15:30 16:00
Break
16:00 17:30
Session 6 & 7
19:00 22:00
Conference Dinner

Conference day 2 (May 16, 2018 -- Wednesday)

StartEndSession
8:00 9:00
Registration
9:00 11:00
Keynotes
11:00 11:30
Break
11:30 13:00
Session 8 & 9
13:00 14:00
Lunch
14:00 15:30
Session 10 & 11
15:30 15:45
Closing Remarks

We'll have a welcome reception and poster session on the workshop day (May 14), which is free and open to all conference attendees. The conference dinner (May 15) requires a separate ticket purchase available at registration time, and will offer not only a good selection of foods and drinks, but a 2-hours long sightseeing boat tour on the Danube as well.

The workshop day will take place at the Central European University. This new building offers plenty of rooms for the tutorials -- ranging from small seminar rooms with 25 seats up to the big auditorium with up to 400 participants.

The second and third days of the conference will take place in the Akvarium Klub, a few minutes walk from the workshops, located only a few hundred meters from the riverside of Danube, and right in the middle of the city center. This venue features a large concert hall with nearly 1300-capacity, a smaller hall, a bar, and a restaurant with unique atmosphere under a pool. In short, we'll have the perfect place for 2 or up to 3 parallel sessions and networking!

Weather forecast

May is a very beautiful month in Hungary, when the daily temperature rises above 20°C, but stays below the summer heat-wave. Late nights can be a bit chilly (make sure to bring a sweater!), but the temperature stays above 10°C. Rain is also not unexpected in this time of the year, but it should be fine. For the most recent forecast, click here.

Parking

There's paid parking garage right under the conference venue, for around 2 EUR per hour. Street parking is a bit cheaper: around 1.5 EUR per hour between 8am and 6pm, but there is a 3 hour restriction, so you have to move your car (?) and get a new ticket pretty frequently. On the other hand, Budapest has a dense network of public transport lines (even at night), so you might consider leaving your car behind (maybe at a free P+R spot) and taking the metro/underground instead. Sporty visitors may choose the BUBI, the public bike-sharing system.

Smoking policy

Smoking is banned in all enclosed public places, on public transport, and in all worplaces in Hungary. Regarding the conference venues, there will be a dedicated smoking area outside the building.

Alcohol policy

Entering the conference venue with any open or visible alcohol container is strictly prohibited. The consumption of alcoholic beverages is forbidden in the conference building except for beverages served as part of the official catering.

Harassment policy

This conference is dedicated to providing a harassment-free conference experience for everyone, regardless of gender, gender identity and expression, sexual orientation, disability, physical appearance, body size, race, age or religion. We do not tolerate harassment of conference participants in any form. Harassment includes offensive verbal comments related to gender, gender identity and expression, sexual orientation, disability, physical appearance, body size, race, religion, sexual images in public spaces, deliberate intimidation, stalking, following, harassing photography or recording, sustained disruption of talks or other events, inappropriate physical contact, and unwelcome sexual attention. Sexual language or imagery is not appropriate for talks, exhibitors’ displays, or social and dining events. Violators may be sanctioned, including being expelled from the conference without a refund, at the complete discretion of the Conference Organizing Committee.

Budapest offers many different accommodation options for backpackers, business travellers, tourists etc, please check the available hotels, youth hostels, guesthouses and other options on eg on booking.com, szallas.hu or Airbnb.

You can reach Budapest and the city center in a variety of different ways:

Please click on the above references to see the alternative routes and actual timetables provided by Google.

Unfortunately, Uber is not available any more in Hungary, but we have cabs -- called "taxi" in Hungary and have a standard fare of:

  • base fee (450 HUF)
  • distance charge (280 HUF/km)
  • waiting fee (70 HUF/min)

Public transportation is also pretty good and much more affordable -- eg you can take bus E200 from the Airport to Kőbánya-Kispest, then the Metro (underground) to Deák tér in ~20-45 minutes for around 3 EUR overall.