Introduction to R for Data Science

Page 1

Cambridge Bioinformatics Training

Introduction to R for Data Science

14—17 December 2020

1


R is one of the leading programming languages in Data Science. It is widely used to perform statistics, machine learning, visualisations and data analyses. It is an open source programming language so all the software we will use in the course is free. This workshop is an introduction to R designed for participants with no programming experience. We will start from scratch by introducing how to start programming in R and progress our way and learn how to read and write to files, manipulate data and visualise it by creating different plots - all the fundamental tasks you need to get you started analysing your data. During the workshop we will be working with one of the most popular packages in R; tidyverse that will allow you to manipulate your data effectively and visualise it to a publication level standard. The workshop will be taught online via an online training environment that we have in place. We hope that you and your loved ones stay safe and we look forward to welcoming you in December in our virtual classroom.

Dr Gita Yadav Course Organiser

Dr Alexia Cardona Course Organiser

To register: https://forms.gle/vvQQCMGznqKBB5deA | 1


Programme Monday 14th December 2020 Times displayed in the schedule below are in India Time (IST)

Introduction to programming in R 14:00

Welcome, Ice breaking, Group photo Paul Judge, Cathy Hemmings

14:30

Introduction to programming in R Speaker: Dr Alexia Cardona Tutors: Dr Kamal Kishore, Dr Gitanjali Yadav

In this session we will learn how to use R and the RStudio IDE and cover the basic syntax of the R programming language. You will be able to create scripts and use functions to perform specific operations on your data.

15:30

Tea/Coffee break

15:45

Introducing to programming in R (cont.)

16:45

Tea/Coffee break

17:00

Introduction to programming in R (cont.)

18:00

End of day 1

Tuesday 15th December 2020 Starting with data 14:00

Welcome to Day 2 - R recap

14:45

Starting with data Speaker: Dr Gitanjali Yadav Tutors: Dr Kamal Kishore, Dr Alexia Cardona In this session we will learn how to load data from a file into memory. We will learn about Dataframes the most popular data structure in R that holds tabular data and continue building our knowledge on R programming.

15:30

Tea/Coffee break

15:45

Starting with data (cont.)

16:45

Tea/Coffee break

17:00

Starting with data (cont.)

18:00

End of day 2

To register: https://forms.gle/vvQQCMGznqKBB5deA | 2


Wednesday 16th December 2020 Data manipulation and visualisation with tidyverse 14:00

Welcome to day 3 and recap Dr Gitanjali Yadav

14:45

Data manipulation and visualisation Speaker: Dr Gitanjali Yadav Tutors: Dr Kamal Kishore, Dr Alexia Cardona

In this session we will learn how to use R and the RStudio IDE and cover the basic syntax of the R programming language. You will be able to create scripts and use functions to perform specific operations on your data.

15:30

Tea/Coffee break

15:45

Data manipulation and visualisation (cont.)

16:45

Tea/Coffee break

17:00

Data manipulation and visualisation (cont.)

18:00

End of day 3

Thursday 17th December 2020 More data manipulation and visualisation with tidyverse 14:00

Welcome to day 4 and recap Dr Alexia Cardona

14:45

More data manipulation and visualisation Speaker: Dr Alexia Cardona Tutors: Dr Kamal Kishore, Dr Gitanjali Yadav

In this session we will build upon what we have learnt on Wednesday and learn more advanced ways how to visualise and manipulate data. We will learn how to group and summarise data that can be helpful to explore your dataset.

15:30

Tea/Coffee break

15:45

More data manipulation and visualisation (cont.)

16:45

Tea/Coffee break

17:00

More data manipulation and visualisation (cont.)

18:00

End of workshop

3 | To register: https://forms.gle/vvQQCMGznqKBB5deA


About this workshop Aims The aim of this workshop is to: • Provide an introduction to progamming and tools required for scripting in R. • Provide practical experience and guidance on how to manipulate data in R. • Understand and learn how to visualise data into different plots in R.

Target audience This course is aimed at Women from India, including Graduate students, Postdocs and other individuals. It will be delivered to Indian Women Applicants under the We -VIDYA Initiative (Women enabled for Virtual Induction as Data Youth and AI Professionals).

Prerequisites This course is aimed at beginners, no prior knowledge is required. Fluency in English and a good internet connection will help you gain the most from the course.

Topics covered: R Programming, Data handling, Manipulation, Visualisation

Presentation of the course The course will consist of a mixture of trainer-led lectures and practical work with computer exercises that will introduce the participants to software tools, including R, to analyse data under the guidance of the trainers and teaching assistants. As a result of attending the course participants should be able to: • • • • •

Write and execute R code Know where to look for help about R code Read data from a file and write data to a file Extracting data from tables Create plots from data

Live online training This year the workshop will be delivered online via our online training environment. You will be given access before the course. As such there will be no need to install software in advance. We will provide software installation instructions and support in case you would like to install the software on your computer. We aim to provide a classroom experience as closely as possible, with opportunities for oneto-one discussion with tutors.

To register: https://forms.gle/vvQQCMGznqKBB5deA | 4


Course Fee The Registration fee for this course is INR 10,000/- per participant. This fee covers: • • • • • • • • •

Access to BTF Online Live Training Environment Password based access to remote cloud workstations Pre-installed datasets and software required for the workshop Access to Full course material, lecture slides, exercises, solutions One-on-One (classroom) training experience Video recordings of all Lectures for later use A living document with Introductions, Q&A, and relevant links Short term and Long term course feedback Certificate of attendance

Scholarships and grants to cover Registration fee are available from our sponsors upon request. Please mention this in your pre-enrolment application below.

Course Enrolment & Selection Procedure All interested candidates must fill the pre-enrolment form available here. Deadline for booking is November 30, 2020. 50 applicants will be selected based on these applications. Selected women will get a link to formally Register for the course, followed by communication from Cambridge by December 05, 2020.

Reading and resources list Listed below are a number of texts that might be of interest for future reference, but do not need to be bought (or consulted) for the course. Books to read following the course: Garrett Grolemund and Hadley Wickham. R for Data Science. O'Reilly 2017 (available at: http://r4ds.had.co.nz/) Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani. An Introduction to Statistical Learning. Springer 2017 (available at: http://www-bcf.usc.edu/~gareth/ISL/) URLs https://www.rstudio.com/resources/cheatsheets/

Contact us For more information contact us on cbl@nipgr.ac.in or grad.bioinfo@lifesci.cam.ac.uk. Subscribe to our mailing list: https://lists.cam.ac.uk/mailman/listinfo/ucam-bioinfo-training

5 | To register: https://forms.gle/vvQQCMGznqKBB5deA


Programme Instructors Alexia Cardona

Gitanjali Yadav

Dr Cardona leads training development of the University of Cambridge’s Bioinformatics Training Programme. Her role involves the management of the different aspects of training including design, development, coordination and teaching of undergraduate and postgraduate training in Bioinformatics and Data Science. She is a leader in the ELIXIR international community, where together with the other leaders and partners she drives the establishment of high-quality training in Data Management for the Life Sciences. Dr Cardona is an advocate of participation in Communities of Practice and of women in leading and computational sectors which are currently underrepresented.

Dr. Yadav is a Lecturer at the Dept of Plant Science, University of Cambridge. She also holds a joint appointment as Group Leader at NIPGR, New Delhi. She is a specialist in Genomics and Complex Networks, with ap- plications in food security and conservation. Dr. Yadav holds academic degrees in Botany (BSc), Biomedical Research (MSc) and Computational Immunology (Ph.D). She is a strong proponent of Open Science and has been actively involved with Indian and inter- national science academies for outreach and Bioinformatics training. She is also keen on addressing technological challenges faced by Women due to the digital gender divide, specially during the pandemic.

Kamal Kishore Dr Kishore obtained his PhD at the University of Milan in the field of computational epigenomics. His PhD involved developing computational tools for integrative analysis of epigenomics data. He joined Bioinformatics core at CRUK CI in 2016. His main activities involves analysis of heterogenous genomics and proteomics data. He work in close collaboration with biologists to analytically answer challenging biological questions. The development of robust software solutions with contributions to Bioconductor project forms an integral part of the work.

To register: https://forms.gle/vvQQCMGznqKBB5deA | 6


This workshop is sponsored by:

Contact us: Cambridge Bioinformatics Training Craik-Marshall Building Downing Site University of Cambridge Cambridge CB2 3EB United Kingdom Email: grad.bioinfo@lifesci.cam.ac.uk Telephone: +44 (0)1223 333614 Website: https://bioinfotraining.bio.cam.ac.uk/ Mailing list: https://lists.cam.ac.uk/mailman/listinfo/ucam-bioinfo-training


Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.