BIOL4585

Introduction to Computational Biology

Instructor

Weekly meeting time

Mondays, 4:00 PM - 5:50 PM (in Shannon 109)

Office Hours

Wednesdays, Noon - 2:00 PM (in PLSB 403)

Course Schedule

Week	Topic
1	Course introduction
2	Connecting to the command-line
3	Navigating in bash
4	Regular expressions
5	What is GWAS?
6	Mapping sequencing data
7	Association testing
8	Batch processing / HPC clusters
9	Break
10	Intro to R programming
11	Using data.table in R
12	Plotting with ggplot
13	Data Interpretation Project - Open Worktime
14	Data Interpretation Project - Open Worktime
15	Data Interpretation Project - Presentations

Course Description

Consider the human genome.

If you could copy, paste, and post one tweet (280 characters) per second, it would still take you 127 days just to get through a single human genome. Modern biological studies can involve hundreds to thousands of times more data. How do researchers handle data at such a scale? How is it possible to glean meaningful insight from big data?

In this course,

you will develop your computational repertoire while practicing on a model framework: how to identify genetic variation related to a phenotype, such as disease. You will be gaining theoretical background from primary literature, manipulating files on a command-line, accessing online repositories, transforming/analyzing data within a computational programming language, and producing efficient graphical displays.

After completing the course,

you’ll have conducted a genome-wide association study from start to finish. More importantly, you’ll have gained confidence and knowledge necessary to tackle computational problems in biology.

Prerequesites: One level 3000 Biology course. While no programming knowledge is required, it is recommended that you have completed an introductory statistics class, such as STAT-2020.

Course Objectives

This course is intended to introduce you to the computational methods that biologists use to answer questions. In actuality, computational problem solving is much like any other kind of problem solving. You will identify a question or problem, break that problem apart into smaller manageable parts, and tackle those smaller problems one by one. Not only will you be able to answer computational questions in biology, you will become a better problem solver in general. Specifically, by the end of this course, you will be able to:

navigate directories and manipulate files via a command-line
read and write basic code for manipulating and visualizing biological data
collaboratively code with others and give/receive feedback
generate reproducible examples when debugging code
identify, locate, and access resources when attempting a computational project

Course materials

This course is intended to be as accessible as possible—all you will need is a modern laptop (with a keyboard) capable of accessing the internet. Primary literature will be distributed via Collab. All software used in this course will be free/open-source. Windows, Mac, and Linux operating systems are all acceptable. Accessing the UVA network off-grounds requires use of the Cisco VPN.

Activities

Readings, reflections & discussions

Prior to some class meetings, you will read an excerpt of scientific literature, including relevant figures. At the start of each class meeting, you will discuss the reading and its figure(s), reflect on the rationale for representing data in a specific way, and considering alternative methods of representing data. This will function as a lead-in to a guided problem modeled after data similar to (or directly drawn from) your reading.

In-class guided problems

Most of our meeting time will be dedicated to tackling a project. For example, to get ready for running an association test, you will retrieve sequencing data from an online repository, align it to a reference genome, and generate summary files. We’ll introduce new content along the way, building on top of what you’ve learned along the way. Initial project sets will be heavily guided in terms of steps and desired outcome, while later work will require you to develop your own workflow.

Out-of-class practice

Outside of class, before the next meeting, you will address a problem set outside of class. Here is where different students may come across different methods of solving a specific kind of problem. You will then compare your workflow and outcomes to other students, and reflect on similarities or differences that you find, and comment on any new tools, techniques, or methods that you found useful.

Group project

For the final few weeks of the semester, you will form a small group and work together to perform a supplemental analysis of your choice. Class time will be dedicated to working on the project, though working together outside of class may be to your benefit. On the final day, your group will present your methods, results, and conclusions to the class.

Attendance

Because participation is so important, it is expected that you attend every class possible. Even if you may personally feel able to learn or complete activities entirely independently, part of this course is learning to collaborate with others. If you can bring expertise to the classroom, it benefits both yourself and other students by sharing your knowledge. Because life can be unpredictable, missing one class will not incur any participation penalties. Any additional absences will be addressed on an individual basis.

Want to do well in this class?

The skills developed in this course can only be learned by practicing them. This will involve a lot of:

typing commands into your terminal,
testing the output to see if it worked,
modifying your commands, and trying again

You will do well in this class if you actively participate and are making good-faith efforts to complete all required activities. Ideally, you would revisit and practice newly introduced concepts more than once per week – even if only for a few minutes per day. If you prefer learning to solve the problems in groups, you’re free to meet up with other students outside of class, or to collaborate online. Simply copying someone else’s work and pasting it into the command-line is not making effort to practice computational skills.

To get the most out of this class, I highly encourage you to attend our community’s weekly Biology Programming Hours (Noon to 2:00 PM on Wednesdays, in PLSB 403).

“…but I can’t code”

If this sounds like you, perfect! This course is designed for those without any prior experience with programming languages or computer science. We will use a combination of in-class guided activities and online interactive tutorials to get you up to speed. We will also have a Slack channel for tech help, so if you’re having problems, you can easily submit a question.