Mondays, 4:00 PM - 5:50 PM (in Shannon 109)
Wednesdays, Noon - 2:00 PM (in PLSB 403)
Week | Topic |
---|---|
1 | Course introduction |
2 | Connecting to the command-line |
3 | Navigating in bash |
4 | Regular expressions |
5 | What is GWAS? |
6 | Mapping sequencing data |
7 | Association testing |
8 | Batch processing / HPC clusters |
9 | Break |
10 | Intro to R programming |
11 | Using data.table in R |
12 | Plotting with ggplot |
13 | Data Interpretation Project - Open Worktime |
14 | Data Interpretation Project - Open Worktime |
15 | Data Interpretation Project - Presentations |
If you could copy, paste, and post one tweet (280 characters) per second, it would still take you 127 days just to get through a single human genome. Modern biological studies can involve hundreds to thousands of times more data. How do researchers handle data at such a scale? How is it possible to glean meaningful insight from big data?
you will develop your computational repertoire while practicing on a model framework: how to identify genetic variation related to a phenotype, such as disease. You will be gaining theoretical background from primary literature, manipulating files on a command-line, accessing online repositories, transforming/analyzing data within a computational programming language, and producing efficient graphical displays.
you’ll have conducted a genome-wide association study from start to finish. More importantly, you’ll have gained confidence and knowledge necessary to tackle computational problems in biology.
Prerequesites: One level 3000 Biology course. While no programming knowledge is required, it is recommended that you have completed an introductory statistics class, such as STAT-2020.
This course is intended to introduce you to the computational methods that biologists use to answer questions. In actuality, computational problem solving is much like any other kind of problem solving. You will identify a question or problem, break that problem apart into smaller manageable parts, and tackle those smaller problems one by one. Not only will you be able to answer computational questions in biology, you will become a better problem solver in general. Specifically, by the end of this course, you will be able to:
This course is intended to be as accessible as possible—all you will need is a modern laptop (with a keyboard) capable of accessing the internet. Primary literature will be distributed via Collab. All software used in this course will be free/open-source. Windows, Mac, and Linux operating systems are all acceptable. Accessing the UVA network off-grounds requires use of the Cisco VPN.
Prior to some class meetings, you will read an excerpt of scientific literature, including relevant figures. At the start of each class meeting, you will discuss the reading and its figure(s), reflect on the rationale for representing data in a specific way, and considering alternative methods of representing data. This will function as a lead-in to a guided problem modeled after data similar to (or directly drawn from) your reading.
Most of our meeting time will be dedicated to tackling a project. For example, to get ready for running an association test, you will retrieve sequencing data from an online repository, align it to a reference genome, and generate summary files. We’ll introduce new content along the way, building on top of what you’ve learned along the way. Initial project sets will be heavily guided in terms of steps and desired outcome, while later work will require you to develop your own workflow.
Outside of class, before the next meeting, you will address a problem set outside of class. Here is where different students may come across different methods of solving a specific kind of problem. You will then compare your workflow and outcomes to other students, and reflect on similarities or differences that you find, and comment on any new tools, techniques, or methods that you found useful.
For the final few weeks of the semester, you will form a small group and work together to perform a supplemental analysis of your choice. Class time will be dedicated to working on the project, though working together outside of class may be to your benefit. On the final day, your group will present your methods, results, and conclusions to the class.
Because participation is so important, it is expected that you attend every class possible. Even if you may personally feel able to learn or complete activities entirely independently, part of this course is learning to collaborate with others. If you can bring expertise to the classroom, it benefits both yourself and other students by sharing your knowledge. Because life can be unpredictable, missing one class will not incur any participation penalties. Any additional absences will be addressed on an individual basis.
The skills developed in this course can only be learned by practicing them. This will involve a lot of:
You will do well in this class if you actively participate and are making good-faith efforts to complete all required activities. Ideally, you would revisit and practice newly introduced concepts more than once per week – even if only for a few minutes per day. If you prefer learning to solve the problems in groups, you’re free to meet up with other students outside of class, or to collaborate online. Simply copying someone else’s work and pasting it into the command-line is not making effort to practice computational skills.
To get the most out of this class, I highly encourage you to attend our community’s weekly Biology Programming Hours (Noon to 2:00 PM on Wednesdays, in PLSB 403).
If this sounds like you, perfect! This course is designed for those without any prior experience with programming languages or computer science. We will use a combination of in-class guided activities and online interactive tutorials to get you up to speed. We will also have a Slack channel for tech help, so if you’re having problems, you can easily submit a question.