# Brazilian Edition of the

Summer Institute in Statistical Genetics

**We are on Facebook!**

## General Information:

We are pleased to host the Summer Institute in Statistical Genetics, Department of Genetics, Luiz de Queiroz College of Agriculture, University of São Paulo, Piracicaba, SP, Brazil.
The Institute is on its 18th edition (http://www.biostat.washington.edu/suminst/sisg/general/) and consists of a series of two-and-a-half day workshops designed to introduce geneticists to modern methods of statistical analysis and to introduce statisticians to the statistical challenges posed by modern genetic data. Prerequisites are minimal, and the modular nature of the Institute enables participants to design a program best suited to their backgrounds and interests. Most participants take two or three modules. In the Brazilian edition, introductory modules will also be included, directed to students lacking quantitative training.
Individuals attending the Institute will receive certificates of course completion in recognition of their participation.

About ESALQ and Piracicaba.

## Modules:

15 hours each, 2.5 days long.

## Venue:

Piracicaba, SP ( http://pt.wikipedia.org/wiki/Piracicaba, Piracicaba's TripAdvisor Guide), Campus of the “Luiz de Queiroz” College of Agriculture, University of São Paulo, Departament of Genetics (http://www.esalq.usp.br/acom/mapa/mapa.htm).

## Date:

Jan 22^{th} - Feb 12^{nd}, 2014.

## Introductory Modules:

This would be directed to students who need greater background in topics required by other courses in the Summer Institute. One module is directed to those with quantitative background, but lacking basic genetics, molecular biology and populations genetics. The other is directed to those with biological background, but lacking quantitative training. Both modules will be taught (in English) by a team of 2 or 3 Brazilian professors.

**(M1)**Principles of Statistics for Geneticists - Antonio Augusto F Garcia (ESALQ/USP), Roland Vencovsky (ESALQ/USP), Gabriel R.A. Margarido ((ESALQ/USP)**(M2)**Principles of Genetics for Statisticians - Diogo Meyer (IB/USP), Tatiana Teixeira Torres (IB/USP) and Maria Vibranovski (IB/USP)

## SISG Modules:

We have grouped the modules into two tracks, catering to the interests of two general groups of students. The first is directed to those interested in plant and animal breeding, a strong component of the agriculture college’s research interest. The second is directed to evolution and population genetics, and should attract those interested in human genetics as well as those more broadly interested in phylogeography, population genetics, forensics and conservation genetics, among other fields.

**Modules for students interested in Plant and Animal Breeding**

**(M3)**Introduction to QTL Mapping**(M4)**Mixed Models in Quantitative Genetics**(M5)**Plant and Animal Association Mapping

**Modules for students interested in Evolution and Populations Genetics**

**(M6)**Computing for Statistical Genetics**(M7)**Population Genetic Data Analysis**(M8)**Quantitative Genetics**(M9)**MCMC for Genetics

**Modules for both audiences**

**(M10)**High-dimensional Omics Data**(M11)**Network and Pathway Analysis of Omics Data

## Module Descriptions:

**M1**: Principles of Statistics for Geneticists

This module will cover fundamental concepts and statistical methods required to have a better understand of other advance modules, and is designed for researchers with a non-quantitative background. We will present and discuss linear models, analysis of variance and covariance, linear regression, correlation, likelihood and tests of hypothesis, as well as an introduction to the widely used package R. All topics will include practical examples in Genetics, using R scripts to their implementation. Our goal is to introduce geneticists to important topics in Statistics using intuitive concepts.

**M2**: Principles of Genetics for Statisticians

In this module we present basic concepts of genetics and evolution. Our goal is to help researchers with non-biological background (e.g., statistical and computational or biomedical training) to better understand the nature of biological data and the types of questions researchers are asking. The module covers a review of basic mendelian genetics, introductory molecular biology, notions on cell division and its relationship to mendelian genetics. We survey the types of biological data that are currently being generated (next generation,microarray genotyping and DNA sequencing and RNAseq). We review the basic measures used to quantify genetic variation and present an introduction to basic concepts in evolutionary biology, including the concept of genetic drift and natural selection. We introduce students to the neutral theory of molecular evolution and the nearly neutral theory, and discuss the importance of demographic history in shaping extant patterns of genetic variation. We discuss two ongoing challenges to evolutionary genomics: evaluating the relative importance of selection and drift in evolution, and assigning function to DNA sequences based on genomic sequence analysis.

**M3**: Introduction to QTL Mapping

This module will systematically introduce statistical methods for mapping quantitative trait loci (QTL) in experimental cross populations. Topics include experimental designs, linkage map construction, single-marker analyses, interval mapping, composite interval mapping and multiple interval mapping. Significance thresholds for genome scan and model selection will also be discussed. Uses public domain software Windows QTL-Cartographer for computer lab exercises. Emphasis is on procedures for QTL mapping data analysis and appropriate interpretation of mapping results rather than on formulas.

**M4**: Mixed Models in Quantitative Genetics

The analysis of linear models containing both fixed and random effects. Topics to be discussed include a basic matrix algebra review, the general linear model, derivation of the mixed model, BLUP and REML estimation, estimation and design issues, Bayesian formulations. Applications to be discussed include estimation of breeding values and genetic variances in general pedigrees, association mapping, genomic selection, direct and associative effects models of general group and kin selection, genotype by environment interaction models. Background reading: Lynch, M. and B. Walsh. 1998. Genetics and analysis of quantitative traits. Sinauer Associates.

**M5**: Plant and Animal Association Mapping

This module is an introduction to association mapping, focusing on plant and animal populations. Topics include theory of linkage disequilibrium and mapping, population and family-based association techniques for discrete and continuous traits, methods for detecting and accounting for population structure, methods to identify causative genes and variants, issues in polyploid organisms, multiple testing issues, and genotyping strategies. Examples for real data, including a discussion of linkage disequilibrium in plant and animal populations.

**M6**: Computing for Statistical Genetics

This module introduces software for analysis of genetic data, in the R statistical environment. Data management in R, programming concepts for R, and standard regression analyses will be discussed. These topics will be followed by analysis more specific to genetic data, including association analysis, and handling large date files. Use of the extensive collection of genomics packages from the Bioconductor project will be introduced, and use of R as an interface to other more specialized, ?legacy? software will be demonstrated. Examples are drawn from a range of genetic studies, but analyses of whole-genome association study data are particularly featured. While the module assumes no prior knowledge of R, some programming experience, in R or other software, will be helpful.

**M7**: Population Genetic Data Analysis

This module serves as a foundation for many of the later modules. Estimates and sample variances of allele frequencies, Hardy-Weinberg and linkage disequilibrium, characterization of population structure with F-statistics. Relationship estimation. Statistical genetic aspects of forensic science and association mapping. Concepts illustrated with R exercises. Background reading: Holsinger, K. and Weir, B.S. 2009. Genetics in geographically structured populations: defining, estimating, and interpreting Fst . Nature Reviews Genetics 10:639-650. Weir, B.S. and Laurie, C.C. 2011. Characterizing allelic association in the genome era. Genetics Research 92:461-470.

**M8**: Quantitative Genetics

Quantitative Genetics is the analysis of complex characters where both genetic and environment factors contribute to trait variation. Since this includes most traits of interest, such as disease susceptibility, crop yield, and all microarray data, a working knowledge of quantitative genetics is critical in diverse fields from plant and animal breeding, human genetics, genomics, to ecology and evolutionary biology. The course will cover the basics of quantitative genetics including: Fishers variance decomposition, covariance between relatives, heritability, inbreeding and crossbreeding, and response to selection. Also an introduction to advanced topics such as: Mixed Models, BLUP, QTL mapping; correlated characters; and the multivariate response to selection. Background reading: Lynch, M. and Walsh, B. 1998. Genetics and analysis of quantitative traits. Sinauer Associates.

**M9**: MCMC for Genetics

This module examines the use of Bayesian Statistics and Markov chain Monte Carlo methods in modern analyses of genetic data. It assumes a solid foundation in basic statistics and the concept of likelihood as well as some population genetics. A basic familiarity with the R statistical package, or other computing language, will be helpful. The first day includes an introduction to Bayesian statistics, Monte Carlo, and MCMC. Mathematical concepts covered include expectation, laws of large numbers, and ergodic and time-reversible Markov chains. Algorithms include the Metropolis-Hastings algorithm and Gibbs sampling. Some mathematical detail is given; however, there is considerable emphasis on concepts and practical issues arising in applications. Mathematical ideas are illustrated with simple examples and reinforced with a computer practical using the R statistical language. With that background, two applications of MCMC are investigated in detail: inference of population structure (using the program STRUCTURE) and haplotype inference (using the program PHASE). Computer practicals using both programs are included. Further topics include the use of MCMC in model evaluation and model checking, strategies for assessing MCMC convergence and diagnosing MCMC mixing problems, importance sampling, and Metropolis-coupled MCMC. Software used: R, STRUCTURE, PHASE. Background reading: Shoemaker, J.S., Painter, I.S. and Weir, B.S. (1999). Bayesian statistics in genetics. Trends in Genetics 15:354-358. Beaumont, M.A. and Rannala, B. (2004). The Bayesian revolution in genetics. Nature Reviews Genetics 5:251-261. Gilks, W.R., Richardson, S. and Spiegelhalter, D.J. (1996). "Markov Chain Monte Carlo in Practice." Chapman and Hall.

**M10**:High-dimensional Omics Data

In this course, we will present a number of statistical machine learning methods for the analysis of high-dimensional biological data, often referred to as 'omics.' Examples include genomic, transcriptomic, metabolomic, proteomic, and other large-scale data sets, typically characterized by a huge number of molecular measurements (such as genes) and a relatively small number of samples (such as patients). In the first part of the course, we will cover supervised learning methods that are useful in the analysis of omics data. These include penalized approaches for performing regression, classification, and survival analysis in the high-dimensional setting. In the second part of the course, we will discuss unsupervised approaches for the analysis of omics data, such as clustering and principal components analysis. Throughout the course, we will highlight the effects of high dimensionality and focus on common pitfalls in the analysis of omics data, and how to avoid them. The techniques discussed will be demonstrated in R. This course assumes a previous course in regression and statistical hypothesis testing, and some familiarity with R or other command line programming languages.

**M11**:Network and Pathway Analysis of Omics Data

Networks represent the interactions among components of biological systems. In the context of high dimensional omics data, relevant networks include gene regulatory networks, protein-protein interaction networks, and metabolic networks. These networks provide a window into biological systems as well as complex diseases, and can be used to understand how biological functions are implemented and how homeostasis is maintained. On the other hand, pathway-based analyses can be used to leverage biological knowledge available from literature, gene ontologies or previous experiments in order to identify the pathways associated with disease or an outcome of interest. In this module, various statistical learning methods for reconstruction and analysis of networks from omics data are discussed, as well as methods of pathway enrichment analysis. Particular attention will be paid to omics datasets with a large number of variables, e.g. genes, and a small number of samples, e.g. patients. The techniques discussed will be demonstrated in R. This course assumes a previous course in regression, previous exposure to the material covered in Module 16, and familiarity with R or other command line programming languages.

## Instructors:

**(M3)**Rebecca Doerge and Zhao-Bang Zeng**(M4)**Bruce Walsh and Guilherme Rosa**(M5)**Michel Georges and Dahlia Nielsen**(M6)**Thomas Lumley and Ken Rice**(M7)**Jerome Goudet and Bruce Weir**(M8)**Bruce Walsh and Guilherme Rosa**(M9)**Eric Anderson and Matthew Stephens**(M10)**Alison Motsinger-Reif and Ali Shojaie**(M11)**Alison Motsinger-Reif and Ali Shojaie

## SCHEDULE:

Week 1

Mon

Tue

Jan 22

^{nd}Jan 23

^{rd}Jan 24

^{th}08:00-08:30

08:30-10:00

M1, M2

M1, M210:00-10:20

coffee break

coffee break

10:20-12:00

M1, M2

M1, M212:00-14:00

Reception

lunch

lunch

14:00-15:30

M1, M2

M1, M2

M1, M215:30-15:50

coffee break

coffee break

coffee break

15:50-17:00

M1, M2

M1, M2

M1, M2Week 2

Jan 27

^{th}Jan 28

^{th}Jan 29

^{th}Jan 30

^{rd}Jan 31

^{st}08:00-08:30

Reception

08:30-10:00

M3, M6

M3, M6

M3, M6

M4, M7

M4, M710:00-10:20

coffee break

coffee break

coffee break

coffee break

coffee break

10:20-12:00

M3, M6

M3, M6

M3, M6

M4, M7

M4, M712:00-14:00

lunch

lunch

lunch

lunch

lunch

14:00-15:30

M3, M6

M3, M6

M4, M7

M4, M7

M4, M715:30-15:50

coffee break

coffee break

coffee break

coffee break

coffee break

15:50-17:00

M3, M6

M3, M6

M4, M7

M4, M7

M4, M7Week 3

Feb 3

^{rd}Feb 4

^{th}Feb 5

^{th}Feb 6

^{th}Feb 7

^{th}08:00-08:30

Reception

08:30-10:00

M5, M8

M5, M8

M5, M8

M9, M10

M9, M1010:00-10:20

coffee break

coffee break

coffee break

coffee break

coffee break

10:20-12:00

M5, M8

M5, M8

M5, M8

M9, M10

M9, M1012:00-14:00

lunch

lunch

lunch

lunch

lunch

14:00-15:30

M5, M8

M5, M8

M9, M10

M9, M10

M9, M1015:30-15:50

coffee break

coffee break

coffee break

coffee break

coffee break

15:50-17:00

M5, M8

M5, M8

M9, M10

M9, M10

M9, M10Week 4

Feb 10

^{th}Feb 11

^{st}Feb 12

^{nd}08:00-08:30

Reception

08:30-10:00

M11

M11

M1110:00-10:20

coffee break

coffee break

coffee break

10:20-12:00

M11

M11

M1112:00-14:00

lunch

lunch

14:00-15:30

M11

M1115:30-15:50

coffee break

coffee break

15:50-17:00

M11

M11

## Fees (per module):

- USD $ 250 for graduate students

- USD $ 500 for professionals (academics, research institutes, public institutions)

- USD $ 900 for non-academic professionals

- Modules
**M1**and**M2**: free for people taking at least 2 modules.

## Lodging:

Suggestions:

- Antonios Palace Hotel - Av.Independência, 2805, Phone: +55-19-3417-6000.

- New Life Piracicaba Apart Hotel - Rua Moraes Barros, 555, Phone: +55-19-3301-6800

- Hotel Ibis Piracicaba - Rua Armando Dedini, 125, Phone: +55-19-3421-6400.

- Hotel Ibis Budget Piracicaba - Rua Armando Dedini, 155, Phone: +55-19- 33725150.

- Beira Rio Palace Hotel - Rua Luiz de Queiroz, 51, Phone: +55-19-3422-0066.

- Hotel Center Flat Service - Rua José Pinto de Almeida, 877, Phone: +55-19-3403.6400 - reservas@centerflatservice.com.br

- Arco Hotel Express Piracicaba - Av. Saldanha Marinho, 1515, Phone: +55-19-3373-3000.

## Organization:

- Antonio Augusto Franco Garcia (coordinator)
- Diogo Meyer
- Bruce Weir
- Ricardo Antunes de Azevedo
- José Baldin Pinheiro

## Registration:

From November 1^{st} to December 10^{th}, 2013.

**Information**:

After providing your personal data, in the registration form, please choose your professional status. Fees will be calculated according to this information.

- a) Students: this category is for undergraduate, graduate students and post-docs.

- b) Professional (academics and public institutions): choose this if you work in some University or Public Institution (e. g. EMBRAPA, IAC, USDA, etc)

- c) Professional (non-academics): if you work on a private company

**IMPORTANT I**: if you subscribe for at least two modules from M03 to M11, you qualify to attend M01 or M02 for free. Please, indicate which module you will attend (if any). Fees will be calculated accordingly.

**IMPORTANT II**: for subscriptions in categories a) and b) above, you will need to provide some proof of your status. This could be done using some identification card, student ID, etc. Details will be presented once you start the process.

**Links for registration:**

## Portuguese |
## English |

## Support:

- Department of Genetics, ESALQ/USP
- Graduate Program in Genetics and Plant Breeding, ESALQ/USP
- Graduate Program in Genetics and Evolutionary Biology, IB/USP
- Brazilian Society of Genetics