element;intro
#Welcome;Welcome to ExploreModelMatrix! Here you can visualize and explore your design matrix in an interactive environment, and gain a better understanding of the meaning of the model coefficients and how they are related to your contrast of interest. <br><br>In this tour you will get an overview of the user interface. You can always leave the tour by pressing "Skip", or by clicking anywhere outside of the tour box. At any stage, you can access the tour by clicking on the question mark in the top right corner of the app.
#choose_design_formula;In this box you specify the desired (fixed-effect) design formula. The formula must start with the ~ character. <br><br>If you start the app without providing a sample table, there will also be a control here to select a tab-separated file with the sample information. After selecting the file, carefully check that all the columns have been correctly interpreted (e.g., in the Sample table summary panel). <br><br>All the controls in the sidebar are interactive, meaning that if you change one of them, the content displayed in the app will update accordingly.
#use_example_design;Instead of supplying your own sample table and design formula, you can choose to explore one of the built-in example designs, available in this dropdown menu.
#reflevels;These dropdown menus let you set or change the reference level for each of the factor variables in the supplied sample data table.
#dropcols;This box lets you specify columns to drop from the design matrix. This can be useful, e.g., to obtain a design matrix with full rank.
#ui_settings;These menus allows you to change the settings for the plots, such as the plot height and font sizes. This is particularly useful if you have large and/or complex design experiments, in which case the text may be difficult to read with default plot settings. The menus can be expanded or collapsed by clicking on the small arrows to the right of the respective title.
#ui_fitted_values_box;This plot shows the fitted values, in terms of the model coefficients (or more generally, the value of the linear predictor for a generalized linear model) for each combination of predictor values. The information can be shown either as a plot or as a table.<br><br>These values can be useful when deriving contrasts - for example, to get the contrast required to compare the fitted values in two cells in this table, take the linear combination given in the first cell and subtract the linear combination given in the second cell. This difference will tell you which linear contrast you need to specify to perform your test of interest.
#sample_table_box;This panel shows the full provided sample information table. You can collapse and expand each panel by clicking on the + or - symbol in the top right corner.
#sample_table_summary_box;This shows the summary of the provided sample information table. Character variables are converted to factors when loaded into the application.
#design_matrix_box;This panel displays the current design matrix, obtained by the model.matrix() R function applied to the specified design formula and sample data table, after excluding any columns specified under "Drop columns". The design matrix shows, for each sample (row), how the different model coefficients (columns) contribute to the fitted value for the sample. In a regular linear model of the form y=Xb+e, where y are the observed response variables, b the values of the model coefficients and e is an error term, X represents the design matrix. An accessible introduction to linear models in general is provided e.g. by Irizarry & Love: Data Analysis for the Life Sciences (LeanPub, 2015).
#design_matrix_rank_box;Here, the rank of the current design matrix, as well as the number of columns, are displayed. If the rank is lower than the number of columns (i.e., the design matrix is not full rank), a warning will be displayed. You will also see the residual degrees of freedom, calculated as the number of observations minus the rank of the design matrix. If this is zero, information such as variance or dispersion can not be estimated from data.
#pinv_design_matrix_box;This panel shows the pseudoinverse of the design matrix X. In a regular linear model of the form y=Xb+e, the ordinary least squares estimates of the regression coefficients are obtained by multiplying the pseudoinverse of X with the observed response values y, that is, $$\hat{\beta}=(X^TX)^{-1}X^Ty.$$ Thus, the pseudoinverse indicates how the response variable values for the different samples contribute to the estimated model coefficients in a linear model.
#vifs_box;This panel displays the estimated variance inflation factors (VIFs) for the model coefficients, calculated by successively modeling each non-intercept column of the design matrix as a function of the other columns, using a linear model. For each variable, the VIF is obtained as $$1/(1-R^2),$$where R^2 is the coefficient of determination of the linear model. In an ordinary least squares regression analysis, the VIF provides a measure of the increase in a coefficient's variance caused by collinearity with the other predictors in the model. The VIF for a predictor is 1 when the corresponding column of X is orthogonal to all other columns, and larger than 1 otherwise.
#cooccurrence_matrix_box;The co-occurrence plot displays the number of observations (rows in the sample data table) for each combination of predictor values. It can be useful in order to visualize whether the setup is balanced, or whether some combinations of predictors are not represented by any observations.
#correlation_matrix_box;This panel shows the correlation among the regression coefficients. In a regular linear model y=Xb+e, the variance-covariance matrix for the vector of regression coefficients b is proportional to $$(X^TX)^{-1}.$$This panel shows this matrix, converted to a correlation matrix.
#goodbye;Thank you for taking the tour of ExploreModelMatrix! For more detailed information about (generalized) linear models, design specifications and contrasts, especially in the life sciences field, consider the following references: <br><ul><li>RA Irizarry & MI Love (2015): <a href="https://leanpub.com/dataanalysisforthelifesciences">Data analysis for the life sciences with R.</a></li><li>The <a href="https://www.bioconductor.org/packages/release/bioc/vignettes/limma/inst/doc/usersguide.pdf">limma</a>, <a href="https://www.bioconductor.org/packages/release/bioc/vignettes/edgeR/inst/doc/edgeRUsersGuide.pdf">edgeR</a> and <a href="https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html">DESeq2</a> vignettes.</li><li>JM Chambers & T Hastie (1992): <a href="https://www.taylorfrancis.com/books/e/9780203738535">Statistical Models in S.</a></li></ul>

