Given the following problem. Suppose we have a large symmetric matrix of a certain dimension \(n\) of which the lower triangular part is stored in a vector. We want to find all row- and column-indices of elements of the original matrix that fullfill a certain property.
The genetic relationship matrix (GRM) between \(n\) individuals is a symmetric matrix of dimension \(n\times n\). Software programs that are used to compute the GRM such as GCTA store the lower triangular part of the matrix in a vector. In case we want to know pairs of individuals that have genetic relationship coefficients above a certain threshold, we need to know the row- and column-indices of the original matrix where the coefficients above a certain threshold occur in the original GRM.
Let us try to explain the problem based on a small example. Suppose the GRM between a small number of individuals has the following structure. For the purpose of this blog-post, we create a matrix based on random uniform numbers. Since the diagonals of a GRM are all larger than 1, we add 1 to the diagonal.
set.seed(4321)
nNrInd <- 6
mGrmP <- matrix(runif(n = nNrInd * nNrInd), ncol = nNrInd)
mGrm <- crossprod(mGrmP)/nNrInd
diag(mGrm) <- 1+diag(mGrm)
print(mGrm)
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 1.3759968 0.3467317 0.2801808 0.2441776 0.1192618 0.2616150
## [2,] 0.3467317 1.4614261 0.3586357 0.2385468 0.1402472 0.3813145
## [3,] 0.2801808 0.3586357 1.3017744 0.2064611 0.1386737 0.2711936
## [4,] 0.2441776 0.2385468 0.2064611 1.2571203 0.1146292 0.1705466
## [5,] 0.1192618 0.1402472 0.1386737 0.1146292 1.1433730 0.1460782
## [6,] 0.2616150 0.3813145 0.2711936 0.1705466 0.1460782 1.4259352
The lower triangular part of the above matrix can be stored in a vector as follows.
vGrmLowTri <- vector(mode = "numeric", length = sum(1:nNrInd))
nVecIdx <- 1
for (nRowIdx in 1:nrow(mGrm)){
for (nColIdx in 1:nRowIdx){
vGrmLowTri[nVecIdx] <- mGrm[nRowIdx, nColIdx]
nVecIdx <- nVecIdx + 1
}
}
print(vGrmLowTri)
## [1] 1.3759968 0.3467317 1.4614261 0.2801808 0.3586357 1.3017744 0.2441776
## [8] 0.2385468 0.2064611 1.2571203 0.1192618 0.1402472 0.1386737 0.1146292
## [15] 1.1433730 0.2616150 0.3813145 0.2711936 0.1705466 0.1460782 1.4259352
From the above, we can see that the order of the elements is to be taken row-wise, that means, in the outer loop we step through each row of the matrix and inside of each row, the inner loop runs from the first element up until the diagonal element.
R has internal functions to extract lower and upper triangular elements of a matrix. Those functions are lower.tri()
and upper.tri()
as arguments they take the matrix and a boolean flag which indicates whether the diagonal elements should be included or not.
Since we are interested in the lower triangular of our matrix, it would seam obvious to use the function lower.tri()
.
lower.tri(mGrm, diag = TRUE)
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] TRUE FALSE FALSE FALSE FALSE FALSE
## [2,] TRUE TRUE FALSE FALSE FALSE FALSE
## [3,] TRUE TRUE TRUE FALSE FALSE FALSE
## [4,] TRUE TRUE TRUE TRUE FALSE FALSE
## [5,] TRUE TRUE TRUE TRUE TRUE FALSE
## [6,] TRUE TRUE TRUE TRUE TRUE TRUE
The result of applying the function lower.tri()
to our matrix is a boolean matrix of the same dimension as the input matrix where all elements of the lower triangular including the diagnal elements are TRUE and all other elements are FALSE.
Using the boolean matrix that comes out of the function lower.tri()
as index selector for our original GRM matrix, flattens the lower triangular elements into a vector but the order is taken column-wise and not row-wise. As a consequence the resulting flattened vector is not the same as the vector that we obtained from our two nested loops above. Therefore, it seams that our idea of using R’s internal function lower.tri()
for flattening the GRM matrix into a vector does not work.
mGrm[lower.tri(mGrm, diag = TRUE)]
## [1] 1.3759968 0.3467317 0.2801808 0.2441776 0.1192618 0.2616150 1.4614261
## [8] 0.3586357 0.2385468 0.1402472 0.3813145 1.3017744 0.2064611 0.1386737
## [15] 0.2711936 1.2571203 0.1146292 0.1705466 1.1433730 0.1460782 1.4259352
Our original numeric matrix is symmetric and hence the elements above the diagonal are the same as below the diagonal. Further more the order of the elements is the same when running through the matrix row-wise below the diagonal as when running through the matrix column-wise above the diagonal. Hence, instead of using lower.tri()
, we have to use upper.tri()
. As shown below, we get the same flattened vector as from the two iterated loops.
vFlatMatElem <- mGrm[upper.tri(mGrm, diag = TRUE)]
print(vFlatMatElem)
## [1] 1.3759968 0.3467317 1.4614261 0.2801808 0.3586357 1.3017744 0.2441776
## [8] 0.2385468 0.2064611 1.2571203 0.1192618 0.1402472 0.1386737 0.1146292
## [15] 1.1433730 0.2616150 0.3813145 0.2711936 0.1705466 0.1460782 1.4259352
identical(vFlatMatElem, vGrmLowTri)
## [1] TRUE
Let us assume, we are given the flattened vector of the lower triangular part of our GRM matrix. Furthermore, we want to select some special components in the vector and want to trace back where they occured in the original matrix. That means we want to know the row and the column indices of the special component in the original matrix.
One group of special elements are the diagonal elements of the original matrix. From our process of flattening the lower triangular part of our original matrix into a vector, we can observe that on a given row i, there are exactly i elements from column 1 until the diagonal element of row i. Based on that observation, the number of elements that are stored in the flattened vector up to the diagonal element of row i corresponds to the sum of all natural numbers from 1 up to and including i.
\[ \sum_{j=i}^i j\]
Computing that number for all rows in the matrix we can store all indices of the flattened vector corresponding to diagonal elements as follows.
vFlatDiagIdx <- cumsum(1:nrow(mGrm))
print(vFlatDiagIdx)
## [1] 1 3 6 10 15 21
From that index vector, all diagonal elements can be extracted by
vFlatMatElem[vFlatDiagIdx]
## [1] 1.375997 1.461426 1.301774 1.257120 1.143373 1.425935
Now it is easy to also get all the off-diagonal elements by
vFlatMatElem[-c(vFlatDiagIdx)]
## [1] 0.3467317 0.2801808 0.3586357 0.2441776 0.2385468 0.2064611 0.1192618
## [8] 0.1402472 0.1386737 0.1146292 0.2616150 0.3813145 0.2711936 0.1705466
## [15] 0.1460782
Let us assume, we are interested in element 12 of the flattened vector.
nElemIdx <- 12
vFlatMatElem[nElemIdx]
## [1] 0.1402472
Our goal is to find out on which row and which column this element occurs in the original matrix. Our index vector of the diagonal elements already tells us that element 10 is the last element of the row 4 and element 15 is the last element of the row 5. Element 12 must then be somewhere on row 5. We now want to compute where exactly that element occurs in the matrix. First we start with the row index. The row index corresponds to the index of our vector of diagonal element indices, where for the first time the element in the index vector is larger than the index of our special element. The column index is computed as the difference between the index of our special element (12) and the index of the diagonal element (10) of the previous row (4)
nElemIdx <- 12
nElementRow <- which(vFlatDiagIdx > nElemIdx)[1]
nElementCol <- nElemIdx-vFlatDiagIdx[nElementRow-1]
cat(" * Row: ", nElementRow, "\n * Col: ", nElementCol, "\n")
## * Row: 5
## * Col: 2
Checking our result
identical(vFlatMatElem[12], mGrm[nElementRow,nElementCol])
## [1] TRUE
In the above sections we gave an outline of an algorithm that allows us to trace back elements from a flattened vector back to the original matrix. We want to wrap that up into a function, that takes as input, the flattened vector and the index of a special element. The function computes the row and the column indices and returns the respective element of the matrix.
#' Find a single element of a symmetric matrix based on a flattened vector representation
#'
#' Given a flattened vector representation of a symmatric matrix \code{findSingleMatElement}
#' computes the row and column indices in the original matrix of a given element. The element
#' is given by the index of the flattened vector. The flattened vector is a vector that contains
#' all elements of the lower triangular part of the matrix including all diagonal elements. The
#' order in which the elements are written to the vector is going through each row and copying
#' each element from column one up to the diagonal element of that row into the flattened
#' vector.
#'
#' @param pnElementIdx Vector index of element to be searched for
#' @param pvFlatVec Flattened vector represenation of symmetric matrix
#' @return lResultList Result list with components nRowIndex, nColIndex and nCoefficient
findSingleMatElement <- function(pnElementIdx, pvFlatVec){
# pnElementIdx must a be positive integer
if (pnElementIdx < 0)
stop(paste("Length of flattened vector must be positive, found: ", pnElementIdx))
if (as.integer(pnElementIdx) - pnElementIdx != 0)
stop(paste("Length of flattened vector must be an integer, found: ", pnElementIdx))
### # pnElementIdx cannot be larger than the length of pvFlatVec
if (pnElementIdx > length(pvFlatVec))
stop("Elementindex cannot be larger than length of index vector")
### # compute the dimensionality of the original matrx
nMatDim <- nGetMatDim(pnLenFlatVec = length(pvFlatVec))
### # compute vector of indices of diagonal elements
vDiagIdx <- cumsum(1:nMatDim)
### # check whether pnElementIdx is a diagonal element, if yes, then special case
if (any(vDiagIdx == pnElementIdx)) {
nDiagIdx <- which(vDiagIdx == pnElementIdx)
lResultList <- list(nRowIndex=nDiagIdx, nColIndex=nDiagIdx, nCoefficient = pvFlatVec[pnElementIdx])
} else {
### # off-diagonal
nElementRow <- which(vDiagIdx > pnElementIdx)[1]
nElementCol <- pnElementIdx - vDiagIdx[nElementRow-1]
lResultList <- list(nRowIndex=nElementRow, nColIndex=nElementCol, nCoefficient = pvFlatVec[pnElementIdx])
}
return(lResultList)
}
The function findSingleMatElement()
searches for a single element in the original matrix. The search is done only based on the element index and based on the flattened vector. In order to find our special element, we need a helper function to determine the dimension of the original matrix based on the length of the flattened vector. This can be done using the fact that the flattened vector contains all elements of the lower triangular part of the original matrix including the diagonal elements. Assuming that the original matrix has dimension \(n\times n\), the total number of elements of the original matrix is \(n^2\). The total number of elements below the diagonal without the diagonal elements is \((n^2-n)/2\). Adding the diagonal elements back in leads to
\[(n^2-n)/2 + n = n*(n+1)/2\]
This expression corresponds to the length (\(l\)) of our flattened vector. Given the length of the flattened vector, we can compute the number of rows and columns (n) of our original matrix, based on the solution formula for quadratic equations.
\[n = (-1 + \sqrt{1 + 8*l})/2 \]
#' Compute matrix dimension based on the length of a flattened vector
#'
#' The flattened vector contains all elements of the lower triangular
#' part of a symmetric matrix including all diagonal elements. Because
#' any row i of a quadratic matrix contains i elements below the
#' diagonal, including the diagonal element, the flattened vector of
#' an n * n symmetric matrix contains the sum of all natural numbers
#' up to and including n. That length has a closed form and corresponds
#' to n*(n+1)/2. Given the length of the flattened vector, we have
#' a quadratic equation for n, that can be solved. We are only interested
#' in the positive solution which is n = (-1 + \sqrt(1+8l))/2
#'
#' @param pnLenFlatVec length of flattened vector
#' @return nMatDimResult dimension of symmetric matrix
nGetMatDim <- function(pnLenFlatVec) {
# pnLenFlatVec must a be positive integer
if (pnLenFlatVec < 0)
stop(paste("Length of flattened vector must be positive, found: ", pnLenFlatVec))
if (as.integer(pnLenFlatVec) - pnLenFlatVec != 0)
stop(paste("Length of flattened vector must be an integer, found: ", pnLenFlatVec))
# compute the matrix dimension
nMatDimResult <- (sqrt(1 + 8*pnLenFlatVec) - 1) / 2
# result must be integer
if ((as.integer(nMatDimResult) - nMatDimResult) != 0 )
stop(paste("Resulting matrix dimension not an integer:", nMatDimResult ))
return(nMatDimResult)
}
Searching element 12 is done with
lElemFound <- findSingleMatElement(pnElementIdx = 12, pvFlatVec = vFlatMatElem)
print(lElemFound)
## $nRowIndex
## [1] 5
##
## $nColIndex
## [1] 2
##
## $nCoefficient
## [1] 0.1402472
identical(lElemFound$nCoefficient, mGrm[lElemFound$nRowIndex, lElemFound$nColIndex])
## [1] TRUE
As an example, we want to find all elements that are smaller than 0.12. Elements that meet that criterion are found with
nElementThreshold <- 0.12
vElemGroup <- which(vFlatMatElem < nElementThreshold)
That element vector can be used together with our function findSingleMatElement()
as arguments to lapply()
.
lapply(vElemGroup, findSingleMatElement, vFlatMatElem)
## [[1]]
## [[1]]$nRowIndex
## [1] 5
##
## [[1]]$nColIndex
## [1] 1
##
## [[1]]$nCoefficient
## [1] 0.1192618
##
##
## [[2]]
## [[2]]$nRowIndex
## [1] 5
##
## [[2]]$nColIndex
## [1] 4
##
## [[2]]$nCoefficient
## [1] 0.1146292
The complete test searches for all elements in the flattend vector. This can be done again using lapply()
.
lCompTestResults <- lapply(1:length(vFlatMatElem), findSingleMatElement, vFlatMatElem)
The results from selecting all elements is compared to the elements in the matrix.
nRowIndex <- NULL
nColIndex <- NULL
nCoefficient <- NULL
nMatElement <- NULL
bCompareResult <- NULL
for (lCurResult in lCompTestResults){
### # collect items
if (is.null(nRowIndex)){
nRowIndex <- lCurResult$nRowIndex
nColIndex <- lCurResult$nColIndex
nCoefficient <- lCurResult$nCoefficient
nMatElement <- mGrm[lCurResult$nRowIndex,lCurResult$nColIndex]
bCompareResult <- identical(lCurResult$nCoefficient,mGrm[lCurResult$nRowIndex,lCurResult$nColIndex])
} else {
nRowIndex <- c(nRowIndex, lCurResult$nRowIndex)
nColIndex <- c(nColIndex, lCurResult$nColIndex)
nCoefficient <- c(nCoefficient, lCurResult$nCoefficient)
nMatElement <- c(nMatElement, mGrm[lCurResult$nRowIndex,lCurResult$nColIndex])
bCompareResult <- c(bCompareResult, identical(lCurResult$nCoefficient,mGrm[lCurResult$nRowIndex,lCurResult$nColIndex]))
}
}
dfResults <- data.frame(nRowIndex,
nColIndex,
nCoefficient,
nMatElement,
bCompareResult,
stringsAsFactors = FALSE)
Showing the above collected results as a table looks as follows
knitr::kable(dfResults)
nRowIndex | nColIndex | nCoefficient | nMatElement | bCompareResult |
---|---|---|---|---|
1 | 1 | 1.3759968 | 1.3759968 | TRUE |
2 | 1 | 0.3467317 | 0.3467317 | TRUE |
2 | 2 | 1.4614261 | 1.4614261 | TRUE |
3 | 1 | 0.2801808 | 0.2801808 | TRUE |
3 | 2 | 0.3586357 | 0.3586357 | TRUE |
3 | 3 | 1.3017744 | 1.3017744 | TRUE |
4 | 1 | 0.2441776 | 0.2441776 | TRUE |
4 | 2 | 0.2385468 | 0.2385468 | TRUE |
4 | 3 | 0.2064611 | 0.2064611 | TRUE |
4 | 4 | 1.2571203 | 1.2571203 | TRUE |
5 | 1 | 0.1192618 | 0.1192618 | TRUE |
5 | 2 | 0.1402472 | 0.1402472 | TRUE |
5 | 3 | 0.1386737 | 0.1386737 | TRUE |
5 | 4 | 0.1146292 | 0.1146292 | TRUE |
5 | 5 | 1.1433730 | 1.1433730 | TRUE |
6 | 1 | 0.2616150 | 0.2616150 | TRUE |
6 | 2 | 0.3813145 | 0.3813145 | TRUE |
6 | 3 | 0.2711936 | 0.2711936 | TRUE |
6 | 4 | 0.1705466 | 0.1705466 | TRUE |
6 | 5 | 0.1460782 | 0.1460782 | TRUE |
6 | 6 | 1.4259352 | 1.4259352 | TRUE |
Checking whether there are any differences between the elements from the flattened vector and the matrix elements leads to
any(bCompareResult == FALSE)
## [1] FALSE
sessionInfo()
## R version 3.2.3 (2015-12-10)
## Platform: x86_64-apple-darwin13.4.0 (64-bit)
## Running under: OS X 10.11.3 (El Capitan)
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## loaded via a namespace (and not attached):
## [1] magrittr_1.5 formatR_1.2.1 htmltools_0.3 tools_3.2.3
## [5] yaml_2.1.13 memoise_1.0.0 stringi_1.0-1 rmarkdown_0.9.5
## [9] highr_0.5.1 knitr_1.12.3 stringr_1.0.0 digest_0.6.9
## [13] devtools_1.10.0 evaluate_0.8