Disclaimer

This document shows a few experiments on how to sample random time points in a given interval.

Experiments

Interval definition

Given a start-time and an end-time, a certain number of intervals between start- and end-times are specified. Within each interval a given number of random time-points are to be sampled.

start_time <- as.POSIXct(x = strftime("2018-04-02 08:00:00", format = "%Y-%m-%d %H:%M:%S"))
end_time <- as.POSIXct(x = strftime("2018-04-02 17:00:00", format = "%Y-%m-%d %H:%M:%S"))
n_nr_interval <- 9

Vector of interval boundaries

Intervals within which we want to sample random time points are sampled are given in a vector

(vec_bound <- seq(start_time, end_time, by = "hour"))
##  [1] "2018-04-02 08:00:00 CEST" "2018-04-02 09:00:00 CEST"
##  [3] "2018-04-02 10:00:00 CEST" "2018-04-02 11:00:00 CEST"
##  [5] "2018-04-02 12:00:00 CEST" "2018-04-02 13:00:00 CEST"
##  [7] "2018-04-02 14:00:00 CEST" "2018-04-02 15:00:00 CEST"
##  [9] "2018-04-02 16:00:00 CEST" "2018-04-02 17:00:00 CEST"
(vec_bound <- seq(start_time, end_time, length.out =  (n_nr_interval+1)))
##  [1] "2018-04-02 08:00:00 CEST" "2018-04-02 09:00:00 CEST"
##  [3] "2018-04-02 10:00:00 CEST" "2018-04-02 11:00:00 CEST"
##  [5] "2018-04-02 12:00:00 CEST" "2018-04-02 13:00:00 CEST"
##  [7] "2018-04-02 14:00:00 CEST" "2018-04-02 15:00:00 CEST"
##  [9] "2018-04-02 16:00:00 CEST" "2018-04-02 17:00:00 CEST"

Samples of random timepoints within given intervals

Inside of the intervals given by the boundaries in vec_bound, we sample a given number of random time points

n_nr_sample <- 2
set.seed(2018)
sample(seq(vec_bound[1], vec_bound[2], by = "sec"), n_nr_sample)
## [1] "2018-04-02 08:20:10 CEST" "2018-04-02 08:27:49 CEST"

Doing the same for all intervalls

sapply(1:(length(vec_bound)-1), function(x) sample(seq(vec_bound[x], vec_bound[x+1], by = "sec"), n_nr_sample))
##            [,1]       [,2]       [,3]       [,4]       [,5]       [,6]
## [1,] 1522649018 1522654108 1522658184 1522663052 1522664624 1522670336
## [2,] 1522649510 1522653483 1522656468 1522661568 1522665592 1522669241
##            [,7]       [,8]       [,9]
## [1,] 1522673302 1522674974 1522680257
## [2,] 1522672683 1522675990 1522680582

It seams that when using sapply, we cannot use the option simplify=TRUE which is used by default, otherwise the date-times are converted to seconds which is the internal storage format of POSIXct. Hence, we can either explicitly specify simplify=FALSE or we can use lapply instead of sapply leading to the same result, as shown below.

sapply(1:(length(vec_bound)-1), function(x) sample(seq(vec_bound[x], vec_bound[x+1], by = "sec"), n_nr_sample), simplify = FALSE)
## [[1]]
## [1] "2018-04-02 08:15:36 CEST" "2018-04-02 08:34:06 CEST"
## 
## [[2]]
## [1] "2018-04-02 09:09:05 CEST" "2018-04-02 09:05:41 CEST"
## 
## [[3]]
## [1] "2018-04-02 10:45:43 CEST" "2018-04-02 10:32:09 CEST"
## 
## [[4]]
## [1] "2018-04-02 11:19:40 CEST" "2018-04-02 11:55:27 CEST"
## 
## [[5]]
## [1] "2018-04-02 12:35:54 CEST" "2018-04-02 12:03:52 CEST"
## 
## [[6]]
## [1] "2018-04-02 13:08:29 CEST" "2018-04-02 13:51:45 CEST"
## 
## [[7]]
## [1] "2018-04-02 14:02:01 CEST" "2018-04-02 14:29:24 CEST"
## 
## [[8]]
## [1] "2018-04-02 15:30:22 CEST" "2018-04-02 15:45:43 CEST"
## 
## [[9]]
## [1] "2018-04-02 16:02:45 CEST" "2018-04-02 16:03:07 CEST"
lapply(1:(length(vec_bound)-1), function(x) sample(seq(vec_bound[x], vec_bound[x+1], by = "sec"), n_nr_sample))
## [[1]]
## [1] "2018-04-02 08:34:52 CEST" "2018-04-02 08:10:30 CEST"
## 
## [[2]]
## [1] "2018-04-02 09:53:49 CEST" "2018-04-02 09:46:22 CEST"
## 
## [[3]]
## [1] "2018-04-02 10:35:56 CEST" "2018-04-02 10:35:57 CEST"
## 
## [[4]]
## [1] "2018-04-02 11:36:07 CEST" "2018-04-02 11:00:10 CEST"
## 
## [[5]]
## [1] "2018-04-02 12:57:47 CEST" "2018-04-02 12:04:08 CEST"
## 
## [[6]]
## [1] "2018-04-02 13:06:40 CEST" "2018-04-02 13:46:27 CEST"
## 
## [[7]]
## [1] "2018-04-02 14:24:18 CEST" "2018-04-02 14:12:35 CEST"
## 
## [[8]]
## [1] "2018-04-02 15:08:41 CEST" "2018-04-02 15:45:48 CEST"
## 
## [[9]]
## [1] "2018-04-02 16:39:45 CEST" "2018-04-02 16:57:59 CEST"

Result

Now we are ready to put together all the results into a single function.

#' Sample random time points from a set of intervals
#'
#' The set of intervals is defined by a start-time ps_start 
#' and an end-time ps_end. Both are specified as strings which 
#' are converted into POSIXct objects inside of the function. 
#' The third function argument pn_nr_interval defines how many
#' intervals are to be created between start- and end-time. 
sample_time_points <- function(ps_start, ps_end, pn_nr_interval, pn_nr_sample){
  # convert strings ps_start and ps_end to POSIXct
  start_time <- as.POSIXct(x = strftime(ps_start, format = "%Y-%m-%d %H:%M:%S"))
  end_time <- as.POSIXct(x = strftime(ps_end, format = "%Y-%m-%d %H:%M:%S"))
  # vector of interval boundary times
  vec_bound <- seq(start_time, end_time, length.out = (pn_nr_interval + 1))
  # get a random sample of pn_nr_sample timepoints in each interval
  # frist, a function that returns a random sample for one interval is defined
  sample_one_int <- function(x, pvec_bound, pn_nr_sample){
    vec_sample_result <- sample(seq(pvec_bound[x], pvec_bound[x+1], by = "sec"), pn_nr_sample)
    return(vec_sample_result[order(vec_sample_result)])
  }
  # sampling random time points for all intervals is done with lapply
  lst_time_points <- lapply(1:(length(vec_bound)-1), sample_one_int, vec_bound, pn_nr_sample)
  return(lst_time_points)
}

The above function can be tested using the following call

sample_time_points(ps_start = "2018-04-03 08:00:00", ps_end = "2018-04-03 17:00:00", pn_nr_interval = 9, pn_nr_sample = 2)
## [[1]]
## [1] "2018-04-03 08:23:35 CEST" "2018-04-03 08:36:15 CEST"
## 
## [[2]]
## [1] "2018-04-03 09:15:49 CEST" "2018-04-03 09:29:22 CEST"
## 
## [[3]]
## [1] "2018-04-03 10:21:01 CEST" "2018-04-03 10:24:35 CEST"
## 
## [[4]]
## [1] "2018-04-03 11:06:51 CEST" "2018-04-03 11:52:51 CEST"
## 
## [[5]]
## [1] "2018-04-03 12:00:31 CEST" "2018-04-03 12:23:42 CEST"
## 
## [[6]]
## [1] "2018-04-03 13:38:44 CEST" "2018-04-03 13:42:05 CEST"
## 
## [[7]]
## [1] "2018-04-03 14:02:18 CEST" "2018-04-03 14:32:27 CEST"
## 
## [[8]]
## [1] "2018-04-03 15:13:41 CEST" "2018-04-03 15:18:44 CEST"
## 
## [[9]]
## [1] "2018-04-03 16:03:53 CEST" "2018-04-03 16:55:15 CEST"

Observations

Compared to the experiments above, we can see that now the random time points are ordered within an interval. The way how the function sample_time_points is written, the intervals have equal length and are evenly spread between start- and end-time. Hence when calling the function with more intervals, the single interval will be smaller, such as shown below

sample_time_points(ps_start = "2018-04-03 08:00:00", ps_end = "2018-04-03 17:00:00", pn_nr_interval = 10, pn_nr_sample = 2)
## [[1]]
## [1] "2018-04-03 08:07:37 CEST" "2018-04-03 08:26:12 CEST"
## 
## [[2]]
## [1] "2018-04-03 09:08:54 CEST" "2018-04-03 09:27:13 CEST"
## 
## [[3]]
## [1] "2018-04-03 09:53:04 CEST" "2018-04-03 10:10:35 CEST"
## 
## [[4]]
## [1] "2018-04-03 11:01:25 CEST" "2018-04-03 11:21:54 CEST"
## 
## [[5]]
## [1] "2018-04-03 11:41:13 CEST" "2018-04-03 11:57:31 CEST"
## 
## [[6]]
## [1] "2018-04-03 12:40:22 CEST" "2018-04-03 13:19:05 CEST"
## 
## [[7]]
## [1] "2018-04-03 13:29:14 CEST" "2018-04-03 14:10:08 CEST"
## 
## [[8]]
## [1] "2018-04-03 14:34:37 CEST" "2018-04-03 14:41:49 CEST"
## 
## [[9]]
## [1] "2018-04-03 15:20:21 CEST" "2018-04-03 15:57:40 CEST"
## 
## [[10]]
## [1] "2018-04-03 16:18:42 CEST" "2018-04-03 16:59:38 CEST"

Furthermore, we cannot specify any break where we do not want to sample any time points. This can most easily be done by either ignoring the sample timepoints or by calling the function several times for each sub-interval around the break. As an example, if we did not want to sample any timepoints between 12 pm and 1 pm, we can do that by the following two calls

# morning samples
sample_time_points(ps_start = "2018-04-03 08:00:00", ps_end = "2018-04-03 12:00:00", pn_nr_interval = 4, pn_nr_sample = 2)
## [[1]]
## [1] "2018-04-03 08:13:05 CEST" "2018-04-03 08:45:38 CEST"
## 
## [[2]]
## [1] "2018-04-03 09:16:05 CEST" "2018-04-03 09:19:05 CEST"
## 
## [[3]]
## [1] "2018-04-03 10:56:45 CEST" "2018-04-03 10:57:06 CEST"
## 
## [[4]]
## [1] "2018-04-03 11:16:42 CEST" "2018-04-03 11:21:23 CEST"

and similarly for the afternoon

# afternoon samples
sample_time_points(ps_start = "2018-04-03 13:00:00", ps_end = "2018-04-03 17:00:00", pn_nr_interval = 4, pn_nr_sample = 2)
## [[1]]
## [1] "2018-04-03 13:17:56 CEST" "2018-04-03 13:24:42 CEST"
## 
## [[2]]
## [1] "2018-04-03 14:19:18 CEST" "2018-04-03 14:43:28 CEST"
## 
## [[3]]
## [1] "2018-04-03 15:06:06 CEST" "2018-04-03 15:13:49 CEST"
## 
## [[4]]
## [1] "2018-04-03 16:25:13 CEST" "2018-04-03 16:36:10 CEST"

Session Info

sessionInfo()
## R version 3.4.3 (2017-11-30)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Sierra 10.12.6
## 
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## loaded via a namespace (and not attached):
##  [1] compiler_3.4.3  backports_1.1.2 magrittr_1.5    rprojroot_1.3-2
##  [5] tools_3.4.3     htmltools_0.3.6 yaml_2.1.16     Rcpp_0.12.15   
##  [9] stringi_1.1.5   rmarkdown_1.8   knitr_1.19      stringr_1.2.0  
## [13] digest_0.6.15   evaluate_0.10.1

Latest Changes: 2018-04-03 07:45:10 (peter)