This document shows a few experiments on how to sample random time points in a given interval.
Given a start-time and an end-time, a certain number of intervals between start- and end-times are specified. Within each interval a given number of random time-points are to be sampled.
start_time <- as.POSIXct(x = strftime("2018-04-02 08:00:00", format = "%Y-%m-%d %H:%M:%S"))
end_time <- as.POSIXct(x = strftime("2018-04-02 17:00:00", format = "%Y-%m-%d %H:%M:%S"))
n_nr_interval <- 9
Intervals within which we want to sample random time points are sampled are given in a vector
(vec_bound <- seq(start_time, end_time, by = "hour"))
## [1] "2018-04-02 08:00:00 CEST" "2018-04-02 09:00:00 CEST"
## [3] "2018-04-02 10:00:00 CEST" "2018-04-02 11:00:00 CEST"
## [5] "2018-04-02 12:00:00 CEST" "2018-04-02 13:00:00 CEST"
## [7] "2018-04-02 14:00:00 CEST" "2018-04-02 15:00:00 CEST"
## [9] "2018-04-02 16:00:00 CEST" "2018-04-02 17:00:00 CEST"
(vec_bound <- seq(start_time, end_time, length.out = (n_nr_interval+1)))
## [1] "2018-04-02 08:00:00 CEST" "2018-04-02 09:00:00 CEST"
## [3] "2018-04-02 10:00:00 CEST" "2018-04-02 11:00:00 CEST"
## [5] "2018-04-02 12:00:00 CEST" "2018-04-02 13:00:00 CEST"
## [7] "2018-04-02 14:00:00 CEST" "2018-04-02 15:00:00 CEST"
## [9] "2018-04-02 16:00:00 CEST" "2018-04-02 17:00:00 CEST"
Inside of the intervals given by the boundaries in vec_bound
, we sample a given number of random time points
n_nr_sample <- 2
set.seed(2018)
sample(seq(vec_bound[1], vec_bound[2], by = "sec"), n_nr_sample)
## [1] "2018-04-02 08:20:10 CEST" "2018-04-02 08:27:49 CEST"
Doing the same for all intervalls
sapply(1:(length(vec_bound)-1), function(x) sample(seq(vec_bound[x], vec_bound[x+1], by = "sec"), n_nr_sample))
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 1522649018 1522654108 1522658184 1522663052 1522664624 1522670336
## [2,] 1522649510 1522653483 1522656468 1522661568 1522665592 1522669241
## [,7] [,8] [,9]
## [1,] 1522673302 1522674974 1522680257
## [2,] 1522672683 1522675990 1522680582
It seams that when using sapply
, we cannot use the option simplify=TRUE
which is used by default, otherwise the date-times are converted to seconds which is the internal storage format of POSIXct
. Hence, we can either explicitly specify simplify=FALSE
or we can use lapply
instead of sapply
leading to the same result, as shown below.
sapply(1:(length(vec_bound)-1), function(x) sample(seq(vec_bound[x], vec_bound[x+1], by = "sec"), n_nr_sample), simplify = FALSE)
## [[1]]
## [1] "2018-04-02 08:15:36 CEST" "2018-04-02 08:34:06 CEST"
##
## [[2]]
## [1] "2018-04-02 09:09:05 CEST" "2018-04-02 09:05:41 CEST"
##
## [[3]]
## [1] "2018-04-02 10:45:43 CEST" "2018-04-02 10:32:09 CEST"
##
## [[4]]
## [1] "2018-04-02 11:19:40 CEST" "2018-04-02 11:55:27 CEST"
##
## [[5]]
## [1] "2018-04-02 12:35:54 CEST" "2018-04-02 12:03:52 CEST"
##
## [[6]]
## [1] "2018-04-02 13:08:29 CEST" "2018-04-02 13:51:45 CEST"
##
## [[7]]
## [1] "2018-04-02 14:02:01 CEST" "2018-04-02 14:29:24 CEST"
##
## [[8]]
## [1] "2018-04-02 15:30:22 CEST" "2018-04-02 15:45:43 CEST"
##
## [[9]]
## [1] "2018-04-02 16:02:45 CEST" "2018-04-02 16:03:07 CEST"
lapply(1:(length(vec_bound)-1), function(x) sample(seq(vec_bound[x], vec_bound[x+1], by = "sec"), n_nr_sample))
## [[1]]
## [1] "2018-04-02 08:34:52 CEST" "2018-04-02 08:10:30 CEST"
##
## [[2]]
## [1] "2018-04-02 09:53:49 CEST" "2018-04-02 09:46:22 CEST"
##
## [[3]]
## [1] "2018-04-02 10:35:56 CEST" "2018-04-02 10:35:57 CEST"
##
## [[4]]
## [1] "2018-04-02 11:36:07 CEST" "2018-04-02 11:00:10 CEST"
##
## [[5]]
## [1] "2018-04-02 12:57:47 CEST" "2018-04-02 12:04:08 CEST"
##
## [[6]]
## [1] "2018-04-02 13:06:40 CEST" "2018-04-02 13:46:27 CEST"
##
## [[7]]
## [1] "2018-04-02 14:24:18 CEST" "2018-04-02 14:12:35 CEST"
##
## [[8]]
## [1] "2018-04-02 15:08:41 CEST" "2018-04-02 15:45:48 CEST"
##
## [[9]]
## [1] "2018-04-02 16:39:45 CEST" "2018-04-02 16:57:59 CEST"
Now we are ready to put together all the results into a single function.
#' Sample random time points from a set of intervals
#'
#' The set of intervals is defined by a start-time ps_start
#' and an end-time ps_end. Both are specified as strings which
#' are converted into POSIXct objects inside of the function.
#' The third function argument pn_nr_interval defines how many
#' intervals are to be created between start- and end-time.
sample_time_points <- function(ps_start, ps_end, pn_nr_interval, pn_nr_sample){
# convert strings ps_start and ps_end to POSIXct
start_time <- as.POSIXct(x = strftime(ps_start, format = "%Y-%m-%d %H:%M:%S"))
end_time <- as.POSIXct(x = strftime(ps_end, format = "%Y-%m-%d %H:%M:%S"))
# vector of interval boundary times
vec_bound <- seq(start_time, end_time, length.out = (pn_nr_interval + 1))
# get a random sample of pn_nr_sample timepoints in each interval
# frist, a function that returns a random sample for one interval is defined
sample_one_int <- function(x, pvec_bound, pn_nr_sample){
vec_sample_result <- sample(seq(pvec_bound[x], pvec_bound[x+1], by = "sec"), pn_nr_sample)
return(vec_sample_result[order(vec_sample_result)])
}
# sampling random time points for all intervals is done with lapply
lst_time_points <- lapply(1:(length(vec_bound)-1), sample_one_int, vec_bound, pn_nr_sample)
return(lst_time_points)
}
The above function can be tested using the following call
sample_time_points(ps_start = "2018-04-03 08:00:00", ps_end = "2018-04-03 17:00:00", pn_nr_interval = 9, pn_nr_sample = 2)
## [[1]]
## [1] "2018-04-03 08:23:35 CEST" "2018-04-03 08:36:15 CEST"
##
## [[2]]
## [1] "2018-04-03 09:15:49 CEST" "2018-04-03 09:29:22 CEST"
##
## [[3]]
## [1] "2018-04-03 10:21:01 CEST" "2018-04-03 10:24:35 CEST"
##
## [[4]]
## [1] "2018-04-03 11:06:51 CEST" "2018-04-03 11:52:51 CEST"
##
## [[5]]
## [1] "2018-04-03 12:00:31 CEST" "2018-04-03 12:23:42 CEST"
##
## [[6]]
## [1] "2018-04-03 13:38:44 CEST" "2018-04-03 13:42:05 CEST"
##
## [[7]]
## [1] "2018-04-03 14:02:18 CEST" "2018-04-03 14:32:27 CEST"
##
## [[8]]
## [1] "2018-04-03 15:13:41 CEST" "2018-04-03 15:18:44 CEST"
##
## [[9]]
## [1] "2018-04-03 16:03:53 CEST" "2018-04-03 16:55:15 CEST"
Compared to the experiments above, we can see that now the random time points are ordered within an interval. The way how the function sample_time_points
is written, the intervals have equal length and are evenly spread between start- and end-time. Hence when calling the function with more intervals, the single interval will be smaller, such as shown below
sample_time_points(ps_start = "2018-04-03 08:00:00", ps_end = "2018-04-03 17:00:00", pn_nr_interval = 10, pn_nr_sample = 2)
## [[1]]
## [1] "2018-04-03 08:07:37 CEST" "2018-04-03 08:26:12 CEST"
##
## [[2]]
## [1] "2018-04-03 09:08:54 CEST" "2018-04-03 09:27:13 CEST"
##
## [[3]]
## [1] "2018-04-03 09:53:04 CEST" "2018-04-03 10:10:35 CEST"
##
## [[4]]
## [1] "2018-04-03 11:01:25 CEST" "2018-04-03 11:21:54 CEST"
##
## [[5]]
## [1] "2018-04-03 11:41:13 CEST" "2018-04-03 11:57:31 CEST"
##
## [[6]]
## [1] "2018-04-03 12:40:22 CEST" "2018-04-03 13:19:05 CEST"
##
## [[7]]
## [1] "2018-04-03 13:29:14 CEST" "2018-04-03 14:10:08 CEST"
##
## [[8]]
## [1] "2018-04-03 14:34:37 CEST" "2018-04-03 14:41:49 CEST"
##
## [[9]]
## [1] "2018-04-03 15:20:21 CEST" "2018-04-03 15:57:40 CEST"
##
## [[10]]
## [1] "2018-04-03 16:18:42 CEST" "2018-04-03 16:59:38 CEST"
Furthermore, we cannot specify any break where we do not want to sample any time points. This can most easily be done by either ignoring the sample timepoints or by calling the function several times for each sub-interval around the break. As an example, if we did not want to sample any timepoints between 12 pm and 1 pm, we can do that by the following two calls
# morning samples
sample_time_points(ps_start = "2018-04-03 08:00:00", ps_end = "2018-04-03 12:00:00", pn_nr_interval = 4, pn_nr_sample = 2)
## [[1]]
## [1] "2018-04-03 08:13:05 CEST" "2018-04-03 08:45:38 CEST"
##
## [[2]]
## [1] "2018-04-03 09:16:05 CEST" "2018-04-03 09:19:05 CEST"
##
## [[3]]
## [1] "2018-04-03 10:56:45 CEST" "2018-04-03 10:57:06 CEST"
##
## [[4]]
## [1] "2018-04-03 11:16:42 CEST" "2018-04-03 11:21:23 CEST"
and similarly for the afternoon
# afternoon samples
sample_time_points(ps_start = "2018-04-03 13:00:00", ps_end = "2018-04-03 17:00:00", pn_nr_interval = 4, pn_nr_sample = 2)
## [[1]]
## [1] "2018-04-03 13:17:56 CEST" "2018-04-03 13:24:42 CEST"
##
## [[2]]
## [1] "2018-04-03 14:19:18 CEST" "2018-04-03 14:43:28 CEST"
##
## [[3]]
## [1] "2018-04-03 15:06:06 CEST" "2018-04-03 15:13:49 CEST"
##
## [[4]]
## [1] "2018-04-03 16:25:13 CEST" "2018-04-03 16:36:10 CEST"
sessionInfo()
## R version 3.4.3 (2017-11-30)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Sierra 10.12.6
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## loaded via a namespace (and not attached):
## [1] compiler_3.4.3 backports_1.1.2 magrittr_1.5 rprojroot_1.3-2
## [5] tools_3.4.3 htmltools_0.3.6 yaml_2.1.16 Rcpp_0.12.15
## [9] stringi_1.1.5 rmarkdown_1.8 knitr_1.19 stringr_1.2.0
## [13] digest_0.6.15 evaluate_0.10.1
Latest Changes: 2018-04-03 07:45:10 (peter)