--- title: "Theory" output: prettydoc::html_pretty: theme: tactile highlight: vignette vignette: > %\VignetteIndexEntry{Theory} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` This vignette presents a general overview of the _clugen_ algorithm. A complete description of the algorithm's theoretical framework is available in the article "[Generating multidimensional clusters with support lines](https://doi.org/10.1016/j.knosys.2023.110836)" (an open version is [available on arXiv](https://arxiv.org/abs/2301.10327)). _Clugen_ is an algorithm for generating multidimensional clusters. Each cluster is supported by a line segment, the position, orientation and length of which guide where the respective points are placed. For brevity, *line segments* will be referred to as *lines*. Given an $n$-dimensional direction vector $\mathbf{d}$ (and a number of additional parameters, which will be discussed shortly), the _clugen_ algorithm works as follows ($^*$ means the algorithm step is stochastic): 1. Normalize $\mathbf{d}$. 2. $^*$Determine cluster sizes. 3. $^*$Determine cluster centers. 4. $^*$Determine lengths of cluster-supporting lines. 5. $^*$Determine angles between $\mathbf{d}$ and cluster-supporting lines. 6. For each cluster: 1. $^*$Determine direction of the cluster-supporting line. 2. $^*$Determine distance of point projections from the center of the cluster-supporting line. 3. Determine coordinates of point projections on the cluster-supporting line. 4. $^*$Determine points from their projections on the cluster-supporting line. Figure 1 provides a stylized overview of the algorithm's steps. ```{asis, echo = crul::ok("https://raw.githubusercontent.com/clugen/.github/main/images/algorithm.png")} ![**Figure 1** - Stylized overview of the *clugen* algorithm. Background tiles are 10 units wide and tall, when applicable.](https://raw.githubusercontent.com/clugen/.github/main/images/algorithm.png) ``` The example in Figure 1 was generated with the following parameters: | Parameter values | Description | |:----------------- | :------------------------ | | $n=2$ | Number of dimensions. | | $c=4$ | Number of clusters. | | $p=200$ | Total number of points. | | $\mathbf{d}=\begin{bmatrix}1 & 1\end{bmatrix}^T$ | Average direction. | | $\theta_\sigma=\pi/16\approx{}11.25^{\circ}$ | Angle dispersion. | | $\mathbf{s}=\begin{bmatrix}10 & 10\end{bmatrix}^T$ | Average cluster separation. | | $l=10$ | Average line length. | | $l_\sigma=1.5$ | Line length dispersion. | | $f_\sigma=1$ | Cluster lateral dispersion. | Additionally, all optional parameters (not listed above) were left to their default values. The complete list of parameters is presented in the `clugen()` function documentation.