---
title: "Theory"
output:
prettydoc::html_pretty:
theme: tactile
highlight: vignette
vignette: >
%\VignetteIndexEntry{Theory}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
This vignette presents a general overview of the _clugen_ algorithm. A complete
description of the algorithm's theoretical framework is available in the article
"[Generating multidimensional clusters with support
lines](https://doi.org/10.1016/j.knosys.2023.110836)" (an open version is
[available on arXiv](https://arxiv.org/abs/2301.10327)).
_Clugen_ is an algorithm for generating multidimensional clusters. Each cluster
is supported by a line segment, the position, orientation and length of which
guide where the respective points are placed. For brevity, *line segments* will
be referred to as *lines*.
Given an $n$-dimensional direction vector $\mathbf{d}$ (and a number of
additional parameters, which will be discussed shortly), the _clugen_ algorithm
works as follows ($^*$ means the algorithm step is stochastic):
1. Normalize $\mathbf{d}$.
2. $^*$Determine cluster sizes.
3. $^*$Determine cluster centers.
4. $^*$Determine lengths of cluster-supporting lines.
5. $^*$Determine angles between $\mathbf{d}$ and cluster-supporting lines.
6. For each cluster:
1. $^*$Determine direction of the cluster-supporting line.
2. $^*$Determine distance of point projections from the center of the
cluster-supporting line.
3. Determine coordinates of point projections on the cluster-supporting line.
4. $^*$Determine points from their projections on the cluster-supporting
line.
Figure 1 provides a stylized overview of the algorithm's steps.
```{asis, echo = crul::ok("https://raw.githubusercontent.com/clugen/.github/main/images/algorithm.png")}
![**Figure 1** - Stylized overview of the *clugen* algorithm. Background tiles
are 10 units wide and tall, when
applicable.](https://raw.githubusercontent.com/clugen/.github/main/images/algorithm.png)
```
The example in Figure 1 was generated with the following parameters:
| Parameter values | Description |
|:----------------- | :------------------------ |
| $n=2$ | Number of dimensions. |
| $c=4$ | Number of clusters. |
| $p=200$ | Total number of points. |
| $\mathbf{d}=\begin{bmatrix}1 & 1\end{bmatrix}^T$ | Average direction. |
| $\theta_\sigma=\pi/16\approx{}11.25^{\circ}$ | Angle dispersion. |
| $\mathbf{s}=\begin{bmatrix}10 & 10\end{bmatrix}^T$ | Average cluster separation. |
| $l=10$ | Average line length. |
| $l_\sigma=1.5$ | Line length dispersion. |
| $f_\sigma=1$ | Cluster lateral dispersion. |
Additionally, all optional parameters (not listed above) were left to their
default values. The complete list of parameters is presented in the `clugen()`
function documentation.