## Introduction

The purpose of this tutorial is to demonstrate the basic usage of the
`icosa`

package. The primary targeted application of the
package is in global biological sciences (e.g. in macroecological,
biogeographical analyses), but other fields might find the structures
and procedures relevant, given that they operate with point vector data
and variables that depend on the density of such data.

Note that this is just a brief introduction to the package’s capabilities. The complete documentation of the package and the rest of the tutorials will be made available on the package website. Relevant how-to guides will be posted on the evolv-ED blog (https://www.evolv-ed.net/).

## Rationale: Point Data

One of the many problems with ecological samples is that due to density and uniformity issues, the data points are to be aggregated to distinct units. As coordinate recording is very efficient on the 2d surface of a polar coordinate system (i.e. latiude and longitude data), this was primarly achieved by rectangular gridding of the surface (for instance 1° times 1° grid cells). Unfortunately, this method suffers from systematic biasing effects: as the poles are approached, the cells become smaller in area, and come closer to each other.

#### Random points on the sphere

We can illustrate this problem by generating random points and assign
them to different spatial bins. The `icosa`

package has a
utility function to generate such points:

The function `rpsphere()`

generates randomly distributed
points on the surface of a sphere.

```
# set seed for exact reproducibility
set.seed(0)
# 10000 random points
randPoints <- rpsphere(10000, output="polar")
# the first 6 points
head(randPoints)
```

```
## long lat
## [1,] -43.998368 -16.764084
## [2,] -105.191289 6.095493
## [3,] -20.456553 -10.867072
## [4,] 3.011043 26.776595
## [5,] -7.820389 67.246328
## [6,] 170.576713 -35.169054
```

#### Land Polygons

You can visualize this pointset and plot them on a map, with some
example data, such as the land
polygons from Natural Earth - which are included in the package for
the sake of this tutorial. You can load and plot such vector data with
the `sf`

package. You can install it from the CRAN with the regular
`install.packages()`

`## Linking to GEOS 3.10.2, GDAL 3.4.1, PROJ 8.2.1; sf_use_s2() is TRUE`

Once the package is installed, you can load the example data with
`sf::st_read()`

:

```
# reading in example data
ne <- sf::st_read(file.path(system.file(package="icosa"), "extdata/ne_110m_land.shx"))
```

```
## Reading layer `ne_110m_land' from data source
## `/usr/local/lib/R/site-library/icosa/extdata/ne_110m_land.shx'
## using driver `ESRI Shapefile'
## Simple feature collection with 127 features and 3 fields
## Geometry type: POLYGON
## Dimension: XY
## Bounding box: xmin: -180 ymin: -90 xmax: 180 ymax: 83.64513
## Geodetic CRS: WGS 84
```

This vector data object can be plotted easilly and we can use it to
contextualize the `randPoints`

object - which can represent
any kind of point data:

#### Note

The default margins of `sf`

plots can be too large! The
margins of some of these plots were set with:

Note that the points appear to have a non-random distribution: they seem to have higher density in lower-latitude areas. But this is in fact a distorting effect of not the points, but the equirectangular map projection itself! The points are are created with 3D normal distributions and then they are projected to the surface of the sphere around the center of this distribution.

#### Binning with a Gaussian (UV)-grid

Let’s count these data points in a simple longitude-latitude
(Gaussian, polar, or UV) grid, which you can easilly implement with the
`terra`

package (i.e. raster)! Similar to `sf`

this package can be
installed from the CRAN with `install.packages()`

function
call.

Let’s make a 10x10° raster grid, and count the number of points in every cell:

```
# a 10-degree resolution grid
r <- terra::rast(res=10)
# count the number of points in raster cells
counts <- terra::rasterize(randPoints,r, fun=length)
counts
```

```
## class : SpatRaster
## dimensions : 18, 36, 1 (nrow, ncol, nlyr)
## resolution : 10, 10 (x, y)
## extent : -180, 180, -90, 90 (xmin, xmax, ymin, ymax)
## coord. ref. : lon/lat WGS 84
## source(s) : memory
## name : values
## min value : 1
## max value : 44
```

The counts of these points can be easilly visualized with a plot.

```
# the raster itself
plot(counts)
# plotting the map on plot
plot(ne$geometry, col=NA, border="black", add=TRUE)
```

The point counts have a similar latitudinal pattern as we saw with their distribution: as latitude increases, the cell sizes decrease and the expected number of points in them decreases.0

One solution to this problem can be the use of a different
projection. For instance the Lambert’s
Cylindrical Equal Area projection
(`["EPSG:54034"](https://epsg.io/54034)`

) can be used instead
of the Equirectangular projection. This will, however distort the polar
cells that will become more and more elongated with higher the latitude
values - which might be a problem, or might not.

## About icosahedral grids

The `icosa`

package approaches this problem by not using a
rectangular grid, i.e. a grid that has cells that are rectangular
shape.

Instead, `icosa`

relies on the tessellation of a regular
icosahedron to a given resolution. This procedure ends up with a
polyhedral object of triangular faces of higly isometric properties:
very similar shapes of cells which are roughly equally distanced, and
similar in cell area. Such bodies are often called icospheres.

Here is how such grid looks like in 3D (also plotted with
`icosa`

, see a later tutorial for such plotting):

Still, triangular grids are somewhat unintuitive, because distances between the cell can vary considerably: the corner of a cell is much farther away from the center than the side. For this reason, we usually use the inverted versions of these grids: the triangle cell-midpoints become the corners of the cells (i.e. vertices), which defines a grid where cells are hexagonal - with exactly 12 pentagons (where the vertices of the icosahedron used to be). This is the inverted version of the grid above:

## Creating an icosahedral grid

Once `icosa`

is loaded, you can create such a
penta-hexagonal grid with a single line of code, using the
`hexagrid()`

function:

```
# create a trigrid class object
hexa <- hexagrid(deg=5, sf=TRUE)
```

```
## Selecting hexagrid with tessellation vector: c(2, 4).
## Mean edge length: 4.993 degrees.
```

The argument

`deg`

refers to the expected length of the edge length of the grid. This is given as angular distance between two cell vertices in**degrees**- which means the same as degrees longitudinally (or latitudinally on the equator), 1° =~111 km. This is used to select a`tessellation`

vector, which is directly controlling the grid resolution, describing how the faces of icosahedron are split up. The higher the product of this vector, the higher the resolution - ie. the more cells there will be in the grid. The example`c(2, 4)`

means that the original icosahedrons edges are split up 2 * 4 = 8 times. This triangular grid (`trigrid`

) is then inverted (vertices become cell midpoints), which results in penta-hexagonal grid.The argument

`sf=TRUE`

indicates that we want to create a representation of the grid that can be used for additional spatial operations using the`sf`

package, for example 2D (projection-based) plotting. Since this is not always needed, and can significantly decrease performance, you have to indicate that you need this option.

If you type in the name of the object, you can immediately see some properties about the structure and resolution of the grid.

`hexa`

```
## A/An hexagrid object with 1280 vertices, 1920 edges and 642 faces.
## The mean grid edge length is 555.24 km or 4.99 degrees.
## Use plot3d() to see a 3d render.
```

The most important detail here is the number of faces of the 3d body (642) i.e. the number of cells in the grid.

## Plotting the grid in 2d

If you have generated the `sf`

-represenation as above, the
`plot()`

function can be invoked on the grid to see its 2D
(longitude-latitude projection).

`plot(hexa)`

Putting the map and points generated above is just as easy as with any other kind of mapping object:

```
# plotting the world map
plot(ne$geometry, col="gray", border=NA)
# the grid
plot(hexa, add=TRUE)
# plotting the point cloud
points(randPoints, col="#FF000022", pch=3)
```

## Spatial binning with the `hexagrid`

The grids themselves can only be useful if we can figure out how the points actually interact with them: for instance, if we want to repeat the same binning as earlier, but with the grid that we created earlier. To make this process as flexible and useful as possible, we will be doing this in three short steps:

- Finding the cells that the points fall onto with
`locate()`

- Count the number of points in a cell. (using base R)
- Plot the counts on the grid

The cells of the icoshedral grid are named with the convention
`"F<integer>"`

(`"F"`

stands for
`face`

), such as as `"F12"`

. You can see how these
are distributed using the `gridlabs()`

function:

Cells names are assigned according to a spiral from the North pole of the grid.

#### Point lookup - `locate()`

The only thing that we need to able to do with the grid to make this process feasible is the ability to find the cells on which every points falls. This is a very basic feature, but with this we can do most computations that we might want to do with the grid.

Simplest way to illustrate this is with a couple of points. Let’s try it with the first 5 from the randomly generated set:

```
# the first five points
examples <- randPoints[1:5,]
examples
```

```
## long lat
## [1,] -43.998368 -16.764084
## [2,] -105.191289 6.095493
## [3,] -20.456553 -10.867072
## [4,] 3.011043 26.776595
## [5,] -7.820389 67.246328
```

We can easily plot this object on the gird map we created above:

```
# visualize exact locations
points(examples, col="#FF0000", pch=3, cex=3)
# add an identifier
text(label=1:nrow(examples), examples, col="#FF0000", pch=3, pos=2, cex=3)
```

The `locate()`

function returns the name of the cell under
a point: one cell name for every coordinate pairs. The function takes
the grid (`hexa`

), and the point coordinates
(`examples`

) as arguments and returns a vector of cell
names:

```
exampleCells <- locate(hexa, examples)
exampleCells
```

`## [1] "F405" "F277" "F367" "F208" "F25"`

Note that the we plotted the number identifiers so it is clearly visible which point is which. The ordering of the face/cell names will match that of the points. The match is easier to see if we highlight the cells the points fall on:

```
# replot the map for clarity
plot(hexa)
# Subsetting the sf representation of the grid
plot(hexa, exampleCells, col="green", add=TRUE)
# labels
gridlabs(hexa, cex=0.5)
# points again
points(examples, col="#FF0000", pch=3, cex=3)
```

This localization of the points does a lot of heavy lifting in
`icosa`

, and it can be applied thousands/millions of points
at once.

`## chr [1:10000] "F405" "F277" "F367" "F208" "F25" "F510" "F79" "F539" "F481" ...`

To keep things orgnanized this output of `locate()`

is so
designed that it can be used as a new column in the table of coordinates
(optional):

```
# transform this to a data frame
rdf <- data.frame(randPoints)
# assign the face names as a column
rdf$cells <- cells
# the first 6 rows
head(rdf)
```

```
## long lat cells
## 1 -43.998368 -16.764084 F405
## 2 -105.191289 6.095493 F277
## 3 -20.456553 -10.867072 F367
## 4 3.011043 26.776595 F208
## 5 -7.820389 67.246328 F25
## 6 170.576713 -35.169054 F510
```

#### Counting the cells

To assess the density of points, we need to count the number of
points in every cell - which is actually nothing else, but the
tabulation of the number of times that a point falls on a cell. This we
can easilly do with the `table()`

function:

```
## 'table' int [1:642(1d)] 9 17 10 23 15 24 21 15 17 11 ...
## - attr(*, "dimnames")=List of 1
## ..$ cells: chr [1:642] "F1" "F10" "F100" "F101" ...
```

… which gives us the number of points that fall on one particular
cell (cell name in the `names()`

attribute), frequency as the
value.

#### Plotting the frequency

There is a number of ways to visualize this number on the map, the
simplest is to use a dedicated method of `plot()`

, that takes
the grid (with an `sf`

representation, as above) and a named entity (vector, table) and
plots this with the features made available via `sf`

:

`plot(hexa, tCells)`

You can modify this plotting in any way you could with
`sf::plot()`

. For instance, if you want to use a different
color scheme, a different binning

```
plot(hexa, tCells,
border="white",
pal=c("#440154FF", "#2D708EFF", "#29AF7FFF", "#FDE725FF"),
breaks=c(0, 10, 15, 20, 40)
)
```

You can also set the plotted cooordinate reference system directly
here in the plotting command. For instance, if you would like this to be
plotted directly on a Mollweide projection
(`"ESRI:54009"`

):

```
# the base map
plot(hexa, tCells,
crs="ESRI:54009",
border="white",
pal=c("#440154FF", "#2D708EFF", "#29AF7FFF", "#FDE725FF"),
breaks=c(0, 10, 15, 20, 40)
)
```

If you do not forget to transform the landmasses

`neMoll <- sf::st_transform(ne, "ESRI:54009")`

… you can also put them on top. Just make sure to set
`reset=FALSE`

(see `sf::plot`

why):

```
# the base map
plot(hexa, tCells,
crs="ESRI:54009",
border="white",
pal=c("#440154FF", "#2D708EFF", "#29AF7FFF", "#FDE725FF"),
breaks=c(0, 10, 15, 20, 40),
reset=FALSE
)
# the landmasses tranparent gray with black contour
plot(neMoll$geometry, add=TRUE, col="#66666688")
```

All these options come from `sf:plot()`

. If you
familiarize yourself with how to use that,then only your imagination
will be the limit!

## Using the result in calculations

The by-product of this workflow is that now we get explicit access to
not only the cells’ name, but also their attributes. For instance if you
want to see whether the number of points in a cell has a latitudinal
pattern, you just need to get the coordinates of the face centers, and
assign them to the densities. You can get these with
`centers()`

:

```
# translate the returned 2-column matrix to a data.frame
faceInfo <- as.data.frame(centers(hexa))
# the first 6 rows
head(faceInfo)
```

```
## long lat
## F1 0 90.00000
## F2 -162 82.07063
## F3 -90 82.07063
## F4 -18 82.07063
## F5 54 82.07063
## F6 126 82.07063
```

which returns the longitude and latitude coordinates of the face
centers. Now we just need to assign the counts to these, which is easy
with base R using the `rownames`

as a character subscript and
assigning the result as a new column of `faceInfo`

:

```
## long lat count
## F1 0 90.00000 9
## F2 -162 82.07063 13
## F3 -90 82.07063 16
## F4 -18 82.07063 20
## F5 54 82.07063 18
## F6 126 82.07063 18
```

Now it is easy to see whether there is an association with latitude:

```
plot(faceInfo$lat, faceInfo$count,
xlab="Latitude (deg)", ylab="Point count",
pch=16, col="#99000044")
```

There is no latitudinal trend in the points, but you can see that the spread decreases latitudinally (because there are fewer cells) at high latitudes.

## Summary

If you have some data (e.g. `randPoints`

) and you want to
execute the binning and plotting, you can do it with this much code -
now using a different, coarser resolution grid:

`gr <- hexagrid(deg=10, sf=TRUE) # create grid`

```
## Selecting hexagrid with tessellation vector: c(2, 2).
## Mean edge length: 9.994 degrees.
```