Using macroeco¶
This tutorial describes the basic usage of the macroeco Python package. Users who wish to use the high-level MacroecoDesktop interface should refer to the Using MacroecoDesktop tutorial. To start, load the following packages in your python environment
>>> import macroeco as meco
>>> import matplotlib.pyplot as plt
>>> import numpy as np
>>> plt.style.use('ggplot') # For pretty plotting
The models subpackage¶
The models subpackage contains a number of common statistical distributions and curves used in macroecological analyses. A full list of the available models and their documentation can be found at Models (macroeco.models).
Note
When just using the models subpackage one will typically start with
>>> import macroeco.models as mdThis will be sufficient for most users as it will import the core models of macroeco. However, if you want access to all the functions and classes used to build these models you can use the notation
>>> import macroeco.models._distributions as dist >>> import macroeco.models._curves as curves
Distributions¶
All the statistical distributions contained in models inherit from the rv_discrete or rv_continuous classes defined in the package scipy.stats.distributions. Therefore, all the distribution can be used in the same way as any distributions defined in scipy.stats. For example, a Fisher logseries distribution, a common distribution used for species abundance distributions, is defined by one parameter p which can take values between 0 and 1. This distribution could defined as follows
>>> logser_dist = meco.models.logser(p=0.9)
The probability of observing a species with one individual is
>>> logser_dist.pmf(1)
0.39086503371292664
The probability of observing and species with 10 individuals or less (given by the cumulative distribution function) is
>>> logser_dist.cdf(10)
0.9201603889810761
Similarly, the following function call gives the same results as above
>>> meco.models.logser.pmf(1, 0.9)
0.39086503371292664
>>> meco.models.logser.cdf(10, 0.9)
0.9201603889810761
The distributions contained in the models package also contain some special functions such as rank, which gives the rank abundance distribution for a given distribution. For example, the rank abundance distribution for the logseries distribution given above with 30 species in the community is given by
>>> meco.models.logser.rank(30, 0.9)
array([ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 2., 2., 2., 2., 2., 3., 3., 3., 4., 4.,
5., 5., 6., 7., 8., 10., 13., 21.])
A community with 30 species following this logseries distribution is expected to have 15 species with one individual.
Curves¶
The models subpackage also contains objects known as Curves. These consist of macroecological curves such as species area relationships (SAR)s and endemic area relationships (EAR)s. Currently, there are 7 implemented curves (Models (macroeco.models))
- Power law curve
- A general sampling SAR with direct upscaling or downscaling
- A general sampling SAR with iterative upscaling or downscaling
- A general sampling EAR with direct downscaling
- Maximum Entropy Theory of Ecology (METE) SAR with direct upscaling and downscaling
- METE SAR with iterative upscaling and downscaling
- METE EAR with direct downscaling
The general sampling SAR/EAR is discussed in Kitzes and Harte (2014) (also see sampling_sar) and is a combination of a species abundance distribution (SAD) and a species-level spatial abundance distribution. In macroeco, we implement the general sampling SAR/EAR with a flexible SAD (zero-truncated negative binomial distribution, nbinom_ztrunc) and a flexible SSAD (conditional negative binomial distribution, cnbinom) both are which are distributions defined within macroeco.models. Each of these distributions has an aggregation parameter k_agg that that allows them to take the shapes of most realistically observed SADs and SSADs.
To use the general sampling SAR, you need to specify 4 parameters
- The total number of species at some base areas (S0)
- The total number of individuals in a community at some base area (N0)
- The aggregation parameter of the SAD (sad_k)
- The aggregation parameter of the SSAD (ssad_k)
For example, to estimate the expected number of species present in an area that is double the size of the base area and half the size of the base area given S0 = 30, N0 = 1000, and a Broken Stick SAD (sad_k = 1) and an approximately binomial SSAD (ssad_k = 10) we can use the following code
>>> # Number of species in base area
>>> S0 = 30
>>> # Number of individuals in base area
>>> N0 = 1000
>>> # A list of habitat areas. Base area is 1
>>> areas = [1, 2, 0.5]
>>> # Get the non-iterative sampling SAR
>>> meco.models.sampling_sar.vals(areas, S0, N0, sad_k=1, ssad_k=10, approx=True)
array([ 30. , 30.50645744, 29.03925601])
>>> # Get the iterative sampling SAR
>>> meco.models.sampling_sar_iterative.vals(areas, S0, N0, sad_k=1, ssad_k=10, approx=True)
array([ 30. , 30.50645744, 29.03925601])
For the parameter areas, the first number in the list (1 in this example) is always the base area (e.g. 50 ha, 2.5 m^2, 300 in^2), and the following numbers are additional areas at which to calculate species richness (2 and 0.5 in this example). Using the argument approx=True significantly speeds up the calculation and will tend to given very similar answers to approx=False. The default is approx=True.
Note that the iterative approach the non-iterative approach are not generally the same
>>> areas = [1, 0.5, 0.25, 0.125, 0.0625]
>>> noiter_sar = meco.models.sampling_sar.vals(areas, S0, N0, 1, 1)
>>> iter_sar = meco.models.sampling_sar_iterative.vals(areas, S0, N0, 1, 1)
>>> # Plot the results
>>> plt.plot(areas, noiter_sar, label="Non-iterative SAR")
>>> plt.plot(areas, iter_sar, label="Iterative SAR")
>>> plt.legend(loc="center right")
>>> plt.xlabel("Area")
>>> plt.ylabel("# of Species")
We can generate sampling EARs with identical syntax
>>> # Get a sampling EAR
>>> meco.models.sampling_ear.vals(areas, S0, N0, 1, 1)
array([ 30. , 2.42629621, 0.42783611, 0.14823899, 0.06399121])
The METE SAR/EAR (mete_sar, mete_sar_iterative, mete_ear) are a special cases of the sampling SAR where sad_k = 0 (Logseries SAD) and ssad_k = 1 (truncated geometric SSAD). This SAR that is described at length in the book Maximum Entropy and Ecology: A Theory of Abundance, Distribution, and Energetics by John Harte (2011). Just like the general sampling SAR/EAR, it can be used to upscale and downscale species richness, but only requires two parameters: total species at the base area (S0) and total individuals at the base area (N0).
>>> # Non-iterative METE SAR
>>> areas = [1, 0.5, 2, 0.25, 0.125, 4]
>>> meco.models.mete_sar.vals(areas, S0, N0)
array([ 30. , 24.35087775, 36.15434332, 19.76518824,
15.76150633, 41.73194557])
>>> # Iterative METE SAR
>>> meco.models.mete_sar_iterative.vals(areas, S0, N0)
array([ 30. , 24.35087775, 36.15434332, 19.25568734,
14.76483053, 42.77067166])
The METE EAR is called with identical syntax
>>> # Get the METE EAR
>>> areas = [1, 0.9, 0.7, 0.2, 0.001]
>>> meco.models.mete_ear.vals(areas, S0, N0)
array([ 3.00000000e+01, 2.22524222e+01, 1.15798199e+01,
1.44475285e+00, 5.79588423e-03])
Additional subpackages¶
In addition to the models package, the macroeco package contains two other main subpackages of interest:
- empirical - loads data tables and performs empirical analysis of macroecological metrics, such as the species abundance distribution and species area relationship (Empirical (macroeco.empirical))
- compare - provides utility functions for comparing the fit of models to empirical metrics, such as AIC weights and r-squared statistics (Compare (macroeco.compare))
A common workflow involves loading data, calculating an empirical metric, fitting one or more models to the empirical metric, and evaluating the fit of the model to the metric.
A simple species abundance distribution analysis¶
The following example shows a simple species abundance distribution analysis for the demo ANBO data that can be downloaded here by clicking on the demo_files_ANBO.zip under the latest version.
First, the Patch class from the empirical subpackage is used to create a Patch object that holds the data table and a metadata dictionary describing the data. Patch requires a path, absolute or relative, to a metadata file as a mandatory argument (see Preparing Data for information on creating a metadata file for a new data set).
>>> pat = meco.empirical.Patch('~/Desktop/ANBO.txt')
The data table can be accessed by the via the table attribute of the Patch object
>>> pat.table
year cell row column spp count
0 2010 1 3 3 cabr 3
1 2010 1 3 3 caspi1 20
2 2010 1 3 3 crcr 3
3 2010 1 3 3 crsp2 1
4 2010 1 3 3 gnwe 11
5 2010 1 3 3 grass 11
6 2010 1 3 3 lesp1 1
7 2010 1 3 3 phdi 5
8 2010 1 3 3 pypo 6
9 2010 1 3 3 ticr 50
10 2010 2 3 2 caspi1 17
11 2010 2 3 2 comp1 2
12 2010 2 3 2 crsp2 7
13 2010 2 3 2 gnwe 4
14 2010 2 3 2 grass 26
15 2010 2 3 2 phdi 7
16 2010 2 3 2 pypo 8
17 2010 2 3 2 ticr 12
18 2010 2 3 2 unsp1 1
19 2010 3 3 1 arsp1 1
20 2010 3 3 1 caspi1 9
21 2010 3 3 1 crsp2 8
22 2010 3 3 1 grass 120
23 2010 3 3 1 mobe 4
24 2010 3 3 1 phdi 14
25 2010 3 3 1 pypo 12
26 2010 3 3 1 ticr 7
27 2010 3 3 1 unsp1 1
28 2010 4 3 0 crcr 23
29 2010 4 3 0 crsp2 13
.. ... ... ... ... ... ...
[121 rows x 6 columns]
The empirical subpackage contains a number of functions that operate on patch objects and return macroecological metrics. Here we’ll use the function sad to calculate a species abundance distribution. The function sad has the following arguments
- The first argument is a Patch object
- The second is a string specifying which column in the data table has the species names (i.e. the spp_col) and which, if any, has a count of individuals at a particular location (i.e. the count_col). For this data set, the column containing species names is spp and the column containing counts is count. Therefore, the string would read 'spp_col:spp; count_col:count'
Note
If count_col is not given, the abundance/count at any given location is assumed to be 1.
- The third is a string specifying how to split the data. We are leaving this argument blank ('') in this example but see Empirical (macroeco.empirical) or later in the tutorial (A more complex analysis) for more information on splitting.
We can then call the sad function as follows
>>> sad = meco.empirical.sad(pat, 'spp_col:spp; count_col:count', '')
All functions for macroecological metrics return their results as a list of tuples. Each tuple has two elements
- A string describing how the data were split (no split in this case)
- A result table with a column y (for univariate distributions like the species abundance distribution) or columns y and x (for curves such as a species area relationship) giving the results of the analysis.
Since the data were not split in this example, the list has only one tuple. The result is
>>> sad
[('', spp y
0 arsp1 2
1 cabr 31
2 caspi1 58
3 chst 1
4 comp1 5
5 cran 4
6 crcr 65
7 crsp2 79
8 enfa 1
9 gnwe 41
10 grass 1110
11 lesp1 1
12 magl 1
13 mesp 6
14 mobe 4
15 phdi 210
16 plsp1 1
17 pypo 73
18 sasp 2
19 ticr 729
20 unsh1 1
21 unsp1 18
22 unsp3 1
23 unsp4 1)]
where the first element of the tuple is '' (an empty string because no split occurred) and the second element in the tuple is a pandas DataFrame with two columns: 1) the species ID (spp) and 2) the abundance of each species (y). The DataFrame itself can easily be extracted
>>> sad_df = sad[0][1]
where we recognize that the DataFrame is the second element (index 1) of the first tuple in the list (index 0). This notation will make more sense when we consider splitting the data below (A more complex analysis).
Any number of distributions from the models subpackage can be fit to the resulting empirical metric. The code below fits a Fisher’s logseries distribution and uses the function AIC from the compare subpackage to calculate the AIC for this distribution and data.
>>> # Fit the logseries distribution to the empirical SAD
>>> p = meco.models.logser.fit_mle(sad_df['y'])
>>> p
(0.9984913251355505,)
We can then get an AIC value to determine the “goodness of fit” of the logseries distribution to the empirical data.
>>> # Get the AIC value
>>> logser_aic = meco.compare.AIC(sad_df['y'], meco.models.logser(p[0]))
>>> logser_aic
206.2729258353742
If you are using the ipython environment you can see the arguments that meco.compare.AIC takes using meco.compare.AIC?. In short, the function takes in the data (in this case the species abundance distribution) and fitted model object and returns the AIC value. Of course, AICs aren’t very useful by themselves, so let’s compare the logseries fit to a broken stick distribution, another classic theoretical SAD. This is equivalent to a zero-truncated negative binomial distribution with aggregation parameter k equal to 1.
>>> # Get Broken Stick AIC
>>> broken_stick_aic = meco.compare.AIC(sad_df['y'], meco.models.nbinom_ztrunc(np.mean(sad_df['y']), 1))
>>> broken_stick_aic
274.27490655552322
We can see that the lower AIC for the logseries suggests that this is a more appropriate model for this SAD.
We could also visually compare these models using their rank abundance distributions. We first generate the rank abundance distributions for the fitted logseries and the broken stick distributions and then plot it against the empirical data.
>>> logser_rad = meco.models.logser.rank(len(sad_df), p)
>>> broken_stick_rad = meco.models.nbinom_ztrunc.rank(len(sad_df), np.mean(sad_df['y']), 1)
>>> # Plot the empirical data. Note that [::-1] reverses the order of a vector
>>> ranks = np.arange(1, len(sad_df['y']) + 1)
>>> plt.semilogy(ranks, np.sort(sad_df['y'])[::-1], label="Empirical RAD")
>>> # Plot the RAD of the models
>>> plt.semilogy(ranks, logser_rad[::-1], label="Logseries RAD")
>>> plt.semilogy(ranks, broken_stick_rad[::-1], label="Broken Stick RAD")
>>> plt.xlabel("Rank")
>>> plt.ylabel("Log(Abundance)")
>>> plt.legend()
>>> plt.show()
A simple species-area relationship analysis¶
We can also analyze species-area relationships (SAR)s using macroeco. To get an empirical SAR from the ANBO data we use the function meco.empirical.sar. As described in the documentation, this function takes 4 key arguments
- patch: The empirical Patch object
- cols: A semicolon-separated column string that identifies the species column (i.e. spp_col, the column containing the species names), the count column (i.e. count_col, the column containing the species counts at a particular location), the x column (i.e. x_col, the column specifying the spatial location of an individual in the x direction), and the y column (i.e. y_col, which column specifying the spatial location of an individual in the y direction). For example, this string for the ANBO data would be 'spp_col:spp; count_col:count; x_col:row; y_col:column' because the column that contains the species names is spp, the column that contains the counts is count, the column that contain the spatial location of an individual in the x direction is row and the column that contains the spatial location of an individual in the y direction is column. For the SAR analysis, x_col and y_col must be specified.
- splits: A string specifying whether the analysis should be run on different subsets of the data. For example, if one had a column year specifying different years that the community census was completed the string year:split would run the analysis on each year separately. split is a key word described in the documentation.
- divs: A semicolon-separated string that describes how to successively divide the patch along the x_col and y_col dimensions. For example, the string '1,2; 2,2; 2,4' will calculate the average species richness at three areas. The first areas (1,2) will be made by dividing the x column into 1 equal part and the y column into 2 equal parts. The second areas (2, 2) will be made by dividing the x column and the y column into 2 equal parts. The third areas (2, 4) will be made by dividing the x column in 2 equal parts and the y column in 4 equal parts.
To illustrate this dividing, let’s look at the ANBO plot. The ANBO census was performed on a 4m x 4m = 16 m^2 grid, as shown below
A division of (1, 2) gives two areas of size 8 m^2 by dividing the plot horizontally into two halves
A division of (2, 1) gives two areas of size 8 m^2 by dividing the plot vertically into two halves
A division of (2, 2) gives four areas of size 4 m^2 by dividing the plot vertically and horizontally
The get the SAR for the areas 1, 2, 4, 8, and 16 m^2 we use the following code.
>>> sar = meco.empirical.sar(pat, 'spp_col:spp; count_col:count; x_col:row; y_col:column', "", "1,1; 1,2; 2,1; 2,2; 2,4; 4,2; 4,4")
>>> sar
[('', div n_individs n_spp x y
0 1,1 2445.0000 24.0000 16 24.0000
1 1,2 1222.5000 18.5000 8 18.5000
2 2,1 1222.5000 17.0000 8 17.0000
3 2,2 611.2500 13.5000 4 13.5000
4 2,4 305.6250 10.1250 2 10.1250
5 4,2 305.6250 10.5000 2 10.5000
6 4,4 152.8125 7.5625 1 7.5625)]
The output of the SAR function is a list of tuples where each tuple is a particular split. Because we did not split the data (i.e. the split parameter was ''), we have one tuple. The second item in this tuple is a pandas DataFrame that contains the key results of the analysis
>>> sar_table = sar[0][1]
>>> sar_table
div n_individs n_spp x y
0 1,1 2445.0000 24.0000 16 24.0000
1 1,2 1222.5000 18.5000 8 18.5000
2 2,1 1222.5000 17.0000 8 17.0000
3 2,2 611.2500 13.5000 4 13.5000
4 2,4 305.6250 10.1250 2 10.1250
5 4,2 305.6250 10.5000 2 10.5000
6 4,4 152.8125 7.5625 1 7.5625
The column div gives the divisions specified in the function call. The column n_individs specifies the average number of individuals across the cells made from the given division. n_spp gives the average species across the cells made from the given division. x gives the absolute area of a cell for the given division. y gives the same information as n_spp and is included for easy plotting.
For example, the row with div = 2,2 is a result of dividing the ANBO plot into 4 equal sized areas of 4 m^2, calculating the the species richness and total number of individuals in each of the 4 areas and returning the average species richness and total number of individuals over the four areas.
For plotting, one might want to combine like areas to a single value and then plot.
>>> # Combine similar areas
>>> combined_sar = sar_table.groupby('x').mean().reset_index()
>>> # Plot the SAR
>>> plt.loglog(combined_sar['x'], combined_sar['y'], '-o', label="Empirical SAR")
>>> plt.xlabel("Log(Area)")
>>> plt.ylabel("Log(Species)")
If we want to compare the empirical SAR to a power law SAR and a METE SAR we can first fit each of these curves to the data. To fit the METE SAR, we only need the total number of species (n_spp) and total number of individuals (n_individs) at the base scale (i.e. at div = 1,1). We could either look at the table at see that n_spp at div = 1,1 is 24 and n_individs is 2445 or pass in the data frame to the fit_lsq method of the mete_sar curve
>>> # Fit the METE SAR
>>> S0, N0 = meco.models.mete_sar_iterative.fit_lsq(sar_table)
>>> S0, N0
(24.0, 2445.0)
>>> # Get the predicted values from the fitted METE SAR
>>> pred_mete = meco.models.mete_sar_iterative.vals(combined_sar['x'][::-1], S0, N0, approx=True)
We can fit a power law SAR using similar notation
>>> # Fit the power law
>>> c, z = meco.models.power_law.fit_lsq(combined_sar['x'], combined_sar['y'])
>>> c, z
(7.617934680879773, 0.41241825356358003)
>>> # Get the predicted value from the fitted power law
>>> pred_power_law = meco.models.power_law.vals(combined_sar['x'][::-1], c, z)
and then compare these theoretical SARs to the empirical SAR
>>> plt.loglog(combined_sar['x'][::-1], pred_power_law, '-o', label="Power Law SAR")
>>> plt.loglog(combined_sar['x'][::-1], pred_mete, '-o', label="METE SAR")
>>> plt.legend()
>>> plt.show()
Clearly the power law SAR provides a better fit to the data than the METE SAR. We can confirm this quantitatively using a one to one R^2 value when we compare observed (Empirical SAR) and predicted values (METE or Power Law SAR). If the predicted SAR is a perfect fit to the observed SAR, the predicted values will exactly equal the observed values (i.e. fall along the one to one line).
>>> r2_mete = meco.compare.r_squared(combined_sar['y'][::-1], pred_mete, one_to_one=True, log_trans=True)
>>> r2_mete
0.65340238146107854
>>> r2_power_law = meco.compare.r_squared(combined_sar['y'][::-1], pred_power_law, one_to_one=True, log_trans=True)
>>> r2_power_law
0.99939083620342017
The R^2 for the power law is close to one and greater than the R^2 for the METE SAR supporting the plot that the power law is a better model for the SAR. Note that unadjusted R^2 values are not generally comparable across different models.
A simple spatial analysis¶
Another potential analysis we can do with macroeco is to analyze the spatial patterns of individuals in the plot. We can get the spatial patterns of all the species in plot by using the meco.empirical.ssad function.
The SSAD is a species-level spatial abundance distribution. In other words, how are the individuals of a species distributed in space? The empirical SSAD function has three arguments. The first is the Patch object, the second is the cols string, and the third is the split string specifying how to grid a given landscape.
For example, the split string 'row:4; column:4' says to divide the column row into 4 equally spaced sections and divide the column column into 4 equally spaced sections. This gives a grid with 16 equally sized cells.
We can do this for the ANBO data using the following code
>>> all_spp_ssads = meco.empirical.ssad(pat, 'spp_col:spp; count_col:count', 'row:4; column:4')
The result all_spp_ssads is a list with 24 tuples where each tuple contains two items. The first item is a string giving a species name and the second item is a data frame giving the abundance of the given species in each of the 16 cells.
>>> all_spp_ssads[0]
('arsp1', y
0 0
1 0
2 0
3 0
4 0
5 0
6 0
7 0
8 1
9 0
10 0
11 0
12 0
13 1
14 0
15 0)
If we want to quantify the aggregation of each of these species in space, we can loop through all of the species in all_spp_ssads (24 of them) and fit a finite negative binomial distribution to each species. A finite negative binomial distribution describes the probability of a single cell on the landscape having an abundance of 0-n where n is the total number of individuals in the species of interest.
The k parameter of this distribution specifies how aggregated a species is in space with k approaching 0 being very aggregated and k approaching infinity being binomially distributed. Here is how we can fit the spatial distribution of each species in the landscape to a finite negative binomial distribution and extract the aggregation parameter k
# Store the results
agg_res = {}
# Loop through all species
for spp_name, data in all_spp_ssads:
# Fit finite negative binomial distribution
k_param = meco.models.cnbinom.fit_mle(data['y'], k_array=np.linspace(0.01, 5, num=1000))[1]
# Get total abundance for a given species
total_abund = data['y'].sum()
# Store k parameter and total abundance for each species
agg_res[spp_name] = (k_param, total_abund)
The dictionary agg_res contains the k parameter and total abundance for each species in the ANBO data. If we wanted to see how k varied with abundance we could plot k versus abundance for each species
# Extract k and abundance
k, abund = zip(*list(agg_res.viewvalues()))
# Get abundances greater than 20
gt_20 = np.array(abund) > 20
plt.semilogx(np.array(abund)[gt_20], np.array(k)[gt_20], 'o')
plt.xlabel("log(Abundance)")
plt.ylabel("k parameter")
plt.show()
A more complex analysis¶
One of the major benefits of macroeco is that you can explore how macroecological patterns vary across scale and/or for different subsets of your data. For example, what if we wanted to explore how an SAD changed across scale? We will again use the ANBO data to illustrate this example.
Remember that the ANBO census was conducted on a 4m x 4m grid where each cell was 1m x 1m. To examine how the SAD changes across scale, we will take the following steps.
First, split the ANBO plot on the row and column columns and get the empirical SAD for each of the resulting cells. For example, if I split row into 2 equal halves and column into 4 equal halves I will get a plot that contains 8 cells of that are 2 m^2.
Each of these cells has a unique SAD. I get these SADs using the following code
>>> # Split row by 2, split column by 4
>>> split_sads = meco.empirical.sad(pat, "spp_col:spp; count_col:count", splits="row:2; column:4")
split_sads is a list with 8 tuples and each tuple contains the empirical SAD for one of the 8 cells created by splits. For example,
>>> split_sads[0]
('row>=-0.5; row<1.5; column>=-0.5; column<0.5',
spp y
1 cabr 7
3 chst 1
5 cran 2
6 crcr 13
7 crsp2 7
9 gnwe 5
10 grass 130
15 phdi 22
16 plsp1 1
17 pypo 4
19 ticr 210
23 unsp4 1)
The second item in this tuple is the empirical SAD for one of the 8 cells created by splits. The first item is a string that tells us that this is an SAD from the cell where row is greater than -0.5 and less than 1.5 and the column is greater than -0.5 and less than 0.5. Here is the visual representation of that cell
The second step is to fit a an theoretical SAD to the empirical SAD in each cell. If we are interested in how the shape of the SAD changes with scale, we might want to fit a zero-truncated negative binomial distribution and look at the shape parameter of this distribution, k in each cell. We could then take the average of k across all SADs at that scale.
Third, we want to repeat this analysis across multiple scales
Here is the code to get the empirical SADs for 1m x 1m, 2m x 2m, 2m x 4m, 4m x 2m, and 4m x 4m scales.
# Redefining the patch
pat = meco.empirical.Patch("~/Desktop/ANBO.txt")
# Get the empirical SAD in each 1m x 1m cell
splits1 = "row:4; column:4"
# Get the empirical SAD in each 2m x 1m cell
splits2 = "row:2; column:4"
# Get the empirical SAD in 4 2m x 2m cells (upper left , upper right, lower left, lower right)
splits3 = "row:2; column:2"
# Get the empirical SAD in left half and right half 4m x 2m cells
splits4 = "row:1; column:2"
# Get the SAD for the full plot
splits5 = "row:1; column:1"
all_splits = [splits1, splits2, splits3, splits4, splits5]
# Store all the empirical SAD results
results = []
for split in all_splits:
results.append(meco.empirical.sad(pat, 'spp_col:spp; count_col:count', splits=split))
The parameter results stores the empirical SAD results across scales. For example, results[0] is a list of length 16 that has the SAD for each cell in the plot.
>>> len(results[0])
16
>>> results[0][0]
('row>=-0.5; row<0.5; column>=-0.5; column<0.5', spp y
1 cabr 2
3 chst 1
5 cran 1
6 crcr 3
10 grass 42
15 phdi 8
16 plsp1 1
17 pypo 3
19 ticr 140
23 unsp4 1)
Now we fit all the SADs to a zero-truncated negative binomial distribution and plot the results
# Fit the SAD
# Store the average ks
avg_ks = []
for tres, split_str in zip(results, all_splits):
within_scale_ks = []
for split in tres:
within_scale_ks.append(meco.models.nbinom_ztrunc.fit_mle(split[1]['y'])[1])
avg_ks.append(np.mean(within_scale_ks))
# Plot the results
areas = [1, 2, 4, 8, 16]
plt.plot(areas, avg_ks, '-o')
plt.xlabel("Scale/Area in m^2")
plt.ylabel("k of zero-truncated NBD")
plt.show()
For this data, k is clearly decreasing with increasing scale.