# Getting Started with NNS: Clustering and Regression

require(NNS)
require(knitr)
require(rgl)
require(data.table)

# Clustering and Regression

Below are some examples demonstrating unsupervised learning with NNS clustering and nonlinear regression using the resulting clusters. As always, for a more thorough description and definition, please view the References.

## NNS Partitioning NNS.part

NNS.part is both a partitional and hierarchical clustering method. NNS iteratively partitions the joint distribution into partial moment quadrants, and then assigns a quadrant identification (1:4) at each partition.

NNS.part returns a data.table of observations along with their final quadrant identification. It also returns the regression points, which are the quadrant means used in NNS.reg.

x = seq(-5, 5, .05); y = x ^ 3

for(i in 1 : 4){NNS.part(x, y, order = i, noise.reduction = "off", Voronoi = TRUE)}

## $order ## [1] 4 ## ##$dt
##   1: -5.00 -125.0000    q4444           q444
##   2: -4.95 -121.2874    q4444           q444
##   3: -4.90 -117.6490    q4444           q444
##   4: -4.85 -114.0841    q4444           q444
##   5: -4.80 -110.5920    q4444           q444
##  ---
## 197:  4.80  110.5920    q1111           q111
## 198:  4.85  114.0841    q1111           q111
## 199:  4.90  117.6490    q1111           q111
## 200:  4.95  121.2874    q1111           q111
## 201:  5.00  125.0000    q1111           q111
##
## $regression.points ## quadrant x y ## 1: q111 4.600 98.164000 ## 2: q113 4.150 71.473375 ## 3: q114 3.650 49.448375 ## 4: q131 3.025 27.746813 ## 5: q134 2.700 19.764000 ## 6: q141 2.050 9.076375 ## 7: q143 1.425 2.924813 ## 8: q144 0.650 0.528125 ## 9: q411 -0.600 -0.450000 ## 10: q412 -1.400 -2.786000 ## 11: q414 -2.025 -8.712562 ## 12: q421 -2.650 -18.689125 ## 13: q424 -3.000 -27.090000 ## 14: q441 -3.625 -48.366563 ## 15: q442 -4.125 -70.197187 ## 16: q444 -4.600 -98.164000 ### X-only Partitioning NNS.part offers a partitioning based on $$x$$ values only, using the entire bandwidth in its regression point derivation, and shares the same limit condition as partitioning via both $$x$$ and $$y$$ values. for(i in 1 : 4){NNS.part(x, y, order = i, type = "XONLY", Voronoi = TRUE)} Note the partition identifications are limited to 1’s and 2’s (left and right of the partition respectively), not the 4 values per the $$x$$ and $$y$$ partitioning. ##$order
## [1] 4
##
## $dt ## x y quadrant prior.quadrant ## 1: -5.00 -125.0000 q1111 q111 ## 2: -4.95 -121.2874 q1111 q111 ## 3: -4.90 -117.6490 q1111 q111 ## 4: -4.85 -114.0841 q1111 q111 ## 5: -4.80 -110.5920 q1111 q111 ## --- ## 197: 4.80 110.5920 q2222 q222 ## 198: 4.85 114.0841 q2222 q222 ## 199: 4.90 117.6490 q2222 q222 ## 200: 4.95 121.2874 q2222 q222 ## 201: 5.00 125.0000 q2222 q222 ## ##$regression.points
## 1:     q111 -4.375 -85.585938
## 2:     q112 -3.100 -31.000000
## 3:     q121 -1.850  -7.053125
## 4:     q122 -0.600  -0.450000
## 5:     q211  0.650   0.528125
## 6:     q212  1.900   7.600000
## 7:     q221  3.150  32.484375
## 8:     q222  4.400  86.900000

## Clusters Used in Regression

The right column of plots shows the corresponding regression for the order of NNS partitioning.

for(i in 1 : 3){NNS.part(x, y, order = i, noise.reduction = "off", Voronoi = TRUE) ; NNS.reg(x, y, order = i, ncores = 1)}

## NNS Regression NNS.reg

NNS.reg can fit any $$f(x)$$, for both uni- and multivariate cases. NNS.reg returns a self-evident list of values provided below.

### Univariate:

NNS.reg(x, y, order = 4, noise.reduction = "off", ncores = 1)

## $R2 ## [1] 0.9998462 ## ##$SE
## [1] 0.7165028
##
## $Prediction.Accuracy ## NULL ## ##$equation
## NULL
##
## $x.star ## NULL ## ##$derivative
##     Coefficient X.Lower.Range X.Upper.Range
##  1:    75.52082        -5.000        -4.800
##  2:    58.65918        -4.800        -4.600
##  3:    58.87750        -4.600        -4.125
##  4:    43.66125        -4.125        -3.625
##  5:    34.04250        -3.625        -3.000
##  6:    24.00250        -3.000        -2.650
##  7:    15.96250        -2.650        -2.025
##  8:     9.48250        -2.025        -1.400
##  9:     2.92000        -1.400        -0.600
## 10:     0.78250        -0.600         0.650
## 11:     3.09250         0.650         1.425
## 12:     9.84250         1.425         2.050
## 13:    16.44250         2.050         2.700
## 14:    24.56250         2.700         3.025
## 15:    34.72250         3.025         3.650
## 16:    44.05000         3.650         4.150
## 17:    59.31250         4.150         4.600
## 18:    58.65918         4.600         4.800
## 19:    75.52082         4.800         5.000
##
## $Point.est ## NULL ## ##$regression.points
##          x           y
##  1: -5.000 -125.000000
##  2: -4.800 -109.895837
##  3: -4.600  -98.164000
##  4: -4.125  -70.197187
##  5: -3.625  -48.366563
##  6: -3.000  -27.090000
##  7: -2.650  -18.689125
##  8: -2.025   -8.712562
##  9: -1.400   -2.786000
## 10: -0.600   -0.450000
## 11:  0.650    0.528125
## 12:  1.425    2.924813
## 13:  2.050    9.076375
## 14:  2.700   19.764000
## 15:  3.025   27.746813
## 16:  3.650   49.448375
## 17:  4.150   71.473375
## 18:  4.600   98.164000
## 19:  4.800  109.895837
## 20:  5.000  125.000000
##
## $Fitted.xy ## x y y.hat NNS.ID gradient residuals ## 1: -5.00 -125.0000 -125.0000 q4444 75.52082 0.00000000 ## 2: -4.95 -121.2874 -121.2240 q4444 75.52082 0.06341583 ## 3: -4.90 -117.6490 -117.4479 q4444 75.52082 0.20108166 ## 4: -4.85 -114.0841 -113.6719 q4444 75.52082 0.41224749 ## 5: -4.80 -110.5920 -109.8958 q4444 58.65918 0.69616332 ## --- ## 197: 4.80 110.5920 109.8958 q1111 75.52082 -0.69616332 ## 198: 4.85 114.0841 113.6719 q1111 75.52082 -0.41224749 ## 199: 4.90 117.6490 117.4479 q1111 75.52082 -0.20108166 ## 200: 4.95 121.2874 121.2240 q1111 75.52082 -0.06341583 ## 201: 5.00 125.0000 125.0000 q1111 75.52082 0.00000000 ### Multivariate: Multivariate regressions return a plot of $$y$$ and $$\hat{y}$$, as well as the regression points ($RPM) and partitions ($rhs.partitions) for each regressor. f= function(x, y) x ^ 3 + 3 * y - y ^ 3 - 3 * x y = x ; z = expand.grid(x, y) g = f(z[ , 1], z[ , 2]) NNS.reg(z, g, order = "max", ncores = 1) ##$R2
## [1] 1
##
## $rhs.partitions ## Var1 Var2 ## 1: -5.00 -5 ## 2: -4.95 -5 ## 3: -4.90 -5 ## 4: -4.85 -5 ## 5: -4.80 -5 ## --- ## 40397: 4.80 5 ## 40398: 4.85 5 ## 40399: 4.90 5 ## 40400: 4.95 5 ## 40401: 5.00 5 ## ##$RPM
##        Var1  Var2         y.hat
##     1: -4.8 -4.80 -7.105427e-15
##     2: -4.8 -2.55 -8.726063e+01
##     3: -4.8 -2.50 -8.806700e+01
##     4: -4.8 -2.45 -8.883587e+01
##     5: -4.8 -2.40 -8.956800e+01
##    ---
## 40397: -2.6 -2.80  3.776000e+00
## 40398: -2.6 -2.75  2.770875e+00
## 40399: -2.6 -2.70  1.807000e+00
## 40400: -2.6 -2.65  8.836250e-01
## 40401: -2.6 -2.60  1.776357e-15
##
## $Point.est ## NULL ## ##$Fitted.xy
##         Var1 Var2          y      y.hat      NNS.ID residuals
##     1: -5.00   -5   0.000000   0.000000     201.201         0
##     2: -4.95   -5   3.562625   3.562625     402.201         0
##     3: -4.90   -5   7.051000   7.051000     603.201         0
##     4: -4.85   -5  10.465875  10.465875     804.201         0
##     5: -4.80   -5  13.808000  13.808000    1005.201         0
##    ---
## 40397:  4.80    5 -13.808000 -13.808000 39597.40401         0
## 40398:  4.85    5 -10.465875 -10.465875 39798.40401         0
## 40399:  4.90    5  -7.051000  -7.051000 39999.40401         0
## 40400:  4.95    5  -3.562625  -3.562625 40200.40401         0
## 40401:  5.00    5   0.000000   0.000000 40401.40401         0

NNS.reg can inter- or extrapolate any point of interest. The NNS.reg(x, y, point.est = ...) parameter permits any sized data of similar dimensions to $$x$$ and called specifically with $Point.est. ### Classification For a classification problem, we simply set NNS.reg(x, y, type = "CLASS", ...). NNS.reg(iris[ , 1 : 4], iris[ , 5], type = "CLASS", point.est = iris[1:10, 1 : 4], location = "topleft", ncores = 1)$Point.est

##  [1] 0.9908216 0.9915350 0.9908216 0.9908216 0.9915350 1.0000000 1.0000000
##  [8] 0.9930402 1.0000000 0.9915350

### NNS Dimension Reduction Regression

NNS.reg also provides a dimension reduction regression by including a parameter NNS.reg(x, y, dim.red.method = "cor", ...). Reducing all regressors to a single dimension using the returned equation $equation. NNS.reg(iris[ , 1 : 4], iris[ , 5], dim.red.method = "cor", location = "topleft", ncores = 1)$equation

##        Variable Coefficient
## 1: Sepal.Length   0.7825612
## 2:  Sepal.Width  -0.4266576
## 3: Petal.Length   0.9490347
## 4:  Petal.Width   0.9565473
## 5:  DENOMINATOR   4.0000000

Thus, our model for this regression would be: $Species = \frac{0.7825612*Sepal.Length -0.4266576*Sepal.Width + 0.9490347*Petal.Length + 0.9565473*Petal.Width}{4}$

#### Threshold

NNS.reg(x, y, dim.red.method = "cor", threshold = ...) offers a method of reducing regressors further by controlling the absolute value of required correlation.

NNS.reg(iris[ , 1 : 4], iris[ , 5], dim.red.method = "cor", threshold = .75, location = "topleft", ncores = 1)$equation ## Variable Coefficient ## 1: Sepal.Length 0.7825612 ## 2: Sepal.Width 0.0000000 ## 3: Petal.Length 0.9490347 ## 4: Petal.Width 0.9565473 ## 5: DENOMINATOR 3.0000000 Thus, our model for this further reduced dimension regression would be: $Species = \frac{0.7825612*Sepal.Length -0*Sepal.Width + 0.9490347*Petal.Length + 0.9565473*Petal.Width}{3}$ and the point.est = (...) operates in the same manner as the full regression above, again called with $Point.est.

NNS.reg(iris[ , 1 : 4], iris[ , 5], dim.red.method = "cor", threshold = .75, point.est = iris[1 : 10, 1 : 4], location = "topleft", ncores = 1)\$Point.est

##  [1] 1.0273181 0.9973628 0.9980254 0.9979093 1.0075029 1.2034139 0.9976448
##  [8] 1.0221656 0.9985283 0.9976272

# References

If the user is so motivated, detailed arguments further examples are provided within the following: