NERRS banner

System-wide Monitoring Program

Synthesis of the Water Quality Data from 1995 to 2000
Chapter 3: Classification by Physical, Chemical, Climatic and Land-use Attributes.


Introduction
Previous classification of NERRs based on hierarchical cluster analysis indicated strong regional groupings, suggesting that climate played an important role in controlling water quality (Wenner et al. 2001). Because critical input attribute data used in these cluster analyses were not standardized to the watershed relative to the water body monitored by NERRs, classification of NERRs according to physical and chemical water quality indices and adjacent land-use practices was included in this synthesis. In addition to standardization and refinement of attribute data, 11 NERRs not evaluated by Wenner et al. (2001) were included, thus, substantially increasing the scope of this classification.

The principal objectives of this chapter are to (1) explore relationships among water quality, habitat and soil variables and detect groupings for the NERR sites based on the input variables, and (2) discriminate the groupings and geographic regions that were used to group the reserves. In order to detect the presence of natural groupings and to provide a baseline for comparison of the results with the new techniques and the 1996-1998 analyses (Wenner et al. 2001), hierarchical cluster analysis and correlation analyses were again utilized. In addition to these analyses, three additional analytical techniques (i.e., principal components analysis (PCA), nonlinear multidimensional scaling and discriminant analysis) were also included. Principal components analysis extracted independent and conceptually meaningful factors from correlated variables (Kleinbaum and Kupper 1978); thus, this technique reduced the dimensionality of the data to facilitate understanding of the complex nature behind the interrelated attributes. Multidimensional scaling was similar to PCA, but established the nonlinear relationship among variables to reduce the dimensionality. Discriminant analyses were performed to differentiate the NERR sites based on water quality and habitat characteristics. The stepwise discriminant analyses were used to select the differentiating variables.

Methods
Data
Eleven physical and chemical attributes represented half of the input data used in these analyses (Table 8). These data primarily consisted of site-specific summary statistics from the 1995-2000 NERR SWMP database. Physical data included mean water depth and mean water body width. Mean daily water depth was determined from the NERR SWMP database for all sites, except for three sites where YSI loggers were attached to floating platforms. At these sites (SAPML, SAPFD, GRBGB), depth was determined from the respective site metadata. Water body width was also determined from the site metadata; however, manual determination using Global Information System (GIS) technology was required in some instances when these data were not included in the site metadata. Six additional water quality variables [daily mean salinity, daily salinity range, hypoxia frequency, supersaturation frequency, frequency of cold water temperature (£ 10°C), frequency of warm water temperature (³ 25°C)] were calculated using the data between 1995-2000. Frequency of high turbidity (>25 NTU) and frequencies of extreme pH values (<7 and >8) were calculated using the data between 1999-2000.

Nine land-use and climatic data attributes represented the remaining input data used in these analyses (Table 9). Land-use data were derived from a Geographic Information System (GIS) database. Water quality sampling locations were first digitized using ArcView® software (Environmental System Research Institute, Inc., California). After digitization, the correctness of each location was compared to descriptions in the metadata, and with the GIS files provided by the National Oceanographic and Atmospheric Administration, Estuarine Reserves Division (NOAA ERD, Gunnar Lauenstein, pers. comm.). Watershed boundaries were delineated for each sampling location using digital raster data within NERRs, nationwide elevation, drainage basins, 8-digit HUC (Hydrological Unit Codes), creek, shoreline, and/or river data (Table 10). Watershed boundaries were delineated based on the area where waters were drained to a shared destination bound by topographic features and height of land. The majority of digital data was downloaded from the Coastal Assessment & Data Synthesis (CA&DS) system (Table 10), a national- and regional-level database and mapping analysis tool under development by the NOS Special Projects Office, in cooperation with other NOS offices. We quantified the watershed sizes after research coordinators had confirmed the delineation of watershed boundaries. ArcView extensions (i.e., Geo-processing and X-tools) were used to quantify seven habitat and soil attributes within each watershed for each sampling location including permeability (inches/hour), clay (%), agricultural land (% of area), forest land (% of area), wetland (% of area), urban/developed (% of area), and shellfish bed (% of area). GIS files of land use and soil were obtained through FTP server of the CA&DS System. Four major classes (agriculture, forest, wetland and urban) of the seven categorized land use types provided in the files were included in the analyses. Because the resolution of three data types (barren” and “range” land; water) was not acceptable for small watersheds, they were excluded from the analyses. Total precipitation (cm) between 1995-2000 from the nearest NCDC weather stations was used, except partial data provided from closer stations at North Inlet-Winyah Bay and North Carolina NERRs.

Analyses
A total of 51 sites in the NERR system were included in the analyses. Two Jobos Bay NERR sites were not included in the multivariate analyses due to unavailability of the land use information in GIS. The Model Marsh (Tijuana River Estuary NERR) site was excluded from analyses because of insufficient water quality data given the recent inclusion (Dec 2000) in the NERR SWMP. Lastly, the Lower Duplin (Sapelo Island NERR) site was excluded from analyses because this site is essentially located on top of the Marsh Landing site. The Lower Duplin site was created in 1999 in close proximity to the Marsh Landing site to provide a long-term reference after the responsibility for maintaining the Marsh Landing site changed custody.

Pair-wise correlations were calculated to detect the linear correlation between each of the variables. Pair-wise correlations were necessary to evaluate if variables were auto-correlated and to help in the interpretation of the multivariate analyses described below.

egions; however, the percent of deployments with hypoxic events was substantially greater at West Coast and Gulf of Mexico/Puerto Rico sites (22-28%) than observed for the Mid-Atlantic (15%), Southeast (11%), and Northeast (6%) regions.

A total of 1,564 hypoxic events were observed in the deployments examined (Table 2). Thirty-two percent of these events were observed at West Coast NERRs, down 8% from 1996-1998 (Wenner et al 2001). This finding loosely suggests that hypoxic events may have decreased in 1999-2000; however, 11% of West Coast deployments were not examined. Twelve percent of hypoxic events were observed at Northeast NERRs, the same as previously reported. Hypoxic events at Mid-Atlantic, Southeast, and Gulf of Mexico/Puerto Rico NERRs increased 2-4% (12-20% total) from 1996-1998 levels.

Frequency of hypoxic duration for 1995-2000 data was similar to frequency of hypoxic duration in 1996-1998 (Wenner et al. 2001). Hypoxic events lasting less than 4 hours decreased by one percent and were compensated for by hypoxic events lasting 12-16 hours, which subsequently increased by one percent. Ninety-five percent of all hypoxic events lasted less than 12 hours, similar to 1996-1998 estimates (Wenner et al. 2001).(See PDF for details)

Table 10. Summary of sources of GIS data obtained for multivariate analyses.(See full PDF for details)
GIS sources or website GIS data
1



The Coastal Assessment & Data
Synthesis (CA&DS) system:
ftp://sposerver.nos.noaa.gov/datasets/
CADS/GIS_Files/ShapeFiles
Land use, shellfish, elevation, shoreline, soil, drainage basin

2


NOAA Coastal Service Center, Charleston, SC

NERR base maps, digital raster data within NERR sites
3



San Diego State University Geospatial Data Clearinghouse: http://hurricane.sdsu
.edu/tj/physdata_trw.html

Sub-basin boundary, soil, land use data for Tijuana River Estuary Reserve
4 U. S. Environmental Protection Agency http://www.epa.gov/nsdi/projects/
rf1_meta.html
Rivers


5


USGS http://water.usgs.gov/lookup/
getspatial?huc250k

8-digit HUCs (Hydrological Unit Codes)
Hierarchical cluster analysis was used to detect groupings among NERR sites. Euclidean distance was used as the measure for clustering sites, and the method of average linkage was used. These two analyses were carried out using Community Analysis Package (Pisces Conservation LTD, UK). A resulting dendrogram indicated how the clusters were formed and provided a measure of the linkage distance for clustering. On the resulting dendrogram, the clusters observed at the linkage distance previously established were identified using an amalgamation schedule provided by STATISTICA (StatSoft, Inc., Tulsa, OK). An amalgamation schedule, which indicates linkage distances across the consecutive steps in a clustering process, was used to detect distance where a discontinuity among distinctive groupings could be observed.

Principal components analysis (PCA) was performed to explore the relationship among water quality, habitat, and climatic characteristics. This analysis resulted in the computation of principal components (PCs) scores for each of the NERR sites. A scree plot was evaluated to determine if a clear breaking point was observed. A relationship between the first two principal components and the twenty variables was plotted to determine the orientation of each variable. The first two principal components were then plotted against each other by NERR site to evaluate if groups of sites were observed.

Non-linear multi-dimensional scaling was performed using an auto-associated Artificial Neural Network (ANN). Almeida (2002) states, “The use of ANNs has gained increasing popularity for applications where a mechanistic description of the dependency between dependent and independent variables is either unknown or very complex”. ANN is a method that deconvolutes complex signals by allowing the data to take on the shape of any unbroken curve. The curve fitting occurs over and over allowing the model to learn the data and develop a predictive model (Almeida 2002). A variable number of hidden nodes were used and activation values of the hidden nodes for each case defined the reduced coordinate system.

Discriminant analysis was performed using SAS (SAS Institute, Inc., Cary, NC) to differentiate five geographic regions previously classified in the report using these twenty sites attributes. The distinguishing variables used to separate the groupings were selected using stepwise, forward, and backward discriminant analyses. Significance levels to include and/or remove for these analyses were set at 0.15 as the default values. Box plots for the selected attributes were graphed to visualize the differences among regions using STATA (Stata Corp, TX).

No turbidity data were collected during 1999-2000 at five (CBMJB, CBMPR, SAPFD, SAPML, WQBCB) sites. The two most common methods, “complete case analyses” (excluding all data from the five sites), and “mean substitution method” (substituting the missing data with the mean value) were used to handle missing data for multivariate analyses. Complete case analysis and the mean substitution method yielded very similar results for cluster analyses, discriminant analyses, and PCA. Only the results using the mean substitution method were provided here in order to be inclusive but succinct.

Results and Discussion
Correlation
Soils rich in clay content were positively associated with low soil permeability (r = -0.76, p < 0.0001) and wetland area (r = 0.393, p = 0.004). Wetland areas were negatively correlated with the amount of forested land (r = -0.404, p = 0.003), agriculture (r = -0.332, p = 0.02), and urban (r = -0.285, p = 0.04) land uses. Watersheds with a large percentage of functional wetlands were associated with abundant shellfish beds (r = 0.363, p = 0.009), which primarily occurred in areas with high salinity (r = 0.34, p = 0.01) and high pH (r = 0.304, p = 0.03). Wetland areas were also associated with warm water temperature (r = 0.458, p = 0.007) and high occurrences of summer hypoxia (r = 0.358, p = 0.01), but negatively correlated with cool water temperature (r = -0.398, p = 0.004).

In contrast, the amount of forested land was positively correlated with cool water temperature (r = 0.352, p = 0.01) and high precipitation (r = 0.527, p = 0.0001), but negatively correlated with high summer hypoxia frequencies (r = -0.366, p = 0.008). Furthermore, the amount of agricultural land was negatively correlated with the amount of forested land (r = -0.35, p = 0.01), salinity (r = -0.458, r = 0.0007), and range in depth (r = -0.49, p = 0.0003). It appeared that sites with lower salinity water and reduced tidal influence would have a higher percent of land available for agricultural purposes. Sites with a large percentage of agricultural land were also positively associated with high turbidity (r = 0.423, p = 0.002).

Salinity was positively correlated with alkaline waters (r = 0.373, p = 0.07) and daily depth range (r = 0.404, p = 0.003), but negatively associated with turbidity (r =-0.5, p = 0.0002). Cold water was positively correlated with more acidic waters (r = 0.3, p = 0.03) and negatively correlated with hypoxia (r = -0.408, p = 0.003). Supersaturation was associated with more alkaline waters (r = 0.561, p < 0.0001) and small daily depth range (r =-0.336, p = 0.01), which implied sites were either located in non- or small-tidal areas.

Cluster Analysis
Cluster analysis was used to group 51 sites in the NERR SWMP according to physical, chemical, land use, climatic, and soil attributes (Tables 8-9). A Euclidean linkage distance of approximately 5.8 was used as the grouping criteria based on the amalgamation schedule (Figure 14). Four distinct groupings of NERRs were identified, with the exception of eight sites (TJROS, TJRTL, HUDTS, HUDTN, NARTW, HUDTS, NARTW, CBVGI, RKBBR, NOCMS) that were not easily classified (Figure 15). Groups two and four appeared to correspond to geographical region and latitude. Sites belonging to group two were primarily located in South Carolina, except for two sites along the West Coast (ELKSM and PDBBY). In contrast, group four was a large grouping, primarily consisting of sites with cooler water temperature located in the Northeast/Great Lakes and the Mid-Atlantic regions (Figure 15). One West Coast site, PDBJL, located at similar latitude as other sites in this group, was also included. Group three was also a large grouping, primarily consisting of sites associated with more saline and less turbid water than the group two sites. The smallest grouping was group one, consisting of two Elkhorn Slough sites (ELKNM, ELKAP) and both Waquoit Bay sites.

../images/fig24.jpg

Amalgamation schedule used to identify groupings among 51 NERR sites. The two straight lines indicate the linkage distance

Dendrogram of 51 NERR sites based on habitat and water quality characteristics. (See PDF for details).

Eight sites did not fit into one of the four major groups (Figure 15). Masonboro Island in the North Carolina Reserve was very dissimilar to other sites because of its small watershed, high permeability, and high percentage of wetland and shellfish. High temperature and summer hypoxia occurrences differentiated Blackwater River in the Rookery Bay Reserve from other sites. Two sites in the Tijuana River Estuary NERR with 100% urban/developed land and extremely low precipitation were also very dissimilar from other NERR sites. Large watersheds at NARTW and CBVGI distinguished these two sites from others. Similarly, the extremely large watershed sizes at HUDTS and HUDTN distinguished these sites from other NERR sites. With the exception of nine reserves (Padilla Bay, Wells, Narragansett Bay, Mullica River, North Inlet-Winyah Bay, North Carolina, Chesapeake Bay-Virginia, Delaware, and Weeks Bay), at least two sites within each reserve were more similar to each other than to sites located in other Reserves.

The dendrogram of site attributes produced four major groupings (Figure 16). Within group one, hypoxia, warm water temperature, and wetland area were most similar, which reinforces the correlative relationship noted previously. Among the sites within group 2, supersaturation and more alkaline water were most similar. Among the sites within group 3, agricultural land and turbidity were similar. Among the sites within group 4, the amount of forested land and precipitation were most similar. These findings were consistent with the results from the correlation analyses
(See PDF for details).

Principal components analysis
No clear breaking point in the scree plot was observed (Figure 17); thus, eigenvalues greater than 1 were used to select the first 7 principal components. Seventy-six percent of the variance was explained by these seven components (Appendices 33-34). This large number of PCs weakened the purpose of dimension reduction and made it difficult to interpret the abstract PCs. The large number of PCs also emphasized the high variability among water quality and habitat characteristics of interest.


Scree plot of principal components analysis.

The first three principal components explained 44% of the total variation in all variables, and are briefly discussed here. The first principal component distinguished two major groups based on water temperature and habitat (i.e., warm wetland vs. cold forest) and explained 16% of the original variation (Figures 18-19). Warm water NERR sites with high wetland components were primarily located in the Southeast, Gulf of Mexico, and California, while cold water, predominantly forested NERR sites were located in the Northeast, Mid-Atlantic, and the northern West Coast. The second PC explained 15% of the original variation and illustrated salinity regime and land use influence (Figure 19). Specifically, high percent of agricultural land was associated with low salinity and high turbidity, as determined using correlation and cluster analyses. Examples of low salinity, high agricultural NERRs include Old Woman Creek, Delaware Bay, and both Chesapeake Bay Reserves. Examples of high salinity, low agricultural NERRs include Reserves located along the Southeast Coast and within California. These findings were consistent with correlation and cluster analyses (Figure 20). The third component accounted for 13% of the variance (Appendix 34) and represented precipitation and mean daily range in depth. No distinct boundaries between the groupings were identified (See PDF for details).
Nonlinear multidimensional scaling using artificial neural networks
More total variance was explained by nonlinear multidimensional scaling compared to PCA with the same numbers of components/dimensions before reaching 20 components (Appendix 34). For example, 13% more variance was explained by the first three components using multidimensional scaling than was explained by the first three components using PCA. This scenario indicated the existence of non-linearity in the relationship among these site attributes. Similar trends of explained variance were generally observed in both PCA and multidimensional scaling; however, ten PCA components were needed to account for more than 85% of total variance, compared to only six multidimensional scaling components to account for similar amounts of variance (Figure 21). Subsequently, the relationships among these variables were complex and were not reducible to lower dimensions as we had hoped, a similar conclusion reached using PCA and cluster analysis.

Fifty-one sites were spatially mapped to the first two dimensions, which was parallel to the method used to examine the pattern of NERR sites using PCA (Figure 22). The distribution pattern of the sites was very similar to the result using PCA, with the reversal of both axes. For instance, the group on the upper right of the PCA map was located on the lower left corner of this new map. Again, the sites with more distinctive features, such as RKBBR, NOCMS, HUDTS, HUDTN, NARTW, and CBVGI, consistently occurred on the edge of the map. The sites with warmer water separated from the others along the first dimension while the second axis differentiated the sites using salinity, turbidity, and percent area of agricultural land. These predominant natural trends were similar to the first two dimensions from the PCA.(See PDF for details)

Discriminant analysis
MDiscriminant analyses successfully differentiated the five geographic regions with the error rate of 6% (Table 11). Two Mid-Atlantic sites and one West Coast site were grouped with sites in the Northeast region, and one site located in the Northeast was grouped with sites in the Mid-Atlantic region. Attributes were most different between the West Coast and the Gulf of Mexico using the generalized square distance (Table 12). Attributes were most similar between the Northeast and the Mid-Atlantic. The Southeast and the Gulf of Mexico were also fairly similar when compared to other regions.

Differentiating attributes (Figure 23) were selected using three distinct stepwise discriminant analysis methods (stepwise, forward selection, and backward elimination, (Table 13). Stepwise and forward selection kept the same ten attributes in the model: warm water temperature (³ 25oC), cold water temperature (£10oC), pH <7, pH >8, clay, permeability, turbidity, precipitation, salinity range, and mean range in depth. Backward elimination was in general agreement with these two methods. (See PDF for details)
Conclusions
In summary, eight distinguishing attributes were consistently important factors for all four methods including warm water temperature (³ 25oC), cold water temperature (£10oC), pH <7, pH >8, clay, precipitation, salinity range, and mean range in depth. As expected, temperature is the most distinguishing characteristic. Water temperature was very warm at sites in the Gulf of Mexico and fairly warm in the Southeast while water temperature in the Mid-Atlantic and the Northeast was cooler. Water was more acidic in the Mid-Atlantic when compared with other regions, where most sites were more oceanic and alkaline. Tidal dynamics (mean daily in depth, and salinity range) seemed greater in the Southeast and the West Coast than at sites in the Gulf of Mexico and the Mid-Atlantic. However, the variation of these attributes was large within the West Coast, so no clear-cut conclusion could be drawn. Precipitation was excessive in the Gulf of Mexico, and the proportion of clay was high in the Southeast and the West Coast.