Suggestions

Proper HTTP Headers

xml files and dods/opendap responses should have headers so that they are properly cachable. This includes at a minimum the Last-Modified header, but could also require Cache-Control: public lines and Vary: specifications depending on the behavior of the server. We, for example, include Vary: Authorization lines on pages that are derived from password-protected datasets.

Content-Length also helps increase reliability: caches will not cache responses that do not match their Content-Length specification.

Most http servers serve normal files with last-modified tags; those servers require cgi scripts to set those header lines if the pages are to be cachable.

Ingrid does display the last-modified information, which can be helpful in checking a given collection or dataset.

example with last-modified	-	THREDDS catalog with last-modified tags pointing to a DODS server with last-modified tags. For pages derived from datasets with last-modified tags, Ingrid gives that time as the last updated time at the bottom of the page. In this case all the pages from the THREDDS catalog on down have last updated times.
THREDDS page	-	currently the top THREDDS page is served without last-modified times. Some server that it points to have last-modified times, so some of the subpages do have Last updated lines at the bottom.
DODS page without last-modified times	-	Another example of how a DODS dataset without last-modified times appears in Ingrid.

There are, of course, other ways of looking at the HTTP headers to make sure that ones servers are delivering last-modified tags on a given WWW response.

DODS Request Size Negotiation

One way to get good transfer rates is to ask for larger pieces: this is a pure win for servers that stream, and even for servers that process in a single chunk, the ideal size could very well be larger than a single lat/lon slice. Short of the client trying a whole bunch of sizes and keeping track of the results, there is no good way to figure out the optimal size. And it is easy to end up with a server that has ill-defined behavior when the request is too large, the classic response being an error message inserted in the data stream.

The classic c-behavior where the client asks for as much as it wants and the server returns as much as it can has a certain grace-and-style: not entirely clear whether we can achieve the same.

Given gigabit ethernet, can we really stick with a 2GB limit on the size of a single request?

global aliases

THREDDS has an interesting ability to have multiple dods servers for a particular dataset. This means that a server that is re-serving a dataset could make that particularly clear by also marking the dataset with the original dods server. There are a few cases where one might want to carry this farther:

If one has picked out one variable from a much larger dataset (e.g. the best-estimate from a dataset which also includes number-of-observations, std-dev, smoothed, unsmoothed version), it would be nice if that relationship could be indicated as well.
if only some of the metadata has changed, it would be nice if the client could figure out that the data itsself does not need to be recopied.

Literature references in XML

Frequently (one hopes) the dataset metadata includes literature references. There must be one or more XML standards for transmitting such information: it would be great if we could pick and support one.

Visualization metadata

Some visualization metadata should get transmitted with the dataset, particularly preferred colorscales. At the moment, we have a list of named colorscales and carry the name across, but we would prefer to be able to describe an arbitrary colorscale. My preference would be to transmit this as a specialized DODS dataset, with the independent variable corresponding to the data values and the dependent variable(s) giving the color values. This would be one example of an attribute being a reference to (another) DODS dataset/variable.

short and long names for datasets

Language-based clients can make good use of short as well as more complete descriptions of datasets. THREDDS should facilitate that.

For example, the CDC dataset that I used as an earlier example is represented in Ingrid as

THREDDS
  (Public Climate Data from the NOAA-CIRES Climate Diagnostics Center) @@
  .CPC_.25x.25_Daily_US_UNIFIED_Precipitation
  (Monthly Accumulated Precipitation) @@

and the dataset that I read via THREDDS from the Data Library is

THREDDS
  (IRI/LDEO Climate Data Library) @@
  .NOAA .NCEP .EMC .CMB .GLOBAL .Reyn_SmithOIv2 .weekly .ssta

While the long names are good for display, it is very useful to have short unique names that can be used to concisely refer to the datasets on a server or a server in a collection of servers. We all concoct these short names for internal use: THREDDS should let us share them with each other. Something as simple as having both name and long_name in the standard with only name required would suffice to allow data providers to share their short names.

arrays of bytes vs. strings

Some servers translate netcdf arrays of byte data into DODS arrays of one character strings. Easier for client writers if you leave them as byte arrays or better-yet translate them to multi-character strings. If you do not translate to multi-character strings, the client has to figure out that it needs to translate to multi-character strings, and the client does not even know that the data came from netcdf files in the first place, i.e. it has an even harder problem than the netcdf-to-dods server did.