Signal and noise

I'd like to make available to a wider audience a stream of GNSS sky recordings.

The recordings are made with the GNSS Firehose digitizers and Tallysman TW3972 antennas. Each recording is 200 ms long and contains three GNSS bands, each sampled at 70 MHz with a useful bandwidth of about 50 MHz. The schedule repeats every 5 minutes:

GPS time modulo 300 seconds	Recorded bands
0	L1, L2, L5
150	L1, E6, L5

Thus, the cadence for L1 and L5 is every 150 seconds and for L2 and E6 every 300 seconds.

Here are the nominal center frequencies and spans for the various bands:

Band	Center frequency, MHz	Approximate useful span, MHz
L1	1584.754875	1558-1610
E6	1273.654125	1248-1300
L2	1227.727125	1201-1253
L5	1191.641625	1165-1217

I think this covers every GNSS signal except the S-band signal from IRNSS. (Incidentally, my antennas are not characterized for reception at E6, but the signal is good enough to be useful.)

Now, these are pretty short recordings, but they should still be long enough to obtain good-quality observables for both pseudorange and carrier phase (or at least the fractional parts thereof). The timing is chosen to coincide with observables from other GNSS receivers, which conventionally output estimates every 30 seconds (or faster), aligned with GPS time. The maximum difference in arrival times between observers on Earth is about 20 ms, so a recording duration of 200 ms guarantees at least 180 ms of overlap.

Currently I have two locations continuously uploading these recordings, one in Indiana, USA and one in California, USA. Indiana has the better sky coverage---about half the sky is available, from azimuth 85 to 275, down to elevation 10 degrees or so. I'd like to encourage others to upload waveforms of this type, so that worldwide signal monitoring becomes possible. Of course, IGS and IGS/MGEX have been covering this ground for many years, but at the observable level, not the waveform level.

The carrier-phase observables from successive waveform snippets hundreds of seconds apart will not be linked by integer cycles, unlike those from a continuously-tracking receiver (unless the receiver clock happens to be very stable, with TDEV(300s) << 1 cycle). But this is not an insurmountable drawback. The integers can be put back in by differencing against a nearby reference receiver that does continously track; alternatively, analysis methods can be used that don't depend on integer cycles, for example the ambiguity function.

Storage details

The files are being stored on Amazon's S3 cloud-storage platform. For the first month, standard S3 is used, so the data is available immediately upon request (with request latency of less than one second). After one month, the files are migrated to S3 Glacier Deep Archive, which has much lower storage cost but retrieval latency of up to 48 hours. All the files are still accessible, but you'll have to wait up to 48 hours to get a copy, so it's best to do any processing (such as observable estimation, cross-correlation among stations, signal-health metrics, etc.) during the first month.

The S3 bucket is "s3://gnss-recordings-pmonta", and its region is us-east-1 (Northern Virginia). If you want to do computation with these recordings, it would be cheapest to use EC2 compute resources in that region, since data transfer between S3 and EC2 is then free. If you transfer the files to other AWS regions, or download them over the Internet to your own storage and computation, then transfer costs are imposed.

If you upload GNSS waveforms to AWS, please also use us-east-1 if possible.

Speaking of costs, the S3 bucket is marked "public" and "requester pays". This is so I don't have to foot the bill for possibly voluminous worldwide downloads of these files (I do have to pay for the storage costs though). So downloading these files requires that you have an AWS account. I'd prefer to not have this speed bump, but I don't see any way around it. Sorry. Maybe at some point AWS can make it part of their "free public data set" offering, in which case anonymous access would be free I think. Note that there is a free tier for AWS, which, after signup, allows a limited amount of storage, computation, and data transfer per month.

A shallow hierarchy is used inside the s3://gnss-recordings-pmonta bucket, of the form <year>/<doy>/filename, where <doy> is the ordinal day within the year ranging from 1 to 366. All dates and times are in GPS time, that is, TAI minus 19 seconds.

2020/
  029/
    ...
    PAOC00USA_2020029103500.json
    PAOC00USA_2020029103500_L1.iq.bz2
    ...
    SBTH00USA_2020029103500.json
    SBTH00USA_2020029103500_L1.iq.bz2
    ...

To reduce the friction of getting started with these files, I have a small subset of them available here on a public basis with no need for AWS credentials:

http://gnss-recordings-pmonta-sample.s3-website-us-east-1.amazonaws.com/index.html

This document is an index to the available files: two sets of recordings, seven hours apart, from the two sites. The sample files can be downloaded with a web browser using the provided links, or alternatively with wget as follows:

wget http://gnss-recordings-pmonta-sample.s3-website-us-east-1.amazonaws.com/PAOC00USA_2020029103500.json
wget http://gnss-recordings-pmonta-sample.s3-website-us-east-1.amazonaws.com/PAOC00USA_2020029103500_L1.iq.bz2
...

File format and naming

Each recording consists of a small amount of metadata and sample streams for the various RF bands. There are many choices for packaging them: HDF5, or the recent ION standard for GNSS waveform files [1], or the various RF/IF formats from radio astronomy. After reading this interesting essay about HDF5, archivalness, and software transparency [2], I opted for flat binary files compressed with a well-known compressor (bzip2), together with metadata in JSON. I don't claim it's the best choice, but at least it is easy to understand and can be easily translated into any of these other formats.

Prior to compression, the sample streams consist of 8-bit signed integers, alternating between in-phase and quadrature part (or real and imaginary part): i,q,i,q, etc. This is a common format for SDR sample streams. 8 bits is ordinarily enough precision for GNSS signals; in fact my recordings are 2-bit (4-level), with signal values -3,-1,1,3. (A signal value of 0 is used for missing data resulting from a dropped packet during recording, which is rare.) Higher-precision data, such as 3-bit or 4-bit, would fit seamlessly into this scheme. The compression reduces the file size to approximately the entropy of the underlying source.

File names roughly follow the RINEX3 convention:

a 9-character site string containing a 4-character site ID, 1-digit marker number, 1-digit receiver number, and 3-character country code</li>
Date and time (GPS time)</li>
Suffix, either .json for JSON metadata or <band>.iq.bz2 for IQ sample data</li>

A full set of four files for a given site and time looks like this:

PAOC00USA_2020029103500.json
PAOC00USA_2020029103500_L1.iq.bz2
PAOC00USA_2020029103500_L2.iq.bz2
PAOC00USA_2020029103500_L5.iq.bz2

This scheme results in many small files (about 1000 per site per day), but has the advantage that the user can request just the data desired.

Metadata

I chose metadata fields mostly inspired by RINEX3. In fact one goal is to be able to construct a single-epoch RINEX3 file from each file-set, and for that one needs observables from all the systems of interest and enough RINEX3 metadata to fill in the blanks.

Here is an example JSON file:

{
    "antenna": {
        "serial_number": "xxxx",
        "type": "Tallysman TW3972"
    },
    "approx_position": [
        -2701201.6,
        -4291624.4,
        3855647.9
    ],
    "bands": [
        {
            "center_freq": [
                5797,
                256
            ],
            "filename": "PAOC00USA_2020029103500_L1.iq.bz2",
            "name": "L1",
            "receiver_channel": 1
        },
        {
            "center_freq": [
                4491,
                256
            ],
            "filename": "PAOC00USA_2020029103500_L2.iq.bz2",
            "name": "L2",
            "receiver_channel": 2
        },
        {
            "center_freq": [
                4359,
                256
            ],
            "filename": "PAOC00USA_2020029103500_L5.iq.bz2",
            "name": "L5",
            "receiver_channel": 3
        }
    ],
    "marker_to_ARP": [
        0,
        0,
        0
    ],
    "observer": {
        "name": "Peter Monta"
    },
    "receiver": {
        "serial_number": "70:B3:D5:F7:90:09",
        "software_version": "2",
        "type": "GNSS Firehose"
    },
    "sample_rate": 69984000,
    "site": {
        "country": "USA",
        "marker_number": 0,
        "name": "PAOC",
        "receiver_number": 0
    },
    "time_duration": "0.2",
    "time_start": "1580294099.9"
}

The fields are mostly self-explanatory (when read together with the RINEX3 spec) except for the "bands" list. For each band, a center frequency is given as a ratio of integers P/Q, which is to be multiplied by the sample rate. This represents the coherence between the sample rate and each channel's downconverter local oscillator. If, for example, the user wanted to construct a single wideband signal from multiple overlapping subbands, then the exact ratios would allow the frequency difference to be set in stone, i.e., not estimated. Only the relative phases would need to be estimated. Those phases are fixed because the various PLLs in the receiver frequency chain never go out of lock. (What, never? Well, hardly ever.)

Licensing

I'm not an expert on copyright, but my intent is to provide these files to be used freely by anyone, roughly according to the Creative Commons license CC BY-SA 4.0. If the attribution or derivative portions of this license turn out to be unwieldy, especially if others contribute similar data, then it might be revised. I don't know if entities like IGS or IERS have a formal license for their data, but if so, that might be a good model.

Applications

Possible applications, exclusive of the usual ones involving observables, positioning, etc.:

satellite health monitoring
receiver development and testing
observables obtainable only from waveforms, e.g. cross-correlation of GPS M code, Galileo PRS, etc.
cooperative estimation of things like W bits or M bits (requires many stations)
archival "history" of GNSS

Using the current upload schedule, the average data rate per station is about 1 Mbit/s, with a duty cycle of either 0.07% or 0.13% depending on the band. It is not currently feasible to provide megabit-class uplinks from GNSS stations in the middle of nowhere, although the large LEO constellations under construction might change that in the near future. Continuous waveforms would be even better [3], but currently require deep pockets for the storage and bandwidth. A reasonable path might be a gradual increase in duty cycle.

Disclaimer

I can't make any guarantees about data uploads or data availability. "Best effort."

References

[1]	GNSS Software Defined Receiver Metadata Standard, http://sdr.ion.org.s3-website-us-east-1.amazonaws.com

[2]	Moving away from HDF5, https://cyrille.rossant.net/moving-away-hdf5

[3]	Considerations for Future IGS Receivers, http://www.ngs.noaa.gov/IGSWorkshop2008/docs/recDev-positionpaper.pdf

Peter Monta's projects

GNSS sky recordings