Problems with ancient date-time values

04 Sep 2023

Sometimes you get questions that really lead you down rabbit holes:

Year Zero

My database uses 0000-12-31 as a timestamp and it crashes Google Big Query as it doesnt consider 0000 as a year. So to solve this customer asks if we can replace all 0000-12-31 values with 0001-01-01.

Why does the year 0000 crash Big Query?

Its a “bug” in big query: using Anno Domini calendaring instead of astronomical year numbering which you would think be more appropriate for scientific systems. Year 0000 doesn’t exist in AD, it goes 1BC, 1AD. In this case year 0000 should be converted to -0001 not 0001.

What does year 0000 even mean to a computer?

Computers interpret date-time as an instant in time represented by milliseconds relative to Epoch. This works with with remarkable consistency across implementations for dates after the adoption of UTC in 1972 but what about dates in the ancient past?

Parsing 0000-01-01T00:00:00Z gives different results depending what programming language you are using:

$ java Main
-62135769600000
$ node date.js
-62135596800000
$ python date.py # smallest date allowed is "0001-01-02T00:00:00Z"
-62135546692000.0
$ bash date.bash
-62135596800000

Here are some of the important things happening around the this date:

The Roman Empire was at its peak, and Emperor Augustus was in power.
Herod the Great was king of Judea.
The construction of the Great Wall of China began during the reign of Emperor Cheng of the Han dynasty.
The Silk Road trade route was well-established, connecting China to the Mediterranean.
The Jewish temple in Jerusalem was still standing, but would be destroyed in the year 70 AD by the Romans.

So why are the answers different? Basically it could be anything but some prime suspects are:

Leap seconds
Leap years
Pope/Emperor decree
Timezones (or lack of)
Early dates are a great place to put “fiddle factors” since no one should be using them
The length of a day in milliseconds in 1 AD was different from 2023. The Earth’s rotation is gradually slowing down so the length of a day in milliseconds in 1 AD was approximately 86,400.002 seconds, whereas in 2023, it is approximately 86,400.0002 seconds

Most importantly:

GMT was adopted in 1884
UTC was adopted in 1972

…And probably many more issues

I hope its now clear why referencing a precise instant of time in Year Zero is very difficult!

Why would you even need a precise definition of time for an instant in Year Zero?

This is a very good question.

In the past, I’ve dealt with datasets going back to the 1800s that needed accurate timing and Oracle gave us zero issues, but what on earth could someone be doing that needed accurate timestamps going this far back?

Turns out this customer was doing with realtime data.

What was the real intention?

I have no idea, but its likely 0000-12-31 is a one year offset to a more recent time, such as today. Notice how by missing the time component its not actually the full year too. Already we are seeing problems.

How do we fix this?

Ideally at the source, for example a Go app could calculate date offsets as below. Note that years have different lengths due to leap years:

package main

import (
	"fmt"
	"time"
)

func main() {
    // Length of year in Java:
    // https://stackoverflow.com/questions/29899299/is-there-a-way-to-have-a-java8-duration-of-one-year-that-accounts-for-leap-years#29899551
    // surely there is a Golang module providing this functionality
    durationString := "8760h" // duration string representing one year (365 days * 24 hours/day)
    duration, err := time.ParseDuration(durationString)
    if err != nil {
        fmt.Println(err)
        return
    }
    milliseconds := duration.Milliseconds()

    // 31536000000
    fmt.Printf("One year has a duration of %v milliseconds\n", milliseconds)
}

With fixed time offsets in milliseconds represented as long integers all of the above issues and concerns vanish instantly and there is no need to worry about such nonsense or waste time and money identifying and working around the quirks of each system the data passes through.

1000 AD

I have this specific date 1000-01-01T00:00:00Z, when I sink this data to Oracle I get 0999-12-27T00:00:00Z, all other dates work fine

This seems familiar, but lets ask GPT for its thoughts on timekeeping in 1000 AD:

The accuracy of time measurement in 1000 AD was limited by the available technology and the understanding of astronomy and mechanics at the time. It’s important to remember that the concept of precise minutes and seconds as we understand them today did not exist in the same form during this period. Time was often measured in broader intervals, and the exact time of day was not as crucial for most daily activities as it is in modern society.

Makes perfect sense. This time were using dates-time values as magic numbers, eg:

1000 AD means no date set for “early” event
3000 AD means no date set for “late” event

Now we know how computers represent date-time its obvious for us to see why this will give us problems vs a simple integer such as -1. For a magic number to work properly we need to match an exact integer value but:

date-time values in the Middle Ages are not scientifically definable
date-time values in the Next Millennium are scientifically defined… But the definition may change at some point in the future, so perhaps your app will mysteriously break in a few (hundred?) years!

How can we fix this?

Ideally in the same way as our Year Zero problem. At the source - one/both of these approaches would work:

boolean field to indicate whether a date time is relevant or expected to be a usable value (replaces magic number)
null value in date-time field to indicate date is not expected to be usable

But I cant fix my data!

Unfortunately this is the reality for a lot of data consumers. In this case you need to clean the data before you can work on it - either once at the ingest point of your system or if thats not possible, at each point of use where problems are occurring.

Now we understand why the exact representation of this data is nonsense, its probably easiest to:

Convert date-time values to string
Carry out any required cleanup by doing find and replace with regular expressions
Convert back to data-time for downstream systems to use.

Yuk!

Im a programmer, how can I avoid these problems?

Say what you mean. If you want to add 365 days to the current time write data structures to support this and then do so
Dont shoe-horn multiple items of data into a single field
If you have an on/off switch use a boolean value
null out fields that have no meaning

And most importantly:

Stay away from date-times in the ancient past unless you actually need to reference an instant of time in this period (you dont).

Sample Programs

Dont just take my word for it…

Java

Main.java

import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.util.Date;
import java.util.TimeZone;

public class Main {
    public static void main(String[] args) throws ParseException {
        String dateString = "0001-01-01T00:00:00Z";
        SimpleDateFormat dateFormat = new SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss'Z'");
        dateFormat.setTimeZone(TimeZone.getTimeZone("UTC"));

        Date date = dateFormat.parse(dateString);
        long milliseconds = date.getTime();

        System.out.println(milliseconds);
    }
}

JavaScript

date.js

const dateString = "0001-01-01T00:00:00Z";
const dateObject = new Date(Date.parse(dateString));
const milliseconds = dateObject.getTime();
console.log(milliseconds)

Python

date.py

import datetime

date_string = "0001-01-02T00:00:00Z"
date_format = "%Y-%m-%dT%H:%M:%S%fZ"

date_object = datetime.datetime.strptime(date_string, date_format)
milliseconds = date_object.timestamp() * 1000

print(milliseconds)

BASH

date.bash

#!/bin/bash

date_string="0001-01-01T00:00:00Z"
date_format="%Y-%m-%dT%H:%M:%S%Z"

# Convert date string to timestamp in seconds
timestamp=$(date -u -d "$date_string" +"%s")

# Multiply by 1000 to get timestamp in milliseconds
timestamp_ms=$(($timestamp * 1000))

echo $timestamp_ms

geoffwilliams@home:~$