Building a Date-Time Library: Timezones

Matthias Ngeo (Pante)
11 min readAug 18, 2023

--

The Persistence of Memory — Salvador Dalí

This article discusses the lessons revolving timezones that I learnt while creating Sugar, a date-time library for Dart. Hopefully it proves useful to those interested in creating their own date-time libraries.

Creating a date-time library that is capable of only handling UTC and local date-times is trivial in languages that have an FFI with C/C++. You need only to call two native functions to retrieve the current Unix timestamp and local timezone offset from the operating system, before performing some rudimentary calculation to obtain both.

The same cannot be said for timezones. In most cases, it isn’t possible to “just ask the operating system for the offset of an arbitrary timezone”.

But why is support for timezones even important? For most trivial use-cases such as displaying the current date-time, having only the UTC and local date-time is indeed sufficient. However, it become essential when performing any form of date-time calculation or scheduling across different timezones.

Suppose you scheduled a reminder for an important appointment one week in advance. Without support for timezones, the reminder will be sent a few hours early or late if a Daylight Saving Time (DST) transition occurs that week, rendering it moot.

Perpetually Changing Timezones

A geographical location’s timezone offset often varies over time, often due to the whims of the governing body. To determine the timezone offset at a given moment, a set of timezone-specific transition rules is applied. These transitions can be either periodical or one-time occurrences.

For example, in the United States, where DST is observed, the transition from “winter” time to “summer” time and vice-versa occurs periodically each year; typically on the second Sunday of March, 2:00 a.m., and vice-versa on the first Sunday of November, 2:00 a.m..

In another example, Sudan permanently changed its standard time from UTC+3 to UTC+4 in 2017 with less than one month’s notice.

One of the challenges arises from the unpredictable and volatile nature of the transition rules. They can be changed for arbitrary reasons, as was the case with DST in the United States in 2005. There are even proposals to abolish DST in the United States and European Union. Adding to the chaos, certain governments are terrible at communicating their timezone changes in advance. This include Sudan, as mentioned above, and Morocco’s abrupt abandonment of DST less than two days before a transition in 2018, among many others.

The volatile nature of the transition rules implies that a date-time library’s transition rules can become rapidly obsolete or even wrong in the near future. Using a date-time library that hasn’t been updated since 2018 will lead to incorrect local date-time calculations for Morocco. In other words, the difficulty lies in keeping the transition rules up-to-update.

Libraries that embed transition rules often provide regular maintenance releases to remedy this. Joda Time, one of the first well designed Java date-time libraries, still continues to provide annual maintenance releases as of 2023, after being incorporated wholesale into JDK 8 in 2014, nearly a decade ago. If you’re serious about creating a date-time library, be prepared to maintain it in the long run.

Some libraries also offer a fallback mechanism to override the embedded transition rules. This proves particularly beneficial for projects that outlive a date-time library’s maintenance since those projects can supply their own transition rules.

An alternative to embedding transition rules is to use the underlying platform or operating system’s transition rules. However, this approach has several drawbacks. Most importantly, reliably locating and parsing transition rules is difficult due to discrepancies across different platforms and versions. For example, the transition rules are stored in a proprietary format inside the Windows registry on Windows, while they are typically stored as TZif files on MacOS/Linux.

Timezone Transitions

Deriving a local date-time from a Unix timestamp and a set of transition rules isn’t straightforward. It is possible for a local date-time to be invalid or even ambiguous during a timezone transition due to the clock jumping forward or being set back. This problem is especially pronounced in timezones that observe DST.

There are three cases to consider.

The normal case is where there is only one valid local date-time. This is always true for timezones that do not observe DST, and for the majority of the year in timezones that do.

The gap (invalid) case is caused by the clock jumping forward and leaving behind an “empty space”. No valid local date-time exists in that “empty space”. This typically occurs when transitioning from “winter” to “summer” time in timezones that observe DST.

Clock jumping forward

In Detroit, America, the DST transition from “winter” to “summer” occurs on March 12th, 2023, 2:00 a.m.. The clock jumps from March 12th, 2023, 2:00 a.m. immediately to March 12th, 2023, 3:00 a.m. Therefore, any local date-time that occurs within that gap, such as March 12th, 2023, 2:30 a.m., does not exist.

The overlap (ambiguous) case is the reverse of the gap case. It is caused by the clocking being set back, leaving behind an overlap where time “repeats”. This typically occurs when transitioning from “summer” to “winter” time in timezones that observe DST.

Clock being set back

In Detroit, America, the DST transition from “summer” to “winter” occurs on November 5th, 2023, 2:00 a.m.. The clock is immediately set back from November 5th, 2023, 2:00 a.m. to November 5th, 2023, 1:00 a.m.. Therefore, it is possible for any local date-time within that overlap, such as November 5th, 2023, 1:30 a.m., to exist twice.

To account for these three cases, the algorithm for deriving a local date-time, given a Unix timestamp, can be implemented as:

// Implementation adapted from Sugar, https://github.com/forus-labs/cauldron/blob/sugar/3.1.0/sugar/lib/src/time/zone/dynamic_timezone.dart#L40
// Algorithm based on Joda Time, https://github.com/JodaOrg/joda-time/blob/main/src/main/java/org/joda/time/DateTimeZone.java#L951

// Step 1: Get the initial timezone offset at the local time represented in microseconds.
final localMicroseconds = local; // 'local' is the Unix timestamp in microseconds.
final localTimezone = timezone(at: localMicroseconds);
final localOffset = localTimezone.offset; // The amount of time, in microseconds, which the local timezone differs from UTC+0.

// Step 2: Adjust localInstant using the initial estimate and recalculate the offset.
final adjustedMicroseconds = localMicroseconds - localOffset;
final adjustedTimezone = timezone(at: adjustedMicroseconds);
final adjustedOffset = adjustedTimezone.offset;

// Step 3: Calculate microseconds representing the local date-time without DST offset.
var microseconds = localMicroseconds - adjustedOffset;

// Step 4: Check if the initial and adjusted offsets differ. If so, handle DST boundary cases.
if (localOffset != adjustedOffset) {
// Step 4.1: Ensure the time is always after the DST gap when the offset is negative.
//
// We need to ensure that time is always after the DST gap
// this happens naturally for positive offsets, but not for negative.
// If we just use adjustedOffset then the time is pushed back before the
// transition, whereas it should be on or after the transition.
if (localOffset - adjustedOffset < 0 && adjustedOffset != timezone(at: microseconds).offset) {
microseconds = adjustedMicroseconds;
}
} else if (localOffset >= 0) {
// Step 4.2: Handle cases where the offset is non-negative (no DST gap).
final previousTimezone = timezone(at: adjustedTimezone.start - 1);
if (previousTimezone.start < adjustedMicroseconds) {
final previousOffset = previousTimezone.offset;
final difference = previousOffset - localOffset;

// Step 4.2.1: Adjust microseconds if needed.
if (adjustedMicroseconds - adjustedTimezone.start < difference) {
microseconds = localMicroseconds - previousOffset;
}
}
}

// Step 5: Return the corrected local date-time in microseconds & timezone.
return (microseconds, timezone(at: microseconds));

Due to the complexity of deriving a local date-time from a Unix timestamp, it is, unsurprisingly, a common source of bugs in date-time libraries.

A particularly nasty bug I observed in the wild was a date-time library which was inconsistently returning a local date-time with a timezone offset before or after an overlap. The returned offset was dependent on whether the timezone offset was positive, i.e. Berlin, Germany, or negative, i.e. Detroit, America. Further investigation revealed that it was caused by the omission of Step 4.2 and unconditionally returning the higher/lower timezone offset.

To avoid such issues, it is recommended to port an existing, proven implementation such as Joda Time’s, and/or the relevant test cases to verify your implementation.

Picking & Parsing a Timezone Database

Coordinating and maintaining a database of transition rules across all timezones isn’t a trivial chore that date-time libraries should tackle. Thankfully, finding a portable and well-maintained timezone database to embed in your date-time library these days is straightforward.

Most operating-systems and date-time libraries rely on Internet Assigned Numbers Authority (IANA)’s timezone database (TZDB). Barring unique circumstances, such as working with embedded systems with significant memory constraints, TZDB should always be preferred database to embed in your date-time library due to its impressive governance/maintenance, and availability in the public domain.

After picking a timezone database, the question then morphs into How do you use the chosen timezone database?

In the case of TZDB, the timezone information is stored as a series of text files optimized for human-readability.

# Rule  NAME    FROM    TO      TYPE    IN      ON      AT      SAVE    LETTER/S
Rule US 1918 1919 - Mar lastSun 2:00 1:00 D
Rule US 1918 1919 - Oct lastSun 2:00 0 S
Rule US 1942 only - Feb 9 2:00 1:00 W # War
Rule US 1945 only - Aug 14 23:00u 1:00 P # Peace
Rule US 1945 only - Sep 30 2:00 0 S
Rule US 1967 2006 - Oct lastSun 2:00 0 S
Rule US 1967 1973 - Apr lastSun 2:00 1:00 D
Rule US 1974 only - Jan 6 2:00 1:00 D
Rule US 1975 only - Feb 23 2:00 1:00 D
Rule US 1976 1986 - Apr lastSun 2:00 1:00 D
Rule US 1987 2006 - Apr Sun>=1 2:00 1:00 D
Rule US 2007 max - Mar Sun>=8 2:00 1:00 D
Rule US 2007 max - Nov Sun>=1 2:00 0 S

This format is really unsuited for direct use by machines. Before that, the text files must first be transformed into a (often TZif) machine-friendly format. This is typically done by a TZDB compiler. There are two options at this juncture: develop your own TZDB compiler, or leverage on one of the existing compilers.

If you’re planning to develop a TZDB compiler, this guide on reading the text files will be useful.

For those using an existing compiler, it is important to choose one that produces an easily decode-able output, such as TZif, for reasons mentioned below. Not all compilers are created equal. Many compilers, such as TZUpdater, are part of existing date-time libraries. These libraries typically do not produce TZif files or provide a public API for decoding the compiled output since the output is meant solely for internal consumption.

It is recommended to pick zic, the reference compiler maintained by IANA. It produces TZif files and comes bundled in the complete TZDB distribution.

Besides implementing or picking an existing compiler, you will also need a companion library in your programming language to decode the compiled output. For this reason, it is highly recommended to produce or choose compilers that produce TZif files due to the numerous TZif file decoders available.

In the best case, there is a TZif file decoder available for your programming language. In the worst case, you may have to port or develop one from scratch. Thankfully, in Sugar’s case, there exists a Dart implementation. Hopefully it serves as a helpful reference for those that need to port a TZif file decoder to their language.

Detecting the Current Timezone

Support for detecting the current timezone across programming languages is a mixed bag. Certain languages, such as Java, provide in-built methods for doing so. Other languages, such as Dart, refuse to, even a decade after it was requested. If you’re part of the latter camp, it’s DIY.

Detecting the current operating system’s timezone may seem difficult at first glance. However, it is manageable for the most part.

A naive solution is to create a native process, invoke a command, such as timedatectl show, and parse the output. However, this approach has numerous pitfalls. It is terribly inefficient and fragile. Furthermore, spawning a process is asynchronous in some languages. Due to function coloring, this bubbles upwards, eventually causing the creation of a timezone-aware date-time in the current timezone to be asynchronous. Yuck.

Far better alternatives exist.

Windows

Windows provides several system functions, such as GetDynamicTimeZoneInformation(), that returns information about the current timezone. That information includes a standard name which is Windows’ proprietary timezone identifier. It can be mapped to a TZDB identifier using the CLDR project’s mappings.

Among the several system functions, GetTimezoneInformation() should be avoided. It may return a historical standard name for a geographical location. For example, in Singapore, the function returns the historical “Malay Peninsula Standard Time” instead of the current “Singapore Standard Time”. This is problematic since the CLDR project’s mappings may not include mappings for historical standard names.

If your programming language has an FFI with C/C++, only a single system function call is necessary. Otherwise, you will need to retrieve the timezone from the Windows registry: HKEY_LOCAL_MACHINE > SYSTEM > CurrentControlSet > Control > TimeZoneInformation. Either approach may be easier depending on your programming language. YMMV.

MacOS/Linux

On MacOS and most major Linux distributions, the current timezone can be detected by resolving the /etc/localtime symbolic link, which path contains the current timezone identifier. To quote the man page for /etc/localtime:

It should be an absolute or relative symbolic link pointing to /usr/share/zoneinfo/, followed by a timezone identifier such as "Europe/Berlin" or "Etc/UTC". The resulting link should lead to the corresponding binary tzfile(5) timezone data for the configured timezone.

Some distributions may have a /etc/timezone text file, which contains either an offset from UTC or a path to a tzfile similar to /etc/localtime. However, using /etc/localtime is still preferred since it is present in more distributions.

The trouble stems from distributions that contain neither. In these cases, one approach is to retrieve the current offset and timezone abbreviation via system calls. After which, certain heuristics can be applied on the combination to approximate the timezone.

For example, CST can mean China Standard Time (UTC+8), Cuba Standard Time (UTC−5), and (North American) Central Standard Time (UTC−6). If the current offset is UTC-6, CST is almost guaranteed to mean Central Standard Time. Nevertheless, Central Standard Time can still map to numerous TZDB identifiers such as America/Chicago and America/El_Salvador. America/Chicago observes DST while America/El_Salvador does not. This can lead to incorrect local date-time calculations.

To my knowledge, there isn’t any foolproof method to eliminate the ambiguity and this approach should therefore be the last resort.

Supporting all distributions has rapidly diminishing returns. Detecting the timezones on Linux is always on a “best effort” basis due to the variability between distributions. In Sugar’s case, all distributions without a etc/localtime symbolic link default to the Factory (UTC+0) timezone.

The TZ Environment Variable

Besides the operating system’s timezone, users can also supply a timezone through the TZ environment variable, which falls under the purview of date-time libraries. Although processing the environment variable isn’t strictly required, it is still a nice touch.

There are two problems with processing the TZ environment variable.

Firstly, the intricacy of the TZ environment variable format. A brief glance at its man page should be sufficient. It can include timezone abbreviations with optional offsets and syntax for specifying DST information. Some distributions may even have their own specific formats that include TZDB identifiers. In short, parsing the TZ environment variable isn’t trivial.

Secondly, as discussed above, the supplied timezone abbreviation can still be ambiguous despite our best efforts.

Due to these difficulties, we chose to only support TZ environment variables that were TZDB identifiers in Sugar.

Conclusion

In this article, we explored the difficulties associated with timezones when developing a date-time library and solutions to them.

It was a blast writing this article, and I sincerely hope you learnt something new. On another note, if you’re using Dart, do consider checking out Sugar!

Kthxbye.

Further Reading

--

--

Matthias Ngeo (Pante)
Matthias Ngeo (Pante)

Written by Matthias Ngeo (Pante)

Just your friendly neighbour software engineer 👀

Responses (1)