Per RFC 7231 §7.1.1.1:
A recipient that parses a timestamp value in an HTTP header field MUST accept all three HTTP-date formats.
These formats are then described as (with the first being the only preferred format; the latter 2 are designated as "obsolete"), converted for this post into strftime(3)
syntax:
%a, %d %b %Y %H:%M:%S %Z
with the timezone always given as "GMT", but to be interpreted as UTC.%A, %d-%b-%y %H:%M:%S %Z
where the timezone may be equal to any of an array of "standard" abbreviations from RFC 850 §2.1.4.%a %b %-d %H:%M:%S %Y
where you simply pray/assume that the remote server is operating in UTC.
However, Python's strptime
function does not support timezones: it eats them with %Z
, but does not actually use them. Therefore, we will have to hack this support in ourselves. The pytz
module is indispensible for parsing them, so we will appreciate/use it. (We additionally have to crack open re
because, depressingly, strptime
does not even make available to us that which it matched as %Z
.)
So, a Python function to parse an HTTP Date header into a datetime object would be something like:
from datetime import datetime
from pytz import timezone, utc
import re
def parse_http_date(date):
try:
imf1 = '%a, %d %b %Y %H:%M:%S GMT'
return datetime.strptime(date, imf1).replace(tzinfo=utc)
except ValueError:
try:
rfc850 = '%A, %d-%b-%y %H:%M:%S %Z'
tzname = re.fullmatch(r'((\w+), (\d+)-(\w+)-(\d+) (\d+):(\d+):(\d+)) (.+)', s).group(9)
if tzname == 'GMT': tzname = 'UTC'
return datetime.strptime(date, rfc850).replace(tzinfo=timezone(tzname))
except (ValueError, TypeError):
pass
try:
asctime = '%a %b %-d %H:%M:%S %Y'
return datetime.strptime(date, asctime).replace(tzinfo=utc)
except ValueError:
pass
# Neither of the "obsolete" formats worked, so re-raise original strptime error from preferred format
raise
If you don't care about parsing obsolete formats, this can be reduced to:
from datetime import datetime
from datetime import timezone as tz
def parse_http_date(date):
imf1 = '%a, %d %b %Y %H:%M:%S GMT'
return datetime.strptime(date, imf1).replace(tzinfo=tz.utc)
…which only uses the standard library!