9. Python datetime & String Operations#

1. datetime.strptime() - Parse String to Datetime#

  • Converts a string → datetime object
  • Need to specify exact format matching the string
from datetime import datetime
timestamp = datetime.strptime("14/Dec/2024:16:45:11 -0500", "%d/%b/%Y:%H:%M:%S %z")

2. %d/%b/%Y:%H:%M:%S %z - Apache Log Format#

  • Most exam-relevant format string
  • Used to parse Apache/Nginx log timestamps
[14/Dec/2024:16:45:11 -0500]
  %d  %b   %Y  %H  %M  %S  %z

3. %z - Timezone Offset (Critical!)#

  • Parses timezone offset like -0500, +0530
  • Without %z → parsing fails or ignores timezone
  • ✅ Always include %z for Apache log timestamps
  • %Z → timezone NAME (EST, UTC) - different from offset

4. timestamp.hour - Extract Hour:#

timestamp.hour     # 0-23 integer
timestamp.minute   # 0-59
timestamp.second   # 0-59
timestamp.date()   # date part only
timestamp.time()   # time part only

5. timestamp.weekday() - Day of Week:#

timestamp.weekday()   # returns integer

# Mapping:
0  Monday
1  Tuesday
2  Wednesday
3  Thursday
4  Friday
5  Saturday
6  Sunday
  • weekday() == 6 → Sunday (exam answer JAN_FN)
  • weekday() == 5 → Saturday (exam answer JAN_AN)
  • weekday() == 0 → Monday
  • timestamp.dayname() → method doesn’t exist
  • timestamp.strftime('%A') == 'Sunday' → works but less efficient

6. Time Range Filtering - Core Logic:#

Key Rule:#

For range X:00 to Y:59
→ X <= timestamp.hour <= Y

For range X:00 to Y:00 exclusive
→ X <= timestamp.hour < Y+1

Exam Scenarios:#

# 12:00 to 15:59 (includes 12,13,14,15)
12 <= timestamp.hour <= 15   # ✅

# 16:00 to 18:59 (includes 16,17,18)
16 <= timestamp.hour < 19    # ✅ exam answer JAN_FN

# 14:00 to 16:59 (includes 14,15,16)
14 <= timestamp.hour < 17    # ✅ exam answer JAN_AN

Common Wrong Answers:#

16 <= timestamp.hour <= 16        # ❌ only hour 16
16 < timestamp.hour < 19          # ❌ excludes 16:xx entries
timestamp.hour in range(16, 20)   # ❌ includes 19:xx entries

7. Format Codes - Complete Reference:#

CodeMeaningExample
%dDay of month (zero-padded)01–31
%mMonth as number01–12
%bAbbreviated month nameJan, Dec
%BFull month nameJanuary
%Y4-digit year2024
%y2-digit year24
%HHour 24-hr00–23
%IHour 12-hr01–12
%MMinutes00–59
%SSeconds00–59
%zUTC offset-0500
%ZTimezone nameEST, UTC
%AFull weekday nameMonday
%aAbbreviated weekdayMon
%pAM/PMAM, PM

8. String Splitting - Request Field Parsing#

Apache Request Field:#

"GET /checkout/payment HTTP/1.1"
 [0]        [1]           [2]
request = "GET /checkout/payment HTTP/1.1"
parts = request.split(' ')

method   = parts[0]   # 'GET'
url      = parts[1]   # '/checkout/payment' ✅ exam answer
protocol = parts[2]   # 'HTTP/1.1' ✅ exam answer
  • request.split(' ')[1] → URL (exam answer JAN_FN Q306)
  • request.split(' ')[2] → Protocol (exam answer JAN_FN Q305)
  • request.split('/')[1] → splits by slash, gives wrong result
  • request.split(' ')[0] → gives HTTP method, not URL

9. URL Path Matching:#

url = '/tamilmp3/song.mp3'

url.startswith('/tamilmp3/')    # ✅ prefix match (exam answer)
url.endswith('.mp3')            # suffix match

# ❌ Wrong approaches:
'/tamilmp3/' in url             # matches anywhere, not just start
url == '/tamilmp3/'             # exact match only, misses sub-paths
url.endswith('/tamilmp3/')      # checks end, not start

10. String Methods - Complete Reference:#

Case Operations:#

s.lower()        # 'HELLO' → 'hello'
s.upper()        # 'hello' → 'HELLO'
s.title()        # 'hello world' → 'Hello World'
s.capitalize()   # 'hello world' → 'Hello world'
s.swapcase()     # 'Hello' → 'hELLO'

Whitespace:#

s.strip()        # remove leading + trailing whitespace
s.lstrip()       # remove leading only
s.rstrip()       # remove trailing only

Search & Check:#

s.startswith('pre')    # True/False
s.endswith('suf')      # True/False
'sub' in s             # True/False
s.find('sub')          # index of first occurrence (-1 if not found)
s.index('sub')         # index (raises ValueError if not found)
s.count('sub')         # count occurrences

Replace & Split:#

s.replace('old', 'new')    # replace all occurrences
s.split(',')               # split by delimiter → list
s.split()                  # split by whitespace
','.join(['a','b','c'])    # join list → 'a,b,c'

Type Checking:#

s.isdigit()     # all digits?
s.isalpha()     # all letters?
s.isalnum()     # alphanumeric?
s.isnumeric()   # numeric?
s.isspace()     # all whitespace?
s.islower()     # all lowercase?
s.isupper()     # all uppercase?

Format:#

f"Hello {name}"                    # f-string ✅ most common
"Hello {}".format(name)            # .format()
"Hello %s" % name                  # % operator (old style)
s.zfill(5)                         # '42' → '00042' (pad with zeros)
s.ljust(10), s.rjust(10)           # left/right align
s.center(10)                       # center align

11. re Module - Regular Expressions#

What is Regex?#

  • Pattern matching for strings
  • Used heavily in log parsing

Core Functions:#

import re

re.search(pattern, string)    # find first match anywhere
re.match(pattern, string)     # match only at START
re.findall(pattern, string)   # return ALL matches as list
re.sub(pattern, replacement, string)  # replace matches
re.split(pattern, string)     # split by pattern

Common Patterns:#

\d         # any digit (0-9)
\d+        # one or more digits
\w         # word character (letter, digit, underscore)
\s         # whitespace
.          # any character except newline
^          # start of string
$          # end of string
[A-Z]      # any uppercase letter
[0-9]      # any digit
(a|b)      # a or b
*          # zero or more
+          # one or more
?          # zero or one
{3}        # exactly 3
{2,5}      # between 2 and 5

Log-Relevant Patterns:#

# Extract IP address
re.findall(r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}', log_line)

# Extract HTTP method
re.search(r'"(GET|POST|PUT|DELETE)', log_line)

# Extract status code
re.search(r'" (\d{3}) ', log_line)

# Extract timestamp
re.search(r'\[(\d{2}/\w+/\d{4}:\d{2}:\d{2}:\d{2})', log_line)

# Match time pattern for grep
re.compile(r'(0[89]|1[0-5]):[0-5][0-9]')  # 08:xx to 15:xx

datetime - Additional Useful Operations:#

from datetime import datetime, timedelta

# Current time
now = datetime.now()
utc_now = datetime.utcnow()

# Date arithmetic
tomorrow = now + timedelta(days=1)
last_week = now - timedelta(weeks=1)
three_hours_later = now + timedelta(hours=3)

# Compare timestamps
ts1 > ts2    # is ts1 after ts2?
ts1 < ts2    # is ts1 before ts2?

# Convert to string
timestamp.strftime("%Y-%m-%d %H:%M:%S")

# pandas datetime
pd.to_datetime(df['col'])                    # parse column
pd.to_datetime(df['col'], format='%d/%b/%Y:%H:%M:%S %z')
df['col'].dt.hour                            # extract hour
df['col'].dt.weekday                         # day of week
df['col'].dt.tz_convert('UTC')              # convert timezone

Quick Reference#

Apache log format string:
  "%d/%b/%Y:%H:%M:%S %z"  ✅

Weekday numbers:
  0=Mon, 1=Tue, 2=Wed, 3=Thu, 4=Fri, 5=Sat, 6=Sun

Time range check:
  12:00-15:59 → 12 <= hour <= 15  ✅
  16:00-18:59 → 16 <= hour < 19   ✅
  14:00-16:59 → 14 <= hour < 17   ✅

Request field splitting:
  split(' ')[0] → method
  split(' ')[1] → URL      ✅
  split(' ')[2] → protocol ✅

URL matching:
  url.startswith('/path/')  ✅
  '/path/' in url           ❌ (matches anywhere)