9. Python datetime & String Operations#
1. datetime.strptime() - Parse String to Datetime#
- Converts a string → datetime object
- Need to specify exact format matching the string
from datetime import datetime
timestamp = datetime.strptime("14/Dec/2024:16:45:11 -0500", "%d/%b/%Y:%H:%M:%S %z")
- Most exam-relevant format string
- Used to parse Apache/Nginx log timestamps
[14/Dec/2024:16:45:11 -0500]
%d %b %Y %H %M %S %z
3. %z - Timezone Offset (Critical!)#
- Parses timezone offset like
-0500, +0530 - Without
%z → parsing fails or ignores timezone - ✅ Always include
%z for Apache log timestamps - ❌
%Z → timezone NAME (EST, UTC) - different from offset
timestamp.hour # 0-23 integer
timestamp.minute # 0-59
timestamp.second # 0-59
timestamp.date() # date part only
timestamp.time() # time part only
5. timestamp.weekday() - Day of Week:#
timestamp.weekday() # returns integer
# Mapping:
0 → Monday
1 → Tuesday
2 → Wednesday
3 → Thursday
4 → Friday
5 → Saturday
6 → Sunday
- ✅
weekday() == 6 → Sunday (exam answer JAN_FN) - ✅
weekday() == 5 → Saturday (exam answer JAN_AN) - ✅
weekday() == 0 → Monday - ❌
timestamp.dayname() → method doesn’t exist - ❌
timestamp.strftime('%A') == 'Sunday' → works but less efficient
6. Time Range Filtering - Core Logic:#
Key Rule:#
For range X:00 to Y:59
→ X <= timestamp.hour <= Y
For range X:00 to Y:00 exclusive
→ X <= timestamp.hour < Y+1
Exam Scenarios:#
# 12:00 to 15:59 (includes 12,13,14,15)
12 <= timestamp.hour <= 15 # ✅
# 16:00 to 18:59 (includes 16,17,18)
16 <= timestamp.hour < 19 # ✅ exam answer JAN_FN
# 14:00 to 16:59 (includes 14,15,16)
14 <= timestamp.hour < 17 # ✅ exam answer JAN_AN
Common Wrong Answers:#
16 <= timestamp.hour <= 16 # ❌ only hour 16
16 < timestamp.hour < 19 # ❌ excludes 16:xx entries
timestamp.hour in range(16, 20) # ❌ includes 19:xx entries
| Code | Meaning | Example |
|---|
%d | Day of month (zero-padded) | 01–31 |
%m | Month as number | 01–12 |
%b | Abbreviated month name | Jan, Dec |
%B | Full month name | January |
%Y | 4-digit year | 2024 |
%y | 2-digit year | 24 |
%H | Hour 24-hr | 00–23 |
%I | Hour 12-hr | 01–12 |
%M | Minutes | 00–59 |
%S | Seconds | 00–59 |
%z | UTC offset | -0500 |
%Z | Timezone name | EST, UTC |
%A | Full weekday name | Monday |
%a | Abbreviated weekday | Mon |
%p | AM/PM | AM, PM |
8. String Splitting - Request Field Parsing#
Apache Request Field:#
"GET /checkout/payment HTTP/1.1"
[0] [1] [2]
request = "GET /checkout/payment HTTP/1.1"
parts = request.split(' ')
method = parts[0] # 'GET'
url = parts[1] # '/checkout/payment' ✅ exam answer
protocol = parts[2] # 'HTTP/1.1' ✅ exam answer
- ✅
request.split(' ')[1] → URL (exam answer JAN_FN Q306) - ✅
request.split(' ')[2] → Protocol (exam answer JAN_FN Q305) - ❌
request.split('/')[1] → splits by slash, gives wrong result - ❌
request.split(' ')[0] → gives HTTP method, not URL
9. URL Path Matching:#
url = '/tamilmp3/song.mp3'
url.startswith('/tamilmp3/') # ✅ prefix match (exam answer)
url.endswith('.mp3') # suffix match
# ❌ Wrong approaches:
'/tamilmp3/' in url # matches anywhere, not just start
url == '/tamilmp3/' # exact match only, misses sub-paths
url.endswith('/tamilmp3/') # checks end, not start
10. String Methods - Complete Reference:#
Case Operations:#
s.lower() # 'HELLO' → 'hello'
s.upper() # 'hello' → 'HELLO'
s.title() # 'hello world' → 'Hello World'
s.capitalize() # 'hello world' → 'Hello world'
s.swapcase() # 'Hello' → 'hELLO'
Whitespace:#
s.strip() # remove leading + trailing whitespace
s.lstrip() # remove leading only
s.rstrip() # remove trailing only
Search & Check:#
s.startswith('pre') # True/False
s.endswith('suf') # True/False
'sub' in s # True/False
s.find('sub') # index of first occurrence (-1 if not found)
s.index('sub') # index (raises ValueError if not found)
s.count('sub') # count occurrences
Replace & Split:#
s.replace('old', 'new') # replace all occurrences
s.split(',') # split by delimiter → list
s.split() # split by whitespace
','.join(['a','b','c']) # join list → 'a,b,c'
Type Checking:#
s.isdigit() # all digits?
s.isalpha() # all letters?
s.isalnum() # alphanumeric?
s.isnumeric() # numeric?
s.isspace() # all whitespace?
s.islower() # all lowercase?
s.isupper() # all uppercase?
f"Hello {name}" # f-string ✅ most common
"Hello {}".format(name) # .format()
"Hello %s" % name # % operator (old style)
s.zfill(5) # '42' → '00042' (pad with zeros)
s.ljust(10), s.rjust(10) # left/right align
s.center(10) # center align
11. re Module - Regular Expressions#
What is Regex?#
- Pattern matching for strings
- Used heavily in log parsing
Core Functions:#
import re
re.search(pattern, string) # find first match anywhere
re.match(pattern, string) # match only at START
re.findall(pattern, string) # return ALL matches as list
re.sub(pattern, replacement, string) # replace matches
re.split(pattern, string) # split by pattern
Common Patterns:#
\d # any digit (0-9)
\d+ # one or more digits
\w # word character (letter, digit, underscore)
\s # whitespace
. # any character except newline
^ # start of string
$ # end of string
[A-Z] # any uppercase letter
[0-9] # any digit
(a|b) # a or b
* # zero or more
+ # one or more
? # zero or one
{3} # exactly 3
{2,5} # between 2 and 5
Log-Relevant Patterns:#
# Extract IP address
re.findall(r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}', log_line)
# Extract HTTP method
re.search(r'"(GET|POST|PUT|DELETE)', log_line)
# Extract status code
re.search(r'" (\d{3}) ', log_line)
# Extract timestamp
re.search(r'\[(\d{2}/\w+/\d{4}:\d{2}:\d{2}:\d{2})', log_line)
# Match time pattern for grep
re.compile(r'(0[89]|1[0-5]):[0-5][0-9]') # 08:xx to 15:xx
datetime - Additional Useful Operations:#
from datetime import datetime, timedelta
# Current time
now = datetime.now()
utc_now = datetime.utcnow()
# Date arithmetic
tomorrow = now + timedelta(days=1)
last_week = now - timedelta(weeks=1)
three_hours_later = now + timedelta(hours=3)
# Compare timestamps
ts1 > ts2 # is ts1 after ts2?
ts1 < ts2 # is ts1 before ts2?
# Convert to string
timestamp.strftime("%Y-%m-%d %H:%M:%S")
# pandas datetime
pd.to_datetime(df['col']) # parse column
pd.to_datetime(df['col'], format='%d/%b/%Y:%H:%M:%S %z')
df['col'].dt.hour # extract hour
df['col'].dt.weekday # day of week
df['col'].dt.tz_convert('UTC') # convert timezone
Quick Reference#
Apache log format string:
"%d/%b/%Y:%H:%M:%S %z" ✅
Weekday numbers:
0=Mon, 1=Tue, 2=Wed, 3=Thu, 4=Fri, 5=Sat, 6=Sun
Time range check:
12:00-15:59 → 12 <= hour <= 15 ✅
16:00-18:59 → 16 <= hour < 19 ✅
14:00-16:59 → 14 <= hour < 17 ✅
Request field splitting:
split(' ')[0] → method
split(' ')[1] → URL ✅
split(' ')[2] → protocol ✅
URL matching:
url.startswith('/path/') ✅
'/path/' in url ❌ (matches anywhere)