The Gap Between Working Code and Production Code
Working code successfully posts some credits and fetches some participants. Production code runs automatically, unattended, at 2 AM when the office is closed. Network hiccups happen. API deployments happen. You need to handle all failure modes gracefully.
The difference:
- Working code: "If the request fails, the program crashes."
- Production code: "If the request fails, we log it, retry intelligently, alert the ops team if it's truly broken, and the job completes with a summary."
Rate Limits: Understanding 429
The SAP SuccessFactors IM API enforces rate limits. If you make too many requests too fast, you get HTTP 429 Too Many Requests:
HTTP/1.1 429 Too Many Requests Retry-After: 30 { "error": "Rate limit exceeded. Max 100 requests per minute." }
Common causes of rate limiting:
- Fetching a new token per request: Token requests count toward your limit. Without caching, 10,000 credits = 10,000 token requests. Instant rate limit.
- Individual POSTs instead of batch: 10,000 individual credit POSTs = 10,000 API requests. Batch into 100-per-request = 100 requests.
- Not following pagination correctly: Making multiple requests for the same data because you didn't follow @odata.nextLink.
Exponential Backoff: The Retry Pattern
When you get 429 (or 5xx), don't retry immediately. Back off exponentially:
import requests import time def post_with_backoff(url, headers, payload, max_retries=3): """POST with exponential backoff for 5xx and 429 errors.""" for attempt in range(max_retries): try: resp = requests.post(url, json=payload, headers=headers, timeout=10) # 4xx errors (except 429) are not retriable if 400 <= resp.status_code < 500 and resp.status_code != 429: raise Exception(f"Non-retriable 4xx: {resp.status_code}") resp.raise_for_status() return resp except requests.exceptions.HTTPError as e: # 429 or 5xx — should retry if e.response.status_code in [429, 500, 502, 503, 504]: # Calculate backoff: 1s, 2s, 4s, 8s wait_time = 2 ** attempt # Check Retry-After header if present retry_after = e.response.headers.get("Retry-After") if retry_after: wait_time = int(retry_after) if attempt < max_retries - 1: print(f"Error {e.response.status_code}, retrying in {wait_time}s...") time.sleep(wait_time) else: raise else: # Other errors — don't retry raise
Token Refresh Mid-Job: Handling 401
Your token expires in ~30 minutes. Long-running jobs might span that boundary. When you get 401 Unauthorized mid-job:
def api_call_with_token_refresh(token_mgr, method, url, **kwargs): """Make API call, refresh token on 401, retry once.""" headers = kwargs.get("headers", {}) headers["Authorization"] = f"Bearer {token_mgr.get_token()}" kwargs["headers"] = headers resp = requests.request(method, url, **kwargs) # On 401, token likely expired mid-request if resp.status_code == 401: print("Token expired, refreshing...") # Clear cache and fetch new token token_mgr.expires_at = 0 new_token = token_mgr.get_token() # Retry the request with new token headers["Authorization"] = f"Bearer {new_token}" resp = requests.request(method, url, **kwargs) resp.raise_for_status() return resp
Logging: What to Capture
Log every integration job run. Structured logging (JSON format) makes it easy to grep and analyze:
import json import logging import uuid from datetime import datetime logger = logging.getLogger(__name__) def log_integration_summary(job_id, start_time, end_time, request_count, success_count, error_count, failed_refs): """Log structured summary of the job.""" duration_sec = (end_time - start_time).total_seconds() summary = { "job_id": job_id, "timestamp": datetime.utcnow().isoformat(), "start_time": start_time.isoformat(), "end_time": end_time.isoformat(), "duration_seconds": duration_sec, "request_count": request_count, "success_count": success_count, "error_count": error_count, "success_rate": success_count / request_count if request_count > 0 else 0, "failed_refs": failed_refs # List of sourceRef that failed } # Log as JSON (ops tools can parse this easily) logger.info(json.dumps(summary))
Idempotency: The sourceRef Contract
Idempotency is non-negotiable for production integrations. Your job crashes. You rerun it. Same sourceRef = already exists in IM = 409 Conflict = safe to skip or handle gracefully.
Implementation:
- Set sourceRef to a stable ID from your source system: CRM Order ID, ERP Posting ID, etc. Not a UUID or timestamp (those change on each run).
- On 409 Conflict: Log it, don't treat it as a failure. The record already exists.
- On job restart: Rerun the entire batch. First run posts all 10,000. Second run (from crash/reruns) gets 409 for all 10,000 (already exists) and exits cleanly.
Error Response Handling Decision Table
| Status Code | Meaning | Retriable? | Action |
|---|---|---|---|
| 4xx (except 429) | Client error. Request is invalid. | No | Log the error, move on. Retrying won't fix an invalid request. |
| 400 | Bad Request | No | Check the payload. Missing required field? Wrong format? |
| 401 | Unauthorized | Yes | Token expired. Fetch new token, retry once. |
| 403 | Forbidden | No | Insufficient permissions. No retry. |
| 404 | Not Found | No | Resource doesn't exist. Validate IDs. |
| 409 | Conflict | No | Duplicate sourceRef. Already exists. Log and move on (idempotent success). |
| 422 | Unprocessable Entity | No | Semantic error (e.g., participant not eligible for this period). Investigate, don't retry. |
| 429 | Too Many Requests | Yes | Rate limit hit. Exponential backoff (1s, 2s, 4s, 8s). Retry. |
| 5xx | Server Error | Yes | Transient. Exponential backoff. Retry. |
The Complete Nightly Credit Push Pattern
Here's a production-grade integration script that ties all lessons together:
import requests import json import logging import uuid import time import os from datetime import datetime logging.basicConfig( level=logging.INFO, format='%(message)s' ) logger = logging.getLogger(__name__) class TokenManager: """OAuth token manager with caching (from Lesson 2).""" def __init__(self, client_id, client_secret, scope): self.client_id = client_id self.client_secret = client_secret self.scope = scope self.oauth_endpoint = "https://api.sap.com/oauth/token" self.access_token = None self.expires_at = 0 def _fetch_new_token(self): data = { "grant_type": "client_credentials", "client_id": self.client_id, "client_secret": self.client_secret, "scope": self.scope, } resp = requests.post(self.oauth_endpoint, data=data) resp.raise_for_status() result = resp.json() self.access_token = result["access_token"] self.expires_at = time.time() + result["expires_in"] def get_token(self): current_time = time.time() if current_time >= (self.expires_at - 60): self._fetch_new_token() return self.access_token def post_credits_with_backoff(token_mgr, credits_batch): """POST credits batch with exponential backoff for 429/5xx.""" url = "https://api.sap.com/successfactors/im/credits/batch" max_retries = 3 for attempt in range(max_retries): try: headers = { "Authorization": f"Bearer {token_mgr.get_token()}", "Content-Type": "application/json" } payload = {"value": credits_batch} resp = requests.post(url, json=payload, headers=headers, timeout=30) # Non-retriable 4xx errors if 400 <= resp.status_code < 500 and resp.status_code != 429: logger.error(f"Non-retriable error {resp.status_code}: {resp.text}") raise Exception(f"4xx error: {resp.status_code}") resp.raise_for_status() return resp.json() except requests.exceptions.RequestException as e: if attempt < max_retries - 1 and (getattr(e.response, 'status_code', 0) in [429, 500, 502, 503]): wait_time = 2 ** attempt logger.warning(f"Retriable error, backoff {wait_time}s: {e}") time.sleep(wait_time) else: raise def nightly_credit_push(): """Complete nightly credit push from CRM to IM.""" job_id = str(uuid.uuid4()) start_time = datetime.utcnow() logger.info(json.dumps({"event": "job_started", "job_id": job_id})) # Initialize token manager token_mgr = TokenManager( client_id=os.getenv("ICM_CLIENT_ID"), client_secret=os.getenv("ICM_CLIENT_SECRET"), scope="im.write" ) # Step 1: Read from CRM (simulated) crm_orders = [ {"order_id": "ORD-98765", "participant_id": "P001234", "amount": 4500.00}, {"order_id": "ORD-98766", "participant_id": "P001235", "amount": 2250.00} ] # Step 2: Format as IM credits (with sourceRef for idempotency) credits = [] for order in crm_orders: credits.append({ "participantId": order["participant_id"], "transactionDate": datetime.utcnow().strftime("%Y-%m-%d"), "amount": order["amount"], "currencyCode": "USD", "periodId": "Q2-2026", "sourceRef": order["order_id"] # Stable ID for idempotency }) # Step 3: Batch POST (100 per request) successes = [] errors = [] batch_size = 100 for i in range(0, len(credits), batch_size): batch = credits[i:i+batch_size] try: result = post_credits_with_backoff(token_mgr, batch) results = result.get("results", []) # Separate successes and errors batch_successes = [r for r in results if r["status"] == "CREATED"] batch_errors = [r for r in results if r["status"] == "ERROR"] successes.extend(batch_successes) errors.extend(batch_errors) except Exception as e: logger.error(f"Batch {i//batch_size} failed: {e}") # Step 4: Log summary end_time = datetime.utcnow() duration_sec = (end_time - start_time).total_seconds() failed_refs = [e["sourceRef"] for e in errors] summary = { "event": "job_completed", "job_id": job_id, "duration_seconds": duration_sec, "request_count": len(credits), "success_count": len(successes), "error_count": len(errors), "failed_refs": failed_refs } logger.info(json.dumps(summary)) # Step 5: Alert if errors > threshold if len(errors) > 5: logger.critical(f"Job {job_id}: {len(errors)} errors, manual review required") if __name__ == "__main__": nightly_credit_push()
Integration Go-Live Checklist
Before going live, verify all production patterns are in place:
- Credentials are not hardcoded. Using environment variables. Secrets manager if available.
- Token caching is implemented. TokenManager or equivalent. Fetch new token only when expires_at - 60 passes.
- Batch operations are used. Credits: POST /credits/batch (100 per request). Not individual POSTs.
- Pagination is correct. GET requests follow @odata.nextLink. Never manually build skip tokens. Always use $orderby when paginating.
- Error handling is comprehensive. 4xx (non-retriable), 5xx/429 (exponential backoff), 401 (token refresh).
- Idempotency is verified. sourceRef is set to stable ID from source system. Tested: rerun job, verify no duplicates created.
- Rate limit handling is in place. Exponential backoff (1s, 2s, 4s, 8s). Respect Retry-After header.
- Logging is structured. JSON format. Each job gets a unique job_id. Log: start, end, duration, success count, error count, failed refs.
- Monitoring/alerting is configured. Alert ops team if error_count > threshold. Monitor logs for patterns of failure.
- Rollback plan exists. If something breaks mid-run, can you revert? Can you re-post the same data safely (idempotency)? Is there a manual escape hatch?
Debugging in Production
When things go wrong at 2 AM, you need good logs:
- Job ID: Unique UUID per run. Search logs by job_id to find all related entries.
- sourceRef in error logs: Which specific records failed? Log the sourceRef so you can look them up in your CRM/ERP.
- Status codes: Always log HTTP status code and response body (sans tokens) when requests fail.
- Duration: Is the job taking much longer than usual? A sign of rate limiting or API slowdown.
- Success rate: 99% success is good. 80% success means something is systematically broken.
Summary
You've completed the REST API learning path:
- Lesson 1: What REST is, HTTP methods, resources, endpoints, JSON, status codes.
- Lesson 2: OAuth 2.0, client credentials, token requests, token caching, security.
- Lesson 3: GET requests, OData query parameters, pagination, performance optimization.
- Lesson 4: POST/PUT/PATCH, batch operations, credit transactions, quota updates, pipeline triggers.
- Lesson 5: Rate limits, exponential backoff, logging, idempotency, complete production pattern.
You can now build production-grade integrations between your CRM, HR, payroll, and BI systems and SAP SuccessFactors IM. The nightly credit push pattern is the template for any ICM integration.