Use this sample script to add a url column to full site scanning CSV exports.
The script supports a Confluence model as well as two Jira URL modes:
--use-jira-api): resolves issue IDs to keys and creates direct /browse/<KEY> links/issues/?jql=id%3D<issueId>URL behavior summary:
<site-url>/wiki/spaces/spaceKey/pages/<pageId>.<site-url>/wiki/spaces/spaceKey/pages/<pageId>?focusedCommentId=<commentId>.?focusedCommentId=<commentId> only when the script can extract a comment ID from CSV data (for example comment23141298).App, Content ARI, and Container ARI (or equivalent custom columns)https://your-site.atlassian.net)The script auto-detects these columns by default:
App (or app)Content ARI (or resourceAri, resource_ari, contentAri, content_ari)Container ARI (or containerAri, container_ari)For Jira comment deep links, the script only checks these exact CSV headers for comment field values:
Data fieldSensitive data fieldYou can also override ARI and App columns with:
--content-ari-column--container-ari-column--app-columnRun from the directory that contains enrich_csv_with_urls.py.
1 2python enrich_csv_with_urls.py \ --csv-in <path-to-input-csv> \ --csv-out <path-to-output-csv> \ --site-url https://your-site.atlassian.net
This mode always uses Jira search links (/issues/?jql=id%3D<issueId>).
1 2python enrich_csv_with_urls.py \ --csv-in <path-to-input-csv> \ --csv-out <path-to-output-csv> \ --site-url https://your-site.atlassian.net \ --content-ari-column "Content ARI" \ --container-ari-column "Container ARI" \ --app-column "App"
/browse/<KEY> URLsWhen --use-jira-api is enabled, the script tries Jira Search API first and falls back to per-issue API requests if needed.
email:api_token)<email>:<api_token>.1 2python enrich_csv_with_urls.py \ --csv-in <path-to-input-csv> \ --csv-out <path-to-output-csv> \ --site-url https://your-site.atlassian.net \ --use-jira-api \ --jira-auth "name@example.com:<api_token>"
1 2#!/usr/bin/env python3 """ enrich_csv_with_urls.py Adds a `url` column to each row of a CSV based on Confluence/Jira identifiers from full site scanning exports. Inputs: - --csv-in: path to input CSV - --csv-out: path to output CSV (will be created or overwritten) - --site-url: your Jira or Confluence base URL (e.g., https://example.atlassian.net or https://example.jira.atlassian.cloud) - --content-ari-column: name of the Content ARI column (default: auto-detect "Content ARI"/resourceAri/resource_ari) - --container-ari-column: name of the Container ARI column (default: auto-detect "Container ARI") - --app-column: name of the App column (default: auto-detect "App") Column detection (auto): - App: App, app - Content ARI: "Content ARI", resourceAri, resource_ari, contentAri, content_ari - Container ARI: "Container ARI", containerAri, container_ari Behavior: - Confluence page URL: <site-url>/wiki/spaces/spaceKey/pages/<pageId> - Confluence comment URL: <site-url>/wiki/spaces/spaceKey/pages/<pageId>?focusedCommentId=<commentId> - Jira issue search URL: <site-url>/issues/?jql=id%3D<issueId> By default, this script resolves urls with JQL issue lookup. When `--use-jira-api` is set with `--jira-auth`, it resolves issue IDs to keys, and provides urls directly to issue keys. """ from __future__ import annotations import argparse import csv import json import re import sys import base64 import time import urllib.request import urllib.error from typing import Dict, Iterable, List, Optional, Sequence, Tuple from urllib.parse import quote_plus CONFLUENCE_PAGE_URL_TEMPLATE = "{site}/wiki/spaces/spaceKey/pages/{page_id}" CONFLUENCE_FOCUSED_COMMENT_SUFFIX = "?focusedCommentId={comment_id}" JIRA_SEARCH_URL_TEMPLATE = "{site}/issues/?jql=id%3D{issue_id}" JIRA_BROWSE_URL_TEMPLATE = "{site}/browse/{issue_key}" DATA_FIELD_COLUMN_NAMES = ("Data field", "Sensitive data field") APPLICATION_JSON_CONTENT_TYPE = "application/json" SEARCH_API_RETRY_ATTEMPTS = 3 SEARCH_API_RETRY_BASE_SECONDS = 0.5 def column_match(row: Dict[str, str], column_names: List[str]) -> Optional[str]: for name in column_names: if name in row: return name return None def parse_ari(ari: str) -> Optional[Dict[str, str]]: """ Parse minimal bits from an ARI. Expected patterns (examples, may vary): - ari:cloud:confluence:...:page/5278120279 - ari:cloud:confluence:...:comment/6600688521 - ari:cloud:jira:...:issue/6140101 - ari:cloud:jira:...:issue/activation/<tenantId>/<issueId> """ if not ari or not isinstance(ari, str): return None ari_parts = ari.strip().split(":") if len(ari_parts) < 5: return None resource_app = ari_parts[2] resource_path = ari_parts[-1] path_segments = [seg for seg in resource_path.split("/") if seg] if not path_segments: return None resource_type = path_segments[0] # page, comment, issue resource_id = path_segments[-1] # pageId, commentId, issueId return {"product": resource_app, "type": resource_type, "id": resource_id} def build_confluence_url(site_url: str, page_id: Optional[str], comment_id: Optional[str]) -> Optional[str]: if not page_id: return None base = CONFLUENCE_PAGE_URL_TEMPLATE.format(site=site_url.rstrip("/"), page_id=page_id) if comment_id: return base + CONFLUENCE_FOCUSED_COMMENT_SUFFIX.format(comment_id=comment_id) return base def build_jira_search_url(site_url: str, issue_id: Optional[str]) -> Optional[str]: if not issue_id: return None normalized_issue_id = issue_id.strip() if not re.match(r"^\d+$", normalized_issue_id): return None safe_issue_id = quote_plus(normalized_issue_id) return JIRA_SEARCH_URL_TEMPLATE.format(site=site_url.rstrip("/"), issue_id=safe_issue_id) def build_jira_browse_url(site_url: str, issue_key: Optional[str]) -> Optional[str]: if not issue_key: return None return JIRA_BROWSE_URL_TEMPLATE.format(site=site_url.rstrip("/"), issue_key=issue_key) def extract_comment_id_from_data_field(row: Dict[str, str]) -> Optional[str]: """ Extract Jira comment ID from 'Data field' when present (e.g., 'comment23141298'). Supported CSV headers for this lookup are exactly: 'Data field' and 'Sensitive data field'. Returns the numeric string or None. """ value = None for col in DATA_FIELD_COLUMN_NAMES: if col in row and row[col]: value = row[col] break if not value: return None m = re.search(r"comment(\d+)", str(value)) return m.group(1) if m else None def batch_issue_ids(items: Sequence[str], size: int) -> Iterable[Sequence[str]]: for i in range(0, len(items), size): yield items[i : i + size] def build_basic_auth_headers(content_type: str, jira_auth: str) -> Dict[str, str]: basic = base64.b64encode(jira_auth.encode("utf-8")).decode("utf-8") return { "Content-Type": content_type, "Accept": APPLICATION_JSON_CONTENT_TYPE, "Authorization": f"Basic {basic}", } def classify_search_request_error( error: Exception, url: str, suppress_known_missing_logs: bool, ) -> Tuple[bool, Optional[str]]: if isinstance(error, urllib.error.HTTPError): should_retry = error.code == 429 or 500 <= error.code < 600 if suppress_known_missing_logs and error.code in (404, 405, 410): return should_retry, None return should_retry, f"Jira API HTTPError {error.code} on {url}: {error.reason}" if isinstance(error, urllib.error.URLError): return True, f"Jira API URLError on {url}: {error.reason}" return True, f"Jira API unexpected error on {url}: {error}" def request_issue_keys_from_search_endpoint( url: str, data: bytes, headers: Dict[str, str], suppress_known_missing_logs: bool, ) -> Optional[Dict[str, str]]: req = urllib.request.Request(url, data=data, headers=headers, method="POST") max_attempts = SEARCH_API_RETRY_ATTEMPTS + 1 for attempt in range(1, max_attempts + 1): try: with urllib.request.urlopen(req, timeout=30) as resp: body = json.loads(resp.read().decode("utf-8")) resolved = { issue.get("id"): issue.get("key") for issue in body.get("issues", []) if issue.get("id") and issue.get("key") } return resolved except Exception as e: should_retry, final_log = classify_search_request_error( error=e, url=url, suppress_known_missing_logs=suppress_known_missing_logs, ) if should_retry and attempt < max_attempts: delay = SEARCH_API_RETRY_BASE_SECONDS * (2 ** (attempt - 1)) print(f"Retrying Jira Search API after transient error in {delay:.1f}s...", file=sys.stderr) time.sleep(delay) continue if final_log: print(final_log, file=sys.stderr) return None return None def fetch_issue_keys_via_search_api( api_base_url: str, jira_auth: str, issue_ids: Sequence[str], batch_size: int = 100, ) -> Optional[Dict[str, str]]: """ Resolve Jira issue numeric IDs to issue keys using the Search API. Returns: - mapping id -> key when Search API calls succeed (including empty mapping) - None when Search API calls are unavailable/failed """ if not issue_ids: return {} headers = build_basic_auth_headers(APPLICATION_JSON_CONTENT_TYPE, jira_auth) # Try the newer endpoint first; fall back to legacy if unavailable new_search_url = api_base_url.rstrip("/") + "/rest/api/3/search/jql" legacy_search_url = api_base_url.rstrip("/") + "/rest/api/3/search" resolved: Dict[str, str] = {} for group in batch_issue_ids(list(issue_ids), batch_size): jql = "id in (" + ",".join(group) + ")" payload = {"jql": jql, "fields": ["key"], "maxResults": len(group)} data = json.dumps(payload).encode("utf-8") batch_succeeded = False for _, url, suppress_known_missing_logs in ( ("search/jql", new_search_url, True), ("search", legacy_search_url, False), ): parsed = request_issue_keys_from_search_endpoint( url=url, data=data, headers=headers, suppress_known_missing_logs=suppress_known_missing_logs, ) if parsed is not None: batch_succeeded = True resolved.update(parsed) break if not batch_succeeded: return None return resolved def fetch_issue_keys_via_issue_get( api_base_url: str, jira_auth: str, issue_ids: Sequence[str], ) -> Dict[str, str]: """ Resolve Jira issue numeric IDs to issue keys using per-issue GET calls. Returns a mapping: id -> key """ if not issue_ids: return {} headers = { "Accept": APPLICATION_JSON_CONTENT_TYPE, } basic = base64.b64encode(jira_auth.encode("utf-8")).decode("utf-8") headers["Authorization"] = f"Basic {basic}" resolved: Dict[str, str] = {} for issue_id in issue_ids: url = f"{api_base_url.rstrip('/')}/rest/api/3/issue/{issue_id}?fields=key" req = urllib.request.Request(url, headers=headers, method="GET") try: with urllib.request.urlopen(req, timeout=30) as resp: body = json.loads(resp.read().decode("utf-8")) key = body.get("key") if key: resolved[issue_id] = key except urllib.error.HTTPError as e: print(f"Jira API HTTPError {e.code} on {url}: {e.reason}", file=sys.stderr) except urllib.error.URLError as e: print(f"Jira API URLError on {url}: {e.reason}", file=sys.stderr) except Exception as e: print(f"Jira API unexpected error on {url}: {e}", file=sys.stderr) return resolved def extract_row_values( row: Dict[str, str], content_ari_column: Optional[str], container_ari_column: Optional[str], app_column: Optional[str], ) -> Dict[str, Optional[str]]: content_ari_col = content_ari_column or column_match( row, ["Content ARI", "resourceAri", "resource_ari", "contentAri", "content_ari"] ) container_ari_col = container_ari_column or column_match( row, ["Container ARI", "containerAri", "container_ari"] ) app_col = app_column or column_match(row, ["App", "app"]) return { "content_ari": row.get(content_ari_col, "") if content_ari_col else "", "container_ari": row.get(container_ari_col, "") if container_ari_col else "", "app_value": (row.get(app_col, "") if app_col else "").strip().lower(), } def build_confluence_url_for_row( site_url: str, resource_type: str, resource_id: str, container_info: Optional[Dict[str, str]], ) -> Optional[str]: if resource_type in ("page", "blogpost") and resource_id: return build_confluence_url(site_url, resource_id, None) if resource_type == "comment": page_id = container_info["id"] if container_info and container_info.get("type") == "page" else None comment_id = resource_id or None return build_confluence_url(site_url, page_id, comment_id) return None def get_jira_issue_id( resource_type: str, resource_id: str, container_info: Optional[Dict[str, str]], ) -> str: if resource_type == "issue" and resource_id: return resource_id if container_info and container_info.get("type") == "issue": return container_info.get("id") or "" return "" def build_jira_url_for_row( site_url: str, row: Dict[str, str], issue_id: str, use_jira_api: bool, id_to_key: Optional[Dict[str, str]], ) -> Optional[str]: if use_jira_api and id_to_key and issue_id: issue_key = id_to_key.get(issue_id) browse_url = build_jira_browse_url(site_url, issue_key) if browse_url: comment_id = extract_comment_id_from_data_field(row) return f"{browse_url}?focusedCommentId={comment_id}" if comment_id else browse_url return build_jira_search_url(site_url, issue_id) def parse_row_ari_context( row: Dict[str, str], content_ari_column: Optional[str], container_ari_column: Optional[str], app_column: Optional[str], ) -> Dict[str, Optional[Dict[str, str]]]: row_values = extract_row_values(row, content_ari_column, container_ari_column, app_column) content_ari = row_values["content_ari"] or "" container_ari = row_values["container_ari"] or "" app_value = row_values["app_value"] or "" content_info = parse_ari(content_ari) if content_ari else None container_info = parse_ari(container_ari) if container_ari else None return { "content_info": content_info, "container_info": container_info, "product": ((content_info or {}).get("product", "") or app_value), "resource_type": (content_info or {}).get("type", ""), "resource_id": (content_info or {}).get("id", ""), } def enrich_row_with_url( site_url: str, row: Dict[str, str], content_ari_column: Optional[str], container_ari_column: Optional[str], app_column: Optional[str], use_jira_api: bool = False, id_to_key: Optional[Dict[str, str]] = None, ) -> str: """ Returns the computed URL string (or empty string if not derivable). """ context = parse_row_ari_context(row, content_ari_column, container_ari_column, app_column) product = (context.get("product") or "").lower() resource_type = context.get("resource_type") or "" resource_id = context.get("resource_id") or "" container_info = context.get("container_info") if "confluence" in product: return build_confluence_url_for_row(site_url, resource_type, resource_id, container_info) or "" if product.startswith("jira"): issue_id = get_jira_issue_id(resource_type, resource_id, container_info) return build_jira_url_for_row(site_url, row, issue_id, use_jira_api, id_to_key) or "" return "" def build_argument_parser() -> argparse.ArgumentParser: parser = argparse.ArgumentParser(description="Enrich CSV with Confluence/Jira URLs") parser.add_argument("--csv-in", required=True, help="Path to input CSV") parser.add_argument("--csv-out", required=True, help="Path to output CSV (will overwrite if exists)") parser.add_argument("--site-url", required=True, help="Your Jira or Confluence base URL, e.g., https://example.atlassian.net") parser.add_argument("--content-ari-column", help='Custom column name for "Content ARI" (default auto-detect)') parser.add_argument("--container-ari-column", help='Custom column name for "Container ARI" (default auto-detect)') parser.add_argument("--app-column", help='Custom column name for "App" (default auto-detect)') parser.add_argument("--use-jira-api", action="store_true", help="Resolve Jira issue IDs to keys using Jira REST") parser.add_argument("--jira-auth", help='Jira credentials "<email>:<api_token>" for Basic auth when using --use-jira-api') return parser def read_input_csv_rows(input_path: str) -> Tuple[List[Dict[str, str]], List[str]]: with open(input_path, "r", newline="", encoding="utf-8") as f_in: reader = csv.DictReader(f_in) raw_rows: List[Dict[str, str]] = list(reader) raw_fieldnames = list(reader.fieldnames or []) rows: List[Dict[str, str]] = [] for raw_row in raw_rows: normalized_row: Dict[str, str] = {} for key, value in raw_row.items(): if not isinstance(key, str): continue normalized_row[key.strip()] = value rows.append(normalized_row) fieldnames = [name.strip() for name in raw_fieldnames if isinstance(name, str) and name.strip()] fieldnames = list(dict.fromkeys(fieldnames)) if "url" not in fieldnames: fieldnames.append("url") return rows, fieldnames def collect_unique_jira_issue_ids( rows: List[Dict[str, str]], content_ari_column: Optional[str], container_ari_column: Optional[str], app_column: Optional[str], ) -> List[str]: issue_ids_set: Dict[str, None] = {} for row in rows: context = parse_row_ari_context(row, content_ari_column, container_ari_column, app_column) product = (context.get("product") or "").lower() if not product.startswith("jira"): continue issue_id = get_jira_issue_id( context.get("resource_type") or "", context.get("resource_id") or "", context.get("container_info"), ) if issue_id and re.match(r"^\d+$", issue_id): issue_ids_set[issue_id] = None return list(issue_ids_set.keys()) def resolve_jira_issue_keys( api_base_url: str, jira_auth: str, issue_ids: List[str], ) -> Dict[str, str]: if not issue_ids: return {} id_to_key = fetch_issue_keys_via_search_api( api_base_url=api_base_url, jira_auth=jira_auth, issue_ids=issue_ids, ) if id_to_key is not None: return id_to_key print("Falling back to per-issue resolution via /rest/api/3/issue/{id}", file=sys.stderr) return fetch_issue_keys_via_issue_get( api_base_url=api_base_url, jira_auth=jira_auth, issue_ids=issue_ids, ) def write_enriched_csv_rows( output_path: str, fieldnames: List[str], rows: List[Dict[str, str]], site_url: str, content_ari_column: Optional[str], container_ari_column: Optional[str], app_column: Optional[str], use_jira_api: bool, id_to_key: Dict[str, str], ) -> Tuple[int, int]: total = 0 enriched = 0 with open(output_path, "w", newline="", encoding="utf-8") as f_out: writer = csv.DictWriter(f_out, fieldnames=fieldnames) writer.writeheader() for row in rows: total += 1 url = enrich_row_with_url( site_url=site_url, row=row, content_ari_column=content_ari_column, container_ari_column=container_ari_column, app_column=app_column, use_jira_api=use_jira_api, id_to_key=id_to_key, ) row["url"] = url if url: enriched += 1 writer.writerow(row) return total, enriched def main() -> int: args = build_argument_parser().parse_args() input_path = args.csv_in output_path = args.csv_out site_url = args.site_url.rstrip("/") api_base_url = site_url rows, fieldnames = read_input_csv_rows(input_path) id_to_key: Dict[str, str] = {} if args.use_jira_api: if not args.jira_auth: print("--use-jira-api was set but --jira-auth is missing; falling back to search links.", file=sys.stderr) else: unique_issue_ids = collect_unique_jira_issue_ids( rows=rows, content_ari_column=args.content_ari_column, container_ari_column=args.container_ari_column, app_column=args.app_column, ) id_to_key = resolve_jira_issue_keys( api_base_url=api_base_url, jira_auth=args.jira_auth, issue_ids=unique_issue_ids, ) total, enriched = write_enriched_csv_rows( output_path=output_path, fieldnames=fieldnames, rows=rows, site_url=site_url, content_ari_column=args.content_ari_column, container_ari_column=args.container_ari_column, app_column=args.app_column, use_jira_api=bool(args.use_jira_api and args.jira_auth), id_to_key=id_to_key, ) print(f"Processed {total} rows; enriched {enriched} with URLs.", file=sys.stderr) return 0 if __name__ == "__main__": raise SystemExit(main())
Example with environment variables:
1 2export JIRA_SITE_URL="https://your-site.atlassian.net" export JIRA_EMAIL="name@example.com" export JIRA_API_TOKEN="<api_token>" export JIRA_AUTH="${JIRA_EMAIL}:${JIRA_API_TOKEN}" python enrich_csv_with_urls.py \ --csv-in <path-to-input-csv> \ --csv-out <path-to-output-csv> \ --site-url "${JIRA_SITE_URL}" \ --use-jira-api \ --jira-auth "${JIRA_AUTH}"
--use-jira-api was set but --jira-auth is missing: this is a warning (not a hard error). The script continues and falls back to Jira search links (/issues/?jql=id%3D<issueId>). Add --jira-auth if you want direct /browse/<KEY> links.Data field or Sensitive data field.Rate this page: