jira-ai-fixer/docs/aci-jira-ai-fixer-technical...

612 lines
22 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# ACI JIRA AI Fixer - Technical Document
**Version:** 1.1
**Date:** 2026-02-18
**Update:** Azure OpenAI mandatory for compliance
**Classification:** Internal - Technical Team
---
## 1. Overview
### 1.1 Objective
Develop an artificial intelligence system that integrates with JIRA and Bitbucket to automate Support Case analysis, identify affected modules in source code (COBOL/SQL/JCL), propose fixes, and automatically document solutions.
### 1.2 Scope
- **Products:** ACQ-MF (Acquirer) and ICG-MF (Interchange)
- **Repositories:** Client-specific forks (e.g., ACQ-MF-safra-fork, ICG-MF-safra-fork)
- **Issues:** Support Cases in JIRA
- **Languages:** COBOL, SQL, JCL
### 1.3 High-Level Architecture
```
┌─────────────────────────────────────────────────────────────────────────────┐
│ ACI JIRA AI FIXER - ARCHITECTURE │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌───────────────┐ │
│ │ JIRA │ │
│ │ gojira.tsacorp│ │
│ │ .com │ │
│ └───────┬───────┘ │
│ │ Webhook (issue_created, issue_updated) │
│ ▼ │
│ ┌───────────────────────────────────────────────────────────────────┐ │
│ │ EVENT PROCESSOR │ │
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────────┐ │ │
│ │ │ Queue │ │ Filter │ │ Issue Classifier │ │ │
│ │ │ (Redis) │──▶ (Support │──▶ (Product, Module, │ │ │
│ │ │ │ │ Cases) │ │ Severity) │ │ │
│ │ └─────────────┘ └─────────────┘ └─────────────────────────┘ │ │
│ └───────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌───────────────────────────────────────────────────────────────────┐ │
│ │ CODE INTELLIGENCE ENGINE │ │
│ │ │ │
│ │ ┌─────────────────┐ ┌─────────────────┐ ┌──────────────┐ │ │
│ │ │ Bitbucket │ │ Code Index │ │ Context │ │ │
│ │ │ Connector │ │ (Azure OpenAI │ │ Builder │ │ │
│ │ │ │ │ Embeddings) │ │ │ │ │
│ │ │ bitbucket. │ │ - COBOL procs │ │ - CALLs │ │ │
│ │ │ tsacorp.com │ │ - SQL tables │ │ - COPYBOOKs │ │ │
│ │ │ │ │ - JCL jobs │ │ - Includes │ │ │
│ │ └─────────────────┘ └─────────────────┘ └──────────────┘ │ │
│ │ │ │
│ │ Repositories: │ │
│ │ ├── ACQ-MF (base) │ │
│ │ │ └── ACQ-MF-safra-fork (client) │ │
│ │ │ └── ACQ-MF-safra-ai (AI) ← NEW │ │
│ │ ├── ICG-MF (base) │ │
│ │ │ └── ICG-MF-safra-fork (client) │ │
│ │ │ └── ICG-MF-safra-ai (AI) ← NEW │ │
│ └───────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌───────────────────────────────────────────────────────────────────┐ │
│ │ FIX GENERATION ENGINE │ │
│ │ │ │
│ │ ┌─────────────────┐ ┌─────────────────┐ ┌──────────────┐ │ │
│ │ │ LLM Engine │ │ Fix Validator │ │ Output │ │ │
│ │ │ (Azure OpenAI) │ │ │ │ Generator │ │ │
│ │ │ - GPT-4o │ │ - Syntax check │ │ │ │ │
│ │ │ - GPT-4 Turbo │ │ - COBOL rules │ │ - JIRA │ │ │
│ │ │ │ │ - SQL lint │ │ comment │ │ │
│ │ │ │ │ - JCL validate │ │ - PR/Branch │ │ │
│ │ └─────────────────┘ └─────────────────┘ └──────────────┘ │ │
│ └───────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ┌───────────────┴───────────────┐ │
│ ▼ ▼ │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ JIRA │ │ Bitbucket │ │
│ │ Comment │ │ Pull Request│ │
│ │ (Analysis + │ │ (AI Fork) │ │
│ │ Suggestion)│ │ │ │
│ └──────────────┘ └──────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
```
---
## 2. Detailed Components
### 2.1 Event Processor
#### 2.1.1 JIRA Webhook Receiver
```yaml
Endpoint: POST /api/webhook/jira
Events:
- jira:issue_created
- jira:issue_updated
Filters:
- issueType: "Support Case"
- project: ["ACQ", "ICG"]
Authentication: Webhook Secret (HMAC-SHA256)
```
#### 2.1.2 Queue System
```yaml
Technology: Redis + Bull Queue
Queues:
- jira-events: Raw JIRA events
- analysis-jobs: Pending analysis jobs
- fix-generation: Fix generation tasks
Retry Policy:
- Max attempts: 3
- Backoff: exponential (1min, 5min, 15min)
Dead Letter Queue: jira-events-dlq
```
#### 2.1.3 Issue Classifier
Responsible for extracting metadata from issues:
```python
class IssueClassifier:
def classify(self, issue: JiraIssue) -> ClassifiedIssue:
return ClassifiedIssue(
product=self._detect_product(issue), # ACQ-MF or ICG-MF
module=self._detect_module(issue), # Authorization, Clearing, etc.
severity=self._detect_severity(issue), # P1, P2, P3
keywords=self._extract_keywords(issue), # Technical terms
stack_trace=self._parse_stack_trace(issue),
affected_programs=self._detect_programs(issue)
)
```
### 2.2 Code Intelligence Engine
#### 2.2.1 Bitbucket Connector
```yaml
Base URL: https://bitbucket.tsacorp.com
API Version: REST 1.0 (Bitbucket Server)
Authentication: Personal Access Token or OAuth
Operations:
- Clone/Pull: Sparse checkout (relevant directories only)
- Read: Specific file contents
- Branches: Create/list branches in AI fork
- Pull Requests: Create PR from AI fork → client fork
```
**Access Structure per Repository:**
| Repository | AI Permission | Usage |
|------------|---------------|-------|
| ACQ-MF (base) | READ | Reference, standards |
| ACQ-MF-safra-fork | READ | Current client code |
| ACQ-MF-safra-ai | WRITE | AI branches and commits |
| ICG-MF (base) | READ | Reference, standards |
| ICG-MF-safra-fork | READ | Current client code |
| ICG-MF-safra-ai | WRITE | AI branches and commits |
#### 2.2.2 Code Index (Embeddings)
**⚠️ IMPORTANT: Azure OpenAI Embeddings (Mandatory)**
The client has compliance requirements that mandate source code data not be processed by public APIs. Therefore, we **mandatorily** use Azure OpenAI Embeddings:
```yaml
Provider: Azure OpenAI (data remains in client's Azure tenant)
Model: text-embedding-ada-002 or text-embedding-3-large
Region: Brazil South (recommended) or East US
Compliance: Data not used for training Microsoft models
Contract: ACI's existing Enterprise Agreement
```
**Why not use GitHub Copilot for embeddings?**
- GitHub Copilot is an IDE tool, has no API for integration
- Does not offer indexing or semantic search functionality
- There is no way to use Copilot to search code programmatically
**COBOL Code Indexing:**
```yaml
Granularity: By PROGRAM-ID / SECTION / PARAGRAPH
Extracted metadata:
- PROGRAM-ID
- COPY statements (dependencies)
- CALL statements (called programs)
- FILE-CONTROL (accessed files)
- SQL EXEC (tables/queries)
- Working Storage (main variables)
Embedding Model: Azure OpenAI text-embedding-3-large
Vector DB: Qdrant (self-hosted on ACI infra) or Azure AI Search
Dimensions: 3072
Index separated by: product + client
```
**SQL Indexing:**
```yaml
Granularity: By table/view/procedure
Extracted metadata:
- Object name
- Columns and types
- Foreign keys
- Referencing procedures
```
**JCL Indexing:**
```yaml
Granularity: By JOB / STEP
Extracted metadata:
- JOB name
- Executed PGMs
- DD statements (datasets)
- Passed PARMs
- Dependencies (JCL INCLUDEs)
```
#### 2.2.3 Context Builder
Assembles relevant context for LLM analysis:
```python
class ContextBuilder:
def build_context(self, issue: ClassifiedIssue) -> AnalysisContext:
# 1. Search programs mentioned in the issue
mentioned_programs = self._search_by_keywords(issue.keywords)
# 2. Search similar programs from past issues
similar_issues = self._find_similar_issues(issue)
# 3. Expand dependencies (COPYBOOKs, CALLs)
dependencies = self._expand_dependencies(mentioned_programs)
# 4. Get configured business rules
business_rules = self._get_business_rules(issue.product)
# 5. Build final context (respecting token limit)
return AnalysisContext(
primary_code=mentioned_programs[:5], # Max 5 main programs
dependencies=dependencies[:10], # Max 10 dependencies
similar_fixes=similar_issues[:3], # Max 3 examples
business_rules=business_rules,
total_tokens=self._count_tokens()
)
```
### 2.3 Fix Generation Engine
#### 2.3.1 LLM Engine
```yaml
Primary: Azure OpenAI GPT-4o (data does not leave Azure environment)
Fallback: Azure OpenAI GPT-4 Turbo
Gateway: LiteLLM (unified interface)
Configuration:
temperature: 0.2 # Low for code
max_tokens: 4096
top_p: 0.95
```
**Note on GitHub Copilot:** The client has GitHub Copilot, however this tool is intended for use in the IDE by developers. Copilot **does not have a public API** for integration in automated systems and **does not offer embedding/indexing functionality**. Therefore, the solution uses Azure OpenAI for all AI operations.
**COBOL Prompt Template:**
```
You are an expert in mainframe payment systems,
specifically ACI Acquirer (ACQ-MF) and Interchange (ICG-MF) products.
## System Context
{business_rules}
## Reported Issue
{issue_description}
## Current Code
{code_context}
## Similar Fix History
{similar_fixes}
## Task
Analyze the issue and:
1. Identify the probable root cause
2. Locate the affected program(s)
3. Propose a specific fix
4. Explain the impact of the change
## Rules
- Maintain COBOL-85 compatibility
- Preserve existing copybook structure
- Do not change interfaces with other systems without explicit mention
- Document all proposed changes
## Response Format
{response_format}
```
#### 2.3.2 Fix Validator
**COBOL Validations:**
```yaml
Syntax:
- Compilation with GnuCOBOL (syntax check)
- Verification of referenced copybooks
Semantics:
- CALLs to existing programs
- Variables declared before use
- Compatible PIC clauses
Style:
- Standard indentation (Area A/B)
- ACI naming conventions
- Mandatory comments
```
**SQL Validations:**
```yaml
- Syntax check with SQL parser
- Verification of existing tables/columns
- Performance analysis (EXPLAIN)
```
**JCL Validations:**
```yaml
- JCL syntax check
- Referenced datasets exist
- Referenced PGMs exist
```
---
## 3. Repository Structure (AI Fork)
### 3.1 AI Fork Creation
```bash
# Proposed structure in Bitbucket
projects/
├── ACQ/
│ ├── ACQ-MF # Base product (existing)
│ ├── ACQ-MF-safra-fork # Client fork (existing)
│ └── ACQ-MF-safra-ai # AI fork (NEW)
├── ICG/
│ ├── ICG-MF # Base product (existing)
│ ├── ICG-MF-safra-fork # Client fork (existing)
│ └── ICG-MF-safra-ai # AI fork (NEW)
```
### 3.2 Branch Flow
```
ACQ-MF-safra-fork (client)
│ fork
ACQ-MF-safra-ai (AI)
├── main (sync with client)
└── ai-fix/JIRA-1234-description
│ Pull Request
ACQ-MF-safra-fork (client)
│ Review + Approve
merge
```
### 3.3 Commit Convention
```
[AI-FIX] JIRA-1234: Short fix description
Problem:
- Original problem description
Solution:
- What was changed and why
Modified files:
- src/cobol/ACQAUTH.CBL (lines 1234-1256)
Confidence: 85%
Generated by: ACI JIRA AI Fixer v1.0
Co-authored-by: ai-fixer@aci.com
```
### 3.4 Recommended Permissions
| User/Group | ACQ-MF (base) | Client Fork | AI Fork |
|------------|---------------|-------------|---------|
| ai-fixer-svc | READ | READ | WRITE |
| devs-aci | WRITE | WRITE | READ |
| tech-leads | ADMIN | ADMIN | ADMIN |
---
## 4. Technology Stack
### 4.1 Backend
```yaml
Runtime: Python 3.11+
Framework: FastAPI
Async: asyncio + httpx
Queue: Redis 7+ with Bull Queue (via Python-RQ or Celery)
Database: PostgreSQL 15+ (metadata, configurations, logs)
Vector DB: Qdrant 1.7+ (self-hosted)
Cache: Redis
```
### 4.2 Frontend (Admin Panel)
```yaml
Framework: React 18+ or Vue 3+
UI Kit: Tailwind CSS + shadcn/ui
State: React Query or Pinia
Build: Vite
```
### 4.3 Infrastructure
```yaml
Container: Docker + Docker Compose
Orchestration: Docker Swarm (initial) or Kubernetes (scale)
CI/CD: Bitbucket Pipelines
Reverse Proxy: Traefik or nginx
SSL: Let's Encrypt
Monitoring: Prometheus + Grafana
Logs: ELK Stack or Loki
```
### 4.4 External Integrations
```yaml
LLM (Azure OpenAI - MANDATORY):
Primary: Azure OpenAI GPT-4o
Fallback: Azure OpenAI GPT-4 Turbo
Region: Brazil South or East US
Gateway: LiteLLM (natively supports Azure OpenAI)
Compliance: Data not used for training, stays in Azure tenant
Embeddings (Azure OpenAI - MANDATORY):
Model: Azure OpenAI text-embedding-3-large
Alternative: Azure OpenAI text-embedding-ada-002
Vector DB: Qdrant (self-hosted) or Azure AI Search
JIRA:
API: REST API v2 (Server)
Auth: Personal Access Token
Bitbucket:
API: REST API 1.0 (Server)
Auth: Personal Access Token
```
**⚠️ Note on GitHub Copilot:**
The client has GitHub Copilot licenses, however this tool **is not applicable** for this solution because:
1. It's an IDE tool (code autocomplete), not an API
2. Has no public endpoint for programmatic integration
3. Does not offer embeddings or semantic search functionality
4. Does not allow indexing or querying code repositories
GitHub Copilot will continue to be used by developers in their daily work, while the ACI AI Fixer solution uses Azure OpenAI for automation.
---
## 5. Security
### 5.1 Sensitive Data
```yaml
Source code:
- Processed in memory, not persisted to disk
- Embeddings stored in Qdrant (encrypted at-rest)
- Sanitized logs (no complete code)
Credentials:
- Vault (HashiCorp) or AWS Secrets Manager
- Automatic token rotation
- Access audit log
LLM and Embeddings:
- MANDATORY: Azure OpenAI (data does not leave Azure tenant)
- Data is not used to train Microsoft models
- Compliance with ACI corporate policies
- Brazil South region for lower latency
```
### 5.2 Network
```yaml
Deployment:
- Internal network (not exposed to internet)
- HTTPS/TLS 1.3 communication
- Firewall: only JIRA and Bitbucket can access webhooks
Authentication:
- Admin Panel: SSO via SAML/OIDC (integrate with ACI AD)
- API: JWT tokens with short expiration
- Webhooks: HMAC-SHA256 signature verification
```
### 5.3 Compliance
```yaml
Requirements:
- [ ] Data segregation by client/fork
- [ ] Complete audit trail (who, when, what)
- [ ] Configurable log retention
- [ ] Option for 100% on-premise processing
- [ ] Data flow documentation
```
---
## 6. Estimates
### 6.1 Development Timeline
| Phase | Duration | Deliverables |
|-------|----------|--------------|
| **1. Initial Setup** | 2 weeks | Infra, repos, basic CI/CD |
| **2. Integrations** | 3 weeks | JIRA webhook, Bitbucket connector |
| **3. Code Intelligence** | 4 weeks | COBOL/SQL/JCL indexing, embeddings |
| **4. Fix Engine** | 3 weeks | LLM integration, prompt engineering |
| **5. Output & PR** | 2 weeks | JIRA comments, Bitbucket PRs |
| **6. Admin Panel** | 2 weeks | Dashboard, configurations |
| **7. Tests & Adjustments** | 2 weeks | Validation with real issues |
| **Total MVP** | **18 weeks** | ~4.5 months |
### 6.2 Suggested Team
| Role | Quantity | Dedication |
|------|----------|------------|
| Tech Lead | 1 | 100% |
| Backend Developer | 2 | 100% |
| Frontend Developer | 1 | 50% |
| DevOps | 1 | 25% |
| **Total** | **5** | |
### 6.3 Monthly Operational Costs (Estimate)
| Item | Cost/Month |
|------|------------|
| LLM APIs (10 issues × ~$3/issue) | ~$30 |
| Infra (VPS/On-premise) | $200-500 |
| Vector DB (Qdrant self-hosted) | $0 (infra) |
| **Total** | **~$230-530/month** |
*Note: Low volume (5-10 issues/month) results in minimal operational cost.*
---
## 7. Technical Risks and Mitigations
| Risk | Probability | Impact | Mitigation |
|------|-------------|--------|------------|
| LLM generates incorrect fix | High | High | Mandatory human review, confidence score |
| Insufficient COBOL context | Medium | High | RAG with copybooks, fix examples |
| High latency | Low | Medium | Async queue, visual feedback |
| Bitbucket API rate limit | Low | Low | Aggressive cache, sparse checkout |
| Security (code exposure) | Medium | High | Azure OpenAI or self-hosted LLM |
---
## 8. Success Metrics
### 8.1 Technical KPIs
| Metric | MVP Target | 6-Month Target |
|--------|------------|----------------|
| Successful analysis rate | 80% | 95% |
| Accepted fixes (no modification) | 30% | 50% |
| Accepted fixes (with adjustments) | 50% | 70% |
| Average analysis time | < 5 min | < 2 min |
| System uptime | 95% | 99% |
### 8.2 Business KPIs
| Metric | Target |
|--------|--------|
| Initial analysis time reduction | 50% |
| Issues with useful suggestion | 70% |
| Team satisfaction | > 4/5 |
---
## 9. Next Steps
1. **Week 1-2:**
- Provision development infrastructure
- Create AI forks in Bitbucket
- Configure JIRA webhooks (test environment)
2. **Week 3-4:**
- Implement Bitbucket connector
- Index code from 1 repository (ACQ-MF-safra-fork)
- Test embeddings with 5 historical issues
3. **Week 5-6:**
- Integrate LLM (Azure OpenAI GPT-4o)
- Develop COBOL-specific prompts
- Validate outputs with technical team
---
**Document prepared for technical review.**
*Contact: [Development Team]*