slice icon Context Slice

Purpose

This guide helps interpret CSV columns semantically—understanding what data they contain based on meaning, not keyword matching. A column named "Customer_Rating" is a score column even though it doesn't contain the word "score."

Interpretation Process

For each column, examine three signals:

  1. Header name — What does the name suggest? Consider synonyms, abbreviations, domain conventions
  2. Sample values — What does the actual data look like? Numbers, dates, text patterns?
  3. Context hints — What column types were requested? Match against those categories

Common Column Categories

Score Columns

Headers suggesting ratings, satisfaction, or numeric evaluations:

  • Direct: score, rating, nps, csat, satisfaction, stars
  • Indirect: sentiment, feedback_score, review_rating, customer_rating, health_score
  • Patterns: typically 1-10, 1-5, or 0-100 scales; sometimes -100 to 100 (NPS)

Customer/Account Columns

Headers identifying entities being measured:

  • Direct: customer, account, company, client, organization, tenant
  • Indirect: customer_name, account_id, company_name, org, business_name
  • Patterns: unique identifiers, names, or codes that repeat across rows

Date/Time Columns

Headers indicating when something occurred:

  • Direct: date, time, timestamp, datetime, created, updated
  • Indirect: created_at, submitted_on, closed_date, renewal_date, start_date, end_date
  • Patterns: ISO dates, US dates (MM/DD/YYYY), epoch timestamps, relative dates

Amount/Value Columns

Headers indicating monetary or quantity values:

  • Direct: amount, value, revenue, price, cost, total, sum
  • Indirect: deal_value, arr, mrr, contract_value, order_total, spend
  • Patterns: numbers often with currency symbols ($, €) or large values

Category/Type Columns

Headers indicating classification or grouping:

  • Direct: category, type, status, priority, tier, segment, stage
  • Indirect: issue_type, ticket_category, deal_stage, customer_tier, severity
  • Patterns: limited set of repeated string values (low cardinality)

Person/Rep Columns

Headers identifying people (employees, owners, assignees):

  • Direct: rep, owner, assignee, agent, manager, employee
  • Indirect: sales_rep, account_owner, assigned_to, created_by, handled_by
  • Patterns: names or email addresses, limited unique values

Campaign/Source Columns

Headers indicating origin or attribution:

  • Direct: campaign, source, channel, medium, referrer
  • Indirect: utm_source, lead_source, marketing_campaign, acquisition_channel
  • Patterns: campaign names, channel codes, UTM parameters

Ticket/Issue Columns

Headers related to support cases:

  • Direct: ticket, issue, case, incident, request
  • Indirect: ticket_id, case_number, issue_description, support_request
  • Patterns: IDs (often numeric), or text descriptions

Usage/Activity Columns

Headers indicating engagement or usage metrics:

  • Direct: usage, logins, sessions, active, engagement
  • Indirect: login_count, daily_active, feature_usage, page_views, api_calls
  • Patterns: numeric counts, often integers

Confidence Levels

Assign confidence based on signal strength:

  • High: Header clearly indicates type AND sample values confirm it
  • Medium: Header suggests type OR sample values match, but not both
  • Low: Weak signals, ambiguous header, values could fit multiple types

Output Format

Update the parsed CSV output file with interpretation metadata:

{
  "source": "original file path",
  "rowCount": N,
  "columns": ["header1", "header2", ...],
  "interpretations": {
    "header1": {
      "type": "score|customer|date|amount|category|person|campaign|ticket|usage|text",
      "confidence": "high|medium|low",
      "reasoning": "Brief explanation of why this classification"
    }
  },
  "detected": {
    "scoreColumns": ["columns identified as scores"],
    "customerColumns": ["columns identified as customers"],
    "dateColumns": ["columns identified as dates"]
  },
  "possibleAnalyses": ["what analyses this data supports based on detected columns"],
  "data": [all rows from parsed CSV]
}

The detected object should include keys matching the column types requested in requirements.

Handling Ambiguity

When a column could fit multiple types:

  1. Prefer the type explicitly mentioned in requirements
  2. Let sample values break ties (dates look like dates, scores look like scores)
  3. If still ambiguous, note it and pick the most likely based on context
  4. Flag low confidence so downstream analysis can ask for clarification

Examples

Header Sample Values Likely Type Confidence
CSAT_Score 4, 5, 3, 5, 4 score high
Customer_Rating 8.5, 9.0, 7.2 score high
Account "Acme Corp", "Widget Inc" customer high
CustomerID 10042, 10043, 10044 customer high
Created 2024-01-15, 2024-01-16 date high
Status "Open", "Closed", "Pending" category high
Owner "John Smith", "Jane Doe" person high
Amount $50,000, $75,000 amount high
Notes "Customer requested..." text medium
Value 42, 87, 15 amount OR score low (needs context)