CSV Column Interpretation Guide

Context Slice

Purpose

This guide helps interpret CSV columns semantically—understanding what data they contain based on meaning, not keyword matching. A column named "Customer_Rating" is a score column even though it doesn't contain the word "score."

Interpretation Process

For each column, examine three signals:

Header name — What does the name suggest? Consider synonyms, abbreviations, domain conventions
Sample values — What does the actual data look like? Numbers, dates, text patterns?
Context hints — What column types were requested? Match against those categories

Common Column Categories

Score Columns

Headers suggesting ratings, satisfaction, or numeric evaluations:

Direct: score, rating, nps, csat, satisfaction, stars
Indirect: sentiment, feedback_score, review_rating, customer_rating, health_score
Patterns: typically 1-10, 1-5, or 0-100 scales; sometimes -100 to 100 (NPS)

Customer/Account Columns

Headers identifying entities being measured:

Direct: customer, account, company, client, organization, tenant
Indirect: customer_name, account_id, company_name, org, business_name
Patterns: unique identifiers, names, or codes that repeat across rows

Date/Time Columns

Headers indicating when something occurred:

Direct: date, time, timestamp, datetime, created, updated
Indirect: created_at, submitted_on, closed_date, renewal_date, start_date, end_date
Patterns: ISO dates, US dates (MM/DD/YYYY), epoch timestamps, relative dates

Amount/Value Columns

Headers indicating monetary or quantity values:

Direct: amount, value, revenue, price, cost, total, sum
Indirect: deal_value, arr, mrr, contract_value, order_total, spend
Patterns: numbers often with currency symbols ($, €) or large values

Category/Type Columns

Headers indicating classification or grouping:

Direct: category, type, status, priority, tier, segment, stage
Indirect: issue_type, ticket_category, deal_stage, customer_tier, severity
Patterns: limited set of repeated string values (low cardinality)

Person/Rep Columns

Headers identifying people (employees, owners, assignees):

Direct: rep, owner, assignee, agent, manager, employee
Indirect: sales_rep, account_owner, assigned_to, created_by, handled_by
Patterns: names or email addresses, limited unique values

Campaign/Source Columns

Headers indicating origin or attribution:

Direct: campaign, source, channel, medium, referrer
Indirect: utm_source, lead_source, marketing_campaign, acquisition_channel
Patterns: campaign names, channel codes, UTM parameters

Ticket/Issue Columns

Headers related to support cases:

Direct: ticket, issue, case, incident, request
Indirect: ticket_id, case_number, issue_description, support_request
Patterns: IDs (often numeric), or text descriptions

Usage/Activity Columns

Headers indicating engagement or usage metrics:

Direct: usage, logins, sessions, active, engagement
Indirect: login_count, daily_active, feature_usage, page_views, api_calls
Patterns: numeric counts, often integers

Confidence Levels

Assign confidence based on signal strength:

High: Header clearly indicates type AND sample values confirm it
Medium: Header suggests type OR sample values match, but not both
Low: Weak signals, ambiguous header, values could fit multiple types

Output Format

Update the parsed CSV output file with interpretation metadata:

{
  "source": "original file path",
  "rowCount": N,
  "columns": ["header1", "header2", ...],
  "interpretations": {
    "header1": {
      "type": "score|customer|date|amount|category|person|campaign|ticket|usage|text",
      "confidence": "high|medium|low",
      "reasoning": "Brief explanation of why this classification"
    }
  },
  "detected": {
    "scoreColumns": ["columns identified as scores"],
    "customerColumns": ["columns identified as customers"],
    "dateColumns": ["columns identified as dates"]
  },
  "possibleAnalyses": ["what analyses this data supports based on detected columns"],
  "data": [all rows from parsed CSV]
}

The detected object should include keys matching the column types requested in requirements.

Handling Ambiguity

When a column could fit multiple types:

Prefer the type explicitly mentioned in requirements
Let sample values break ties (dates look like dates, scores look like scores)
If still ambiguous, note it and pick the most likely based on context
Flag low confidence so downstream analysis can ask for clarification

Examples

Header	Sample Values	Likely Type	Confidence
CSAT_Score	4, 5, 3, 5, 4	score	high
Customer_Rating	8.5, 9.0, 7.2	score	high
Account	"Acme Corp", "Widget Inc"	customer	high
CustomerID	10042, 10043, 10044	customer	high
Created	2024-01-15, 2024-01-16	date	high
Status	"Open", "Closed", "Pending"	category	high
Owner	"John Smith", "Jane Doe"	person	high
Amount	$50,000, $75,000	amount	high
Notes	"Customer requested..."	text	medium
Value	42, 87, 15	amount OR score	low (needs context)

Slice Info

Description

Semantic guidance for interpreting CSV column types based on headers and sample values

Tokens

1,239

Used By

Data Utilities skill

Analyze A/B Test Results task

Analyze Adoption Patterns task

Show 6 more

slice:stdlib.csv.interpretation