Task

Parse and Interpret CSV

Parse CSV and semantically detect column types based on requirements

Requirements

CSV file path to parse. Column type hints (e.g., "scores, customers, dates, categories"). Output file path for the interpreted data.

Run Parse CSV code →

Parse the CSV file into structured JSON

Read CSV Column Interpretation Guide slice →

Semantic column interpretation guidance

Read the JSON file at the output path. It contains an array of row objects from the parsed CSV.

Extract the column names from the keys of the first row object.

For each column, apply the interpretation guide:

Examine the header name for semantic meaning
Look at sample values from the data
Match against the column types requested in the parent task's context

Construct the interpretation output (do not write yet):
{
"source": "[CSV file path from context]",
"rowCount": [number of rows],
"columns": [list of column names],
"interpretations": { [column]: { "type": "...", "confidence": "...", "reasoning": "..." } },
"detected": { "[type]Columns": [columns matching each requested type] },
"possibleAnalyses": [what analyses this data supports],
"data": [original row array]
}

Log a summary of detected column types.

Read CSV Data Validation Guide slice →

Data quality validation rules

Validate the data quality using the validation guide:

Check each column for missing values (nulls, empty strings)
For date columns, verify formats are parseable; flag invalid dates with row numbers
If an ID column exists, check for duplicates
Flag suspicious values: negative amounts, future dates in historical columns, percentages outside 0-100

Add a "validation" object to the output with findings:
{
"validation": {
"status": "clean | warnings | errors",
"issues": [{ "type": "...", "column": "...", "severity": "...", "count": N, "rows": [...], "message": "..." }],
"summary": "Brief assessment"
}
}

Write the complete output (interpretations + validation) to the output file.
If errors found, ask user how to proceed before downstream analysis.

                    To run this task you must have the following required information:

> CSV file path to parse. Column type hints (e.g., "scores, customers, dates, categories"). Output file path for the interpreted data.

If you don't have all of this information, exit here and respond asking for any extra information you require, and instructions to run this task again with ALL required information.

---

You MUST use a todo list to complete these steps in order. Never move on to one step if you haven't completed the previous step. If you have multiple CONSECUTIVE read steps in a row, read them all at once (in parallel). Otherwise, do not read a file until you reach that step.

Add all steps to your todo list now and begin executing.

## Steps

1. [Gather Arguments: Parse CSV] The next step has the following requirements for arguments, do not proceed until you have all the required information:
- `inputPath`: CSV file path from requirements
- `outputPath`: output path from requirements
- `hasHeaders` (default: "true") - Whether first row is headers: true, false
- `delimiter` - Field delimiter (auto-detected if empty)
- Packages: papaparse

2. [Run Code: Parse CSV]: Call `run_script` with:

```json
{
  "file": {
    "path": https://sk.ills.app/code/stdlib.csv.parse/preview,
    "args": [
      "inputPath",
      "outputPath",
      "hasHeaders",
      "delimiter"
    ]
  },
  "packages": ["papaparse"]
}
```

3. [Read CSV Column Interpretation Guide]: Read the documentation in: `skills/sauna/[skill_id]/references/stdlib.csv.interpretation.md` (Semantic column interpretation guidance)

4. Read the JSON file at the output path. It contains an array of row objects from the parsed CSV.

Extract the column names from the keys of the first row object.

For each column, apply the interpretation guide:
1. Examine the header name for semantic meaning
2. Look at sample values from the data
3. Match against the column types requested in the parent task's context

Construct the interpretation output (do not write yet):
{
  "source": "[CSV file path from context]",
  "rowCount": [number of rows],
  "columns": [list of column names],
  "interpretations": { [column]: { "type": "...", "confidence": "...", "reasoning": "..." } },
  "detected": { "[type]Columns": [columns matching each requested type] },
  "possibleAnalyses": [what analyses this data supports],
  "data": [original row array]
}

Log a summary of detected column types.


5. [Read CSV Data Validation Guide]: Read the documentation in: `skills/sauna/[skill_id]/references/shared.csv.validation.md` (Data quality validation rules)

6. Validate the data quality using the validation guide:

1. Check each column for missing values (nulls, empty strings)
2. For date columns, verify formats are parseable; flag invalid dates with row numbers
3. If an ID column exists, check for duplicates
4. Flag suspicious values: negative amounts, future dates in historical columns, percentages outside 0-100

Add a "validation" object to the output with findings:
{
  "validation": {
    "status": "clean | warnings | errors",
    "issues": [{ "type": "...", "column": "...", "severity": "...", "count": N, "rows": [...], "message": "..." }],
    "summary": "Brief assessment"
  }
}

Write the complete output (interpretations + validation) to the output file.
If errors found, ask user how to proceed before downstream analysis.

task:stdlib.csv.interpret