Parse and Interpret CSV
Read the JSON file at the output path. It contains an array of row objects from the parsed CSV.
Extract the column names from the keys of the first row object.
For each column, apply the interpretation guide:
- Examine the header name for semantic meaning
- Look at sample values from the data
- Match against the column types requested in the parent task's context
Construct the interpretation output (do not write yet):
{
"source": "[CSV file path from context]",
"rowCount": [number of rows],
"columns": [list of column names],
"interpretations": { [column]: { "type": "...", "confidence": "...", "reasoning": "..." } },
"detected": { "[type]Columns": [columns matching each requested type] },
"possibleAnalyses": [what analyses this data supports],
"data": [original row array]
}
Log a summary of detected column types.
Validate the data quality using the validation guide:
- Check each column for missing values (nulls, empty strings)
- For date columns, verify formats are parseable; flag invalid dates with row numbers
- If an ID column exists, check for duplicates
- Flag suspicious values: negative amounts, future dates in historical columns, percentages outside 0-100
Add a "validation" object to the output with findings:
{
"validation": {
"status": "clean | warnings | errors",
"issues": [{ "type": "...", "column": "...", "severity": "...", "count": N, "rows": [...], "message": "..." }],
"summary": "Brief assessment"
}
}
Write the complete output (interpretations + validation) to the output file.
If errors found, ask user how to proceed before downstream analysis.
To run this task you must have the following required information:
> CSV file path to parse. Column type hints (e.g., "scores, customers, dates, categories"). Output file path for the interpreted data.
If you don't have all of this information, exit here and respond asking for any extra information you require, and instructions to run this task again with ALL required information.
---
You MUST use a todo list to complete these steps in order. Never move on to one step if you haven't completed the previous step. If you have multiple read steps in a row, read them all at once (in parallel).
Add all steps to your todo list now and begin executing.
## Steps
1. [Gather Arguments: Parse CSV] The next step has the following requirements for arguments, do not proceed until you have all the required information:
- `inputPath`: CSV file path from requirements
- `outputPath`: output path from requirements
- `hasHeaders` (default: "true") - Whether first row is headers: true, false
- `delimiter` - Field delimiter (auto-detected if empty)
- Packages: papaparse
2. [Run Code: Parse CSV]: Call `run_script` with:
```json
{
"file": {
"path": https://sk.ills.app/code/stdlib.csv.parse/preview,
"args": [
"inputPath",
"outputPath",
"hasHeaders",
"delimiter"
]
},
"packages": ["papaparse"]
}
```
3. [Read CSV Column Interpretation Guide]: Read the documentation in: `./skills/sauna/[skill_id]/references/stdlib.csv.interpretation.md` (Semantic column interpretation guidance)
4. Read the JSON file at the output path. It contains an array of row objects from the parsed CSV.
Extract the column names from the keys of the first row object.
For each column, apply the interpretation guide:
1. Examine the header name for semantic meaning
2. Look at sample values from the data
3. Match against the column types requested in the parent task's context
Construct the interpretation output (do not write yet):
{
"source": "[CSV file path from context]",
"rowCount": [number of rows],
"columns": [list of column names],
"interpretations": { [column]: { "type": "...", "confidence": "...", "reasoning": "..." } },
"detected": { "[type]Columns": [columns matching each requested type] },
"possibleAnalyses": [what analyses this data supports],
"data": [original row array]
}
Log a summary of detected column types.
5. [Read CSV Data Validation Guide]: Read the documentation in: `./skills/sauna/[skill_id]/references/shared.csv.validation.md` (Data quality validation rules)
6. Validate the data quality using the validation guide:
1. Check each column for missing values (nulls, empty strings)
2. For date columns, verify formats are parseable; flag invalid dates with row numbers
3. If an ID column exists, check for duplicates
4. Flag suspicious values: negative amounts, future dates in historical columns, percentages outside 0-100
Add a "validation" object to the output with findings:
{
"validation": {
"status": "clean | warnings | errors",
"issues": [{ "type": "...", "column": "...", "severity": "...", "count": N, "rows": [...], "message": "..." }],
"summary": "Brief assessment"
}
}
Write the complete output (interpretations + validation) to the output file.
If errors found, ask user how to proceed before downstream analysis.