task icon Task

Clean CSV Data

Requirements
CSV data to clean (file or pasted content)
2

Get the CSV data and understand the issues:

DATA

  • CSV file or pasted content
  • What should this data represent?

KNOWN ISSUES

  • Specific problems they've noticed
  • Columns with issues
  • Rows to filter out

DESIRED OUTPUT

  • What does "clean" mean for this data?
  • Keep or remove duplicates?
  • How to handle missing values?
3

Analyze the data and identify issues:

STRUCTURE

  • Row count
  • Column count and names
  • Apparent data types per column

QUALITY ISSUES

  • Missing values (which columns, how many)
  • Duplicate rows (count)
  • Formatting inconsistencies
  • Potential data type problems

Present a summary of findings and recommended fixes.

4

Apply cleaning operations:

  • Remove or fill missing values as agreed
  • Remove duplicates if requested
  • Fix formatting (whitespace, case, dates)
  • Standardize values

Output the cleaned CSV with a summary of changes made.

                    To run this task you must have the following required information:

> CSV data to clean (file or pasted content)

If you don't have all of this information, exit here and respond asking for any extra information you require, and instructions to run this task again with ALL required information.

---

You MUST use a todo list to complete these steps in order. Never move on to one step if you haven't completed the previous step. If you have multiple read steps in a row, read them all at once (in parallel).

Add all steps to your todo list now and begin executing.

## Steps

1. [Read CSV Transformation Guide]: Read the documentation in: `./skills/sauna/[skill_id]/references/data.csv.guide.md`

2. Get the CSV data and understand the issues:

DATA
- CSV file or pasted content
- What should this data represent?

KNOWN ISSUES
- Specific problems they've noticed
- Columns with issues
- Rows to filter out

DESIRED OUTPUT
- What does "clean" mean for this data?
- Keep or remove duplicates?
- How to handle missing values?


3. Analyze the data and identify issues:

STRUCTURE
- Row count
- Column count and names
- Apparent data types per column

QUALITY ISSUES
- Missing values (which columns, how many)
- Duplicate rows (count)
- Formatting inconsistencies
- Potential data type problems

Present a summary of findings and recommended fixes.


4. Apply cleaning operations:

- Remove or fill missing values as agreed
- Remove duplicates if requested
- Fix formatting (whitespace, case, dates)
- Standardize values

Output the cleaned CSV with a summary of changes made.