Home/Working with Files Using Claude/Episode 1
BeginnerEpisode 1 of 48 min

Clean CSV Files Using Claude Code

How to give Claude Code a CSV and get clean, usable data back.

CSV Basics: Reading and Cleaning Data

In Getting Started, you cleaned up a messy contacts file. That was the appetizer. Now let's get into the real stuff — the kinds of CSV tasks that eat up hours of your week.

What's a CSV, really?

A CSV (comma-separated values) file is just a text file where each line is a row, and commas separate the columns. Open one in Notepad and you'll see something like:

name,email,amount
Jane Smith,[email protected],1500
Bob Lee,[email protected],2300

Every spreadsheet app — Excel, Google Sheets, Numbers — can export to CSV. It's the universal format for moving data between tools. That's why you'll bump into them constantly.

The problem with CSVs

CSVs look simple, but they're full of traps:

  • Inconsistent formatting — one row has "CA", another has "California", another has "calif."
  • Missing values — half the rows have no phone number
  • Duplicates — the same person shows up three times with slightly different names
  • Mixed data types — some "numbers" are actually text with dollar signs or commas
  • Encoding issues — names with accents or special characters show up as garbage

You could fix all of this manually in Excel. But why would you, when you can describe what you want in English?

Reading and understanding a CSV

Before you clean anything, it helps to know what you're working with. Download sample.csv and try this:

Read sample.csv. Tell me:
- How many rows and columns
- What the column names are
- Whether there are any empty cells, and in which columns
- A sample of the first 5 rows

This gives you a quick snapshot without opening the file. It's especially useful for large CSVs that make Excel choke.

Cleaning up messy data

This is the task you'll run most often. Say you exported a customer list from your CRM and it's a mess. Download customers.csv — it has inconsistent state names, messy phone numbers, and duplicates:

Read customers.csv. Clean it up:
- Standardize the "state" column to two-letter abbreviations (e.g., "California" → "CA")
- Format the "phone" column as (XXX) XXX-XXXX
- Remove rows where both email AND phone are empty
- Trim extra whitespace from all fields
- Deduplicate by email, keeping the row with the most filled-in fields
Save as customers_clean.csv

Notice the specificity. "Clean it up" alone would give you unpredictable results. Each bullet tells Claude exactly what "clean" means to you.

Filtering and extracting subsets

Sometimes you don't need the whole file, just a slice of it. Download orders.csv and try:

Read orders.csv.
Create a new CSV with only:
- Orders from 2025
- Where the total is over $500
- Sorted by date, most recent first
Save as big_orders_2025.csv

Or pulling out a specific list. Download contacts.csv for this example:

Read contacts.csv.
Extract just the email addresses into a plain text file, one per line.
Only include contacts where the "opted_in" column is "yes".
Save as email_list.txt

Reformatting for import

You have data in one format. Another tool needs it in a different format. Download hubspot_export.csv to try this:

Read hubspot_export.csv.
Reformat it to match Mailchimp's import format:
- Rename "Email Address" to "email"
- Rename "First Name" to "fname" and "Last Name" to "lname"
- Combine "City", "State", and "Zip" into a single "address" column, comma-separated
- Drop all other columns
Save as mailchimp_import.csv

You don't need to memorize any tool's import format. Just paste the column requirements from their help docs into your instruction, and Claude will figure out the mapping.

Adding calculated columns

Claude Code can add new data based on what's already there. Download sales_log.csv and try:

Read sales_log.csv.
Add these columns:
- "revenue" = quantity × unit_price
- "quarter" = Q1/Q2/Q3/Q4 based on the date column
- "deal_size" = "small" if revenue < 1000, "medium" if 1000-5000, "large" if over 5000
Save as sales_enriched.csv

Generating summaries

Sometimes you don't want another CSV. You want answers. Download expenses_2025.csv and try:

Read expenses_2025.csv.
Create a summary report that includes:
- Total expenses by category
- Highest single expense
- Average monthly spend
- Any categories that increased more than 20% from the prior month
Format as a markdown file. Save as expense_report.md

The output here isn't a CSV at all, it's a readable document. The output format is up to you.

Handling large CSVs

Claude Code can handle CSVs with thousands of rows, but a few tips help.

"Summarize" is faster than "rewrite the entire file" for a 50,000-row dataset. Be specific about what you need.

For bigger jobs, ask Claude to work with a subset first. "Process the first 100 rows and show me the result before doing the rest" lets you verify the output before committing to the full run.

And don't ask it to print the whole file. If you say "show me the data," Claude will try to display all 50,000 rows in the terminal. Ask for a summary or a sample instead.

Getting quick answers

Sometimes you don't need a file — you just want an answer:

Read expenses_2025.csv.
Tell me: what's the total amount spent, and which category has the highest spend?

Claude prints the answer directly in your terminal:

Total spend: $12,847.50
Highest category: Software & Subscriptions ($4,230.00)

No output file needed. Ask questions, get answers.

Download expenses_2025.csv to try this yourself.

Recap

CSVs are the most common file format you'll throw at Claude Code. The pattern is always the same:

  1. Tell it what file to read
  2. Describe the transformation in specific terms
  3. Tell it what to save and where

You already know how to write good instructions from Getting Started. The only new piece here is knowing what's possible. Now you've seen filtering, reformatting, calculating, summarizing, and cleaning. Combine them however you need to.

Next up

Most real-world data doesn't start as a CSV. In the next lesson, you'll learn how to work with Excel and Google Sheets exports — getting data out of spreadsheets and into a format Claude Code can work with.

Next: Process Excel and Google Sheets Using AI