Data Frosch

How do I begin working with data?

Skills and tools to consider when leveling up your data game 💪

Computing Frogs

When working with databases and large datasets, the way we work really depends on our current capabilities and research needs.

Consider for example the Financial Transparency System, where the European Commission publishes their yearly expenditures. There are multiple ways to access this data.

We can access it through a web interface. This is suitable for quick lookups and filters. But we can also download the data and analyse them locally on our computer. This makes it possible to interrogate older data, analyze trends, and generally allows for more flexibility.

Different approaches lend themselves for different sort of analysis and require different skills. Let's explore these approaches below.

Level 1: Web Interface Only

Pros: No technical skills required

Cons: Data analysis not possible

Tools: browser

Time investment to master: hours

This approach is best for quick searches, look-ups of names, fact-checking, small-scale verification. The time investment is limited to learning how the web interface works, what data is in there, how it's collected, what are the columns, what data is not there. Usually pretty straightforward. Oftentimes we can also export data from these interfaces to move on to spreadsheet analysis.

Level 2: Spreadsheet Analysis

Pros: Spreadsheet software widely available, low bar in learning

Cons: Sheets with more than ~100,000 records might get more slow to process

Tools: Excel, Google Sheets, LibreOffice Calc

Time investment to master: days

In spreadsheet analysis, basic data skills are required: basic data cleaning, sorting, filtering, pivot tables, lookups and simple visualisation are usually enough for most day-to-day journalistic analysis. The time investment to learn spreadsheet techniques is measured in days and therefore still pretty approachable.

Level 3: Database Tools

Pros: Large-scale analysis

Cons: Data visualisation not supported by most databases

Tools: SQL, DuckDB, Neo4J

Time investment to master: weeks

With database tools, we are approaching master levels of data analysis. We can analyse and combine tables of millions of records. There are databases that handle tabular data (SQL, DuckDB), relational data (Neo4J), read data as rows (SQL) or columns (DuckDB). All these tools are free to use, most of them are open source, so we are not dependent on proprietary software. With database analysis, we can handle big datasets pretty well and learn the complexities that arise from joining and grouping data. We learn about unique identifiers and why they are important. The time investment gets a bit larger here and is measured in weeks rather than days.

Level 4: Programming Languages

Pros: Swiss army knife, maximal freedom in analysis

Cons: Time investment

Time investment: months

Learning Python or R gives us all the freedom we want when it comes to data analysis. We can analyse and combine tables of millions of records. We can ask any questions, do any transformation, visualize anything. We can write scrapers, analyse relational or geographical data. We can plug in APIs and build proper software solutions. Moreover, we can combine them with spreadsheets (for example Google Sheets API) or databases (connect to a database if we have too much data to fit into memory) if we need to. The time investment is significant and therefore requires commitment.

Chatbots

AI chatbots help us using all of the skills and tools above. We do not want chatbots to analyse our data directly. But they can write scripts for us and explain how the tools work.

Moreover, there are special applications:

ChatGPT

  • Most widely used all round model
  • API integration for scripts
  • Codex for coding

Claude

  • Sonnet model is good for agents, coding, and general computer use
  • Web interface includes interactive artifacts for data visualization
  • Claude code for coding on your own computer

DeepSeek

  • All-round model
  • Cheap API

Gemini Tools

You can encounter document upload limits if using on a free tier

Local models

Mentorship

Would you like some help on the way to data skills mastery? Join our mentorship program!

Not sure it's right for you? Let's chat! ☕

Sign up for our newsletter!

Ready to jump-start your data skills? Keep up to date with the newest courses and tutorials.

🪷 Join the Pond 🪷