Data and Getting Data

Structured Data

How do we get data from a RDB (relational database)?

A query (SQL code) to a Table in a DB

RDB (Relational Database)

A piece of software that enables you to relate tables of data together and perform actions on that data.

Joins

Inner Join

Include any row in both tables

Left Join

Include all rows in first table

Right Join

Include all rows in second table

Full Join (Outer)

Include any row in either table

RDB

A relational database (RDB) is a way of structuring information in tables, rows, and columns. An RDB has the ability to establish links - or relationships - between information by joining tables, which makes it easy to understand and gain insights about the relationship between various data points.

Unstructured Data

Unstructured Data Types

  • Text files and documents
  • Websites and applications
  • Sensor data
  • Image files
  • Audio files
  • Video files
  • Email data
  • Social media data

Questions to ask when dealing with data

  1. How do I access the data?
  2. What is the format of this data?
  3. Technical/ethical concerns accessing data?
  4. How should I store this data?

Review

Data Types

  • Qualitative vs. Quantitative
  • What is data?
  • What is a data standard, e.g. ISO-8601

Data Formats

  • Structured vs. semi-structured vs. unstructured
  • What is CSV, XML, JSON, how are they different, how are they alike
  • What is the relationship betrween RDB and SQL
  • How do we access data from a RDB
  • How do joins in SQL allow us to get rich and complex data from a RDBMS

Understand the Process

  • What questions do we need to ask when reviewing the data used in a data sci project?
  • Be able to talk about how tht example projects we went over in class were able to use raw data
    E.g. a raw google street image picture was used to create new data, which was then analyzed.