Data and Getting Data
Structured Data
How do we get data from a RDB (relational database)?
A query (SQL code) to a Table in a DB
RDB (Relational Database)
A piece of software that enables you to relate tables of data together and perform actions on that data.
Joins
Inner Join
Include any row in both tables
Left Join
Include all rows in first table
Right Join
Include all rows in second table
Full Join (Outer)
Include any row in either table
RDB
A relational database (RDB) is a way of structuring information in tables, rows, and columns. An RDB has the ability to establish links - or relationships - between information by joining tables, which makes it easy to understand and gain insights about the relationship between various data points.
Unstructured Data
Unstructured Data Types
- Text files and documents
- Websites and applications
- Sensor data
- Image files
- Audio files
- Video files
- Email data
- Social media data
Questions to ask when dealing with data
- How do I access the data?
- What is the format of this data?
- Technical/ethical concerns accessing data?
- How should I store this data?
Review
Data Types
- Qualitative vs. Quantitative
- What is data?
- What is a data standard, e.g. ISO-8601
Data Formats
- Structured vs. semi-structured vs. unstructured
- What is CSV, XML, JSON, how are they different, how are they alike
- What is the relationship betrween RDB and SQL
- How do we access data from a RDB
- How do joins in SQL allow us to get rich and complex data from a RDBMS
Understand the Process
- What questions do we need to ask when reviewing the data used in a data sci project?
- Be able to talk about how tht example projects we went over in class were able to use raw data
E.g. a raw google street image picture was used to create new data, which was then analyzed.