[Notes] Meta Database Engineer Professional Certificate - part 1
Table of contents:
Introduction:
Here i’ll be writing down notes about my learning journey through this certification made by Meta about ‘Database engineering’
Module 1: Introduction to Databases
Databases and Data
→ What is a Database ?
A database is form of electronic storage in which data is stored systematically
- systematic data:
- Identifiable features or attributes (eg: age, height, hair color…)
- Stored in entities (entity is relational elements like a table )
- entities can be physical representations like employee, a customer, or product. Or they could be conceptual like order, an invoice, or a quotation
- type of databases:
- relational database : data stored in form of entities/tables related to each other (sql db)
- Object oriented database : data stored in form of objects
- Graph databases : data stored in form of nodes
- Documents databases : data stored in from of JSON objects (non-sql db)
→ How is Data related ?
In a database, each table contains a unique field that cannot be replicated in other rows within the table. This unique field serves as a PRIMARY KEY, acting as a distinct identifier for the entity set.
Utilizing this primary key, we establish connections between tables. For instance, TABLE A can be linked to TABLE B using A’s primary key, where Table B contains a field with the Primary Key of Table A as a FOREIGN KEY.
+---------+------------------+---------+
| Table A | | Table B |
+---------+------------------+---------+
| | | |
| PK: id | | PK: id |
| | | FK: idA |
| | | |
| 1 | | 1 |
| 2 | | 1 |
| 3 | ← 1 to Many → | 2 |
| 4 | | 2 |
| | | 3 |
+---------+------------------+---------+
→ Relational data example charts
We store many types of data in our databases from various sourses, in order to give clear idea about this data, we have to process & present it in a clear way. for this purpopse we can use charts to read data :
-
Bar Chart:
A bar chart is a graph that presents categorical data with rectangular bars, where the heights of the bars are proportional to the values that they represent. -
Bubble Chart:
A Bubble chart is another popular type of data chart. It shows how different values compare to each other in terms of bubble size. The smaller bubbles represent smaller values, and the larger bubbles represent larger values. -
Line chart:
A line chart presents information as a series of data points called “markers” connected by straight line segments. Line charts are extremely popular and are widely used in most data analytics fields. -
Pie chart:
A pie chart is another type of data chart that displays how various data make up a whole of 100 percent. In this type of chart, each data point is allocated a “slice” of the pie according to its value. -
Dual axis charts:
Dual axis charts display two sets of data on the same graph with two separate y-axes, allowing for easy comparison of related variables. They are useful for visualizing correlations or trends between two different datasets. -
Gantt charts:
Gantt charts are a type of bar chart used in project management to illustrate a project schedule. They display tasks or activities along a horizontal timeline, with each task represented as a bar. The length of the bar corresponds to the duration of the task, and the position of the bar indicates when the task starts and ends.Gantt charts are a powerful tool for project managers to visualize project schedules, allocate resources, and monitor project progress.
-
Heat maps:
Visual representations of data where values are depicted using colors, typically ranging from low (e.g., blue) to high (e.g., red) intensity, allowing quick visualization of patterns, trends, and variations across a dataset.Heat maps visually represent complex data sets using colors to indicate values, making it easy to identify patterns, trends, and anomalies at a glance. They help in quickly identifying areas of interest and making data-driven decisions.
Example: In website analytics, a heat map can be used to visualize user interaction with a webpage. By tracking where users click or spend the most time, the heat map can reveal which areas of the webpage are most engaging or which elements are being ignored. This information can then be used to optimize the layout and design of the webpage to improve user experience and engagement.
-
Scatter plot charts:
Scatter Plot Chart: A type of data visualization that displays individual data points as dots on a two-dimensional graph, where each axis represents a different variable. Scatter plots are useful for visually identifying relationships, trends, and patterns between variables, such as correlation or clustering.
-
What chart do I choose to present my data?
Different charts server multiple purposes, for example Line chart is usefull to identify trends and helpt predict the future, while Pie charts can explain who multiple part create the whole. The Gantt for intance can be very usefull to track the progress of a project or ay continious tasks over time.In general, choosing the right chart depends on :
- the target audiance who will consume the information
- the idea/theory you trying to present/prove
- your goal from visualizing the data
- …
→ Alternative types of databases
- Relational databases primarily store structured data but are limited in accommodating unstructured data.
- The increasing demand for storing unstructured data has led to a trend towards NoSQL databases.
- NoSQL databases provide a flexible structure, allowing storage of data in various formats.
- This flexibility simplifies scaling and enables modifications to the database structure without the complexities of traditional data models.
- Big data and cloud databases represent recent changes in data and database management.
- Big data encompasses complex, exponentially growing data from sources like social media, e-commerce, and IoT devices.
- This data is often unstructured or semi-structured, posing challenges for traditional relational databases designed for structured data.
- Big data combines structured, semi-structured, and unstructured data from diverse sources, offering insights for decision-making.
- Industries leverage big data analytics for predictive maintenance, customer demand forecasting, and process optimization.
- Cloud databases enable organizations to shift data storage and management to reduce costs and simplify maintenance.
- Business intelligence (BI) technologies extract valuable insights from data for informed decision-making.
- Ongoing advancements in database technology drive innovation in data management and analytics capabilities.
→ Database Evolution
(1970s-1990s) - Flat files, hierarchical and network
(1980s-present) - Relational
(1990s- present) - Object-oriented, object-relational, web-enabled
-
Flat Files:
- Flat file databases, prevalent from the 1970s to the 1990s, store data in a single file or table. These text-based systems organize data in records, with fields separated by fixed lengths or delimiters like commas. Unlike relational databases, flat files cannot contain multiple tables.
-
Hierarchical database systems:
-
Hierarchical database systems, prevalent during a specific era, organize data in a hierarchical structure, resembling a parent-child relationship where one entity can have multiple children but each child has only one parent. This model is exemplified by its one-to-many relationship, with all attributes of a specific record listed under an entity type.
-
Example:
-
Consider a hierarchical database storing data on college students and the courses they are taking. In this model, a course can be assigned to only one student, but a student can enroll in multiple courses, establishing a one-to-many relationship.
- Students: John, Anil, and Rohan
- Courses: C#, Perl, Python, and Java
-
John is enrolled in the C# course, Anil in both Python and Java, while Rohan is taking Perl.
-
Entity Types:
- Student:** John, Anil, Rohan
- Course: C#, Perl, Python, Java
-
-
Diagram:
-
-
Network Database:
Unlike the hierarchical database model, a network database allows multiple parent and child relationships.
A network database has a graph-like structure, and it allows you to represent more complex relationships among data. -
Relational Database System:
Data stored in tables. Columns hold attributes of this data. Each record has a unique ID called Primary Key. A relational DB provides access to data related to each other using Foreign Key -
Object-Oriented Databases:
-
Introduction: Object-oriented databases (OODBs) store data as objects, departing from the tabular structure of relational databases to accommodate complex data structures.
-
Data Structure: Data in OODBs is organized into objects, encapsulating both data and functions. Objects are instances of classes, defining the structure and behavior of the data.
-
Encapsulation: Encapsulation bundles data and functions within objects, facilitating data abstraction and modularity.
-
Inheritance: OODBs support inheritance, allowing objects to inherit properties and behaviors from other objects, enabling code reuse and hierarchical organization of data.
-
Polymorphism: OODBs facilitate polymorphism, treating objects as instances of their parent classes for flexibility and extensibility.
-
Example: OODBs define classes like “Person” and “Employee”, where “Employee” inherits from “Person”. Objects represent individual entities with associated attributes and behaviors.
-
-
NoSQL Databses:
Relational databases store structured data but struggle with unstructured data. NoSQL databases emerged to handle unstructured data efficiently, driven by the demands of the internet for speed and flexibility.-
Advantages of NoSQL:
- Higher scalability
- Distributed architecture
- Lower costs
- Flexible schema
- Ability to process unstructured and semi-structured data
- No complex relationships
-
Use Cases: NoSQL databases are favored by social media platforms (e.g., Twitter, LinkedIn, Facebook) and tech giants like Google due to their capability to handle unstructured big data.
-
Types of NoSQL Databases:
-
Document Databases: Store data in documents resembling JSON objects, offering flexibility in data types.
-
Key-Value Databases: Simple databases storing items as keys and values.
-
Wide-Column Databases: Store data in tables, rows, and dynamic columns.
-
Graph Databases: Represent data as nodes and edges, suitable for managing complex relationships between entities. …
-
Intro to SQL
→ What is Structured Query Language ?
- Standard Language that can interact with structured data in databases.
- How SQL works ?
- a DBMS (database management system) takes the SQL query (that consists of CRUD operation), and translate it into a form of instructions understood by the underlying database.
→ SQL usage
SQL subsets:
- DDL Data definition language :
- create command : create storage objects in DB like tables
- alter command : update created table like adding new column.
- drop command : remove table from DB
- DML Data manipulation language : used to inserting, update, delete data, most CRUD operation fall under DML
- DQL Data query language : used to read data from database
- select command : lets you retrieve data from one or multiple tables, letting you specify data fields you want based on preferred criteria.
- DCL Data control language : let you controll the access to a database, with DCL you grant or revoke users access to data in database.
→ SQL syntax introduction
- create a database using Data Definition Language (DDL)
- Populating tables with data using INSERT INTO syntax
- Updating or modifying data using UPDATE syntax
- Deleting data using DELETE syntax
- Querying data within databases using Data Query Language (DQL) subset
- Reading data from database tables using SELECT statement
- Populating and modifying data in a database using Data Manipulation Language (DML) subset
→ Common SQL Commands
Data Definition Language (DDL)
- create new table in the database:
CREATE TABLE TABLE_NAME ( column_name1 datatype(SIZE), column_name2 datatype(SIZE), column_name3 datatype(SIZE) );
- delete table
DROP TABLE table_name;
- change the structure of the tables in the database such as changing the name of a table, adding a primary key to a table, or adding or deleting a column in a table.
--Syntax to add a column into a table: ALTER TABLE table_name ADD ( column_name datatype(size) ); -- Syntax to add a primary key to a table: ALTER TABLE table_name ADD primary key (column_name);
- remove all records from a table without drop the table itself.
TRUNCATE TABLE table_name;
Data Query Language (DQL)
- retrieve data from table:
SELECT * FROM table_name;
Data Manipulation Language (DML)
- Add records of data into an existing table:
--Syntax to insert data into three columns in a table: INSERT INTO table_name (column1, column2, column3) VALUES (value1, value2, value3);
- Purpose: To modify or update data contained within a table in the database.
--Syntax to update data in two columns: UPDATE table_name SET column1 = value1, column2 = value2 WHERE condition;
- Purpose: To delete data from a table in the database.
--Syntax to delete data DELETE FROM table_name WHERE condition;
Data Control Language (DCL)
- GRANT Command to provide the user of the database with the privileges required to allow users to access and manipulate the database.
-- Grant SELECT and INSERT privileges on the 'Students' table to the 'student_user' user GRANT SELECT, INSERT ON Students TO student_user; -- Grant ALL privileges on the 'Courses' table to the 'admin_user' user GRANT ALL ON Courses TO admin_user; -- Grant SELECT privilege on the 'Faculty' table to the 'researcher_role' role GRANT SELECT ON Faculty TO researcher_role;
- REVOKE Command to remove permissions from any user.
-- Revoke INSERT privilege on the 'Students' table from the 'student_user' user REVOKE INSERT ON Students FROM student_user; -- Revoke ALL privileges on the 'Courses' table from the 'admin_user' user REVOKE ALL ON Courses FROM admin_user; -- Revoke SELECT privilege on the 'Faculty' table from the 'researcher_role' role REVOKE SELECT ON Faculty FROM researcher_role;
Transaction Control Language (TCL):
-
COMMIT Command to save all the work you have already done in the database.
-- Begin a transaction START TRANSACTION; -- Update some data UPDATE Students SET FirstName = 'John' WHERE StudentID = 123; -- Insert new data INSERT INTO Students (StudentID, FirstName, LastName) VALUES (456, 'Jane', 'Doe'); -- Commit the transaction to save the changes COMMIT;
-
ROLLBACK Command to restore a database to the last committed state.
-- Begin a transaction START TRANSACTION; -- Update some data UPDATE Students SET FirstName = 'John' WHERE StudentID = 123; -- Insert new data INSERT INTO Students (StudentID, FirstName, LastName) VALUES (456, 'Jane', 'Doe'); -- Something went wrong, rollback the transaction to undo the changes ROLLBACK;