Inner Join
- SQL is a relational database
- The relation part refers to relationships between tables
- Think of tables as classes where the rows are the instance data, and the rows are instances
- The attributes we have seen so far have been primitive (int, float, str, …)
- Just like it is possible to have instance data that refer to other objects of other classes
- It is possible to have tables that refer to other tables
To create a query with more than on table, use INNER JOIN
SELECT * FROM Table1 INNER JOIN Table2;
- This creates the Cartesian produce of the two tables
- Every row in Table1 is joined with a all the rows in Table2
Note, inner join is so common, you can leave the INNER part out
SELECT * FROM Table1 INNER JOIN Table2;
- So what is the relationship?
- It related every row to every other row, which is not very useful
A more useful relationship relates the values in Table1 to the values in Table2
SELECT * FROM Table1 INNER JOIN Table2 ON id = id;
- Which produces an error, because roleID is two different tables
- SQL is not sure which roleID it refers to
To fix this, use the table name in the column reference
SELECT * FROM Table1 INNER JOIN Table2 ON Table1.id = Table2.id;
- Note that the order of the tables does not matter
It just changes the order of the columns in the retrieved table
SELECT * FROM Table2 INNER JOIN Table1 ON Table1.id = Table2.id;
- This works a lot like the WHERE clause, it filters the rows to only include the ones that meet the condition
The columns that are used for the join often, do not have another purpose and are left out of the retrieved table
SELECT name, value FROM Table1 INNER JOIN Table2 ON (id);
Can also use the table names in the column selection to make the query more readable
SELECT Table1.name, Table2.value FROM Table1 INNER JOIN Table2 ON (id);
- Note that naming columns in two different columns the same thing is common
- It makes it clear that the two columns form a relationship
It is so common that there is a syntactic convenience
SELECT * FROM Table1 INNER JOIN Table2 USING (id);
- Note, that this doesn’t retrieve the joining column twice like ON does
- Two tables are often used to implement the equivalent of an enum
- That is restricting values that a column can have by using a second table that contains the possible values
- This can save memory if the values are text
- It makes changing an enum value easy, because it only exists once
- And it can prevent database errors
If the enum value only exists in one place then there can’t be two different versions (think typo on entry)
Keys
- Note that it is possible for multiple rows in Table1 to have the same value for id
- And this makes sense, when thinking about it like an enum
- Multiple objects can have the same value
- It is also possible for there to be multiple rows in the Table2 that have the same value for id
- This does not make sense when thinking about it like an enum
- That would mean it is possible for a value to have two different meanings
To see what happens, add a row with a duplicate id
INSERT INTO Table2 VALUES(id, 'value'); SELECT * FROM Table1 INNER JOIN Table2 USING (id);
- The row gets duplicated with the two different values
To remove the row
DELETE FROM Table2 value = 'value';
- To prevent this, columns in a table can be designated as a primary key or a foreign key
- A primary key can not have a duplicate value in a table
- A foreign key refers to a primary key in another table and can be duplicated in the table
- When you describe the table from the textbook, columns labeled as primary key or foreign key
And if you try to drop a table that has a primary key that is another table’s foreign key, you get an error
Alias Tables
Just like columns can be named, table can be named
SELECT * FROM Table1 AS T1 INNER JOIN Table2 AS T2 ON T1.id = T2.id;
What’s a little weird is that the ALIAS can also be used before the AS
SELECT T1.name, T2.value FROM Table1 AS T1 INNER JOIN Table2 AS T2 ON T1.id = T2.id;
- And that the new name is now mandatory
This will produce an error
SELECT Table1.name, T2.value FROM Table1 AS T1 INNER JOIN Table2 AS T2 ON T1.id = T2.id;
- But T1 and T2 are actually worse names
- If table is created correctly everything already has good names, so why bother?
Could be necessary with more complex joins
Embedding Select
Can filter with a join as normal
SELECT * FROM Table1 INNER JOIN Table2 ON Table1.id = Table2.id WHERE name = 'something';
- Since what is retrieved from the database is a table, and a JOIN has tables as operands, it is possible to use a select as an operand
This can be more efficient because it filters before joining, which reduces the number of rows in the Cartesian product
SELECT * FROM (SELECT * FROM Table1 WHERE name = 'something') INNER JOIN Table2 ON Table1.id = Tabl2.id;
- But this produces an error ‘Every derived table must have its own alias’
So add an alias
SELECT * FROM (SELECT * FROM Table1 WHERE name = 'something') AS P INNER JOIN ON Table1.id = Tabl2.id;
And this produces another error because it doesn’t know what Table1 is anymore, so use the alias
SELECT * FROM (SELECT * FROM Table1 WHERE name = 'something') AS T1 INNER JOIN Table2 ON T1.id = Tabl2.id;
- Note that the alias is required even if using USING
This is an error
SELECT * FROM (SELECT * FROM Table1 WHERE name = 'something') INNER JOIN USING id;
This works even though the alias is not used
SELECT * FROM (SELECT * FROM Table1 WHERE name = 'something') AS T1 INNER JOIN USING id;
- Now that we’ve got queries inside of queries, they’re getting a little too long and difficult to read
Can use line wrapping and indentation to make it easier
SELECT * FROM (SELECT * FROM Table1 WHERE name = 'something') AS T1 INNER JOIN Table2 ON T1.id = Tabl2.id;
- Each part of the query goes on its own line
- Sub-queries get indented
Emacs can handle indenting these, so writing these larger queries is easier in a text editor than on the command line
Joining Joins
Can even use queries that have joins as subqueries to another join
SELECT * FROM (SELECT * FROM Table1 INNER JOIN Table2 USING id) AS T12 INNER JOIN Table3 USING value;
This is such a common pattern that there is a convenience syntax
SELECT * FROM Table1 INNER JOIN Table2 USING id INNER JOIN Table3 USING value;
Added bonus, don’t need to name the subquery