Working With Repeated Information
Repeated information, groups, and data structures (are all the same thing)
Many variables are 1 to 1. Each person has one birthdate, one full name, etc. But right away we run into information where one person has more than one of the same kind of thing. For example: each person might have more than one phone number; child; and more.
One way to handle that could be to make a bunch of variables: child1, child2, etc. That gets messy fast! In computer programming, we can replace a bunch of variables with one variable that stores that repeated information, giving it a single variable name like children. Those special variables are called data structures in computer science. Docassemble calls them "groups". The most common data structures you will run into in Docassemble are lists, dictionaries, and sets.
Introduction to lists
The first kind of data structure you should learn about is called a list. Lists can store any kind of repeated information: numbers, text, objects, or even other lists. Lists are similar to arrays in other computer programming languages.
Here's a portion of a real paper intake form from Greater Boston Legal Services:
Right away, the first thing you might notice is that our form can only handle up to 9 people. That probably is plenty for most households, but not every household (I come from a family of 11!).
If we wanted to model this intake form in Docassemble, we might start out by using a list named household_members.
Below is a short Python program that demonstrates two ways to handle a list of children: as separate variables, and as one list.
Click the "run" button to run the code sample. After you have run it once, change the value of use_variables to False and run it again.
For easy reference, here is the full code:
use_variables = True
child1 = "James"
child2 = "Alice"
child3 = "Richard"
children = ["Alexandra","Robert","Lisa"]
if use_variables:
print(child1)
print(child2)
print(child3)
elif not use_variables:
for child in children:
print(child)
We can see that using several variables and using a list can get us to the same outcome. But using the list is more flexible: we can keep track of many items without having to create a variable name for each item in advance.
Accessing items in a list
Once we have created a list, we can access the items in it like this:
children = ["Jenny","Shakira"]
children[0]
Try copying and running the code above (one line at a time) in the interactive Python console below:
The number in the square brackets is called an index. In Python, the first item in a list has an index of 0, not 1.
Inside the computer, our list is stored like a spreadsheet:
| Index | Value |
|---|---|
| 0 | Jenny |
| 1 | Shakira |
| 2 | Beyonce |
We read down to the row (index) we want to find the value stored in that row.
After running the code above, try typing children[1]. What value does it have? What happens if we try to access children[2]?
Changing the value of a list
Changing the value of an item in a list is the same as changing the value of a variable. We use the item's index to say which item we want to change:
children[1] = "Sean"
You can add items to a list by using .append(), like this:
Try adding a new name to the list.
Here's the code for reference:
children = ["Alexandra","Robert","Lisa"]
print(children)
children.append("Miles")
print(children)
Deleting an item in a list
You can delete an item in a list two different ways: by value, or by index.
By value, using .remove():
shopping_list = ["Eggs","Milk","Cereal"]
shopping_list.remove("Milk")
print(shopping_list)
By index, using .pop():
shopping_list = ["Eggs","Milk","Cereal"]
shopping_list.pop(1)
print(shopping_list)
for loops
In the first example, we used a for loop to print the name of each child. Let's take a closer look at that now.
children = ["Alexandra","Robert","Lisa"]
for child in children:
print(child)
When you use a for loop like the one above, Python will run the same series of actions for each item in the list. The variable child is a temporary variable. Each time the loop runs, the value changes.
- On the first loop,
childis equal to "Alexandra" - On the second loop,
childis equal to "Robert" - On the third loop,
childis equal to "Lisa"
The DAList class in Docassemble
When you use a list in Docassemble and want it to handle collecting items for you, you will create an object of class DAList instead of creating it using my_list = []. Once you've done that, you can access/modify, and otherwise work with the items the same way you do in Python.
As a reminder, we use the objects block to create a Docassemble object. Here's an interview that creates a DAList:
objects:
- my_list: DAList
You can read Docassemble's documentation about lists.
More explorations of lists
- https://www.w3schools.com/python/python_lists.asp
- https://teachcomputerscience.com/gcse-python/lists/
Dictionaries
Like a list, a dictionary is a data structure that can store repeated information. The main difference is that in a list, the index is numeric. In a dictionary, the index is a keyword. Dictionaries are similar to associative arrays or hashtables in other computer programming languages.
If a list is a good way to store a unknown number of items, a dictionary is a good way to store an unknown number of items that match exactly one category.
Sticking with our intake analogy, let's think about an intake that asks someone to report all of their expenses. We know everyone has some expenses, and we want to be able to work with each expense separately. For example:
- Rent
- Food
- Utilities
- Credit card payments
- Student loan payments
- Medical bills
Here's a small example of a Python dictionary:
Run the code sample above. Notice that when we use a for loop on this dictionary, on each loop the variable gets the keyword, or index of the dictionary.
A python dictionary is created with curly braces, like this: {}. Each entry is a keyword, followed by a : and then a value. Commas separate multiple pairs of key/value.
Accessing items in a dictionary
Once we have created a dictionary, we can access items in it like this:
expenses = {
"rent": 1000,
"utilities": 300,
"food": 400,
"credit cards": 120,
"student loans": 1000,
"medical bills": 200
}
print(expenses['rent'])
This is very similar to how we access an item in a list. A dictionary is a lot like a spreadsheet where each row has a unique name instead of a number. The difference is that the index can be a descriptive word (or even a sentence) instead of a number. A dictionary item's index is called a key.
| Key | Value |
|---|---|
| rent | 1000 |
| utilities | 300 |
| food | 400 |
For a quick example of why using a dictionary is powerful, let's introduce the sum() function. Try running the code sample below.
Try changing some of the numbers and see how the value that is printed out changes at the same time.
In the example above, we use the .values() method of a dictionary to get all of the values as one list. Then we used sum() to add all of the numbers together. Storing items in a dictionary lets us label them, while still working on them as a group. It gives us a little more context than a list, where we only would know that one expense was the first, second, third, and so on.
Adding a new item to a dictionary
You can add a new item to a dictionary simply by referencing a key that you haven't used yet. Referencing an existing key will change the value stored at that key.
Run the code sample below. Try adding an expense for automobile insurance.
Docassemble's DADict class
When you use a dictionary in Docassemble and want it to handle collecting items for you, you will create an object of class DADict instead of creating it using my_dict = {}. Once you've done that, you can access/modify, and otherwise work with the items the same way you do in Python: you can refer to my_dadict["keyword"] and use the method .update() to combine two DA dictionaries.
As a reminder, we use the objects block to create a Docassemble object. Here's an interview that creates a DAList:
objects:
- my_dict: DADict
You can read Docassemble's documentation about dictionaries.
More explorations of dictionaries
Introduction to sets
Sets come from the world of mathematics: think Venn diagrams. In a set, each item can only appear exactly once. No duplicates allowed. Items in a set don't have an index or a key, unlike items stored in dictionaries and lists.
Suppose you go on a bird watch. You want to know how many different species of birds you see. If you store the names of each bird in a set, each bird species appears only once, so you don't have to worry about duplicates.
Run the code sample below.
Notice that when we stored the bird names in a list, we had duplicates. When we use a set instead, each bird species only appears one time. Try adding a new bird species to both the set and to the list and see what happens when you run it again.
An alternative Python way to create a set is with curly braces, like the example below:
birds = {"Blue jay","Pileated woodpecker","Ivory billed woodpecker"}
Accessing items in a set
An item is in a set, or not. It doesn't have an index or a key. The items don't have any order to them. You can use standard mathematical operations on a set, such as Union, Intersection, Difference, etc. You can use a for loop to work on the whole list at once.
birds = {"Blue jay","Pileated woodpecker","Ivory billed woodpecker"}
for bird in birds:
print(bird)
See the explorations below for more about operations on sets, including use of the union and intersection operators.
Turning lists into sets
Sometimes, you may want to collect everything in a list, and then turn it into a set later, so you can keep track just of the unique values.
You can do that with set(). Try running the code sample below:
Docassemble's DASet class
When you use a dictionary in Docassemble and want it to handle collecting items for you, you will create an object of class DASet. Once you've done that, you can work with the items the same way you do in Python. Use the method .update() to combine two DASets.
As a reminder, we use the objects block to create a Docassemble object. Here's an interview that creates a DASet:
objects:
- my_set: DASet
You can read Docassemble's documentation about sets.