To remove duplicates from a list in Python, you can:
Each method will be appropriate for each different case. Some methods can remove a duplicate list from a list or keep the order, while others can't.
Keep reading to see which one fits your case.
Remove Duplicates From a List Using a Set
A set is a collection of unique elements. Converting a list into a set will remove all elements that are duplicated, i.e. appear the second time or more.
names = ["Minh", "Musk", "Olon", "Durian", "Minh"] unique_names = list(set(names)) print(unique_names) # ['Musk', 'Durian', 'Minh', 'Olon']
Remember that using this way can change the order of the elements in the list. If you want to keep the order, use the next method.
Remove Duplicates From a List Using dict.fromkeys()
dict.fromkeys()
is a method that creates a new dictionary from a list of keys. The keys are unique, so it will remove all duplicates.
names = ["Minh", "Musk", "Olon", "Durian", "Minh"] unique_names = list(dict.fromkeys(names)) print(unique_names) # ['Minh', 'Musk', 'Olon', 'Durian']
This method is fast and keeps the order of the elements. However, it can only be used for lists of strings or integers. If you have a list of lists, you can't use this method.
Remove Duplicates From a List Using a New List
This method is the most straightforward. You create a new list and add elements to it if they are not already in the list.
names = ["Minh", "Musk", "Olon", "Durian", "Minh"] unique_names = [] for name in names: if name not in unique_names: unique_names.append(name) print(unique_names) # ['Minh', 'Musk', 'Olon', 'Durian']
This method is not efficient. It has a time complexity of O(n^2) because of the for loop and the in
operator. If you have a large list, it will take a long time to run.
However, it can also be used for different data types inside the list. For example, you can also remove duplicate lists from a list using this method.
names = [["Minh", 1], ["Musk", 2], ["Olon", 3], ["Durian", 4], ["Minh", 1]] unique_names = [] for name in names: if name not in unique_names: unique_names.append(name) print(unique_names) # [['Minh', 1], ['Musk', 2], ['Olon', 3], ['Durian', 4]]
Remove Duplicates From a List Using itertools.groupby()
itertools.groupby()
is a function that groups elements in an iterable. It groups elements based on a key function. If the key function returns the same value for two elements, they will be in the same group.
Also, the list must be sorted so that all duplicates are next to each other.
import itertools names = ["Minh", "Musk", "Olon", "Durian", "Minh"] unique_names = [k for k, g in itertools.groupby(sorted(names))] print(unique_names) # ['Durian', 'Minh', 'Musk', 'Olon']
This method is also easy to implement. It can also handle different data types in the list.
import itertools names = [["Minh", 1], ["Musk", 2], ["Olon", 3], ["Durian", 4], ["Minh", 1]] unique_names = [k for k, g in itertools.groupby(sorted(names))] print(unique_names) # [['Durian', 4], ['Minh', 1], ['Musk', 2], ['Olon', 3]]
But as I mentioned, the order of elements is not kept so use it with caution.
Conclusion
To recap, there are 4 ways to remove duplicates from a list in Python:
- Use a set:
list(set(names))
(fastest, order ignored) - Use
dict.fromkeys()
:list(dict.fromkeys(names))
(fast, order kept) - Create a new list (slow, order kept, can be used for different data types in list)
- Use
itertools.groupby()
:[k for k, g in itertools.groupby(sorted(names))]
(fast, order ignored)
Each method is a fit for different cases. Pick one that fits your case the most.
If you have any questions, feel free to leave a comment below.