Python集合使用

更新日期: 2025-10-26 分享

在Python编程中，集合是一种有用的数据类型。集合就像数学中的集合一样，可以存储不重复的元素，并且支持各种集合运算。

什么是集合

集合是一个无序的、不重复的元素序列。集合中的元素不会重复，如果尝试添加重复元素，集合会自动去重。

创建集合有两种方法：

# 方法1：使用大括号
fruits = {'苹果', '香蕉', '橙子', '苹果'}  # 重复的'苹果'会被自动去除

# 方法2：使用set()函数
numbers = set([1, 2, 3, 2, 1])  # 从列表创建，自动去重

print(fruits)   # 输出：{'苹果', '香蕉', '橙子'}
print(numbers)  # 输出：{1, 2, 3}

创建空集合时要注意：

# 创建空集合的正确方法
empty_set = set()
print(type(empty_set))  # 输出：<class 'set'>

# 错误的方法（这会创建字典）
not_a_set = {}
print(type(not_a_set))  # 输出：<class 'dict'>

集合的基本操作

添加元素

向集合添加元素有两种方法：

colors = {'红色', '绿色'}

# 使用add()添加单个元素
colors.add('蓝色')
print(colors)  # 输出：{'红色', '绿色', '蓝色'}

# 使用update()添加多个元素
colors.update(['黄色', '紫色'])
colors.update({'粉色', '橙色'})
print(colors)  # 输出：{'红色', '绿色', '蓝色', '黄色', '紫色', '粉色', '橙色'}

移除元素

移除元素也有几种方法：

fruits = {'苹果', '香蕉', '橙子', '葡萄'}

# 使用remove() - 元素不存在会报错
fruits.remove('香蕉')
print(fruits)  # 输出：{'苹果', '橙子', '葡萄'}

# 使用discard() - 元素不存在不会报错
fruits.discard('西瓜')  # '西瓜'不存在，但不会报错

# 使用pop()随机移除一个元素
removed_fruit = fruits.pop()
print(f"移除了：{removed_fruit}")

集合运算

集合支持多种数学运算：

set_a = {1, 2, 3, 4, 5}
set_b = {4, 5, 6, 7, 8}

# 并集 - 包含两个集合的所有元素
union_set = set_a | set_b
print(union_set)  # 输出：{1, 2, 3, 4, 5, 6, 7, 8}

# 交集 - 两个集合都有的元素
intersection_set = set_a & set_b
print(intersection_set)  # 输出：{4, 5}

# 差集 - 只在第一个集合中的元素
difference_set = set_a - set_b
print(difference_set)  # 输出：{1, 2, 3}

# 对称差集 - 不同时属于两个集合的元素
symmetric_difference = set_a ^ set_b
print(symmetric_difference)  # 输出：{1, 2, 3, 6, 7, 8}

集合的常用方法

检查元素是否存在

vowels = {'a', 'e', 'i', 'o', 'u'}

# 检查元素是否在集合中
print('a' in vowels)    # 输出：True
print('x' in vowels)    # 输出：False
print('e' not in vowels) # 输出：False

集合比较

set1 = {1, 2, 3}
set2 = {1, 2, 3, 4, 5}
set3 = {4, 5, 6}

# 检查子集
print(set1.issubset(set2))  # 输出：True

# 检查超集
print(set2.issuperset(set1))  # 输出：True

# 检查是否有共同元素
print(set1.isdisjoint(set3))  # 输出：True（没有共同元素）

其他实用方法

numbers = {1, 2, 3, 4, 5}

# 集合长度
print(len(numbers))  # 输出：5

# 复制集合
numbers_copy = numbers.copy()
print(numbers_copy)  # 输出：{1, 2, 3, 4, 5}

# 清空集合
numbers.clear()
print(numbers)  # 输出：set()

集合的实际应用

数据去重

集合最常用的场景就是去除重复数据：

# 去除列表中的重复元素
names = ['张三', '李四', '王五', '张三', '李四']
unique_names = list(set(names))
print(unique_names)  # 输出：['张三', '李四', '王五']

# 统计不重复的单词
text = "苹果 香蕉 苹果 橙子 香蕉 葡萄"
words = text.split()
unique_words = set(words)
print(f"不重复的水果有 {len(unique_words)} 种：{unique_words}")

关系测试

# 学生选课情况
math_students = {'张三', '李四', '王五'}
physics_students = {'李四', '王五', '赵六'}

# 同时选修数学和物理的学生
both = math_students & physics_students
print(f"同时选修两门课的学生：{both}")

# 只选修数学的学生
only_math = math_students - physics_students
print(f"只选修数学的学生：{only_math}")

# 选修至少一门课的学生
at_least_one = math_students | physics_students
print(f"选修至少一门课的学生：{at_least_one}")

集合推导式

像列表推导式一样，集合也支持推导式：

# 创建平方数集合
squares = {x**2 for x in range(1, 6)}
print(squares)  # 输出：{1, 4, 9, 16, 25}

# 过滤偶数
numbers = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
evens = {x for x in numbers if x % 2 == 0}
print(evens)  # 输出：{2, 4, 6, 8, 10}

集合与列表、元组的区别

理解集合与其他数据类型的区别很重要：

# 列表 - 有序，可重复，可修改
list_example = [1, 2, 2, 3, 3, 3]

# 元组 - 有序，可重复，不可修改
tuple_example = (1, 2, 2, 3, 3, 3)

# 集合 - 无序，不重复，可修改
set_example = {1, 2, 2, 3, 3, 3}  # 实际存储：{1, 2, 3}

print(f"列表：{list_example}")
print(f"元组：{tuple_example}")
print(f"集合：{set_example}")

实际项目示例

用户标签系统

class TagSystem:
    def __init__(self):
        self.user_tags = {}
    
    def add_tag(self, user_id, tag):
        if user_id not in self.user_tags:
            self.user_tags[user_id] = set()
        self.user_tags[user_id].add(tag)
    
    def remove_tag(self, user_id, tag):
        if user_id in self.user_tags:
            self.user_tags[user_id].discard(tag)
    
    def common_tags(self, user1, user2):
        if user1 in self.user_tags and user2 in self.user_tags:
            return self.user_tags[user1] & self.user_tags[user2]
        return set()

# 使用示例
system = TagSystem()
system.add_tag('user1', '科技')
system.add_tag('user1', '编程')
system.add_tag('user2', '科技')
system.add_tag('user2', '音乐')

common = system.common_tags('user1', 'user2')
print(f"共同标签：{common}")  # 输出：共同标签：{'科技'}

数据清洗工具

def clean_data(data_list):
    """
    清洗数据，去除重复项
    """
    # 去除完全重复的数据
    unique_data = list(set(data_list))
    
    # 也可以根据某个字段去重（比如根据姓名）
    # 这里假设data_list是字符串列表
    return unique_data

# 测试数据清洗
raw_data = ['张三', '李四', '王五', '张三', '李四', '赵六']
cleaned_data = clean_data(raw_data)
print(f"原始数据：{raw_data}")
print(f"清洗后数据：{cleaned_data}")

集合内置方法完整列表

方法	描述
add()	为集合添加元素
clear()	移除集合中的所有元素
copy()	拷贝一个集合
difference()	返回多个集合的差集
difference_update()	移除集合中的元素，该元素在指定的集合也存在。
discard()	删除集合中指定的元素
intersection()	返回集合的交集
intersection_update()	返回集合的交集。
isdisjoint()	判断两个集合是否包含相同的元素，如果没有返回 True，否则返回 False。
issubset()	判断指定集合是否为该方法参数集合的子集。
issuperset()	判断该方法的参数集合是否为指定集合的子集
pop()	随机移除元素
remove()	移除指定元素
symmetric_difference()	返回两个集合中不重复的元素集合。
symmetric_difference_update()	移除当前集合中在另外一个指定集合相同的元素，并将另外一个指定集合中不同的元素插入到当前集合中。
union()	返回两个集合的并集
update()	给集合添加元素
len()	计算集合元素个数

学习建议

要熟练掌握集合操作，建议：

理解集合的无序性和不重复性
多练习集合运算（并集、交集、差集）
在需要去重的场景中使用集合
注意集合与列表、字典的区别

记住，集合是Python中处理唯一性数据的强大工具。虽然不如列表常用，但在特定场景下非常高效。随着编程经验的积累，你会发现在数据去重、关系测试等任务中，集合能提供简洁高效的解决方案。

本文内容仅供个人学习/研究/参考使用，不构成任何决策建议或专业指导。分享/转载时请标明原文来源，同时请勿将内容用于商业售卖、虚假宣传等非学习用途哦～感谢您的理解与支持！

链接: https://fly63.com/course/36_2087

<< Python字典使用 Python条件控制详解 >>