The early applications of Visual Simultaneous Localization and Mapping (VSLAM) technology were primarily focused on static environments, relying on the static nature of the environment for map construction and localization. However, in practical applications, we often encounter various dynamic environments, such as city streets, where moving objects are present. These dynamic objects can make it challenging for robots to accurately understand their own position. This paper proposes a real-time localization and mapping method tailored for dynamic environments to effectively deal with the interference of moving objects in such settings. Firstly, depth images are clustered, and they are subdivided into sub-point clouds to obtain clearer local information. Secondly, when processing regular frames, we fully exploit the structural invariance of static sub-point clouds and their relative relationships. Among these, the concept of the sub-point cloud is introduced as novel idea in this paper. By utilizing the results computed based on sub-poses, we can effectively quantify the disparities between regular frames and reference frames. This enables us to accurately detect dynamic areas within the regular frames. Furthermore, by refining the dynamic areas of keyframes using historical observation data, the robustness of the system is further enhanced. We conducted comprehensive experimental evaluations on challenging dynamic sequences from the TUM dataset and compared our approach with state-of-the-art dynamic VSLAM systems. The experimental results demonstrate that our method significantly enhances the accuracy and robustness of pose estimation. Additionally, we validated the effectiveness of the system in dynamic environments through real-world scenario tests.