Seg-LSTM: Performance of xLSTM for Semantic Segmentation of Remotely Sensed Images

#1 Seg-LSTM: Performance of xLSTM for Semantic Segmentation of Remotely Sensed Images [PDF⁹] [Copy] [Kimi¹] [REL]

Authors: Qinfeng Zhu, Yuanzhi Cai, Lei Fan

Recent advancements in autoregressive networks with linear complexity have driven significant research progress, demonstrating exceptional performance in large language models. A representative model is the Extended Long Short-Term Memory (xLSTM), which incorporates gating mechanisms and memory structures, performing comparably to Transformer architectures in long-sequence language tasks. Autoregressive networks such as xLSTM can utilize image serialization to extend their application to visual tasks such as classification and segmentation. Although existing studies have demonstrated Vision-LSTM's impressive results in image classification, its performance in image semantic segmentation remains unverified. Our study represents the first attempt to evaluate the effectiveness of Vision-LSTM in the semantic segmentation of remotely sensed images. This evaluation is based on a specifically designed encoder-decoder architecture named Seg-LSTM, and comparisons with state-of-the-art segmentation networks. Our study found that Vision-LSTM's performance in semantic segmentation was limited and generally inferior to Vision-Transformers-based and Vision-Mamba-based models in most comparative tests. Future research directions for enhancing Vision-LSTM are recommended. The source code is available from https://github.com/zhuqinfeng1999/Seg-LSTM.

Subjects: Computer Vision and Pattern Recognition , Artificial Intelligence , Machine Learning

Publish: 2024-06-20 08:01:28 UTC

2406.14086

#1 Seg-LSTM: Performance of xLSTM for Semantic Segmentation of Remotely Sensed Images [PDF9] [Copy] [Kimi1] [REL]

#1 Seg-LSTM: Performance of xLSTM for Semantic Segmentation of Remotely Sensed Images [PDF⁹] [Copy] [Kimi¹] [REL]