Recent years have witnessed the fast evolution of Transformer models in the field of computer vision, natural language processing, etc. Though promising, Transformer models usually require higher computation and memory as well as more diversed kernel supports. In this talk, I will discuss some of our recent works that leverage network/hardware co-design and co-optimization to improve the efficiency of Transformer models across different platforms. The talk will also discuss interesting future directions for Transformer and LLM acceleration.