The inner ear sensory epithelium harbors mechanosensory hair cells responsible for detecting sound and maintaining balance. This protocol describes a three-dimensional (3D) culture system that efficiently generates inner ear sensory epithelia from aggregates of mouse embryonic stem (mES) cells. By mimicking the activations and repressions of key signaling pathways during in vivo inner ear development, mES cell aggregates are sequentially treated with recombinant proteins and small molecule inhibitors for activating or inhibiting the Bmp, TGFβ, Fgf, and Wnt signaling pathways. These stepwise treatments promote mES cells to sequentially differentiate into epithelia representing the non-neural ectoderm, preplacodal ectoderm, otic placodal ectoderm, and ultimately, the hair cell-containing sensory epithelia. The derived hair cells are surrounded by a layer of supporting cells and are innervated by sensory neurons. This in vitro inner ear organoid culture system may serve as a valuable tool in developmental and physiological research, disease modeling, drug testing, and potential cell-based therapies.