You are extremely limited in two respects:
What you can produce visually, because of the limitations of voxel graphics.
What you can produce character wise, because of the lack of facial expressions/body language/ animation.
It is imperative you have some spoken audio, to narrate what the character thinks and experiences, so that these two limitations can be lessened, through dialogue.
Love the idea of a total miner film though.