Text this: DTCC: Multi-level dilated convolution with transformer for weakly-supervised crowd counting