Moderators often face a double challenge regarding reducing offensive and harmful content in social media. Despite the need to prevent the free circulation of such content, strict censorship on social media cannot be implemented due to a tricky dilemma – preserving free speech on the Internet while limiting them and how not to overreact. Existing systems do not essentially exploit the correlatedness of hate-offensive content and aggressive posts; instead, they attend to the tasks individually. As a result, the need for cost-effective, sophisticated multi-task systems to effectively detect aggressive and offensive content on social media is highly critical in recent times. This work presents a novel multifaceted transformer-based framework to identify aggressive and hate posts on social media. Through an end-to-end transformer-based multi-task network, our proposed approach addresses the following array of tasks: (a) aggression identification, (b) misogynistic aggression identification, (c) identifying hate-offensive and non-hate-offensive content, (d) identifying hate, profane, and offensive posts, (e) type of offense. We further investigate the role of emotion in improving the system’s overall performance by learning the task of emotion detection jointly with the other tasks. We evaluate our approach on two popular benchmark datasets of aggression and hate speech, covering four languages, and compare the system performance with various state-of-the-art methods. Results indicate that our multi-task system performs significantly well for all the tasks across multiple languages, outperforming several benchmark methods. Moreover, the secondary task of emotion detection substantially improves the system performance for all the tasks, indicating strong correlatedness among the tasks of aggression, hate, and emotion, thus opening avenues for future research.