MANY new internet of things (IoT) applications like intelligent and connected vehicles, smart healthcare and bike sharing have emerged in the recent years as a result of the advent of IoT technologies such as Narrow Band IoT (NB-IoT) [1] and the IPv6 over low-power wireless personal area network (6LoWPLAN) [2]. There has also been a wide use of a variety of IoT mobile devices (MDs), for instance the virtual reality (VR) glass, the smart band, the smart phone, the smart camera, etc. On the other hand, one of the trends of the future communication networks is the deployment of ultradense 5G [3], [4] due to the rapid development of 5G communication technologies. This means that, the deployment of massive IoT MDs and ultra-dense 5G cells would lead to the expansion of IoT towards the ultra-dense IoT networks in the future [5]. Regarding the quality of service (QoS) and experience (QoE) [6], [7], compared to the existing IoT, the ultra-dense IoT poses a number of requirements on the communication networks. While some IoTs (e.g., NB-IoT applications of the sharing bicycles) have elastic requirements on the network reliability latency, the some (e.g., smart healthcare and transportation) have stringent requirements on the network reliability, latency and throughput, etc. Because of the diverse QoS and QoE requirements, the future ultra-dense IoT networks will face some unprecedented challenges [8], [9] like tackling the conflict between the the resource-constrained IoT MDs and the resource-hungry mobile applications [10]–[12]. It is worth noting that many of the mobile applications requested by the IoT MDs, such as the face identification, fingerprint recognition, interactive gaming and the natural language processing are computationally intensive and demand a high energy consumption [13], [14]. However, because of the constrained physical size, these lightweight IoT MDs have always restricted the battery life and computational resources [15]. A feasible solution seems to be the Mobile-edge computation offloading (MECO) [16]– [18] which can address the issue. MECO can substantially decrease the tasks processing latency and the used energy of the IoT MDs through the offloading of the computation tasks of the IoT MDs to the edge servers deployed at the radio access infrastructures, like the small cells (e.g., femtocells, picocells, and the relays), Wi-Fi acess points (APs) and the macrocell, etc, which consequently leads into the enhancement of the the QoS and QoE in ultra-dense IoT networks [19], [20]. It is noticed that the available MECO research mostly considered the single-tier base staton (BS) scenario with a very simple computation offloading scheme [21], [22]. Particularly, the MDs may either execute their computation tasks locally at their CPU, or offload parts of their computationally intensive tasks to the edge server which is located at the macro base station (MBS). Although, considering the large number of mobile applications in ultra-dense IoT networks and the IoT MDs, the MBS is liable to be congested, and the QoE and the will be severely influenced. In fact, in order to lessen the burden of the MBS, lightweight edge servers can be utilized at the small cells, which are closer to the IoT MDs, and parts of computationally intensive tasks of these edge servers can be offloaded. The MECO problem in ultradense networks [23], [24], [26] has recently attracted attention among the researchers. However, this mainly took some simple computation task scenarios into consideration, and ignored the cases in which the type of computation tasks are randomly requested by the MDs and the computing resources at the edge servers change dynamically. Taking the limitations of existing work into account, this paper is an attempt to investigate the MECO problem in the ultra-dense IoT networks, i.e., a multi-user. In this paper, the problem of joint energy consumption and execution delay minimization for the MDs is investigated in a 5G heterogeneous network. In particular, the major contribution of this paper is summarized as follows: A MEC system with sub-6 GHz macrocells (Mcells) and millimeter-wave (mmWave) small cells (Scells) in ultra-dense IoT networks, i.e., a multi-user ultra-dense mmWave Scells scenario is investigated. In general, each MDs can connect to either type of BS on the uplink direction, independently. In order to provide in-depth study on energy consumption and delay performance, different queue models are applied to different network elements, e.g., the queues at the MD are considered as a M=M=1 queue, the one at the APs (fog node) is considered as a M=M=c. queue with a defined maximum request rate, and the one at the central cloud is considered as a M=M=1 queue. In the previous work about mobile cloud computing (MCC), such a fog computing system has been rarely studied. In particular, when modeling the energy consumption, delay performance, both wireless transmission and computing capabilities are explicitly and jointly considered. In terms of the computation tasks’ processing delay in local execution process and the IoT MDs’ energy consumption, the computational task transmission process, the fog execution and transmission process, and MEC execution and transmission process, a joint optimization problem is outlined, which can thoroughly complement the existing analysis of the fog computing system, and formulate the optimal MECO problem in ultra-dense IoT networks as a constrained optimization problem, in order to decrease the overall computation overhead while satisfying the delay and energy constraints. A multi-objective optimization problem involving the minimization of the the energy consumption and execution delay, from the perspective of MDs is formulated in mathematical manner, by finding the optimal transmit power, local computing frequency and processing location. Using the scalarization method, we are able to transform the multi-objective optimization problem into a single-objective optimization problem. The remaining energy of MDs used in order to weigh these functions. To address this optimization problem, successive convex approximation (SCA) method combining an iterative search algorithm is proposed to obtain optimal offloading decision and resource allocation, which optimizes local computing frequency scheduling, power allocation and computation offloading. The proposed algorithm can reduce the accumulated error and improve the calculation accuracy during the iteration process, effectively. The remainder of this paper is organized as follows. In Section II, the related work are reviewed. In Section III, the system model of multi-device MEC computation offloading is presented and our minimization problem as an MINLP problem is formulated. Section IV and V present the optimal computation offloading and resource allocation schemes for single and multi-cell MEC networks, respectively. Performance evaluation and simulation results are discussed in Section VI. Finally, the work is concluded in Section VII. Many research work have been carried out on MECO in the recent years due to the increased popularity the MECO has received [27]–[29]. The focus of some research has been the design of computation offloading schemes in the single/multiuser single edge server scenarios, where the problem of efficient computation offloading is solved in addition to the interference management, radio resource allocation and the computational resource allocation, etc [22], [30]. Under the dynamic problem, Zheng et al. [21] investigated the multi-user MECO problem, in which the MDs’ activeness and wireless channel gains were time-varying, and formulated it as a stochastic game. He proposed an efficient multi-agent stochastic learning algorithm as the solution afterwards, which resulted in reduced system wide computation cost. Wang et al. [31] studied the MECO problem as well as the interference management, and formulated the physical resource block allocation, the computation offloading decision-making, and MEC computation resource allocation as three optimization problems. These problems were solved gradually by adopting the results of the previous steps to the next ones which led into a superior performance. In addition, in order to reduce the consumed energy of the MDs further, some researchers have focused on designing computation offloading schemes through the energy harvesting (EH) techniques or the integration of dynamic voltage frequency scaling (DVFS) [32]. As an example, Mao et al. [33] investigated a green MEC system with EH devices and suggested an energy-effective online computation offloading scheme, i.e., the Lyapunov optimization-based dynamic computation offloading algorithm. The MD’s transmit power for computation offloading, the computation offloading decision, and the MDs’ CPU-cycle frequencies for task execution can be determined through the adaptation of their algorithm. In the past several years, some researchers who attempted to combine the ultra-dense network (UDN) and the mobile-edge computing (MEC) have designed MECO in multicell scenarios which are considered as two highly promising technologies of the 5G era. Sardellitti et al. [34] studied the MECO problem of a MIMO multicell system which was connected to a common cloud server, and proposed an energy-efficient iterative algorithm based on the SCA technique as the solution. By optimizing the computation resource allocation and the communication jointly, Zhang et al. [22] presented an energy aware computation offloading scheme so that they could investigate the tradeoff between the MDs’ energy consumption and the transmission latency of the multicell and single networks. An iterative search algorithm was proposed which combined the interior penalty function with the difference of two convex functions/ sets as their solution which led in to cost energy consumption and the processing latency reduction. However, the focus of research in multicell networks has been the scenarios in which multiple cells connect to a common cloud/edge server. Recently, some studies have emerged which have focused on MECO in UDN with multiple edge servers [23]–[25]. By introducing software defined network, Chen et al. [23] studied the task offloading problem in UDN. The aim was to minimize the delay while reducing the MDs’ energy consumption. The problem was formulated as a MINLP problem and by transforming the problem into two sub-problems, they proposed an efficient software defined task offloading scheme. By randomly offloading multiple number of tasks and weight to different MEC servers, Yang et al. [26] presented a MEC collaborative architecture with the aim of achieving resource sharing among the MEC-BSs in UDN, so that they could utilize the computing resources efficiently and to relieve the unfairness of MEC-BSs. It is worth noting that although several works have been focusing on the computation offloading problem with multiple edge servers in UDN over the past years, they mainly considered simple computation task scenarios, in which during a computation offloading period, a fixed number of computation tasks are presupposed, there is solely one computation task to accomplish in each MD. Additionally, in the most available work, some cases where different types of computation task were randomly by the MDs were neglected and there was a dramatic change of the computing resources at the edge servers. Considering the limitations of the existing work, this paper is aimed to investigate the computation offloading problem subject to specified energy and delay constraints for ultra-dense IoT networks with traditional sub-6 GHz macrocells coexisting with denser mmWave small cells, where a MD can connect to either opportunistically via the non-orthogonal multiple access (NOMA) protocol. To put it differently, the computation offloading techniques for mobile edge computing with mmWave communications have been merged, and the radio resources and the computational and are allocated collaboratively among the IoT MDs. This problem is modeled as a MINLP problem, which is NP-hard. An algorithm to solve the problem with adjustable solving accuracies is proposed in what follows. Additionally, an iterative algorithm is designed to obtain optimal offloading decision.