Mixing C and Java programming in embedded, IoT designs

Although the Java language is the number one programming language in the world [1, 2], one may think that its adoption is lagging in "traditional" embedded systems because of its "fat and slow" reputation.

Actually, Java technologies changed the game in one particular type of embedded system, which is the cell phone. Cell phones have always had their specific hardware platforms and operating systems (for example, the Nokia Symbian and BlackBerry OS), but the advent of smartphones contributed to the emergence of app ecosystems such as Google's Android. Android apps are programmed in the Java language, so Java, in fact, has already won a significant percentage of embedded systems development – yes, smartphones have become "big" devices with powerful processors and plenty of memory/storage, but they're still embedded devices.

Today, the Java language is winning more and more designs in traditional non-mobile embedded systems and, in conjunction with real-time operating systems (RTOSs) and traditional C programming, is poised to become the solution of choice for IoT developers. To understand why, let's explore in more detail.

Taking embedded development to the next level

The Internet of Things is the next level for embedded systems, as IoT can be seen as "embedded" on a much larger scale:

  Programmability: Billions of IoT devices cannot be programmed with the limited number (in the hundred thousand range) of embedded/C/RTOS experts in the world. Industry needs to leverage larger communities (millions) of programmers from mobile/PC/server to meet the massive demands of the Internet of Things.

  Connectivity: IoT involves multiple wired and wireless physical layers and IP-based transport layer protocols such as UDP, TCP/IP, HTTP, TLS, REST, as well as new protocols and frameworks like CoAP, MQTT, and LWM2M.

  Complexity: IoT devices embed larger software content with more features and the capability to add new features dynamically (in the field) to address evolving technical or market needs.

   User experience: Consumers expect to interact with IoT devices as they do with their smartphones and tablets.

  Security: IoT devices need security at all levels – code execution, communications, identification/authentication, data storage, etc.

Java platforms provide a good solution to these challenges, as:

  Java is the number one language in the world, and all software engineering students learn it at the university level

  Java platforms offer generic implementation and APIs for IP-based networking, IoT protocols, and most non-IP protocols.

  The Java language and object-oriented programming (OOP) is well known for minimizing complexity, improving productivity, and reducing bugs.

  Java platforms enable dynamic downloading of code.

  Java platforms provide built-in security.

The key to success of Java platform implementations in embedded systems relies on tight integration with the underlying world of C, and leveraging it to the fullest extent. Java programming is not meant to replace C programming, as the C language and RTOSs are very good at providing a base runtime on top of embedded microprocessors (MPUs) and microcontrollers (MCU), and solving challenges associated with hardware-dependent software. However, Java programming is better at dealing with (developing, debugging, and maintaining) larger software packages and complexity, and at addressing hardware-independent application code.

Just like Android's virtual machine sits on top of Linux, an embedded Java platform can sit on top of an embedded RTOS and C runtime. The embedded Java platform has to be open and integrated as an independent piece of software by the C developer responsible for software bring-up on the embedded hardware, but this combined approach allows embedded projects to benefit from the best of both worlds: C for hardware interfacing and performance and Java for portability and scalability. Projects can also solve device programmability and software productivity issues as a few low-level C developers can enable dozens of higher level Java developers to build Java platforms on top of their C runtime.

Four key ingredients for Java integration

Java source code is compiled into a specific format called bytecode stored in class (.class) files. Class files are usually packaged into Java archive (.jar) files, which are in fact zip files that first require inflating before their bytecode can be executed. Standard Java platforms on PCs dynamically interpret bytecode with a Java virtual machine and compile it to machine code on the fly for performance improvement using a just-in-time (JIT) compiler. Unfortunately, this process cannot be transposed to MCU-based systems because it requires a lot of memory and fast processors (for storage, the inflating program, and running the JIT compiler) that are beyond the capabilities of that class of device.

But four key ingredients exist that make Java platforms suitable for integration with an embedded C-based environment with minimal memory footprint overhead (tens of kilobytes) and equivalent performance (yes, Java code can run as fast as C code). Let's review them:

1) A single, standards-based binary code format

The Executable and Linkable Format (ELF) [3] has become the de facto industry standard binary format for compiled code on MCUs. It is supported by the open source GNU GCC toochain and by other commercial toolchains. ARM, the industry-leading MCU architecture, defines its application binary interface (ABI) and relocations based on ELF.

ELF should be used as the unique and final binary code format for all programming languages used in an embedded software project.

Sidebar | Details of ELF

There are two main notions in ELF: symbols (names) and sections (memory zones with data or code), where basically symbols point to sections.

A symbol is an entity composed of a name and a value. A symbol may be absolute (also called a link-time constant) or relative to a section – its value will be resolved only when the linker has assigned a definitive position to the target section. A symbol can be local to the relocatable file or global to the link process. All global symbol names must be unique in the system (the name is the key to connect an unresolved symbol reference to a symbol definition).

Sections can be of two sorts:

  Allocation sections, representing a part of the program (image or runtime)

  Control sections, containing metadata (relocation sections, symbol tables, debug sections, etc.)

An allocation section can hold some image binary bytes (assembler instruction and raw data: the PROGBITS section) or can declare a runtime memory (statics, main stack, heap, etc.: the NOBITS section). A section has a conventional name representing the kind of data it holds: .text sections for binary instructions, .rodata sections for constant data, .bss sections for zero-initialized read/write data, .data sections for pre-initialized read/write data.

A relocation section is often associated with an allocation section, which contains instructions to resolve dependencies to external sections, such as a call to another function.

2. Minimal onboard runtime linking

The bytecode format should not be considered as an embedded binary format, but rather as an intermediate format between the source code and the binary (machine-specific) code that is compiled and linked off-board (cross-compilation process). Off-board bytecode compilation, or ahead-of-time (AOT) compilation and linking, allows one to leverage desktop compiler optimization techniques and take advantage of the underlying instruction set and its characteristics to produce efficient code.

The Java code has to be programmed and linked into flash memory at the same time as the C code. With such an implementation, no special Java linking program is required in on-board flash: the embedded virtual machine library is just a small runtime engine that can cost only a few tens of kilobytes. All code can be directly executed in place to ensure short boot time.

3. A single, standards-based native linker

The main idea behind successful integration of a Java platform on MCU-based systems is to simply see the Java language as another programming language in addition to the C language, without having to change production toolchains used today by C developers. This involves converting Java bytecode into ELF that can be mixed with ELF coming from compiled C code using off-the-shelf linker tools:

  The bytecode is compiled into a regular object file by a dedicated off-board compiler. Java functions are compiled to regular ELF sections, targeted by an ELF symbol with a naming convention that ensures standard ELF linkers can resolve Java symbols.

  The virtual machine is just a new ELF library added to the global project.

  The virtual machine APIs are described using regular C header files. APIs must be as generic as possible to enable porting the virtual machine to any underlying C runtime and associated RTOS, drivers, board support package (BSP), and C libraries. In extreme cases, only a timer is required when the virtual machine integrates its own internal scheduler, thus no RTOS is required.

  The whole (mixed) object files are statically linked with an off-the-shelf ELF linker. C developers still can use their favorite toolchain and integrated development environment (IDE).

Figure 1 shows the full mixed C and Java code compilation and linking steps.

Sidebar | Details of the ELF linking process

The linking process can be divided in three main steps:

1.  Symbols and sections resolution – Starting from root symbols and root sections, the linker embeds all sections targeted by symbols and all symbols referred by sections. This process is transitive while new symbols and/or sections are found. At the end of this step, the linker may stop and output errors (unresolved symbols, duplicate symbols, etc.).

2.  Memory positioning – Sections are laid out in memory ranges according to memory layout instructions described in the linker file (sometimes called a scatter file). Then relocation instructions are performed (i.e. symbol values are resolved and section content is modified). At the end of this step, linker may stop and output errors (if it could not resolve constraints, such as not enough memory, etc.).

3.  Output ELF executable file generation – The executable file generation is associated to a memory map file which is a text file that lists what content has been linked, where it has been positioned, sizes, etc.

4. Optimized Java-to-C code programming bridges

The embedded Java programming environment must offer access to some embedded specifics that can be done with C code:

  Immutable data (read-only data) for managing persistent (const) data stored in flash

  Bridges between Java and C programs linked as standard function calls so that any routines can be turned into C/assembly code if needed with zero-link runtime cost (linking Java code to C code is done by the off-the-shelf ELF linker)

  Fixed-size buffer sharing without any copy

The embedded Java runtime environment has to be implemented in an optimized way on top of the C runtime in order to:

  Provide an autonomous scheduler with built-in threads to ensure predictable scheduling adapted to embedded constraints ("green thread" integration to the RTOS: all Java threads run inside a single RTOS thread)

   Support object-oriented specifics (e.g., late binding in order to manage polymorphism)

   Manage memory (e.g., garbage collection adapted to embedded constraints, optimized array copy based on the C memcpy)

This enables easy reuse of legacy C code and integration of that code into the global Java application code.

It is common practice in software engineering to link object files with the linker provided by the same toolchain used for compiling the object files. This allows avoiding issues when trying to link objects with different binary formats. This rule remains true for embedded C and Java programming on MCU-based systems.

The four ingredients detailed previously ensure that using the Java language for programming MCU-based embedded systems does not result into large-footprint overhead. Furthermore, developers can benefit from the compactness of the Java bytecode. Developers can use widespread Java APIs (e.g., for networking, file systems) that make software truly portable. They don't need to port their source code to heterogeneous C APIs, stacks, and compilers, or work around an unequal level of support for standards like POSIX across MCU/RTOS/compilers. Off-the-shelf binary components can be created and reused across multiple MCU architectures and associated C runtimes without porting or even re-compiling source code. Binary components can be configured at link time (using link-time constants), avoiding source-level configurations with C #define statements and interdependent source files.

Vincent Perrier is Chief Product Officer at MicroEJ.




LinkedIn: www.linkedin.com/company/microej

Google+: plus.google.com/+Is2t_News/posts

YouTube: www.youtube.com/user/IS2Tsa



1. The PYPL PopularitY of Programming Language Index: http://pypl.github.io/PYPL.html

2. The TIOBE Programming Community Index: http://www.tiobe.com/tiobe_index?page=index

3. The industry-accepted ELF specification document is chapter "Object File Format" of the "Sun Solaris Linkers and Libraries Guide" (https://docs.oracle.com/cd/E26502_01/pdf/E26507.pdf)