linux malware programming langages

0001-01-01

Linux malware can be written in any programming language that runs on Linux.

Conventionally, Linux malware tends to be written in C, C++, shell scripts, and whatever interpreted languages are popular at the time.

In the 90s and 00s, Perl was popular and nearly always installed on systems. Nowadays, Python is the more popular choice. There is still a very high chance that both (and possibly more languages or interpreters) will be installed on a random Linux host.

As programming languages gain popularity, malware authors will almost assuredly use the language for malicious purposes. Go and Rust are some good examples of this. Both languages are newer, popular, and contain features condusive for writing malware. As such, there are numerous examples of each, but historically many more malicious programs have been written in C.

Each language has its pros and cons. A detailed breakdown of the pros and cons of each language in malware contexts would be lengthy, but generally, the following observations and gotchas apply:

  • Compiled programs should “just work” if they have been built for the correct CPU architecture of the victim host. For example, a x86_64 binary should work on any x86_64 system if it is copied over and ran.

  • Dynamic linked binaries will fail to run if the libraries it expects to be present are not available on the victim host.

  • Static linked binaries do not rely on external libraries and should just run, but tend to be much larger in size.

  • Binary analysis can determine which language a sample was written in. Identification of which language a sample was written in can be automated with tools such as Yara. Samples written in an uncommon language may stand out like a sore thumb and lead to malware being discovered easily. For example, if Rust isn’t widely used in an environment and a Rust binary suddenly lands on a host, this is suspicious.

  • Most malware analysts are great at analyzing samples in languages they are familiar with. If an analyst has never been exposed to Go and suddenly has to analyze a sample, it may take them significantly longer to analyze the sample due to the learning curve, potential lack of tooling/support/documentation, and simple unfamiliarity.

  • Malware written in interpreted languages tends to be much smaller in size than a compiled sample, and by its nature the code can simply be read by an analyst; no decompilation required. Some malware authors are great at disgusing malware to appear benign. Others may use obfuscators to muddy the analysis process, however this tends to be a dead giveaway that the sample is indeed malicious.

  • Interpreted language samples will fail if the interpreter is not available on the host; if exploitation is successful and the final payload is written in Perl, but Perl is not installed on the host, the malware will fail to run.

  • Interpreted languages may require very specific versions of the interpreter installed in order for them to work. For example, if the sample was written for Python 2.7 but Python 3.5 is installed, there is a very real risk that the sample will not run. Interpreted languages tend to have development cycles that introduce functionality that is not forward or backwards compatible.

Malware authors should research their intended targets and make choices based on what they expect the victim hosts’ architecture, distribution, and installed software and libraries to be.

Malware analysts should focus on what they expect to encounter based on the hosts they are defending as well as trends in malware authors' preffered languages and anti-analysis techniques.

System administrators should make efforts to reduce the attackable surface on their systems; remove any unused programs or scripting languages, set permissions on interpreters so only the intended users can run them, collect usage data that would be helpful for incident response, and monitor their systems’ overall health.


No notes link to this note