Skip to content. | Skip to navigation

solving complex IT problems!

Personal tools

You are here: Home / Featured

Featured

Cheat sheets for our Python seminars

For our Python seminars we have created cheat sheets that allow the course participants to quickly reuse what they have learned:

Python Cheat Sheet

Python Cheat Sheet Python Cheat Sheet 2

python-for-beginners-cheat-sheet.pdf, PDF, 70.4 KB

Pandas Cheat Sheet

Pandas Cheat Sheet Pandas Cheat Sheet 2

pandas-cheat-sheet.pdf, PDF, 52 KB

Git Cheat Sheet

Git Cheat Sheet Git Cheat Sheet 2

git-cheatsheet-web.pdf, PDF, 437 KB

And if you have any further questions about our training courses, please give us a call on +49 30 22430082 or send us an email to training@cusy.io.

Are Jupyter notebooks ready for production?

Are Jupyter notebooks ready for production?

Jupyter Notebook

In recent years, there has been a rapid increase in the use of Jupyter notebooks, s.a. Octoverse: Growth of Jupyter notebooks, 2016-2019. This is a Mathematica- inspired application that combines text, visualisation, and code in one document. Jupyter notebooks are widely used by our customers for prototyping, research analysis and machine learning. However, we have also seen that the growing popularity has also helped Jupyter notebooks be used in other areas of data analysis, and additional tools have been used to run extensive calculations with them.

However, Jupyter notebooks tend to be inappropriate for creating scalable, maintainable, and long-lasting production code. Although notebooks can be meaningfully versioned with a few tricks, automated tests can also run, but in complex projects, mixing code, comments and tests becomes an obstacle: Jupyter notebooks can not be sufficiently modularized. Although notebooks can be imported as modules, these options are extremely limited: the notebooks must first be fully loaded into memory and a new module must be created before each cell can run in it.

As a result, it came to the first notebook war, which was essentially a conflict between data scientists and software engineers.

How To Bridge The Gap?

Notebooks are rapidly gaining popularity among data scientists and becoming the de facto standard for rapid prototyping and exploratory analysis. Above all, however, Netflix has created an extensive ecosystem of additional tools and services, such as Genie and Metacat. These tools simplify complexity and support a broader audience of analysts, scientists and especially computer scientists. In general, each of these roles depends on different tools and languages. Superficially, the workflows seem different, if not complementary. However, at a more abstract level, these workflows have several overlapping tasks:

data exploration occurs early in a project

This may include displaying sample data, statistical profiling, and data visualization

Data preparation

iterative task

may include cleanup, standardising, transforming, denormalising, and aggregating data

Data validation

recurring task

may include displaying sample data, performing statistical profiling and aggregated analysis queries, and visualising data

Product creation

occurs late in a project

This may include providing code for production, training models, and scheduling workflows

A JupyterHub can already do a good job here to make these tasks as simple and manageable as possible. It is scalable and significantly reduces the number of tools.

To understand why Jupyter notebooks are so compelling for us, we highlight their core functionalities:

  • A messaging protocol for checking and executing language-independent code
  • An editable file format for writing and capturing code, code output, and markdown notes
  • A web-based user interface for interactive writing and code execution and data visualisation

Use Cases

Of our many applications, notebooks are today most commonly used for data access, parameterization, and workflow planning.

Data access

First we introduced notebooks to support data science workflows. As acceptance grew, we saw an opportunity to leverage the versatility and architecture of Jupyter notebooks for general data access. Mid-2018, we started to expand our notebooks from a niche product to a universal data platform.

From the user’s point of view, notebooks provide a convenient interface for iteratively executing code, searching and visualizing data – all on a single development platform. Because of this combination of versatility, performance, and ease of use, we have seen rapid adoption across many user groups of the platform.

Parameterization

Along with increasing acceptance, we have introduced additional features for other use cases. From this work notebooks became simply paramatable. This provided our users with a simple mechanism to define notebooks as reusable templates.

Workflow planning

As a further area of notebook ​​applications, we have discovered the planning of workflows. They have the following advantages, among others:

  • On the one hand, notebooks allow interactive work and rapid prototyping and on the other hand they can be put into production almost without any problems. For this the notebooks are modularized and marked as trustworthy.
  • Another advantage of notebooks are the different kernels, so that users can choose the right execution environment.
  • In addition, errors in notebooks are easier to understand because they are assigned to specific cells and the outputs can be stored.

Logging

In order to be able to use notebooks not only for rapid prototyping but also for long-term productivity, certain process events must be logged so that, for example, errors can be diagnosed more easily and the entire process can be monitored. IPython Notebboks can use the logging module of the standard Python library or loguru, see also Jupyter-Tutorial: Logging.

Testing

There have been a number of approaches to automate the testing of notebooks, such as nbval, but with ipytest writing notebook tests became much easier, see also Jupyter Tutorial: ipytest.

Summary

Over the last few years, we have been promoting close collaboration between Software Engineers and data scientists to achieve scalable, maintainable and production-ready code. Together, we have found solutions that can provide production-ready models for machine learning projects as well.

enterPy on 15 April 2021

The enterPy is a conference for Python in Business, Web and DevOps. It is aimed at professionals who use Python productively in the company or intend to do so because they want to exploit the potential of the programming language in data analysis, machine learning, web programming or even in the DevOps environment.
When Apr 15, 2021
from 09:00 AM to 03:45 PM
Where Online
Contact Name
Contact Phone +49 30 22430082
Add event to calendar vCal
iCal

enterPy April 2021

Veit Schiele will give a talk on Thursday, 15 April entitled «Why gRPC? – and how to implement it in Python».

gRPC is a modern high performance RPC (Remote Procedure Call) framework with defined IDL (Interface Definition Language). Since a client application can call a method on a server as if it were a local object, client-server applications can be created very easily. It supports most languages so that, for example, a Python service can also communicate with an Android Java client.

For more information on the conference, look here:

enterPy on 6 May 2021

The enterPy is a conference for Python in Business, Web and DevOps. It is aimed at professionals who use Python productively in the company or intend to do so because they want to exploit the potential of the programming language in data analysis, machine learning, web programming or even in the DevOps environment.
When May 06, 2021
from 09:00 AM to 04:45 PM
Where Online
Contact Name
Contact Phone +49 30 22430082
Attendees Frank Hofmann
Add event to calendar vCal
iCal

enterPy on 6 May 2021

Frank Hofmann will give a talk on Thursday, 6 May entitled «Version control in machine learning projects».

In this talk, you will learn by example how model development for machine learning (ML) can be systematically organised with DVC and scikit-learn. For example, the performance of a model can be improved if you fine-tune the parameters or if more training data becomes available. To be able to measure the improvement, it should be possible to track which data was used for training in which model definition and configuration and which model performance was achieved with it. Both the data and the associated Python code are recorded in one version.

For more information on the conference, look here:

More information about this event…

Data protection in times of Covid-19

Companies and organizations have data that they do not want to make available to others. They also have a special responsibility for their customers, partners and employees. Not being sovereign of this data means not only a loss of trust, but usually also commercial losses.

Show your customers, partners and employees that data protection is important to you and that you take responsibility to protect their privacy. Show that you have implemented the rules of the European General Data Protection Regulation (GDPR) from May 2018.

Therefore, do without Google services and use alternatives. Google makes money from the data you provide Google:

With your permission you give us more information about you, about your friends, and we can improve the quality of our searches. We don’t need you to type at all. We know where you are. We know where you’ve been. We can more or less know what you’re thinking about. [1]

This statement by the Google CEO, Eric Schmidt, is more relevant than ever. It can get scary when you think that a company knows more or less what you think about. The group only reveals part of this information if you still have a Google account – saved graphs and other evaluations will remain hidden from you.

In the following we would like to introduce you to some privacy-friendly alternatives to Google services:

… for your office work

  • Jitsi instead of Google Hangout, Zoom or Microsoft Teams
  • Mattermost instead of Slack
  • Nextcloud and OnlyOffice instead of Google Docs, Google Sheets, Google Slides, Google Calendar and Google Drive

… for your website

… for your apps

For further reading

Telearbeit und Mobiles Arbeiten
Information from the Federal Commissioner for Data Protection and Freedom of information (BfDI), January 2019
Top Tips for Cybersecurity when Working Remotely
Article by the European Union Agency for Cybersecurity (ENISA), March 2020
Home-Office? – Aber sicher!
Information from the Federal Office for Information Security (BSI), March 2020

[1]Google’s CEO: ‹The Laws Are Written by Lobbyists›, 2010.

Choosing the right NoSQL database

Relational databases dominated the software industry for a long time and are very mature with mechanisms such as redundancy, transaction control and standard interfaces. However, they were initially only able to react moderately to higher demands on scalability and performance. Thus, from the beginning of 2010, the term NoSQL was increasingly used to describe new types of databases that better met these requirements.

NoSQL databases should solve the following problems:

  • Bridging the internal data structure of the application and the relational data structure of the database.
  • Moving away from the integration of a wide variety of data structures into a uniform data model.
  • The growing amount of data increasingly required clusters for data storage

Aggregated data models

Relational database modelling is very different from the types of data structures that application developers use. The use of data structures modelled by developers to solve different problem domains has led to a move away from relational modelling towards aggregate models. Most of this is inspired by Domain Driven Design. An aggregate is a collection of data that we interact with as a unit. These aggregates form the boundaries for ACID operations, where Key Values, Documents and Column Family can be seen as forms of an aggregator-oriented database.

Aggregates make it easier for the database to manage data storage on a cluster, as the data unit can now be on any computer. Aggregator-oriented databases work best when most data interactions are performed with the same aggregate, e.g. when a profile needs to be retrieved with all its details. It is better to store the profile as an aggregation object and use these aggregates to retrieve profile details.

Distribution models

Aggregator-oriented databases facilitate the distribution of data because the distribution mechanism only has to move the aggregate and doesn’t have to worry about related data, since all related data is contained in the aggregate itself. There are two main types of data distribution:

Sharding
Sharding distributes different data across multiple servers so that each server acts as a single source for a subset of data.
Replication

Replication copies data across multiple servers so that the same data can be found in multiple locations. Replication takes two forms:

Master-slave replication makes one node the authoritative copy, processing writes, while slaves are synchronised with the master and may process reads.

Peer-to-peer replication allows writes to any node. Nodes coordinate to synchronise their copies of the data.

Master-slave replication reduces the likelihood of update conflicts, but peer-to-peer replication avoids writing all operations to a single server, thus avoiding a single point of failure. A system can use one or both techniques.

CAP Theorem

In distributed systems, the following three aspects are important:

  • Consistency
  • Availability
  • Partition tolerance

Eric Brewer has established the CAP theorem, which states that in any distributed system we can only choose two of the three options. Many NoSQL databases try to provide options where a setup can be chosen to set up the database according to your requirements. For example, if you consider Riak as a distributed key-value database, there are essentially the three variables

r
Number of nodes to respond to a read request before it is considered successful
w
number of nodes to respond to a write request before it is considered successful
n
Number of nodes on which the data is replicated, also called replication factor

In a Riak cluster with 5 nodes, we can adjust the values for r, w and n so that the system is very consistent by setting r = 5 and w = 5. However, by doing this we have made the cluster vulnerable to network partitions, as no write is possible if only one node is not responding. We can make the same cluster highly available for writes or reads by setting r = 1 and w = 1. However, now consistency may be affected as some nodes may not have the latest copy of the data. The CAP theorem states that when you get a network partition, you have to balance the availability of data against the consistency of data. Durability can also be weighed against latency, especially if you want to survive failures with replicated data.

Often with relational databases you needed little understanding of these requirements; now they become important again. So you may have been used to using transactions in relational databases. In NoSQL databases, however, these are no longer available to you and you have to think about how they should be implemented. Does the writing have to be transaction-safe? Or is it acceptable for data to be lost from time to time? Finally, sometimes an external transaction manager like ZooKeeper can be helpful.

Different types of NoSQL databases

NoSQL databases can be roughly divided into four types:

Key-value databases

Key-value databases are the simplest NoSQL data stores from an API perspective. The client can either retrieve the value for the key, enter a value for a key or delete a key from the data store. The value is a blob that the datastore just stores without caring or knowing what is inside. It is solely the responsibility of the application to understand what has been stored. Because key-value databases always use primary key access, they generally have high performance and can be easily scaled.

Some of the most popular key-value databases are

Riak KV
Home | GitHub | Docs
Redis
Home | GitHub | Docs
Memcached
Home | GitHub | Docs
Berkeley DB
Home | GitHub | Docs
Upscaledb
Home | GitHub | C API Docs

You need to choose them carefully as there are big differences between them. For example, while Riak stores data persistently, Memcached usually does not.

Document databases

These databases store and retrieve documents, which may be XML, JSON, BSON, etc. These documents are hierarchical tree data structures that can consist of maps, collections and scalar values. Document databases provide rich query languages and constructs such as databases, indexes, etc. that allow for an easier transition from relational databases.

Some of the most popular document databases are

MongoDB
Home | GitHub | Docs
CouchDB
Home | GitHub | Docs
RavenDB
Home | GitHub | Docs
Elasticsearch
Home | GitHub | Docs
eXist
Home | GitHub | Docs

Column Family Stores

These databases store data in column families as rows assigned to a row key. They are excellent for groups of related data that are frequently accessed together. For example, this could be all of a person’s profile information, but not their activities.

While each Column Family can be compared to the row in an RDBMS table where the key identifies the row and the row consists of multiple columns, in Column Family Stores the different rows do not have to have the same columns.

Some of the most popular Column Family Stores are

Cassandra
Home | GitHub | Docs
HBase
Home | GitHub | Docs
Hypertable
Home | GitHub | Docs

Cassandra can be described as fast and easily scalable because writes are distributed across the cluster. The cluster does not have a master node, so reads and writes can be performed by any node in the cluster.

Graph database

In graph databases you can store entities with certain properties and relationships between these entities. Entities are also called nodes. Think of a node as an instance of an object in an application; relationships can then be called edges, which can also have properties and are directed.

Graph models
Labeled Property Graph
In a labelled property graph, both nodes and edges can have properties.
Resource Description Framework (RDF)
In RDF, graphs are represented using triples. A triple consists of three elements in the form node-edge-node subject --predicate-> object, which are defined as resources in the form of a globally unique URI or as an anonymous resource. In order to be able to manage different graphs within a database, these are stored as quads, whereby a quad extends each triple by a reference to the associated graph. Building on RDF, a vocabulary has been developed with RDF Schema to formalise weak ontologies and furthermore to describe fully decidable ontologies with the Web Ontology Language.
Algorithms

Important algorithms for querying nodes and edges are:

Breadth-first search, depth-first search
Breadth-first search (BFS) is a method for traversing the nodes of a graph. In contrast to depth-first search (DFS), all nodes that can be reached directly from the initial node are traversed first. Only then are subsequent nodes traversed.
Shortest path
Path between two different nodes of a graph, which has minimum length with respect to an edge weight function.
Eigenvector
In linear algebra, a vector different from the zero vector, whose direction is not changed by the mapping. An eigenvector is therefore only scaled and the scaling factor is called the eigenvalue of the mapping.
Query languages
Blueprints
a Java API for property graphs that can be used together with various graph databases.
Cypher
a query language developed by Neo4j.
GraphQL
an SQL-like query language
Gremlin
an open source graph programming language that can be used with various graph databases (Neo4j, OrientDB).
SPARQL
query language specified by the W3C for RDF data models.
Distinction from relational databases

When we want to store graphs in relational databases, this is usually only done for specific conditions, e.g. for relationships between people. Adding more types of relationships then usually involves many schema changes.

In graph databases, traversing the links or relationships is very fast because the relationship between nodes doesn’t have to be calculated at query time.

Some of the most popular graph databases are

Neo4j
Home | GitHub | Docs
InfiniteGraph
Home

Selecting the NoSQL database

What all NoSQL databases have in common is that they don’t enforce a particular schema. Unlike strong-schema relational databases, schema changes do not need to be stored along with the source code that accesses those changes. Schema-less databases can tolerate changes in the implied schema, so they do not require downtime to migrate; they are therefore especially popular for systems that need to be available 24/7.

But how do we choose the right NoSQL database from so many? In the following we can only give you some general criteria:

Key-value databases
are generally useful for storing sessions, user profiles and settings. However, if relationships between the stored data are to be queried or multiple keys are to be edited simultaneously, we would avoid key-value databases.
Document databases
are generally useful for content management systems and e-commerce applications. However, we would avoid using document databases if complex transactions are required or multiple operations or queries are to be made for different aggregate structures.
Column Family Stores
are generally useful for content management systems, and high volume writes such as log aggregation. We would avoid using Column Family Stores databases that are in early development and whose query patterns may still change.
Graph databases
are well suited for problem areas where we need to connect data such as social networks, geospatial data, routing information as well as recommender system.

Conclusion

The rise of NoSQL databases did not lead to the demise of relational databases. They can coexist well. Often, different data storage technologies are used to store the data to match your structure and required query.

Migration from Jenkins to GitLab CI/CD

Our experience is that migrations are often postponed for a very long time because they do not promise any immediate advantage. However, when the tools used are getting on in years and no longer really fit the new requirements, technical debts accumulate that also jeopardise further development.

Advantages

The advantages of GitLab CI/CD over Jenkins are:

Seamless integration
GitLab provides a complete DevOps workflow that seamlessly integrates with the GitLab ecosystem.
Better visibility
Better integration also leads to greater visibility across pipelines and projects, allowing teams to stay focused.
Lower cost of ownership
Jenkins requires significant effort in maintenance and configuration. GitLab, on the other hand, provides code review and CI/CD in a single application.

Getting started

Migrating from Jenkins to GitLab doesn’t have to be scary though. Many projects have already been switched from Jenkins to GitLab CI/CD, and there are quite a few tools available to ease the transition, such as:

  • Run Jenkins files in GitLab CI/CD.

    A short-term solution that teams can use when migrating from Jenkins to GitLab CI/CD is to use Docker to run a Jenkins file in GitLab CI/CD while gradually updating the syntax. While this does not fix the external dependencies, it already provides better integration with the GitLab project.

  • Use Auto DevOps

    It may be possible to use Auto DevOps to build, test and deploy your applications without requiring any special configuration. One of the more involved tasks of Jenkins migration can be converting pipelines from Groovy to YAML; however, Auto DevOps provides predefined CI/CD configurations that create a suitable default pipeline in many cases. Auto DevOps offers other features such as security, performance and code quality testing. Finally, you can easily change the templates if you need further customisation.

Best Practices

  • Start small!

    The Getting Started steps above allow you to make incremental changes. This way you can make continuous progress in your migration project.

  • Use the tools effectively!

    Docker and Auto DevOps provide you with tools that simplify the transition.

  • Communicate transparently and clearly!

    Keep the team informed about the migration process and share the progress of the project. Also aim for clear job names and design your configuration in such a way that it gives the best possible overview. If necessary, write comments for variables and code that is difficult to understand.

Let me advise you

I will be happy to advise you and create a customised offer for the migration of your Jenkins pipeline to GitLab CI/CD.

Veit Schiele

Veit Schiele
Phone: +49 30 22430082

I will also be happy to call you!

Request now

Atlassian discontinues the server product range

Atlassian announced in mid-October 2020 that it would completely discontinue its server product line for the products Jira, Confluence, Bitbucket and Bamboo on 2 February 2021. Existing server licences will still be able to be used until 2 February 2024, although it is doubtful that Atlassian will actually continue to provide extensive support until 2 February 2024.

The product series will be phased out in stages:

  • 2 February 2021: New server licences will no longer be sold and price increases will come into effect.
  • 2 February 2022: Upgrades and downgrades will no longer be possible
  • 2 February 2023: App purchases for existing server licences will no longer be possible
  • 2 February 2024: End of support

While Atlassian recommends migrating to the cloud, many of our customers refuse to do so due to business requirements or data protection reasons. We work with our customers to analyse the requirements of their existing Jira, Confluence, Bitbucket and Bamboo servers and then develop suitable migration plans, e.g. to GitLab.

Let me advise you

Even if you are not yet a customer of ours, I will be happy to advise you and create a customised offer for the migration of your Atlassian servers.

Veit Schiele

Veit Schiele
Phone: +49 30 22430082

I will also be happy to call you!

Request now

Python Pattern Matching in Admin Magazine #63

The originally object-oriented programming language Python is to receive a new feature in version 3.10, which is mainly known from functional languages: pattern matching. The change is controversial in the Python community and has triggered a heated debate.
Python Pattern Matching in Admin Magazine #63

Admin Magazine #63

The originally object-oriented programming language Python is to receive a new feature in version 3.10, which is mainly known from functional languages: pattern matching. The change is controversial in the Python community and has triggered a heated debate.

Pattern matching is a symbol-processing method that uses a pattern to identify discrete structures or subsets, e.g. strings, trees or graphs. This procedure is found in functional or logical programming languages where a match expression is used to process data based on its structure, e.g. in Scala, Rust and F#. A match statement takes an expression and compares it to successive patterns specified as one or more cases. This is superficially similar to a switch statement in C, Java or JavaScript, but much more powerful.

Python 3.10 is now also to receive such a match expression. The implementation is described in PEP (Python Enhancement Proposal) 634. [1] Further information on the plans can be found in PEP 635 [2] and PEP 636 [3]. How pattern matching is supposed to work in Python 3.10 is shown by this very simple example, where a value is compared with several literals:

def http_error(status):
      match status:
          case 400:
              return "Bad request"
          case 401:
              return "Unauthorized"
          case 403:
              return "Forbidden"
          case 404:
              return "Not found"
          case 418:
              return "I'm a teapot"
          case _:
              return "Something else"

In the last case of the match statement, an underscore _ acts as a placeholder that intercepts everything. This has caused irritation among developers because an underscore is usually used in Python before variable names to declare them for internal use. While Python does not distinguish between private and public variables as strictly as Java does, it is still a very widely used convention that is also specified in the Style Guide for Python Code [4].

However, the proposed match statement can not only check patterns, i.e. detect a match between the value of a variable and a given pattern, it also rebinds the variables that match the given pattern.

This leads to the fact that in Python we suddenly have to deal with Schrödinger constants, which only remain constant until we take a closer look at them in a match statement. The following example is intended to explain this:

NOT_FOUND = 404
retcode = 200

match retcode:
    case NOT_FOUND:
        print('not found')

print(f"Current value of {NOT_FOUND=}")

This results in the following output:

not found
Current value of NOT_FOUND=200

This behaviour leads to harsh criticism of the proposal from experienced Python developers such as Brandon Rhodes, author of «Foundations of Python Network Programming»:

If this poorly-designed feature is really added to Python, we lose a principle I’ve always taught students: “if you see an undocumented constant, you can always name it without changing the code’s meaning.” The Substitution Principle, learned in algebra? It’ll no longer apply.

— Brandon Rhodes on 12 February 2021, 2:55 pm on Twitter [5]

Many long-time Python developers, however, are not only grumbling about the structural pattern-matching that is to come in Python 3.10. They generally regret developments in recent years in which more and more syntactic sugar has been sprinkled over the language. Original principles, as laid down in the Zen of Python [6], would be forgotten and functional stability would be lost.

Although Python has defined a sophisticated process with the Python Enhancement Proposals (PEPs) [7] that can be used to collaboratively steer the further development of Python, there is always criticism on Twitter and other social media, as is the case now with structural pattern matching. In fact, the topic has already been discussed intensively in the Python community. The Python Steering Council [8] recommended adoption of the Proposals as early as December 2020. Nevertheless, the topic only really boiled up with the adoption of the Proposals. The reason for this is surely the size and diversity of the Python community. Most programmers are probably only interested in discussions about extensions that solve their own problems. The other developments are overlooked until the PEPs are accepted. This is probably the case with structural pattern matching. It opens up solutions to problems that were hardly possible in Python before. For example, it allows data scientists to write matching parsers and compilers for which they previously had to resort to functional or logical programming languages.

With the adoption of the PEP, the discussion has now been taken into the wider Python community. Incidentally, Brett Cannon, a member of the Python Steering Council, pointed out in an interview [9] that the last word has not yet been spoken: until the first beta version, there is still time for changes if problems arise in practically used code. He also held out the possibility of changing the meaning of _ once again.

So maybe we will be spared Schrödinger’s constants.


[1]PEP 634: Specification
[2]PEP 635: Motivation and Rationale
[3]PEP 636: Tutorial
[4]https://pep8.org/#descriptive-naming-styles
[5]@brandon_rhodes
[6]PEP 20 – The Zen of Python
[7]Index of Python Enhancement Proposals (PEPs)
[8]Python Steering Council
[9]Python Bytes Episode #221

Criteria for safe and sustainable software

Open Source
The best way to check how secure your data is against unauthorised access is to use open source software.
Virtual Private Network
This is usually the basis for accessing a company network from outside. However, do not blindly trust the often false promises of VPN providers, but use open source programmes such as OpenVPN or WireGuard.
Remote desktop software
Remotely is a good open source alternative to TeamViewer or AnyDesk.
Configuration

Even with open-source software, check whether the default settings are really privacy-friendly:

For example, Jitsi Meet creates external connections to gravatar.com and logs far too much information with the INFO logging level. Previous Jitsi apps also tied in the trackers Google CrashLytics, Google Firebase Analytics and Amplitude. Run your own STUN servers if possible, otherwise meet-jit-si-turnrelay.jitsi.net is used.

Encryption methods

Here you should distinguish between transport encryption – ideally end-to-end – and encryption of stored data.

The synchronisation software Syncthing, for example, uses both TLS and Perfect Forward Secrecy to protect communication.

You should be informed if the fingerprint of a key changes.

Metadata
Make sure that communication software avoids or at least protects metadata; it can tell a lot about users’ lives.
Audits
Even the security risks of open source software can only be detected by experts. Use software that has successfully passed a security audit.
Tracker

Smartphone apps often integrate a lot of trackers that pass on data to third parties such as Google or Facebook without the user’s knowledge. εxodus Privacy is a website that analyses Android apps and shows which trackers are included in an app.

It also checks whether the permissions requested by an app fit the intended use. For example, it is incomprehensible why messengers such as Signal, Telegram and WhatsApp compulsorily require the entry of one’s own telephone number.

Malvertising

Avoid apps that embed advertising and thus pose the risk of malicious code advertising. Furthermore, tracking companies can evaluate and market the activities of users via embedded advertising.

There are numerous tools such as uBlock Origin for Firefox, Blokada for Android and iOS or AdGuard Pro for iOS that prevent the delivery of advertising and the leakage of personal data. With HttpCanary for Android apps and Charles Proxy for iOS apps, users can investigate for themselves how apps behave unless the app developers resort to certificate pinning. Burp Suite intercepts much more than just data packets and can also bypass certificate pinning.

Decentralised data storage
It is safest if data is stored decentrally. If this is not possible, federated systems, such as email infrastructure, are preferable to centralised ones.
Financial transparency
If there are companies behind open source software, they should be transparent about their finances and financial interests in the software. A good example in this respect is Delta Chat.
Availability
If an Android app is available, for example, only via Google's Play Store or also via the more privacy-friendly F-Droid Store.
Data economy
When selecting software, check not only whether it meets all functional requirements, but also whether it stores only the necessary data.
Data synchronisation
Data from a software should be able to be synchronised between multiple devices without the need for a central server to mediate it. For example, we sync our KeePass database directly between our devices using Syncthing and not via WebDAV or Nextcloud. This means that password data is not cached anywhere, but only stored where it is needed.
Backup
To ensure that all relevant data is securely available for the entire period of use, backup copies should be made. These should be stored in a safe place that is also legally permissible. The backup should also be automatic and the backups should be encrypted.