Doing It Wrong

learn in public

Site Navigation

  • Home
  • Books
  • Work & Play

Site Search

You are here: Home / Archives for Book

The Pragmatic Programmer: Chapter 6

posted on December 27, 2023

Concurrency

  • concurrency is when the execution of two or more pieces of code act as if they run at the same time: it is a software mechanism
  • parallelism is when they do run at the same time: it is a hardware concern
  • Concurrency requires that you run code in an environment that can switch execution between different parts of your code. This is often implemented using things like fibers, threads, and processes.
  • Almost any decent-sized codebase will need to handle concurrency: it’s just a real-world requirement.
  • temporal coupling happens when your code imposes a sequence of things that is not actually required to solve the problem at hand.
  • Shared state is the biggest liability once two things can happen at the same time.

Breaking Temporal Coupling

There are two aspects of time that are important to us: concurrency (things happening at the same time), and ordering (the relative positions of things in time). We should think about concurrency in our project by analyzing our workflow through an activity diagram.

  • drawn with rounded boxes for actions
  • arrow leaving an action leads to another action (A -> B means B must wait on A) or to a thick line called a synchronization bar
  • once all actions leading to a synchronization bar are complete, you can proceed along any arrows leaving it (to another action)
  • actions with no arrows leading in can be completed at any time.
  • this activity helps to identify activities that could be performed in parallel

Section Challenges

  • [ ] How many tasks do you perform in parallel when you get ready for work in the morning? Could you express this in a UML activity diagram? Can you find some way to get ready more quickly by increasing concurrency?

Shared State is Incorrect State

Scenarioizing servers in an restaurant selling pie (in a pie case) to their tables. Both see the last piece of pie and promise it to their table. One is disappointed.

  • The problem is not that two processes can write to the same memory; the problem is that neither process can guarantee that its view of that memory is consistent.
  • This is because fetching then updating the pie count is not an atomic operation: the underlying value can change in the middle.
  • Semaphores (a thing that only once process can own at a time) can solve this: only the server holding a plastic Leprechaun from the pie case can sell a pie.
  • trade-offs: this only works if all follow the convention
  • mitigation: move resource locking / semaphore handling into resource
  • multiple-resource transactions (pie and ice cream) should generally be a separate resource (so that you don’t have a server holding pie without ice cream)
  • non-transactional updates: concurrency problems aren’t restricted to writing to memory, but can pop up in files, databases, external services, etc.
  • whenever two or more instances of your code can access some resource at the same time, you have a potential for a problem
  • random failures are often concurrency issues
  • many languages have library support for mutexes (mutual exclusion), monitors, or semaphores
  • one could argue that functional languages (tendency to make all data immutable) make concurrency simpler, though they also run in the real world, subject to temporal restrictions, so you need to be aware of concurrency

Actors and Processes

Actors and processes offer a means to implement concurrency without the burden of synchronizing access to shared memory. Use actors for concurrency without shared state

  • an actor is an independent virtual processor with its own local (and private) state
  • each actor has a mailbox, whose messages get processed as soon as the actor is idle
  • there’s no single thing in control: nothing schedules what happens next, no orchestration of raw data to final output
  • the only state in the system is held in messages and the local, private state of the actor
  • all messages are one-way: there’s no built-in concept of replying
    • replies can be built by including a mailbox address in the message; make replying part of the message processing
  • processes each message to completion, and only processes one message at a time
  • a process is typically a more general-purpose virtual processor, often implemented by the operating system to facilitate concurrency

The diner / server scenario with actors:

  • the customer becomes hungry
  • they respond by asking the server for pie
  • the server asks the pie case for pie
  • if available, pie case sends pie to customer and notifies waiter to add to bill
  • if not available, pie case informs server

In all of the code for this, there’s no explicit concurrency handling, as there is no shared state. Erlang language and runtime are a great example of an actor implementation. Erlang calls actors processes, but they’re not regular operating system processes as described in this chapter’s notes. Also has a supervision system that manages process lifetimes.

Section Challenges

  • [ ] Do you currently have code that uses mutual exclusion to protect shared data? Why not try a prototype of the same code written using actors?
  • [ ] The actor code for the diner only supports ordering slices of pie. Extend it to let customers order pie a la mode, with separate agents managing the pie slices and the scoops of ice cream. Arrange things so that it handles the situation where one or the other runs out.

Blackboards

Blackboards provide a form of laissez faire concurrency, where the blackboard is the storage repository for independent processes, agents, actors, etc. They may be a good fit when writing an application with many sometimes independent and sometimes inter-dependent steps. Use them to coordinate workflows. Imagine writing an application to process loan applications.

  • responses (to credit inquiries, bank account balances, etc.) can arrive in any order.
  • data aggregation may be done by many different people, distributed across different time zones
  • some data gathering may be done automatically by other systems; data may arrive asynchronously
  • certain data may be dependent on other data (e.g., cannot start a car’s title search until the system has proof of ownership)
  • the arrival of new data raises new questions and policies; e.g., bad result from a credit check means stricture requirements for down payment

Messaging systems (think NATS, SQS, Kafka) can be like blackboards, since they offer message persistence and the ability to retrieve messages through pattern matching. Using blackboards comes with trade-offs:

  • harder to reason about business logic because everything is disconnected
  • observability can suffer unless you implement a system
  • agreeing on a communication format (or at least having a repository of different formats) requires work
  • more troublesome to deploy as there are more moving parts

Section Challenges

  • [ ] Exercise 24 Would a blackboard-style system be appropriate for the following applications? Why or why not?
  • Image processing: you’d like to have a number of parallel processes grab chunks of an image, process them, and put the completed chunk back.
  • Group calendaring: you’ve got people scattered across the globe, in different time zones, speaking different languages, trying to schedule a meeting.
  • Network monitoring tool: the system gathers performance statistics and collects trouble reports, which agenst use to look for trouble in the system.
  • [ ] Do you use blackboard systems in the real world—the message board by the refrigerator, or the big whiteboard at work? What makes them effective? Are messages ever posted with a consistent format? Does it matter?

Filed Under: Development Tagged With: Book, Notes

The Pragmatic Programmer: Chapter 5

posted on December 20, 2023

Bend, or Break

Make every effort to write code that’s as flexible as possible.

Decoupling

  • Coupling is the enemy of change, because it links together things that must change in parallel.
  • Software is not bridges—we don’t want components to be rigidly coupled together, or else change one component requires changing its coupled components.
  • Coupling is transitive: A -> B, B -> C means A -> C.
  • Decoupled code is easier to change (ETC).
  • Symptoms of coupling
  • dependencies between unrelated modules / libraries
  • simple changes to one module propagates changes through unrelated modules (or breaks)
  • developers are afraid to change code because they don’t know what will be affected
  • meeting where everyone has to attend because nobody is sure who will be affected by a change

Train Wrecks

public void applyDiscount(customer, order_id, discount) {
      totals = customer
          .orders
          .find(order_id)
          .getTotals();
      totals.grandTotal = totals.grandTotal - discount;
      totals.discount   = discount;
}Code language: JavaScript (javascript)

That code transverses five levels of abstraction, from customers to total amounts. We have to know that customer exposes orders that have a .find() method, etc., all the way down. That’s a lot of implicit knowledge and things that cannot change in the future.

Tell, don’t ask. Don’t make decisions based on the internal state of an object and then update that object. This is a pattern, not a law of nature, so don’t follow it slavishly.

public void applyDiscount(customer, order_id, discount) {
      totals = customer
          .findOrder(order_id)
          .applyDiscount(discount);
}Code language: JavaScript (javascript)

The Law of Demeter expressing the following sentiment in a more detailed way (LoD probably not very relevant today): don’t chain method calls. This “one-dot rule” doesn’t apply if the things you’re chaining are very unlikely to change (like language-level features). The following Ruby code doesn’t violate the one-dot rule, because it’s language-level.

people
  .sort_by {|person| person.age }
  .first(10)
  .map {| person | person.name }

Pipelines are not method-call chains: pipelines transform data, passing it from one function to the next. We’re not relying on hidden implementation details.

  • [ ] How does “tell, don’t ask” strike you?
  • [ ] Do you think the “one-dot rule” is practical? Could it be helpful?

The Evils of Globalization

  • Global data is coupling, as you never know what will break if you change it. Reuse should not be your primary concern when writing code, but the thinking that makes code reusable should be in your mind as you create it. Avoid global data—it slows you down.
  • Singletons are global data, though at least they have intelligence behind Confg.getLogLevel() that can help you not break calling code.
  • Any mutable external resource is global data. You can’t avoid using a database, so you can minimize the impact of global data by wrapping these resources behind code you control.
  • If it’s important enough to be global, wrap it in an API.
  • [ ] What are appropriate uses of global data?

Inheritance Adds Coupling

Subclassing just isn’t shy: it doesn’t deal with only it’s own concern. Alterations in one place (the parent class) can change the subclass elsewhere.

  • [ ] How does this strike you? Do you prefer to work in an OOP mental model / language?
  • [ ] Can you imagine using OOP without subclassing?

Juggling the Real World

Events represent the availability of information. They can come from external or internal sources. When we write applications that response to events, here are are few strategies:

  • Finite State Machines
  • The Observer Pattern
  • Publish / Subscribe
  • Reactive Programming and Streams

Finite State Machines

  • There exist a limited number of states for your application. You can be in one state at any given time.
  • Events move you from one state to another.
  • Actions can be triggered upon moving.
  • [ ] What do you think about using a FSM in your next application? Have you ever used one before?

The Observer Pattern

  • source of events is the observable
  • observers are watching the observable
  • fairly simple pattern: push a function reference into a list, and call those functions when the event occurs
  • because the observers have to register with the observable, you introduce coupling
  • callbacks are handled synchronously, so you have more opportunity for performance bottlenecks

Publish / Subscribe (PubSub)

  • generalizes the observer pattern, dealing with coupling and performance bottlenecks
  • publishers and subscribers are connected via channels (the how is an implementation detail hidden from your application logic)
  • subscribes register to 1 or more channels
  • publishers write to channels
  • good choice for decoupling the handling of asynchronous events
  • observability is hard with such a distributed system
  • is a good example of reducing coupling by abstracting up through a shared interface (the channel)

Reactive Programming

Reactive programming, as a paradigm, is often compared to using a spreadsheet: you change one value, and other values reactively update. Reactivity can be created with events, but streams build reactivity in. RxJS is a good example of this paradigm. Event streams unify synchronous and asynchronous processing behind a common API.

Section Challenges

  • [ ] Exercise 19 In the FSM section we mentioned that you could move the generic state machine implementation into its own class. That class would probably be initialized by passing in a table of transitions and an initial state. Try implementing the string extractor that way.
  • [ ] Exercise 20 Which of these technologies (perhaps in combination) would be a good fit for the following situations:
  • If you receive three network interface down events within five minutes, notify the operations staff.
  • If it is after sunset, and there is motion detected at the bottom of the stairs followed by motion detected at the top of the stairs, turn on the upstairs lights.
  • You want to notify various reporting systems that an order was completed.
  • In order to determine whether a customer qualifies for a car loan, the application needs to send requests to three backend services and wait for the responses.

Transforming Programming

All programs transform data, yet we rarely thing about creating transformations when designing software. There’s great value in thinking about programs as being something that transforms inputs into outputs—like an industrial assembly line.

  • think of the Unix philosophy
  • programming is about code, but programs are about data
  • break down your program into transform |> transform, then repeat
  • even if your language doesn’t support pipes, you can still use the philosophy of design
const content = File.read(fileName);
const lines = findMatchingLines(content, pattern);
const result = truncateLines(lines);Code language: JavaScript (javascript)
  • the reason transforms are worthwhile is that instead of hoarding state (encapsulation in objects), you pass it around—you lose a whole category of complexity and coupling
  • data becomes a flow…a peer to functionality
  • error handling can be done with either:
  • an :ok/:error tuple (I like [error, data]), handled inside each transformation
  • handle it in the pipeline (some kind of andThen function that only continues if no error)

Section Challenges

  • [ ] Exercise 21 Can you express the following requirements as a top-level transformation? That is, for each, identify the input and the output.
  1. Shipping and sales tax are added to an order
  2. Your application loads configuration information from a named file
  3. Someone logs in to a web application
  • [ ] Exercise 22 You’ve identified the need to validate and convert an input field from a string into an integer between 18 and 150. The overall transformation is described by
  field contents as string
    -> [validate & convert]
      -> {:ok, value} | {:error, reason}Code language: JavaScript (javascript)

Write the individual transformations that make up validate & convert.

  • [ ] Exercise 23 In _Language X Doesn’t Have Pipelines, on page 153 we wrote:
  const content = File.read(file_name);
  const lines = find_matching_lines(content, pattern);
  const result = truncate_lines(lines);Code language: JavaScript (javascript)

Many people write OO code by chaining together method calls, and might be tempted to write this as something like:

  const result = content_of(file_name)
                  .find_matching_lines(pattern)
                  .truncate_lines(lines)Code language: JavaScript (javascript)

What’s the difference between these two pieces of code? Which do you think we prefer and why?

Inheritance Tax

  • two types of inheritance (from two origins):
  • Simula, where inheritance was a way of combining types
  • Smalltalk, where inheritance was a dynamic organization of behaviors
  • both types have the issue of coupling code
  • alternatives to inheritance:
  • interfaces and protocols, which allow us to
  • delegation
  • mixins and traits
  • delegate to services: has-a trumps is-a

Section Challenges

  • [ ] The next time you find yourself subclassing, take a minute to examine the options. Can you achieve what you want the interfaces, delegation, and / or mixins? Can you reduce coupling by doing so?

Configuration

Parameterize your application by using external configuration. Common configurable data:

  • credentials for external services
  • logging levels and destinations
  • ports, IP addresses, machine names, cluster names
  • environment-specific validation parameters
  • externally-set parameters (like tax rates)
  • site-specific formatting details
  • license keys

You could structure this as a flat-file off-the-shelf plain-text document (that works). You can also store it in a database table if it is likely to be changed by the customer. You can also do both!

Consider putting your configuration data behind a thin API:

  • multiple applications can share configuration data (with appropriate authN and authZ)
  • configuration changes can be made globally
  • configuration data can be made via a specialized UI
  • configuration data become dynamic (no application restart necessary)

As with all things, don’t overdo it. You can have too much configuration.

Filed Under: Development Tagged With: Book, Notes

The Pragmatic Programmer: Chapter 4

posted on December 13, 2023

Pragmatic Paranoia

Nobody writes perfect code. Just as we’ve all been taught to be defensive drivers, so we should be defensive coders.

Design By Contract

First developed by Bertrand Meyer for the language Eiffel. A correct program is one that does no more and no less than it claims to do. Documenting and verifying those claims is the heart of Design By Contract (DBC). Expectations and claims are described as follows:

  • preconditions: what needs to be true for a routine to be called
  • without DBC, maybe like conditional parsing input to conditionally call a function
  • postconditions: the state of data after the routine is done (requires that it will conclude, so no infinite loops)
  • without DBC, maybe like parsing output and returning data or error
  • class invariants: class ensures this conditions is true from the perspective of the caller (not necessarily internal to the routine when running) once the routine is finished
  • without DBC, maybe like an assertion about output Summarized as: If all the routine’s preconditions are met by the caller, the routine shall guarantee that all postconditions and invariants will be true when it completes. If the contract is broken, the “remedy” is invoked, which may be an exception or program termination. This shouldn’t happen; it’s a bug. Some languages have better support for these concepts than others. Clojure has pre-conditions and post-conditions. Elixir has guard clauses. Even in languages that don’t support these concepts, you can honor the principles (Zod is one example). If orthogonal (decoupled) code is “shy” (so that it’s concerns are its’ own), DBC code is “lazy”: be strict in what you will accept before you begin, and promise as little as possible in return. This may seem to contradict Postel’s Law / the Robustness Principle

Be liberal in what you accept, and conservative in what you send.

But a series of “lazy” functions can sum to a liberal / robust acceptance.

DBC differs from Test-Driven Development and Defensive Programming in the following:

  • DBC requires no mocking or setup.
  • DBC defines the parameter for success or failure in all cases, whereas testing can only target one specific case at a time.
  • TDD happens only during the build cycle; DBC and assertions are runtime, so they exist through all cycles.
  • TDD does not generally focus on checking internal invariants.
  • DBC is more efficient and DRY-er than defensive programming; if no one has to validate the data, then everyone does.

Implementing DBC

Simply enumerating the input domain range, boundary conditions, and what the routine promises to deliver (and therefore what it doesn’t promise to deliver) is a huge leap forward in writing better software. Most languages don’t support DBC in the code, so you can implement it the best you can.

  • Assertions: runtime checks of logical conditions
  • if used in classes where extended from parent / superclass, assertions must be manually called or recreated; not auto-inherited
  • if you tie assertions to a log level, they may be turned off
  • there’s no concept of “old” values—values as they existed at the entry to a method, so you have save / assign any data you want to check in the post condition
  • the runtime doesn’t support checking contracts, so you’re left with bolting it on (like throwing an error)
  • Crashing Early
  • validate your input and crash early so that, for example, you’re not passing a NaN value down the line to a sqrt function
  • Semantic Invariants: a kind of “philosophical contract”
  • semantic invariants are endemic to the meaning of the thing; they are not changeable business logic
  • when you find one, state it clearly and concisely
  • e.g., if building a debit transaction system: “Err in favor of the consumer.”
  • Dynamic Contracts and Agents
  • e.g., “I can’t provide that, but if you give me this, then I might provide something else.”

Section Challenges

  • [ ] Points to ponder: If DBC is so powerful, why isn’t it used more widely? Is it hard to come up with the contract? Does it make you think about issues you’d rather ignore for now? Does it force you to THINK!? Clearly, this is a dangerous tool!
  • [ ] Exercise 14 Design an interface to a kitchen blender. It will eventually be a web-based, IoT-enabled blender, but for now we just need the interface to control it. It has ten speed settings (0 means off). You can’t operate it empty, and you can change the speed only one unit at a time (that is, from 0 to 1, and from 1 to 2, not from 0 to 2). Here are the methods. Add appropriate pre- and postconditions and an invariant.
int getSpeed()
void setSpeed(int x)
boolean isFull()
void fill()
void empty()Code language: JavaScript (javascript)
  • [ ] Exercise 15 (possible answer) How many numbers are in the series 0, 5, 10, 15, …, 100?

Dead Programs Tell No Lies

It’s easy to fall into the “that can’t happen” mentality: “Does my switch statement really need a default case?!” But we’re coding defensively; we make sure the data is what we think it is, the code in production is the code we think it is, the correct dependency versions were loaded, etc.

The application code shouldn’t be eclipsed by the error handling code. If the caller has to catch every form of exception and raise the appropriate error, the code is coupled: if the author of the called function adds another exception, the caller is subtly out-of-date.

The Erlang and Elixir languages embrace a “Crash Early” (crash, don’t trash) philosophy.

Defensive programming is a waste of time. Let it crash!—Joe Armstrong

In these environments, crashes are managed with supervisors, which are responsible for cleaning up after it, restarting it, etc. Supervisors are supervised, creating a design of supervisor trees. This technique creates high-availability, fault-tolerant systems. This might not always be appropriate: you may have allocated resources that need freeing, need to log messages, finish transactions, etc.

Still, if the “impossible” happens, your program is no longer viable, so terminate it as soon as possible. A dead program normally does a lot less damage than a broken one.

Assertive Programming

We deceive ourselves when we say “This can never happen…”. Use assertions to prevent the impossible. Whenever you find yourself thinking “but of course this could never happen”, add code to check it. Assertions, however, do not replace error handling.

  • Be careful of of side effects when making assertions—like calling .next() on an iterator.
  • Leave assertions on in prod!

Section Challenges

  • [ ] Exercise 16 A quick reality check. Which of these “impossible” things can happen?
  • A month with fewer than 28 days
  • Error code from a system call: can’t access the current directory
  • In C++: a = 2; b = 3; but (a + b) does not equal 5
  • A triangle with an interior angle sum ≠ 180°
  • A minute that doesn’t have 60 seconds
  • (a + 1) <= a

How to Balance Resources

In short, be careful when allocating resources (like opening a file). Don’t couple functions tightly together by sharing a file resource that one opens and the other closes. Instead, act locally. Some languages have fail-safes for closing filesystem resource references automatically, like Java’s try-with-resources statements. General advice:

  • Deallocate resources in the reverse order of there allocation so you don’t orphan resources if one contains a reference to another.
  • When allocating the same set of resources throughout your codebase, use the same ordering. This reduces the possibility of deadlock. I.e., process A claims resource 1 and wants resource 2, but process B claims resource 2 and wants resource 1, causing both to hang.
  • The resource could be transactions, network connections, memory, files, threads, windows, etc.
  • Consider balancing over time. Applied to log files, you might ask
  • Do you rotate the logs and clean them up?
  • How do you handle the finite space you have for logs?
  • What are you doing with your unofficial debug logs?
  • If using a DB, do you expire the records?
  • Object oriented languages can wrap the resource usage in a class. When the class representing the resource is constructed, you allocate the resource; when destructed (and garbage collected, maybe) you deallocate. This can really help when the language you’re using allows exceptions to interfere with resource deallocation.
  • How to ensure that you deallocate resources if there’s an exception? Generally two choices:
  • Use variable scope.
  • Use finally clause (of try...catch...finally).
  • Sometimes you cannot balance resources through the resource allocation pattern. Try to establish a semantic invariant for memory allocation. Who is responsible for the data in an aggregate data structure? Three man options:
  • Top-level structure is responsible for freeing any substructures it contains. These structures recursively delete data they contain.
  • Top-level structure is deallocated and structures that it points to are orphaned.
  • Top-level structure refuses to deallocate itself if it contains any substructures.

Section Challenges

  • [ ] Although there are no guaranteed ways of ensuring that you always free resources, certain design techniques, when applied consistently, will help. In the text we discussed how establishing a semantic invariant for major data structures could direct memory deallocation decisions. Consider how Topic 23, Design by Contract, could help refine this idea.
  • [ ] Exercise 17 Some C and C++ developers make a point of setting a pointer to NULL after they deallocate the memory it references. Why is this a good idea?
  • [ ] Exercise 18 Some Java developers make a point of setting an object variable to NULL after they have finished using the object. Why is this a good idea?

Don’t Outrun Your Headlights

Take small steps—always. The rate of feedback you can receive is your speed limit. Feedback is what independently confirms or disproves your action. Steps too large are those that require any “fortune telling”. Fortune telling feels like:

  • Estimate completion dates months in the future.
  • Plan a design for future maintenance or extendability.
  • Guess user’s future needs.
  • Guess future tech availability.

Designing for future maintenance only works up to a point—only as far ahead as you can see. If you’re aiming further than that, instead, design code that’s easy to change. Make it easy to delete.

Filed Under: Development Tagged With: Book, Notes

  • « Previous Page
  • 1
  • 2
  • 3
  • 4
  • 5
  • Next Page »

Profile Links

  • GitHub
  • Buy Me a Coffee?

Recent Posts

  • Event Listeners
  • A Philosophy of Software Design
  • The Programmer’s Brain
  • Thoughts on Microservices
  • API Design Patterns

Recent Comments

No comments to show.

Archives

  • May 2025
  • September 2024
  • July 2024
  • March 2024
  • February 2024
  • January 2024
  • December 2023
  • November 2023
  • October 2023
  • December 2022
  • December 2021

Categories

  • Development