All posts by bondy

Interested in end-user computing, particularly in relation to deployment and scripting. Site is as much a reminder for myself about how I fixed stuff as it is to help others....!

KB5025885 (Black Lotus) – Making Everything Work After It Breaks

The rumours about Microsoft’s enforcement of the Black Lotus boot kit mitigations have been around since May 2023 but so far Microsoft have (sensibly) held back enforcement. And with good reason – the mitigations, once applied, are known to cause all sorts of problems with booting, PXE boot, USB boot, CD, even SecureBoot. As you can imagine, this could become a support headache, especially when applied at enterprise level customers.

I have recently had the dubious pleasure of investigating this and assessing what impact this might have when it’s forcibly imposed in some future KB (incidentally, no concrete date set from what I have heard so far, but possibly Q1 2025 although this date keeps getting shunted forward).

In any case, I thought I’d share the benefit of my knowledge for those struggling to get stuff working, particularly PXE booting via WDS. In all likelihood, any machines you’re preparing for have already got the ‘payload’ in place – that is, it’s already present on your machines but just needs activation. This is achieved via several registry updates and reboots, the process of which can take the best part of half an hour. Quite a pain, but as I say, I would expect this to be taken care of in a more streamlined way in a future KB. Once the mitigations have been applied, there shouldn’t be any immediately noticeable issues with the machine, although there are reports by some of being unable to boot into the OS. If this happens, you may wish to temporarily disable SecureBoot or alternatively, apply a workaround in the linked article above. However I wouldn’t expect this to happen and I haven’t personally come across it yet.

However, you will start to notice problems when you attempt to PXE boot or boot off some other media. The reason for this is that the certificates in the UEFI BIOS have now been updated to 2023 from 2011 and fail the integrity check when booting using binaries with unmatched certs. There are three areas we have to update:

  • The Boot Image
  • The WDS/Native SCCM PXE Binaries
  • The OS Media

Boot Image

This should be the first port of call. When creating the image, make sure you have installed the ADK 10.1.26100.1 (May 2024) and in particular, the WinPE Add-On. This version contains all the latest certificates within the winpe.wim file and will ensure compatibility/bootability(!).

Windows Deployment Server

  1. Ensure the server has a recent (at least June 2024) KB update. This will ensure it has a copy of the binaries required to PXEboot containing the new certs available.
  2. Copy C:\Windows\System32\RemInst\boot_EX\x64\wdsmgfw_EX.efi E:\RemoteInstall\Boot\x64\wdsmgfw.efi
  3. Copy C:\Windows\System32\RemInst\boot_EX\x64\en-US\wdsmgfw_EX.efi.mui E:\RemoteInstall\Boot\x64\en-US\wdsmgfw.efi.mui
  4. Copy C:\Windows\System32\RemInst\boot_EX\x64\en-US\bootmgfw_EX.efi.mui E:\RemoteInstall\Boot\x64\en-US\bootmgfw.efi.mui
  5. Copy C:\Windows\System32\RemInst\boot_EX\x64\bootmgfw_EX.efi E:\RemoteInstall\Boot\x64\bootmgfw.efi

Strictly speaking, the mui’s aren’t necessary but…belt and braces. What we’re doing here is replacing the old binaries with the new 2023 binaries which were supplied in the latest KB. If you have more than one WDS server, obviously ensure all are updated appropriately. In our environment, we also had one or two hard-coded boot paths which caused further confusion but I wouldn’t expect that normally.

PXE via Native SCCM

If you use the native PXE responder in SCCM rather than WDS, follow the steps below:

  1. Download the Windows 11 24H2 Enterprise ISO and mount it. This contains the new binaries we need.
  2. Browse to <MOUNTPOINT>\sources and copy boot.wim to a new folder (D:\Updates).
  3. Mount the boot image, eg Dism /mount-wim /wimfile:D:\Updates\boot.wim /index:1 /mountdir:D:\Updates\MOUNT
  4. Copy the following files to your update directory: D:\Updates\MOUNT\Windows\Boot\EFI_EX\bootmgfw_EX.efi D:\Updates\MOUNT\Windows\Boot\PXE_EX\wdsmgfw_EX.efi
  5. Rename bootmgfw_EX.efi to bootmgfw.efi and wdsmgfw_EX.efi to wdsmgfw.efi
  6. Copy the renamed files to D:\SMS_DP$\sms\bin\SMSBoot\<PkgID\>\x64 on your PXE server (substitute for C: if necessary)
  7. You should be able to boot an updated machine now.

NOTE: wdsmgfw_EX.efi and bootmgfw_EX.efi can also be found on the boot image in the May 24 ADK. I had no luck trying the get the versions included there working correctly even though the certificates stated 2023, so it’s recommended you stick with the versions in the latest 24H2 ISO.

OS Media (Task Sequence)

So….we should now be working, right? Not so fast. You should now be in a position to be able to boot from WDS. However, if your OS media is out of date you will also be in trouble. If this is the case, you’ll find your task sequence failing once the OS has been laid down with bcdboot.exe failing to find the appropriate files it needs. From an OS media perspective you will need a minimum of the following:

  1. Windows 11 23H2 OS 22621.3737
  2. Windows 10 22H2 OS 19044.4529

If you ensure these or later versions are part of your task sequence, everything should be rosy.

OS Media (ISO Image)

Sometimes we need an ISO image to boot off. The instructions below should help when updating an ISO with the mitigations:

On SCCM – Upgrade to latest ADK and create a bootable ISO

Pre-Requisites On client machine ( eg win 11 )
Install Windows latest ADK (ADK 10.1.26100.1 (May 2024)) on your system, ensuring you include Deployment Tools.
Obtain the most recent Windows 11 23H2/24H2 installation media through your preferred distribution channel, such as the Volume Licensing portal.
This latest installation media contains the UEFI 2023 CA signed boot managers within the boot.wim file.

Once you have installed ADK, open Deployment and Imaging Tools Environment with administrative privileges.
Create the following directories:
mkdir C:\UpdateWinMedia\DVD
mkdir C:\UpdateWinMedia\Boot
mkdir C:\UpdateWinMedia\Mount

Mount the Windows installation ISO media on your system. In this example, the drive letter is F:

Copy the contents of the Windows installation media (F:) to C:\UpdateWinMedia\DVD
xcopy /s /h F:\ C:\UpdateWinMedia\DVD

Mount the Boot.wim file to extract updated boot files
Dism /mount-wim /wimfile:C:\UpdateWinMedia\DVD\sources\boot.wim /index:1 /mountdir:C:\UpdateWinMedia\Mount

xcopy /s /h C:\UpdateWinMedia\Mount\Windows\Boot C:\UpdateWinMedia\Boot
Dism /unmount-wim /discard /mountdir:C:\UpdateWinMedia\Mount

Create bootable DVD using oscdimg tool with the UEFI 2023 CA signed efisys.bin
oscdimg -m -o -u2 -udfver102 -pEF -b”C:\UpdateWinMedia\Boot\DVD_EX\EFI\en-US\efisys_EX.bin” C:\UpdateWinMedia\DVD C:\UpdateWinMedia\Windows11_UEFI2023.iso

Boot using the updated ISO and install Windows 11.

Hope the above is useful to someone, certainly cost me a good few hours of my life!

Azure Site-To-Site VPN – Lab Setup

In the fine traditions of this site, I am not going to go into the minutiaeĀ of every aspect of this or why we do it. The goal here is to get it up and running as quickly as possible with as few steps as possible. Whether I achieve this or not, you’ll have to be the judge, suffice to say there will be some basic steps I assume you will be able to do. So let’s get cracking.

1. Create a resource Group (eg RG_S2SVPN)
2. Create a VNet (eg vnet_s2svpn – 10.0.0.0/16)
3. Create a Subnet (eg Subnet1, 10.0.0.0/24)
4. Create a VM on the subnet you just created (this will be used for testing connectivity later)
5. Create a Gateway Subnet (eg GatewaySubnet, 10.0.1.0/29)
6. Create a VirtualNetworkGateway. This can be done manually in the portal as with anything else but it can no longer be done this way if you wish to use the basic SKU. If you wish to use the basic SKU update the code below if necessary and run this in Cloud Shell:

$location = "east us"
$resourceGroup = "RG_S2SVPN"
$VNetName = "vnet_s2svpn"
$VNGWPIPName = "s2svnetgw-ip"
$vnetgwipconfig = "vnetgwipconfig1"
$VNetGWName = "s2svnetgw-gw"
$vnet = Get-AzVirtualNetwork -name $VNetName -ResourceGroupName $resourceGroup
$subnet = Get-AzVirtualNetworkSubnetConfig -Name GatewaySubnet -VirtualNetwork $vnet
$vnetgwPIP = New-AzPublicIpAddress -Name $VNGWPIPName -ResourceGroupName $resourceGroup -Location $location -Sku Basic -AllocationMethod Dynamic
$vnetgwIpConfig = New-AzVirtualNetworkGatewayIpConfig -Name $vnetgwipconfig -SubnetId $subnet.Id -PublicIpAddressId $vnetgwPIP.Id
New-AzVirtualNetworkGateway -Name $VNetGWName -ResourceGroupName $resourceGroup -Location $location -IpConfigurations $vnetgwIpConfig -GatewayType Vpn -VpnType RouteBased -GatewaySku Basic

  1. Create a Local Network Gateway (eg OnPremGateway, IP = <Physical Internet Router IP> – hint:What’s my IP in Google), Address Space = , eg 192.168.0.0/24)
    1. Create local VPN router (typically on a server OS VM on your home network)
      – ‘Configure and enable Routing and Remote Access’
      – Custom Configuration
      – Select ‘VPN’ and ‘LAN routing’
      – Start Service
      – Click Network Interfaaces | New Demand-Dial Interface
      – Configure:
      Name (‘AzureS2S’)
      Connect using VPN
      IKEv2
      Public IP of your VPNGW in Azure
      Route IP packets on this Interface
      Static route w/metric of your azure subnet, eg 10.0.0.0 / 255.255.255.0 / Metric (eg 5)
      No need to specify any credentials
      – Click new connection (AzureS2S)|Options|Persistent Connection
      |Security|Specify a password for the Pre-sharedKey
    2. You will need to create a static route on your physical network/broadband router, pointing to the software router you created above. Different routers will have slightly different options but you should aim to provide the information below:
      – On WAN options, you will need to select port forwarding
      – Enable this, add ports 500/1701/4500 (UDP)
      – For the internal IP address, give the IP of the router you created in (8)
    3. In the portal, search for ‘connections’
      – Basics: Create Site to Site (IPSec), bi-direction connectivity, name and region
      – Settings: Select the virtual and on prem gateways and preshared key from above. Leave defaults, Create
    4. From the local VPN router you set up in (8) right click the connection you created and click ‘connect’. If all is well ‘connection state’ should now change to ‘connected’ after a few seconds. The Azure portal connection should also now show a ‘connected’ status after a refresh.
    5. Now you have the connection in place, log into your azure VM. For the purposes of testing, turn off the firewall (or at least let ICMP traffic through). You should be able to ping the VM on it’s local network IP (eg 10.0.0.4) from the router computer.
    6. In order to be able to communicate to your Azure VM from other machines on your local (‘on prem’/lab) you will need to create a static route from those machine(s):
      • On the local machine in question, get an admin cmd prompt up
      • ROUTE ADD 10.0.0.0 255.255.255.0 metric 5

Try pinging the VM againā€¦it should now be able to communicate to your azure VM. You can browse shares on it if you want, drop files etc just as you would on a machine on your local network/lab (you’ll need to provide the appropriate credentials, obvs).

Relaxing Intune Bitlocker policy for removeable disks

A quick post about an issue I ran into today whilst trying to create an OSDCloud USB stick (BTW anyone interesting in cloud imaging, I thoroughly recommend checking out David Segura’s site : OSDCloud.com).

Anyway I created a new image on my Intune-managed laptop and as you might typically expect, we have bitlocker policies for drive encryption. However by default, this will also ask the user to either encrypt the drive to allow writing to it or not do so but open it in a read-only mode. Given that I needed a bootable USB drive the encryption option wasn’t going to work for me. Digging through the policy settings, I eventually came to a setting called Deny write access to removable drives not protected by BitLocker which needed to be set to disabled. After several minutes/syncs and a few restarts later (yes, I’m impatient) The previously greyed out ‘paste’ option when I selected the drive appeared and for all intents and purposes I figured all would now be well. Unfortunately not.

At this point I was scratching my head a bit until I noticed a file on my desktop called WriteAccessUSB.reg. I guess I must have run into a similar issue in the past and this did the trick. Open regedit and browse to the following location:

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Policies\Microsoft\FVE

Add/change the following setting:

"RDVDenyWriteAccess"=dword:00000000

Finally just remove and replace your USB drive (no need to restart) and it should be readable.

Generic exception : ImportUpdateFromCatalogSite failed. Arg = xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx. Error =The underlying connection was closed: An unexpected error occurred on a send.

I recently rebuilt my WSUS/SUP server and after running a sync, was presented with a sea of red giving me an error for each and every update it tried to sync.

Transpires this is a result of the (relatively) recent enforced strengthening of the TLS protocol by Microsoft. The fix is pretty simple though. Jump onto your WSUS server and just run the commandline below to configure .NET Framework to support strong cryptography:

reg add HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\.NETFramework\v4.0.30319 /V SchUseStrongCrypto /T REG_DWORD /D 1

Now resync your updates and all should be well.

NB

I also ran into an issue after this whereby the wsyncmgr.log ‘synchronized’ all the updates (well appeared to do so) but no meta data appeared in the console. To fix this I unchecked all products and categories, sync’d again, then rechecked those I needed. I ran the sync once again and they all appeared.

SCCM Content Distribution Broken – WMI?

There can of course be many reasons for broken ConfigMgr content distribution – lack of space, physical security on disks and many, many others. This is one possibility though – can the site server actually reach the DP though WMI? If not, then this will undoubtedly cause problems.

This happened to my infrastructure, I suspect, through a patch deployment. See here for more information. Anyway, to test if this is an issue, run up a session of WBEMTEST and connect to the DP in question from your site server via:

\\<ConfigMgr DP>\root\CimV2

Assuming you’re getting ‘Access Denied’ (typically with an 80004005 error) this may well be the fix you’re looking for. You will also see the following in the SYSTEM eventlog of the DP:

The server-side authentication level policy does not allow the user XXX from address XXX to activate DCOM server. Please raise the activation authentication level at least to RPC_C_AUTHN_LEVEL_PKT_INTEGRITY in client application.

You’ll likely see the following in Distribution Manager status messages:

SOLUTION:

In REGEDIT, browse to

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Ole\AppCompat

Create a DWORD value:

RequireIntegrityActivationAuthenticationLevel

Give this a value of 0, then restart the machine.

You should now be able to successfully connect via WMI, as will your site server.

Configuration Manager can’t connect to the administration service

The configuration manager console can’t connect to the site database through the administration service on server

I am looking to test out one or two features which rely on MECM’s Administration Service so was somewhat disappointed when I got the error above whenever I clicked on the respective nodes. Mine is a fully PKI environment and my initial suspicion was that it was certificate-related. Having spent several hours tinkering with the certificates and messing with IIS and getting nowhere I decided to sleep on it…

The first thing I noticed was that the SMS_REST_PROVIDER.log hadn’t logged anything for over a month so something must be broken. I went to the SMS_REST_PROVIDER component on the server with the SMS Provider role and noticed I was unable to query/start the component. Looking at the status messages, it was constantly trying to reinstall and failing. A little more detective work and I found a possible lead to DCOM security, so I opened DCOMCNFG, expanded Component Services, expanded Computers, and then expanded My Computer. First red flag I saw was that there was clearly an error ‘bang’ on the My Computer icon. Anyway, I persevered and right-clicked it and selected MSDTC whereby I got an error:

“The remote server has been paused or is in the process of being started.”

This lead me to another post which was talking about a cluster configuration which was receiving the same error message. This got me thinking…I don’t have a cluster, what’s this on about? Anyway, I went back and checked the MECM box and it transpired I did have an old cluster I’d set up ages ago which I’d forgotten about and had since deleted one of the nodes! This was no longer required, so I simply ran a couple of Powershell commands:

Remove-Cluster -force

Remove-windowsFeature failover-clustering -restart

After restarting. I checked DCOMCNFG and the My Computer icon no longer had the bang in place. Nice. Looked at the console but still no joy. It was still telling me the Admin Service was unavailable šŸ™

I nonetheless sensed I was close. I went back to the DCOMCNFG applet and went down to the Distributed Transaction Coordinator node, under which there is another node called Local DTC. I right-clicked this and went to the security tab. I was interested to see whether the DTC logon account was correct. Unfortunately, it was (it should be NT AUTHORITY\NetworkService by the way). Another dead end. This time however I tried selecting the Network DTC Access check box. and opened up the MECM console again. I clicked on the Console extensions node and this time there was a short pause and everything appeared!

One weird thing I noticed. I was able to uncheck the Network DTC Access check box and my admin service seems to remain in place without error. I will monitor this but seems that it just needed temporary access here from my observations at present.

UPDATE:

Following the above, I found that a remote console I was using kept crashing. I had to add the Network DTC Access check box before it would load correctly. Further, it appears this checkbox should be kept checked as the console will begin to crash again when opened without it over time.

XML Parsing Error at line XXXXX char : Operation Aborted: MaxXMLSize constraint violated

Been a while since I last posted but ran into an issue today that had everyone confused as it quite a tricky one to track down. There had been a number of changes in the environment over the last few weeks so each and every one one of them was examined in microscopic detail. Let me explain…

We started to see a few hundred or so machines start to fail on a certain application (actually it was just a simple Powershell script packaged as an application) during the OSD build task sequence. As it happens this app was part of a nested TS but this is probably irrelevant. In any case, some machines were fine, others were failing. Nobody had touched the app in any way for several months.

After much digging and many red herrings, tucked away in the SMSTSLOG.log was the following message :

XML Parsing Error at line 473689 char 52: Operation Aborted: MaxXMLSize constraint violated.

The cause of this error was down to ‘too much policy’. Basically the affected machines had a lot of Defender Updates deployed to them and it was essentially too much for the machines to handle. Once removed everything started to work again.

If you’re pulling your hair out and can’t figure out why something is failing, then there are thousands of possibilities, admittedly. But it might be worth a quick search for the words XML Parsing Error.

Script Status Message queries!

If, like me, you spend more than your fair share of time searching through status messages to figure out what broke in the deployment over the weekend, then you’ll know what an arduous process it can be putting the criteria into each query. If you have a good few machines to check then you literally spend half your time typing in machine names and times.

Well no more, because did you know it is perfectly possibly to script this? Status Message Viewer (statview.exe) is simply an executable and with the right parameters and the correct time format applied, you can simply call the status messages from as many machines as you see fit (although I’d recommend you limit this to no more than 15-20 at a time).

One observation when running this against multiple machines is that you’ll notice some of the status messages won’t always contain as much info as you expect – simply refresh the status message and all info will display as expected.

Finally, create a text file containing a list of the machines you wish to take status messages from and use the path as a parameter along with the date from which you wish to obtain the messages, in the format YYYY-MM-DD.

Please note this script assumes you have installed the ConfigMgr admin console on the machine on which you run the script, and in the default location. If you have installed it elsewhere please change statview.exe path accordingly.

Param(
 [string]$path,
 [string]$date
 )
If($date -eq "" -or $path -eq "") 
 { 
     Write-Host "File path and date must be supplied as a parameters.
     Example: 
     -path C:\Temp\Computers.txt
     -date 2021-04-09"
     exit
 } 
$command = "C:\Program Files (x86)\Microsoft Configuration Manager\AdminConsole\bin\i386\statview.exe"
$siteServer = "SCCMSiteSvr.contoso.com"
$startDate = Get-Date -Format "yyyyMMddHH:mmm.000" -Date $date
$Computers = Get-Content $path
foreach($compName in $Computers)
{
    $commandArgs = "/SMS:Server=\\$siteServer\ /SMS:SYSTEM=$compName /SMS:COMPONENT=Task Sequence Engine /SMS:COMPONENT=Task Sequence Action /SMS:SEVERITY=ERROR /SMS:SEVERITY=WARNING /SMS:SEVERITY=INFORMATION /SMSSTARTTIME=$startDate"
    & "$command" $commandArgs
} 

in-line script execution time-out…

Had this recently on a machine we were upgrading to Win 10 1909. Initially it looked as though there was an issue detecting the application being installed correctly but on closer inspection, the AppDiscovery log file revealed that the same timeout issue was happening on several applications. Googling about there were quite a few posts on how later versions on ConfigMgr now incorporated a client property to change the script timeout setting but this sadly appeared not to be the case. Other posts suggested a script that could be run at server level to fix this. Not really the short-term fix I needed to sort my issue as it would doubtless take weeks to get the change through at work.

Then I found what I needed – a client-side script which I have now lost the source to, so really sorry if this came from you. I’m happy to set the record straight and link as needed. In any case, I do have the script itself, see below. This wil set the timeout to 1200 seconds (from the 60s default). This fixed my issue. I would imagine this could be added to the start of a task sequence if required. Note it’s a VBScript…old skool.

On Error Resume Next
strQuery = "SELECT * FROM CCM_ConfigurationManagementClientConfig"
Set objWMIService = GetObject("winmgmts:\\" & "." & "\ROOT\ccm\Policy\Machine\ActualConfig")
Set colItems = objWMIService.ExecQuery(strQuery, "WQL")
For Each objItem in colItems
objItem.ScriptExecutionTimeOut=1200
objItem.put_()
Next

Set objWMIService = GetObject("winmgmts:\\" & "." & "\ROOT\ccm\Policy\Machine\ActualConfig")
Set colItems = objWMIService.ExecQuery(strQuery, "WQL")
For Each objItem in colItems
If 1200 = objItem.ScriptExecutionTimeOut Then
WScript.Echo "True"
Else
WScript.Echo "False"
End if
Next 

Timed out waiting for CcmExex service to be fully operational

The mis-spelling above is intentional BTW, this is how it appears in the SMSTSLOG.log file. Typically a task sequence will be ticking along then something will happen after which the above error is displayed, almost always when it is trying to install either a ConfigMgr package or a ConfigMgr application (ie not a command-line). This is because the client isn’t actually required to execute, for example, a Powershell command line, but it must be initiated if a package or application is called on.

Ultimately, in my experience, this always comes down to one issue – an inability to reach the Management Point. There maybe various reasons for this and one such reason is described here.

However in my case this wasn’t the problem. Following a task sequence upgrade to 1909, I found all our laptops failing with this same error. These laptops were all on the internal network (ie not connected via VPN/DA, etc). If I logged on locally, I found that while I was able to ping machines, I was unable to reach shares from other machines on the network. Something was very wrong with networking (and obviously why the build was failing).

Running an IPCONFIG /all revealed that the broken laptops were all trying to use the IP/HTTPS Tunnel Adapter Interface. This synthetic interface is typically used for Direct Access which these laptops were certainly not using at the time. On further investigation, if I removed the group policies the laptops inherited for the duration of the upgrade I was then able to complete the upgrade without issue.

THE SOLUTION

The group policies are causing the tunnel adaptor to become active albeit briefly (haven’t got to the bottom of why this happens BTW). Unfortunately the adaptor isn’t able to communicate with the MP as it should. Then I read an article about a bug found in the 1909/2004 OS’s. You must ensure this patch (Oct 2020 update) is applied to the installation media prior to upgrade. Essentially, the certificates were disappearing causing the communication problem with the Tunnel adaptors on the laptop models. Once the patch was added, all was well.